Blog

To provide support for the hierarchical Azure IoT Edge scenarios we started working on a connected registry implementation that will allow extension of the Azure container registry functionality to on-premises. For those of you who are not familiar with what a hierarchical IoT Edge scenario is, take a look at the Purdue network model used in the ISA 95 and ISA 99 standards – TL;DR: it is the network architecture that allows segregation of OT and IT traffic in manufacturing networks. While the Azure IoT team has provided a sample of the hierarchical IoT Edge environment, I wanted to reproduce it using physical ARM-based devices.

The problem with configuring a hierarchy of IoT Edge devices at home is that home networking devices do not allow advanced network configurations that you would normally expect from an enterprise-grade switch or router. While my Eero is good at its mesh WiFi capabilities, it is a very poor implementation of a switch and doesn’t allow the creation of multiple virtual networks or WiFI SSIDs. The routing capabilities are also quite limited, which to be honest, impact the ability to create secure IoT networks at home. However, I derail… To implement my configuration, I gathered a bunch of Raspberry Pi 4s, Pi Zeros, and an Nvidia Jetson Nano and got to work.

Here is the big picture of what are we implementing:

Hierarchy of IoT Edge devices - Purdue Networks

 

In this post, I will walk over the configuration of the IT Proxy in the IT DMZ layer shown in the picture above. For that, I decided to use a Raspberry Pi 4 device and configure it with the Squid proxy similar to the Azure IoT sample linked above. Before that though, let’s look at the device configuration.

Configuring the Raspberry Pi Network Interfaces

Raspberry Pi 4 has two network interfaces – WiFi one and Ethernet one. The idea here is to configure the WiFi interface to connect to my home network using the home network IP range  192.168.0.0/24 and wire the Ethernet interface to a simple switch and assign it a static IP address from the  10.16.8.0/24 network. Routing between the two interfaces should also be established so traffic from the  10.16.8.0/24 network can flow to the 192.168.0.0/24 network. Pulling some details from the Setting up a Raspberry Pi as a routed wireless access point article on the Raspberry Pi’s official website, I ended up with the following configuration.

Warning: Be careful here and don’t copy the commands directly from the Pi’s article! We are routing in the reverse direction – from the Ethernet interface to the WiFi interface.

  • Install the netfilter-persistent and the iptables-persistent plugin to be able to persist the routing rules between reboots:
    sudo DEBIAN_FRONTEND=noninteractive apt install -y netfilter-persistent iptables-persistent
  • Configure a static IP address for the Ethernet interface. To follow the Azure IoT sample, I choose 10.16.8.4. Edit the DHCP configuration file with sudo vi /etc/dhcpcd.conf and add the following at the end:
    interface eth0
    static ip_address=10.16.8.4/24
  • Enable routing by creating a new file with sudo vi /etc/sysctl.d/routed-proxy.conf and add the following in it:
    # Enable IPv4 routing
    net.ipv4.ip_forward=1
  • Next, create a firewall rule with:
    sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
  • Last, save the configuration and reboot the device:
    sudo netfilter-persistent save
    sudo systemctl reboot

Checking the configuration with ifconfig after reboot should give you something similar to:

pi@raspberrypi:~ $ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.16.8.4  netmask 255.255.255.0  broadcast 10.16.8.255
        inet6 fe80::6816:2aa8:ab8d:9f93  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:99  txqueuelen 1000  (Ethernet)
        RX packets 5565  bytes 1359978 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 131  bytes 14399 (14.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 22  bytes 1848 (1.8 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 22  bytes 1848 (1.8 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

wlan0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.0.101  netmask 255.255.0.0  broadcast 192.168.255.255
        inet6 fe80::6788:d8e2:4e8a:558  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:32:77:fb:9a  txqueuelen 1000  (Ethernet)
        RX packets 7619  bytes 1671937 (1.5 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1494  bytes 346279 (338.1 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

To test the connectivity through the newly configured router, I had to set up my Mac with a wired connection using a USB Ethernet adapter. Here the configuration:

traceroute yields satisfactory results showing that the traffic goes through the Raspberry Pi:

toddysm@MacBook-Pro ~ % traceroute 192.168.0.101
traceroute to 192.168.0.101 (192.168.0.101), 64 hops max, 52 byte packets
 1  192.168.0.101 (192.168.0.101)  9.146 ms  7.669 ms  10.125 ms
toddysm@MacBook-Pro ~ % traceroute 172.217.3.164
traceroute to 172.217.3.164 (172.217.3.164), 64 hops max, 52 byte packets
 1  10.16.8.4 (10.16.8.4)  2.555 ms  0.661 ms  0.377 ms
 2  * *^C
toddysm@MacBook-Pro ~ %

Installing and Configuring Squid Proxy

Installing Squid proxy is as trivial as typing the following command:

sudo apt install squid

However, there are two configuration settings you need to do to make sure that the proxy can be used from machines on the local network. Those settings are available in the Squid configuration file /etc/squid/squid.conf:

  • First, the port configuration for the proxy is set to bind to any IP address but when the proxy starts, it binds to the IPv6 addresses instead of the IPv4 ones. You need to change the following line:
    http_port 3128

    and add the IP address of the Ethernet port like this:

    http_port 10.16.8.4:3128
  • Second, access to the proxy is enabled only from the localhost. Uncomment the following line to enable access from the localnet:
    http_access allow localnet

    Also, make sure that the localnet is defined as:

    acl localnet src 0.0.0.1-0.255.255.255  # RFC 1122 "this" network (LAN)
    acl localnet src 10.0.0.0/8             # RFC 1918 local private network (LAN)
    acl localnet src 100.64.0.0/10          # RFC 6598 shared address space (CGN)
    acl localnet src 169.254.0.0/16         # RFC 3927 link-local (directly plugged) machines
    acl localnet src 172.16.0.0/12          # RFC 1918 local private network (LAN)
    acl localnet src 192.168.0.0/16         # RFC 1918 local private network (LAN)
    acl localnet src fc00::/7               # RFC 4193 local private network range
    acl localnet src fe80::/10              # RFC 4291 link-local (directly plugged) machines

    The latter should already be done later in the file.

To make sure that the proxy is properly configured, I changed the networking configuration in my Firefox browser on my Mac as follows:

I also turned off my Mac’s WiFi to make sure that the traffic goes through the wired interface and uses the proxy. The test was successful and now I have the top layer of the hierarchy of IoT Edge network configured.

In the next post, I will go over configuring the L5 of the Purdue network architecture and installing Azure IoT Edge runtime on it.

With the recent Solorigate incident, a lot of emphasis is put on determining the origin of the software running in an enterprise. For Docker container images, this will mean to embed in the image the Dockerfile the image was built from. However, tracking down the software origin is not so trivial to do. For closed-source software, we blindly trust the vendors and if we are lucky enough, we may get a signed piece of code. For open-source one, we rarely check the SHA signature and never even think of verifying what source code this binary was produced from. In talks with customers, I quite often hear them asking, how can they verify what sources a container image is built from. They want to attribute each image with metadata that links to the Dockerfile used to build the image as well as the Git commit and the developer who triggered the build.

There are many articles that discuss this problem. Here are two recent examples. Richard Lander from the Microsoft .NET team writes in his blog post Staying safe with .NET containers about the pedigree and provenance of the software we run and how to think about it. Josh Hendrick in his post Embedding source code version information in Docker images offers one solution to the problem.

Josh Hendrick’s proposal is in the direction I would go, but one problem I have with it is that it requires special handling in the application that runs in the container to obtain this information. I would prefer to have this information readily available without the need to run the container image. Docker images and the Open Container Initiative already have specified ways to do that without adding special files to your image. In this post, I will outline another way you can embed this information into your images and easily retrieve it without any changes to your application.

Using Docker Image Labels

Docker images spec has already built-in functionality to add labels to the image. Labels are intended to be set during build time. They also show up when inspecting the image using docker image inspect, which makes them the right choice to specify the Dockerfile and the other build origin details. One more argument that makes them the right choice for this information is that the labels are layers in the image, and thus immutable. If you change the label in an image the resulting image SHA will change.

To demonstrate how labels can be used to embed the Dockerfile and other origin information into the Docker image, I have published a dynamic labels sample on GitHub. The sample uses a base Python image and implements a simple functionality to print the container’s environment variables. Let’s walk through it step by step.

The Dockerfile is quite simple.

FROM python:slim
ARG IMAGE_COMMITTER
ARG IMAGE_DOCKERFILE
ARG IMAGE_COMMIT_SHA
LABEL "build.user"=${IMAGE_COMMITTER}
LABEL "build.sha"=${IMAGE_COMMIT_SHA}
LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
ADD ./samples/dynamic-labels/source /
CMD ["python", "/show_environment.py"]

Lines 2-4 define the build arguments that need to be set during the build of the image. Lines 5-7 set the three labels build.user, build.sha, and build.dockerfilethat we want to embed in the image. build.dockerfile is the URL to the Dockerfile in the GitHub repository, while the build.sha is the Git commit that triggers the build. If you build the image locally with some dummy build arguments you will see that new layers are created for each of the lines 5-7.

toddysm@MacBook-Pro ~ % docker build -t test --build-arg IMAGE_COMMITTER=toddysm --build-arg IMAGE_DOCKERFILE=https://test.com --build-arg IMAGE_COMMIT_SHA=12345 -f .\samples\dynamic-labels\Dockerfile .
Sending build context to Docker daemon  376.3kB
Step 1/9 : FROM python:slim
 ---> 8c84baace4b3
Step 2/9 : ARG IMAGE_COMMITTER
 ---> Running in 71ad05f20d20
Removing intermediate container 71ad05f20d20
 ---> fe56c62b9903
Step 3/9 : ARG IMAGE_DOCKERFILE
 ---> Running in fe468c44e9fc
Removing intermediate container fe468c44e9fc
 ---> b776dca57bd7
Step 4/9 : ARG IMAGE_COMMIT_SHA
 ---> Running in 849a82225c31
Removing intermediate container 849a82225c31
 ---> 3a4c6c23a699
Step 5/9 : LABEL "build.user"=${IMAGE_COMMITTER}
 ---> Running in fd4bfb8d5b5b
Removing intermediate container fd4bfb8d5b5b
 ---> 2e9be17c48ff
Step 6/9 : LABEL "build.sha"=${IMAGE_COMMIT_SHA}
 ---> Running in 892323d73495
Removing intermediate container 892323d73495
 ---> b7bc6559629d
Step 7/9 : LABEL "build.dockerfile"=${IMAGE_DOCKERFILE}
 ---> Running in 98687b8dd9fb
Removing intermediate container 98687b8dd9fb
 ---> 35e97d273cbc
Step 8/9 : ADD ./samples/dynamic-labels/source /
 ---> 9e71859892b1
Step 9/9 : CMD ["python", "/show_environment.py"]
 ---> Running in 366b1b6c3bea
Removing intermediate container 366b1b6c3bea
 ---> e7cb39a21c2a
Successfully built e7cb39a21c2a
Successfully tagged test:latest

You can inspect the image and see the labels by issuing the command docker image inspect --format='{{json .Config.Labels}}' <imagename>.

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' test | jq
{
  "build.dockerfile":"https://test.com",
  "build.sha":"12345",
  "build.user":"toddysm"
}

Now, let’s automate the process with the help of GitHub Actions. I have created one GitHub Action to build and push the image to DockerHub and another to build and push to Azure Container Registry (ACR). Both actions are similar in the steps they use. The first two steps are the same for both actions. They will build the URL to the Dockerfile using the corresponding GitHub Actions variables:

- name: 'Set environment variable for Dockerfile URL for push'
  if: ${{ github.event_name == 'push' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

- name: 'Set environment variable for Dockerfile URL for pull request'
  if: ${{ github.event_name == 'pull_request' }}
  run: echo "DOCKERFILE_URL=${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/blob/${GITHUB_BASE_REF#refs/*/}/samples/dynamic-labels/Dockerfile" >> $GITHUB_ENV

Then, there will be specific steps to sign into DockerHub or Azure. After that, the build steps are the ones where the labels are set. Here, for example, is the build step that buildx and automatically pushes the image to DockerHub:

- name: Build and push
  id: docker_build
  uses: docker/build-push-action@v2
  with:
    context: ./
    file: ./samples/dynamic-labels/Dockerfile
    push: true
    tags: ${{ secrets.DOCKER_HUB_REPONAME }}:build-${{ github.run_number }}
    build-args: |
      IMAGE_COMMITTER=${{ github.actor }}
      IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }}
      IMAGE_COMMIT_SHA=${{ github.sha }}

The build step for building the image and pushing to Azure Container Registry uses the traditional docker build approach:

- name: Build and push
  id: docker_build
  uses: azure/docker-login@v1
  with:
    login-server: ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}
    username: ${{ secrets.ACR_REGISTRY_USERNAME }}
    password: ${{ secrets.ACR_REGISTRY_PASSWORD }}
- run: |
    docker build -f ./samples/dynamic-labels/Dockerfile -t ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }} --build-arg IMAGE_COMMITTER=${{ github.actor }} --build-arg IMAGE_DOCKERFILE=${{ env.DOCKERFILE_URL }} --build-arg IMAGE_COMMIT_SHA=${{ github.sha }} .
    docker push ${{ secrets.ACR_REGISTRY_LOGIN_SERVER }}/${{ secrets.ACR_REPOSITORY_NAME }}:build-${{ github.run_number }}

After the actions complete, the images are available in DockerHub and Azure Container Registry. Here is how the image looks like in DockerHub:

Docker container image with labels

If you scroll down a little, you will see the labels that appear in the list of layers:

The URL points you to the Dockerfile that was used to create the image while the commit SHA can be used to identify the latest changes that are done on the project that is used to build the image. If you pull the image locally, you can also see the labels using the command:

toddysm@MacBook-Pro ~ % docker pull toddysm/tmstests:build-36
build-36: Pulling from toddysm/tmstests
45b42c59be33: Already exists
8cd3485318db: Already exists
2f564129f025: Pull complete
cf1573f5a21e: Pull complete
ceec8aed2dab: Pull complete
78b1088f77a0: Pull complete
Digest: sha256:7862c2a31970916fd50d3ab38de0dad74a180374d41625f014341c90c4b55758
Status: Downloaded newer image for toddysm/tmstests:build-36
docker.io/toddysm/tmstests:build-36
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' toddysm/tmstests:build-36
{
  "build.dockerfile":"https://github.com/CrimsonPinnacle/container-image-inspector/blob/development/samples/dynamic-labels/Dockerfile",
  "build.sha":"e80e6ef86f86a11d6a73aea8d8c41700c4d3d7c5",
  "build.user":"toddysm"
}

To summarize, the benefit of using labels for embedding the Dockerfile and other origin information into the container images is that those are considered immutable layers of the image. Thus, they cannot be changed without changing the image.

Who is Using Docker Image Labels?

Unfortunately, labels are not widely used if at all 🙁 Checking several popular images from DockerHub yields the following results:

toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' busybox | jq
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' alpine | jq 
null
toddysm@MacBook-Pro ~ % docker image inspect --format='{{json .Config.Labels}}' ubuntu | jq 
null

Tracking down the sources from which the Alpine image is built would require much higher effort.

What is Next for Checking Docker Image Origins?

There are a couple of community initiatives that will play a role in determining the origin of container images.

  • Notary V2 will allow images to be signed. Having the origin information embedded into the image and adding an official signature to the image will increase the confidence in the legitimacy of the image.
  • OCI manifest specification allows artifacts (i.e. images) to be annotated with arbitrary metadata. Unfortunately, Docker doesn’t support those yet. Hopefully, in the future, Docker images will add support for arbitrary metadata that can be included in the image manifest.
  • An implementation of metadata service (see metadata service draft from Steve Lasker) as part of the registry will enable additional capabilities to provide origin information for the images.

Summary

While image metadata is great to annotate images with useful information and enable search and querying capabilities, the metadata is kept outside of the image layers and can mutate over time. Verifying the authenticity of the metadata and keeping a history of the changes will be a harder problem to solve. Docker already provides a way to embed the Dockerfile and other image origin information as immutable layers of the image itself. Using dynamically populated Docker image labels, developers can right now provide origin information and increase the supply chain confidence for their images.

In my previous post, Learn More About Your Home Network with Elastic SIEM – Part 1: Setting Up Elastic SIEM, I explained how you could set up Elastic SIEM on a Raspberry Pi[ad]. The next thing you would want to do is to collect the logs from your firewall and analyze them. Before I jump into the technical details, I should warn you that you may… not be able to do the steps below if you rely on consumer products or you use the equipment provided by your ISP.

Let me go on a short rant here! Every self-respected router vendor should allow firewall logs to be sent to an external system. A common approach is to use the SYSLOG protocol to collect the logs. If your router does not have this capability… well, I would suggest you buy a new, more advanced one.

I personally invested in a tiny Netgate SG-1100 box that runs the open-source PFSense router/firewall. You can, of course, install PFSense on your own hardware if you don’t want to buy a new device. PFSense allows you to configure up to three external log servers. Logstash, that we have configured in the previous post, can play the role of an SYSLOG server and send the events to Elasticsearch. Here is how simple the configuration of the PFSense log shipping looks:

The IP address 192.168.11.72 is the address of the Raspberry Pi, where the ELK SIEM is installed and 5140 is the port that Logstash uses to listen for incoming events. Thas is all you need to configure PFSense to send the logs to the ELK SIEM.

Our next step is to configure Logstash to collect the events from PFSense and feed them into an index in Elastic. The following project from Patrick Jennings will help you with the Logstash configuration. If you follow the instructions, you will see the new index show up in Kibana like this:

The last thing we need to do is to create a dashboard in Kibana to show the data collected from the firewall. Patrick Jennings’ project has pre-configured visualizations and a dashboard for the PFSense data. Unfortunately, when you import those, Kibana warns you that those need to be updated. The reason is that they use the old JSON format used by Kibana, and the latest versions require all objects to be described using the Newline Delimited NDJSON format (for more details, visit ndjson.org). The pfSense dashboard and visualization are available in my GitHub repository for Home SIEM.

Now, keep in mind that the pfSense logs will not feed into the SIEM functionality of the Elastic stack because it is not in the Elastic Common Schema (ECS) format. What we have created is just a dashboard that visualizes the firewall log data. Also, the dashboard and the visualizations are built using the pfSense data. If you use a different router/firewall, you will need to update the configuration to visualize the data, and things may not work out of the box. I would be curious to hear feedback on how other routers can send data to ELK.

In subsequent posts, I will describe how you can use Beats to get data from the machines connected to your local network and how you can dig deeper into the collected data.

 

 

 

 

 

Last night I had some free time to play with my network, and I ran  tcpdump out of curiosity. For a while, I’ve been interested to analyze what traffic is going through my home network, and the result of my test pushed me to get to work. I have a bunch of Raspberry Pi devices in my drawers, so, the simplest thing that I can do is get one and install Elastic SIEM on it. For those of you, who don’t know what SIEM is, it stands for Security Information and Event Management. My hope was that with it, I will be able to get a better understanding of the traffic on my home network.

Installing Elastic SIEM on Raspberry Pi

The first thing I had to do is to install the ELK stack on a Raspberry Pi. There are not too many good articles that explain how to set up Elastic SIEM on your Pi. According to Elastic, Elastic SIEM is generally available in Elastic Stack 7.6. So, installing the Elastic Stack should do the work.

A couple of notes before that:

  1. The first thing to keep in mind is that 8GB is the minimum requirement for the ELK stack. You can get around with 2GB Pi, but if you want to run the whole stack (Elasticsearch, Logstash, and Kibana) on a single device, make sure that you order Raspberry Pi 4 Model B Quad Core 64 Bit with 4GB[ad]. Even this one is a stretch if you collect a lot of information. A good option would be to split the stack over two devices: one for Elasticsearch and Kibana, and another one for the Logstash service
  2. Elastic has no builds for Raspbian. Hence, in the instructions below, I will use Debian packages and will describe the steps to install those on the Pi. This will require some custom configs and scripts, so be prepared for that. Well, this article is how to hack the installation and no warranties are provided 🙂
  3. You will not be able to use the ML functionality of Elasticsearch because it is not supported on the low-powered Raspbian device
  4. Below steps assume version 7.7.0 of the ELK stack. If you are installing a different version, make sure that you replace it accordingly in the commands below
  5. Last but not least (in the spirit of no warranties), Elasticsearch has a dependency on libc6that will be ignored and will break future updates. You have to deal with this at your own risk

Here the steps to follow.

Installing Elasticsearch on Raspberry Pi

  1. Set up your Raspberry Pi first. Here are the steps to set up your Rasberry Pi. The Raspberry Pi Imager makes it even easier to set up the OS. Once again, I would recommend using Raspberry Pi 4 Model B Quad Core 64 Bit with 4GB[ad] and a larger SD card[ad] to save logs for longer.
  2. Make sure all packages are up to date, and your Raspbian OS is fully patched.
    sudo apt-get update
    sudo apt-get upgrade
  3. Install the ZIP utility, we will need that later on for the Logstash configuration.
    sudo apt-get install zip
  4. Then, install the Java JRE because Elastic Stack requires it. You can use the open-source JRE to avoid licensing troubles with Oracle’s one.
    sudo apt-get install default-jre
  5. Once this is done, go ahead and download the Debian package for Elasticsearch. Make sure that you download the package with no JDK in it.
    wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.7.0-no-jdk-amd64.deb
  6. Once this is done, go ahead and install Elasticsearch using the package manager.
    sudo dpkg -i --force-all --ignore-depends=libc6 elasticsearch-7.7.0-no-jdk-amd64.deb
  7. Next, we need to configure Elasticsearch to use the installed JDK.
    sudo vi /etc/default/elasticsearch

    Set the JAVA_HOMEto the location of the JDK. Normally this is /usr/lib/jvm/default-java. You can also set the JAVA_HOMEto the same path in the /etc/environmentfile but this is not required.

  8. Last thing you need to do if to disable the ML XPack for Elasticsearch. Change the access mode to the /etc/elasticsearchdirectory first and edit the Elasticsearch configuration file.
    sudo chmod g+w /etc/elasticsearch
    sudo vi /etc/elasticsearch/elasticsearch.yml

    Change the xpack.ml.enabledsetting to falseas follows:

    xpack.ml.enabled: false

The above steps install and configure the Elasticsearch service on your Raspberry Pi. You can start the service with:

sudo service elasticsearch start

Or check its status with:

sudo service elasticsearch status

Installing Logstash on Raspberry Pi

Installing Logstash on the Raspberry Pi turned out to be a bit more problematic than Elasticsearch. Again, Elastic doesn’t have a Logstash package that targets ARM architecture and you need to install it manually. StackOverflow posts and GitHub issues were particularly helpful for that – I listed the two I used in the References at the end of this article. Here the steps:

  1. Download the Logstash Debian package from Elastic’s repository.
    wget https://artifacts.elastic.co/downloads/logstash/logstash-7.7.0.deb
  2. Install the downloaded package using the dpkg package installer.
    sudo dpkg -i logstash-7.7.0.deb
  3. If you run Logstash at this point and encounter error similar to logstash load error: ffi/ffi -- java.lang.NullPointerException: null get Alexandre Alouit’s fix from GitHub using.
    wget https://gist.githubusercontent.com/toddysm/6b4b9c63f32a3dfc476a725561fc23af/raw/06a2409df3eba5054d7266a8227b991a87837407/fix.sh
  4. Go to /usr/share/logstash/logstash-core/lib/jars and check the version of the jruby-complete-X.X.X.X.jarJAR
  5. Open the downloaded fix.sh, and replace the version of the jruby-complete-X.X.X.X.jaron line 11 with the one from your distribution. In my case, that was  jruby-complete-9.2.11.1.jar
  6. Change the permissions of the downloaded fix.shscript, and run it.
    chmod 755 fix.sh
    sudo ./fix.sh
  7. You can run Logstash with.
    sudo service logstash start

You can check the Logstash logs in /var/log/logstash/logstash-plain.logfor information on whether Logstash is successfully started.

Installing Kibana on Raspberry Pi

Installing Kibana had different challenges. The problem is that Kibana requires an older version of NodeJS, but the one that is packed with the Debian package doesn’t run on Raspbian. What you need to do is to replace the NodeJS version after you install the Debian package. Here the steps:

  1. Download the Kinabna Debian package from Elastic’s repository.
    wget https://artifacts.elastic.co/downloads/kibana/kibana-7.7.0-amd64.deb
  2. Install the downloaded package using the dpkg package installer.
    sudo dpkg -i --force-all kibana-7.7.0-amd64.deb
  3. Move the redistributed NodeJS to another folder (or delete it completely) and create a new empty directory nodein the Kibana installation directory.
    sudo mv /usr/share/kibana/node /usr/share/kibana/node.OLD
    sudo mkdir /usr/share/kibana/node
  4. Next, download version 10.19.0 of NodeJS. This is the required version of NodeJS for Kibana 7.7.0. If you are installing another version of Kibana, you may want to check what NodeJS version it requires. The best way to do that is to start the Kibana service and it will tell you.
    wget https://nodejs.org/download/release/v10.19.0/node-v10.19.0-linux-armv7l.tar.xz
  5. Unpack the TAR and move the content to the nodedirectory under the Kibana installation directory.
    sudo tar -xJvf node-v10.19.0-linux-armv7l.tar.xz
    sudo mv ./node-v10.19.0-linux-armv7l.tar.xz/* /usr/share/kibana/node
  6. You may also want to create symlinks for the NodeJS executable and its tools.
    sudo ln -s /usr/share/kibana/node/bin/node /usr/bin/node
    sudo ln -s /usr/share/kibana/node/bin/npm /usr/bin/npm
    sudo ln -s /usr/share/kibana/node/bin/npx /usr/bin/npx
  7. Configure Kibana to accept requests on any IP address on the device.
    sudo vi /etc/kibana/kibana.yml

    Set the server.hostsetting to 0.0.0.0like this:

    server.host: "0.0.0.0"
  8. You can run Kibana with.
    sudo service kibana start

Conclusion

Although not supported, you can run the complete ELK stack on a Raspberry Pi 4[ad] device. It is not the most trivial installation, but it is not so hard either. In the following posts, I will explain how you can use the Elastic SIEM to monitor the traffic on your network.

References

Here are some additional links that you may find useful:

For a while, I’ve been planning to build a cybersecurity research environment in the cloud that I can use to experiment with and research malicious cyber activities. Well, yesterday I received the following message on my cell phone:

Hello mate, your FEDEX package with tracking code GB-6412-GH83 is waiting for you to set delivery preferences: <url_goes_here>

My curiosity to follow the link was so tempting that I said: “Now is the time to get this sandbox working!” I will write about the scam above in another blog post, but in this one, I would like to explain what I needed in the cloud and how did I set it up.

Cybersecurity Research Needs

What (I think) I need from this sandbox environment? I know that my requirements will change over time when I get deeper in the various scenarios but for now, here is what I wanted to have:

  • First, and foremost, a dedicated network where I can click on malicious links and browse dark web sites without the fear that my laptop or local network will get infected. I also need to have the ability to split the network into subnets for different purposes
  • Next, I needed pre-built VM images for various purposes. Here some examples:
    • A Windows client machine to act as an unsuspicious user. Most probably, this VM will need to have Microsoft Office and other software like Acrobat Reader installed. Other more advanced software that will help track URLs and monitor process may also be required on this machine. I will go deeper into this in a separate post
    • Linux machine with networking tools that will allow me to better understand and monitor network traffic. Kali Linux may be the right choice, but Ubuntu and CentOS may also be helpful as different agents
    • I may also need some Windows Server and Linux server images to simulate enterprise environments
  • Very important for me was to be able to create the VM I need quickly, do the work I need, and tear it down after that. Automating the process of VM creation and set up was high up on the list
  • Also, I wanted to make sure that if I forget the VM running, it will be shut down automatically to save money

With those basic requirements, I got to work setting up the cloud environment. For my experiments, I choose Microsoft Azure because I have a good relationship with the Azure team and also some credits that I can use.

Segregating Network Access

As mentioned in the previous section, I need a separate network for the VMs to avoid any possibility of infecting my laptop. Now, the question that comes to mind is: Is a single virtual network with several subnets OK or not? Having in mind that I can destroy this environment at any time and recreate it (yes, this is the automation part), I decided to go with a single VNet and the following subnets in it:

  • Sandbox subnet to be used to spin up virtual machines that can simulate user behavior. Those will be either single VMs running Windows client and Microsoft Office or set of those if I want to observe the lateral movement within the network. I may also have a Linux machine with Wireshark installed on it to watch network traffic to the web sites that host the malicious pages
  • Honeypot subnet to be used to expose vulnerable devices to the internet. Those may be Windows Server Datacenter VMs or Linux servers without outdated patches and weaker or compromised passwords
  • Frontend subnet to be used to host exploit frameworks for red team scenarios. One example can be the Social Engineering Toolkit (SET). Also, simple redirection services or other apps can be placed in this subnet
  • Public subnet that is required in Azure if I need to set up any load balancers for the exploit apps

With all this in mind, I need to make sure that traffic between those subnets is not allowed. Hence, I had to set up a Network Security Group for each subnet to disable inbound and outbound VNet traffic.

Cybersecurity Virtual Machine Images

This can be an evergrowing list depending on the scenarios but below is the list of the first few I would like to start with. I will give them specific names based on the purpose I plan to use them for:

User VM

The purpose of the User VM is to simulate an office worker (think of receptionist, accountant or, admin). Typically such machines have a Windows Client OS installed on it as well as other office software like Word, Excel, PowerPoint, Outlook and Acrobat Reader. Different browsers like Edge, Chrome, and Firefox will be also useful to test.

At this point, the question that I need to answer is whether I would like to have any other software installed on this machine that will allow me to analyze the memory, reverse engineer binaries or, monitor network traffic on it. I decided to go with a clean user machine to be able to see the exact behavior the user sees and not impact it with the availability of other tools. One other reason I thought this would be a better approach is to avoid malware that checks for the existence of specialized software.

When I started building this VM image, I also had to decide whether I want to have any anti-virus software installed on it. And of course, the question is: “What anti-virus software?” My company is a Sophos Partner, and this would be my obvious choice. I decided to build two VM images to go with: one without anti-virus and one with.

User Analysis VM

This one is the same as the User VM but with added software for malware analysis. The minimum I need installed is:

  • Wireshark for network scanning
  • Cygwin for the necessary Linux tools
  • HEXDump viewer/editor
  • Decompiler

I am pretty sure the list will grow over time and I will keep a tap on what software I have installed or prefer to use. What my intent is to build Kali Linux type of VM but for Windows 🙂

Kali VM

This one is the obvious choice. It can be used not only for offensive capabilities but also to cover your identity when accessing malicious sites. It has all the necessary tools for hackers and it is available from the Microsoft Azure Marketplace.

Tor VM

One last VM type I would like to have is a machine on which I can install the Tor browser for private browsing. Similar to the Kali VM, it can be used for hiding the identity of the machine and the user who accesses the malicious sites. It can also be used to gain access to Dark Web sites and forums for cybersecurity research purposes.

Those are the VM images I plan for now. Later on, I may decide on more.

Automating the Security Research VM Creation

Ideally, I would like to be able to create a whole security research environment with a single script. While this is certainly possible, I will not be able to do it in an hour to load the above URL. I will slowly implement the automation based on my needs going forward. However, one thing that I will do immediately is to save regular snapshots of my VMs once I install new software. I will also need to version those.

I can use those snapshots to quickly spin up new VM with the required software if the previous one gets infected (ant this will certainly happen). So, for now, my automation will be only to create a VM from an Azure Disk snapshot.

Shutting Down the Security Research VMs

My last requirement was to shut down the security research VMs when I don’t need them. Like every other person in the world, I forget things, and if I forget the VMs running, I can incur some expenses that I would not be happy to pay. While I am working on full-fledged scheduling capability for Azure resources for customers, it is still in the works and I cannot use it yet. Hence, I will leverage the built-in Azure functionality for Dev/Test workloads and will schedule a daily shutdown at 10:00 PM PST time. This way, if I forget to turn off any VM, it will not continue to run all the time.

With all this, my plan is ready and I can move on to build my security research environment.

In my previous post What to Desire from a Good Image Annotator?, I wrote about the high-level capabilities of an Image Annotation Tool. In this one, I will go over the requirements for the actual image annotations or as you may also know it, tagging. I will use two images as examples. The first one is a scanned receipt. The receipt example can be used to generalize the broader category of scanned documents, whether financial, legal, or others. The second example is of a cityscape. That one can be used to generalize any other image.

Annotating Store Receipt

Let’s start with the receipt. A receipt is a scanned document that contains financial information. Below is just one way that you may want to annotate a receipt.

Annotated receipt

In this example, I have decided to annotate the receipt using the logical grouping of information printed on it. Each region is a rectangle that contains the part of the image that belongs together. Here is the list of regions and their possible annotations:

  • Region ID: 1
    Annotation: Store Logo
    Description: This can be the store logo or just the name printed on the receipt
  • Region ID: 2
    Annotation: Store Details
    Description: This can include information like address, phone number, store number, etc.
  • Region ID: 3
    Annotation: Receipt Metadata
    Description: This can be the date and time, receipt number as well as another receipt specific metadata
  • Region ID: 4
    Annotation: Cashier Details
    Description: This is information about the cashier
  • Region ID: 5
    Annotation: Items
    Description: Those are the purchased items, the quantities and the individual item price
  • Region ID: 6
    Annotation: Receipt Summary
    Description: This is the summary of the information for the purchase like subtotal amount, tax and the total amount
  • Region ID: 7
    Annotation: Customer Information
    Description: This is information about the customer and any loyalty programs she or he participates to
  • Region ID: 8
    Annotation: Merchant Details
    Description: This is additional information about the merchant
  • Region ID: 9
    Annotation: Transaction Type
    Description: This is information about the transaction
  • Region ID: 10
    Annotation: Transaction Details
    Description: This contains information about the transaction with the payment card processor. It can include transaction ID, the card type and number, timestamp, authorization code, etc.
  • Region ID: 11
    Annotation: Transaction Amounts
    Description: This summarizes the amounts for the transaction with the payment card processor
  • Region ID: 12
    Annotation: Transaction Status
    Description: This is the status of the transaction – i.e., Approved or Declined
  • Region ID: 13
    Annotation: Transaction Info
    Description: Those are technical details about the transaction
  • Region ID: 14
    Annotation: Copy Owner
    Description: This is information about the ownership of the receipt. Usually, this is Merchant or Customer
  • Region ID: 15, 16, and 17
    Annotation: Additional Details
    Description: Those can be various things like return policies, disclaimers, advertisement, surveys, notes, and so on. In this example, we have 15 as Return Policy, 16 as Survey and 17 as Additional Notes

When you think about it, the above areas will be the ones that your eyesd will immediately look to find information. For example, if you want to know what store the receipt was from, you will directly look at the top where the logo should be (Region #1); if you want to know what the total amount is, your eyes will steer towards the receipt summary (Region #6) and so on. Majority of us will follow a similar approach for separating the data because it is something that we do every day in our minds.

Few things to note about the annotations above. First, not every receipt will have all the information from above. Some receipts will have more and some less. Second, annotations evolve. After annotating a certain number of receipts, you start building a pattern and make fewer changes the more you annotate. However, after some time, you may discover that the patterns you developed need to be updated. A straightforward example is a better name for the annotation. If this happens, you need to go back and change the names. Third, there is no standard way to name those annotations. You and I will undoubtedly have different names for the same thing.

Now, let’s write a few requirements from this receipt example.

  1. The first thing we did is to draw the rectangular regions that we want to annotate. And this is our first and simplest requirement.
  2. The second thing we did is to annotate the rectangular region. When we create the annotation, we should be able to add additional information like description of the annotation
  3. The third thing we want is to be able to update annotation information retrospectively.

Those are good as a beginning. But to provide more context and backup our requirements, it will be useful to think about how those annotations will be used, i.e., define our use cases. I kind of hinted to those above.

Use Case #1: Logo Recognition

Let’s say; you are developing classification application that is used to recognize the store the receipt if from. You can easily do this by looking at the store logo only and develop a machine learning algorithm that returns the name of the store by recognizing the logo. For this, the only region you will need is Region 1 with the logo. Thus, you can just cut this region from the receipt and teach your algorithm only on the logo. That way you minimize the noise from the rest of the receipt and your algorithm can have better accuracy.

Use Case #2: Receipt Amount Extraction

If your application needs to extract the summary amounts from the receipt, you can concentrate on Region 6. That region contains all the information you will need. Few things you can do with this region are:

  • Binarize the area
  • Straighten the text
  • OCR the text
  • Analyze the extracted text (not an image related task anymore:))

This use case is applicable for any other are you annotated on the receipt. It doesn’t matter whether you want to obtain the credit card number or the timestamp; the approach will be the same.

Nested Annotations

Now, let’s look at another way to annotate the same receipt.

Annotated Receipt

If your application needs to determine what are your shopping habbits based on geography, you will need to extract detailed information about the store location. Thus, you will want to annotate the receipt as above to know which part is the street address, which is the city, etc. But those regions are all nested in Region 2 from our first annotation pass. It will be useful to have both types of annotations and use them for different use cases.

So, the requirements for the tool will be:

That is also very relevant in the next example, where we have areas with buildings but also want to annotate a single building.

Annotating Cityscapes

Annotating landscapes, cityscapes or other images with real objects is very similar to the receipt annotation. However, real objects rarely have regular shapes in pictures. Here is an example from a picture I took in Tokyo some time ago.

Annotated Cityscape

In this example, I have annotated only a few of the objects: two buildings (1 and 2), a crane (3), soccer field (4) and a tree (5). The requirements for annotating landscapes are not too different from the requirements for annotating documents. There is just one more thing we need to add to the tool to support real object tagging:

There are many use cases that you can develop for real-object recognition, and for that, versatile annotation capabilities will be important in any tool.

Additional Requirements for Annotations

All requirements that I have listed above are specific to the objects or areas in the pictures. However, we need to have the ability to add meta information to the whole picture. Well, you may think we already have a way to do that! We can use the EXIF data. The EXIF data is helpful, and it is automatically populated by the camera or the editing tool. However, it has limited capabilities for free-form meta-information because its fields are standardized.

For example, if you want to capture information who annotated the image last and at what time, you cannot use the EXIF fields for that. You can repurpose some EXIF fields, but you will lose the the original information. What we need is a simple way to create key-value metadata for the image. Of course, having the ability to see the EXIF information would be a helpful feature, although maybe not a high priority one.

With all that, I believe we have enough requirements to start working on tool design. If you are curious to follow the development or participate in it, you can head over to the Image Annotator Github project. The next thing we need to do is to do some design work. That includes UI design, back-end design, and data model.

In my last post, I demonstrated how easy it is to create fake accounts on the major social networks. Now, let’s take a look at what can we do with those fake social network accounts. Also, let’s not forget that my goals here are to penetrate specific individual’s home network (in this case my own family’s home network) and gain access to files and passwords. To do that, I need to find as much information about this individual as possible, so I can discover a weak link to exploit.

1. Search Engines

The first and most obvious way to find anything on the web is, of course, Google. Have you ever Googled yourself? Search for “Toddy Mladenov” on Google yields about 4000 results (at the time of this writing). The first page gives me quite a few useful links:

  • My LinkedIn page
  • Some pages on sys-con.com (must be important if it is so highly ranked by Google). Will check it later
  • Link to my blog
  • Link to my Quora profile
  • Link to a book on Amazon
  • Two different Facebook links (interesting…)
  • Link to a site that can show me my email and phone number (wow! what more do I need 😀)
  • And a link to a site that can show me more about my business profile

The last two links look very intriguing. Whether we want it or not, there are thousands of companies that scrape the Web for information and sell our data for money. Of course, those two links point to such sites. Following those two links I can learn quite a lot about myself:

  • I have (at least) two emails – one with my own domain toddysm.com and one with Gmail. I can get two emails if I register for free on one of the sites or start a free trial on the other (how convenient 😀)
  • I am the CTO co-founder of Agitare Technologies, Inc
  • I get immediate access to my company’s location and phone number (for free and without any registrations)
  • Also, they give me more hints about emails t***@***.com (I can get the full email if I decide to join the free trial)
  • Clicking on the link to the company information additional information about the company – revenue, website as well as names of other employees in the company

OK, enough with those two sites. In just 5 minutes I got enough to start forming a plan. But those links made me curious. What will happen if I search for “Toddy Mladenov’s email address” and “Toddy Mladenov’s phone number” on Google? That doesn’t give me too much but points me to a few sites that do a specialized search like people search and reverse phone lookup. Following some links, I land on truepeoplesearch.com from where I can get my work email address. I also can learn that my real name is Metodi Mladenov. Another site that I learned about recently is pipl.com. A search on it, not surprisingly, returns my home address, my username (not sure for what) as well as the names of relatives and/or friends.

Doing an image search for “Toddy Mladenov” pulls out my picture as well as an old picture of my family.

Let’s do two more simple searches and be done with this. “Metodi Mladenov email address” and “Metodi Mladenov phone number” give my email addresses with Bellevue College and North Seattle College (NSC). I am sure that the NSC email address belongs to the target person because the picture matches. For the Bellevue College one, I can only assume.

Here is useful information I gathered so far with those searches:

  • Legal name
  • 3 email addresses (all work-related, 1 not confirmed)
  • 2 confirmed and 1 potential employer
  • Home address
  • Company name
  • Company address
  • Company phone number
  • Titles (at different employers)
  • Personal web site
  • Social media profiles (Facebook and LinkedIn so far)

Also, I have hints for two personal emails – one on toddysm.com and one on Gmail.

2. Social Media Sites

As you can assume, social media sites are full of information about you. If by now, you didn’t realize it, their business model is to sell that information (it doesn’t matter what Mark Zuckerberg tries to convince you). Now, let’s see what my fake accounts will offer.

Starting with Facebook, I sign in with my fake female account (SMJ) and type the URL to my personal Facebook page that Google returned. What I saw didn’t make me happy at all. Despite my ongoing efforts to lock down my privacy settings on Facebook, it turns out that anyone with a Facebook account can still see quite a lot about me. Mostly it was data from 2016 and before but enough to learn about my family and some friends. Using the fake account, I was able to see pictures of my family and the conversations that I had with friends around the globe. Of course, I was not able to see the list of my friends, but I was still able to click on some names from the conversations and photos and get to their profiles (that is smart, Facebook!). Not only that but Facebook was allowing the fake user to share these posts with other people, i.e., on her timeline (yet another smartie from the Facebook developers).

I was easily able to get to my wife’s profile (same last name, calling me “dear” in posts – easy to figure it out). From the pictures, I was able to see on my and my wife’s profile, I was also able to deduct that we have three daughters and that one of them is on Facebook, although under different last name. Unlike her parents (and like many other millennials), my daughter didn’t protect her list of friends – I was able to see all 43 of them without being her “friend.”

So far Facebook gave me valuable information about my target:

  • Family pictures
  • Wife’s name
  • Teenage daughter’s name
  • A couple of close friends

I also learned that my target and his wife are somehow privacy conscious but thanks to the sloppy privacy practices that Facebook follows, I am still able to gather important intelligence from it.

As we all know, teens are not so much on Facebook but on Instagram (like there is a big difference). Facebook though was nice enough to give me an entry point to continue my research. On to Instagram. My wife’s profile is private. Hence, I was not able to gather any more information about her from Instagram. It was different the case with my daughter – with the fake account I was able to see all her followers and people she follows. Crawling hundreds of followers manually is not something I want to do so, I decided to leave this research for later. I am pretty sure; I will be able to get something more out of her Instagram account. For now, I will just note both Instagram accounts and move on.

Next is LinkedIn. I signed in with the male (RJD) fake account that I created. The first interesting thing that I noticed is that I already had eight people accepting my invite (yes, the one I send by mistake). So, eight people blindly decided to connect with a fake account without a picture, job history details or any other detailed information whatsoever! For almost all of them I was able to see a personal email address, and for some even phone, and eventually address and birth date -pretty useful information, isn’t it? That made me think, how much data can I collect if I build a complete profile.

Getting back on track! Toddy Mladenov’s LinkedIn page gives me the whole work history, associations, and (oops) his birth date. The information confirmed that the emails found previous are most probably valid and can easily be used for a phishing campaign (you see where am I going with this 😀). Now, let’s look what my wife’s page can give me. I can say that I am really proud of her because contact info was completely locked (no email, birthdate or any other sensitive information). Of course, her full resume and work history were freely available. I didn’t expect to see my daughter on LinkedIn, but it turned out she already had a profile there – searching for her name yielded only two results in the US and only one in the area where we live. Doing some crosscheck with the information on Instagram and Facebook made me confident that I found her LinkedIn profile.

3. Government Websites

Government sites expose a lot of information about businesses and their owners. The Secretary of State site allows you to search any company name, and find not only details like legal name, address, and owners but also the owners’ personal addresses, emails and sometimes phone numbers. Department of Revenue web site also gives you the possibility to search for a company by address. Between those two, you can build a pretty good business profile.

Knowing the person’s physical address, the county’s site will give you a lot of details about their property, neighborhood, and neighbors (names are included). Google Maps and Zillow are two more tools that you can use to build a complete physical map of the property – something that you can leverage if you plan to penetrate the WiFi network (I will come back to this in one of my later posts where we will look at how to break into the WiFi network).

Closing Thoughts

I have always assumed that there will be a lot of information about me on the Internet, but until now I didn’t realize the extent to which this information can be used for malicious purposes. The other thing that I realized is that although I tried to follow the guidelines from Facebook and other social media sites to “protect” my information, it is not protected enough, which gives me no confidence that those companies will ever take care of our privacy (doesn’t matter what they say). Like many other people nowadays, I post things on the Internet without realizing the consequences of my actions.

Couple of notes for myself:

  • I need to figure out a way to remove from the web some of the personal information like emails and phone numbers. Not an easy task, especially in the US
  • Facebook, as always fails the privacy check. I have to find a way to hide those older posts from non-friends
  • I also need to educate my family on locking down their social media profiles (well, now that I have that information, I don’t want other people to use it)
  • Hide the birthday from my LinkedIn page

In my next post, I will start preparing a phishing campaign with the goal to discover my home network’s router IP. I will also try to post few suggestions, how you can restrict the access to information from your social media profiles – something that I believe, everyone should do.

Disclaimer: This post describes techniques for online reconnaissance and cyber attacks. The activities described in this post are performed with the consent of the impacted people or entities and are intended to improve the security of those people or entities as well as to bring awareness to the general public of the online threats existing today. Using the above steps without consent from the targeted parties is against the law. The reader of this post is fully responsible for any damage or legal actions against her or him, which may result from actions he or she has taken to perform malicious online activities against other people or entities.

 

 

In my previous post, How Can I Successfully Hack My Home Network? I set the stage for my “Hacking my Home” activities. A possible scenario here is that I am given the task to penetrate a high-profile target’s (i.e., myself 😀) home network and collect as much information to use for malicious purposes. Before I start doing anything though, I need to prepare myself.

Let’s think about what will I need for my activities. Laptop, Internet connection… Well, (almost) everybody has those nowadays. Let’s look at some not so obvious things.

First, I will need some fake online identities that I can use to send emails or connect with people close to my target. For this, of course, I will need to have one or more email addresses. In the past, creating a Gmail or Hotmail address was trivial. Now though, even Yahoo requires you to have a phone number for verification. That makes things a bit more complicated, but not impossible. A quick search on Google or Bing will give you a good list of burner phones that you can buy for as low as $10 at retailers like Fry’s or Wallmart. However, one good option is the Burner app – for just $5/month you can have unlimited calls, texts, and monthly number swaps. There are many uses of it:

  • Use it for general privacy – never give your real phone number on the web or anywhere else (just this is worth the money)
  • Use it for registering new email addresses or social media accounts
  • Make phone calls to avoid Caller ID (we all get those from marketers and scammers)
  • Most importantly, send text messages to avoid Caller ID

If you only need a masked phone, you can also use the Blur app from Abine. There are many other options if you search the web but be careful with the reputation. I decided to try both, Burner and Blur in my next step.

Armed with burner phone numbers, the next step is to create email addresses that I can use for various registrations. Yahoo emails are frowned upon; that is why I decided to go with Gmail. I will need to think of creating a complete online personality that I can use to send emails, engage in conversations, etc. Without knowing what will work, creating a male and female personality will be helpful.

The first step is to choose some trustworthy names. Although fancy, names like North West and Reality Winner 😀 will look more suspicious than traditional English names like John or Mary. Not surprisingly, a simple Google search gives you thousands of pages with names that inspire trustworthiness. So, I choose some of those to name my two personalities. Because I don’t want those to be known, I will use the first initials from now on to differentiate between the male (RJD) and the female (SMJ). Now, let’s go and create some e-mail addresses and social media profiles.

One complication while creating Gmail accounts is the verification process. Google requires your phone number for account creation, and after typing any of the burner phone numbers, I got an error message that the number cannot be used for verification (kind of a bummer, I thought). Interestingly though, after the initial verification (for which I used my real cell phone number), I was able to change the phone number AND verify it using the burner ones.

Next is Facebook. Despite all the efforts Facook is putting to prevent fake accounts, creating a Facebook account doesn’t even require text verification. With just an email, you can easily go and create one. Of course, if you want to change your username, you will be required to verify your account via mobile phone. The burner phones work like a charm with Facebook.

After Facebook, creating an Instagram account is a snap. Picking up a few celebrities to follow is, of course, essential to building up the profile. Be selective when you select people to follow from the suggestion list, you don’t want to look like a bot by following every possible person. Later on, after I gather more intelligence on my targets’ interests, I will tweak the following base to match better their interests, but for now, this will be enough to start building a profile and also getting access to information that is behind the Facebook and Instagram walls.

The Twitter registration process is as simple as Instagram. You don’t need a mobile phone for Twitter – e-mail is enough for the confirmation code. Once again, choose a few select accounts to follow to start building your online profile and reputation.

Creating LinkedIn account is a bit more elaborate. Similar to Gmail, LinkedIn requires real phone number for verification – both burner phones failed here. Of course, LinkedIn doesn’t immediately ask you to add your phone number for notifications. However, their account creation wizard required at least one job title and company. Not unexpected, this stumbled me a bit because I had to think what will be the best title and company to choose from. LinkedIn has one of the cripiest experiences for account creation. The most scary part for me was the ability to pull all my contacts from… well somewhere (I will need to dig into that later on). Nevertheless, my recommendation is to do the registration on a burner laptop that has none of your personal information on it. The next crippy thing was the recommendations – I got some high profile people recommended to me and of course I clicked on a bunch before I realize that it sends invitations to those people. This could have blown my cover because it recommended mostly people working in the company I chose as my employer. Surprisingly to me though just a minute after, I had a request accepted, so I decided that maybe I should not be worried so much. One important thing you need to do on your LinkedIn profile is to change the Profile viewing options in your Privacy settings to Private mode. If you do not do that, people whose profiles you look up will get notified who are you, and you want to avoid that.

Although some people may accuse me in profiling, I have to say that I chose to use the female account for Facebook, Instagram and Twitter and the male for LinkedIn initially. The reason is that I didn’t want to spent time doing all those registrations before I know more about the targets.

Now that I have all those fake accounts created, I can move to the next step: doing some online research about my target. To do that, I don’t need full profiles established but I will try to generate some online activity in the mean time to make those accounts more credible.

In my next post, I will demonstrate how easy it is to gather basic social engineering intelligence using the above accounts as well as free web sites. Stay tuned…

Disclaimer: This post describes techniques for online reconnaissance and cyber attacks. The activities described in this post are performed with the consent of the impacted people or entities and are intended to improve the security of those people or entities as well as to bring awareness to the general public of the online threats existing today. Using the above steps without consent from the targeted parties is against the law. The reader of this post is fully responsible for any damage or legal actions against her or him, which may result from actions he or she has taken to perform malicious online activities against other people or entities.

This morning I was looking at our company’s e-mail gateway and cleaning some of the quarantined messages when I got reminded that while my company’s digital infrastructure may be well protected with firewalls and e-mail gateways, my home network can be wide open and vulnerable to attacks. Like everyone else, I try not to spend too much time configuring my home network and rely on my “ISP to take care of it.” Well, this is a silly approach because the ISPs don’t care about our cybersecurity. It took me hours on the phone, two bricked routers and a house visit (for which I paid of course) to convince mine to replace their outdated router with a simpler gateway device so that I can use my own Eero as the main router and Wi-Fi access point. However, replacing an old router is not something that will solve my cybersecurity issues. Hence, I decided to stop procrastinating and make the first steps to execute on my idea to do some penetration testing on my home network. You will be able to find all my steps and (successful and failing) attempts in the series of Hack My Home post, so let’s get started.

The first thing I need to start with is to decide what my goals are. The best way to do that is to put myself in the hacker’s shoes. If I am a black-hat hacker who wants to attack an Ordinary Joe, what would I like to get from him? Here are a few things that come to mind:

  • Like many of you, I have a file server or NAS device at home, where my family stores a lot of information. Pictures, tax returns, scanned personal documents and what else. Having access to this information may turn beneficial. Hacker’s goal #1: Get access to the file share!
  • Having access to personal information may be useful, but if I am looking for fast money or a way to do bigger damage, harvesting credentials may turn out better. There is a good chance I can find some passwords in a text file on the file share, but because I don’t save mine in plain text, I need to look for other options. Hacker’s goal #2: Steal a password of a family member!

Here is the moment for a disclaimer. Because this is my home, I believe, I have full authority to hack into my devices. If I discover device vulnerability, I will follow the responsible disclosure practice and will need to delay any posts that describe the approach of breaking into the device. Regarding the second goal, stealing a password, I have full (verbal) consent from my family to do that. I also have full access to almost all of their passwords, so I don’t consider this an issue. However, if you are planning to follow my steps, please make sure that you get consent from your family – they may not be so receptive to the idea.

Next, are some assumptions. The biggest one is to assume no knowledge of my home network. Initially, I thought I should start with a diagram of my network, but this will assume I know the details. What I need to do is to get to the details from the outside using public information. If you think about it, the information that hackers can easily (and legally) obtain is the following:

  • Domain name
  • IP address
  • Email address
  • Home address
  • Phone number
  • Social media profiles

This is an excellent set of starting points, isn’t it? Some of those things may be easier obtained than others. Hence, I will need to do some research online to figure out everything I need. I will walk through each step in separate posts. For now, let’s figure out the ways I can digitally break into my home and define some simple next steps.

If I know the IP address of my router, I may be able to attack my home remotely over the Internet. For this though, I need to figure out the IP address of my router. So, one of my next steps would be to figure out an approach to do that.

If I know the location of my home, I may try to attack my home Wi-Fi network and break through it. That will be a little more complicated approach because it will require for me to be close to my home and to use some specialized devices. There may be other wireless devices in my home that I may be able to get to but those will again require some proximity to the home to exploit.

Of course, for my testing purposes, I would like to explore both approaches, but I will need to start with one of them. Because I think the remote exploit higher chance to happen, I would start from there. As a next step, I would need to figure out the entry point for my home from the open Internet, i.e., I need to figure out my router’s IP address.

In my next post, I will walk you through my thought process, the steps and the tools I can use to obtain my home’s IP address.

 

Recently, I started looking for an image annotation tool that we can use to annotate a few thousand images. There are quite a few that you can choose from. Starting with paid ones like Labelbox and Dataturks, to free and open source ones like coco-annotator, imglab and labelimg. The paid ones have also a free tier that is limited but can give you a sense of their capabilities.

Each one of those tools can be used to create an image dataset, but without going into details in each one of them, here are the things we were looking to have to consider an image annotation tool fit for our needs:

  • First, and foremost – usability. I like nice UI, but it shouldn’t take too much time and clicks to get to the core functionality of the tool – i.e., annotating images. Navigating within the annotator, creating projects and adding images should be smooth and easy. Users shouldn’t be tech savvy to set up and start using the annotation tool.
  • Second is security. I wouldn’t put this one so high on the list if I didn’t need to deal with sensitive data. Unfortunately, even some of the commercially available annotation tools neglected basic security features like encryption. Annotating healthcare images like x-rays or financial documents on such platforms will be out of the question. Also, detailed privacy and data handling policies are crucial for handling sensitive images.
  • The third is scalability. The stand-alone annotation tools are limited to the resource of the machine they are running on, but the hosted ones lack detailed information about the limits they impose on the various plans. In addition to the number of projects and users, which are most often quoted in the commercial plans, the number of images per dataset or the total amount of storage available will be good to know.
  • Fourth is versatility. And, with versatility, I don’t mean only the ways one annotates images (polygons vs. rectangles for example) but also export formats like COCO, PASCAL VOC, YOLO, etc.; ability to choose different storage backends like files, AWS S3, Azure Storage or Dropbox and so on. Like everything else in technology, there is no single standard, and a good image annotation tool should satisfy the needs of different audiences.
  • The fifth is collaboration. An image annotator should allow many (if not thousands of) people to collaborate on the project. Datasets like PASCAL VOC or COCO consists of hundreds of thousands of images and millions of annotations, work that is beyond the scale of a single person or small team. Enabling lightweight access to external collaborators is crucial to the success of such a tool.

OK! Let’s be a little bit more concrete about what requirements I have for an image annotation tool.

Image Annotation Tools Usability

Before we go into the usability requirements, let’s look at the typical steps a user must go through to start annotating images:

  1. Select the tool or the service
  2. Install the tool or sign up to the service
  3. Create a project or data set
  4. Chose data storage location (if applicable)
  5. Determine users and user access (if applicable)
  6. Upload image
  7. Annotate image

To select the tool and the service, users need to be clear about what they are getting. Providing a good description of the features the annotation tool offers is crucial for the successful selection. Some developers rely on trials, but I would like to save my time for registering and installation if I know upfront the tool will not work for me.

If the tool is a stand-alone tool, its installation should not be a hassle. A lot of the open source tools rely on the tech savviness of their users, which can be an obstacle. Besides, I am reluctant installing something on my machine if it will turn out not what I need. Trials and easy removal instructions are crucial.

One of the biggest usability issues I saw in the tools are the convoluted flows for creating projects and datasets. Either bugs or unclear flows resulted in a lot of frustration when trying out some of the tools. For me it is simple – it should follow the ages old concept of files and folders.

Being clear where the data is stored and how to access it or get it out of there is important. For some of the tools I have tested, it took a while to understand where the data is; others used some proprietary structures and (surprisingly) no export capabilities.

Adding users and determining access is always a hassle (and not only in image annotation tools). Still, there should be a clear workflow for doing that as well as opening the dataset to the public if needed.

Although I may have an existing dataset, I may want to add new images to it – either one by one or in bulk. This should be one of the most prominent flows in the annotation tool. At any point in the UI I should be able to easily upload a single or multiple files.

For me, the annotation UI should mimic the UI of the traditional image editing software like Adobe Photoshop. You have menus on the top, tools on the left, working are in the middle and properties on the right. Well, it may be boring or not modern but it is familiar and intuitive.

Securing Annotated Images

We deal with scanned financial documents that can contain highly sensitive information like names, addresses, some times credit card details, account numbers or even social security numbers. Some of our customers would like to have a tool that allows them to annotate medical images like x-rays – those images can also contain personal information in their metadata (if, for example DICOM format is used).

Unless the annotation tool is a standalone tool that people can install on their local machine, using secure HTTPS is a no-brainer and the least you can do from security point of view (surprisingly some of the SaaS services lacked even here). However, security goes far beyond that. Things that should be added are:

  • Encrypting the storage where the annotated images are stored. Hosted or self-managed keys should be allowed.
  • Proper authentication mechanisms should be added. Multi-Factor-Authentication should be used for higher security.
  • Good Role Based Access Control (RBAC) should be implemented. For example some people should be able to just view the annotated images, while others to annotate and edit those.
  • Change logs should be kept as part of the application. For example, will be important to know who certain annotation and whether it was correct or not.

Scalability for Image Annotators

A good dataset can contain hundreds of thousands of images – the COCO dataset for 2017 has 118K images in its training dataset. Depending on the quality of the images, the storage needed to store those can vary from 10s of GB to 100s of GB to PB and more. Having an ability to grow the storage is essential to the success of an image annotation tool.

On the users side, a dataset of hundred thousand images may require hundreds of people to annotate. Being able to support large userbase without a huge impact on the cost is also important (hence, the user-based licensing may not be the best option for such a SaaS offering because a single user may annotate only 2-3 images from the whole dataset).

The back-end or APIs handling the annotation of the images should also be able to scale to the number of images and users without problems.

Versatile Export Options for Annotated Images

Rarely the image annotation tool is tightly coupled with the machine learning system that will use the images. Also, the same annotations can be used by various teams using different systems to create the machine learning models. Clear explanation of the format used to store the annotations is a must-have but also the ability to export the annotations in common formats will be essential for the success and usefulness of the tool.

The word “export” hear may mean different things. It doesn’t always need to be download the images and annotations in the desired format but simply saving the annotations in this format.

I would start with defining a versatile format for storing the image annotations and then offer different “export” options, whether for download or just conversion in the original storage.

Collaborating While Annotating Images

Having a single person create an image dataset with hundreds of thousands of images is unrealistic. Such a task requires the collaboration of many people who can be spread around the world. Having the ability to not only give them access to annotate the images but also to comment and give suggestions to already existing annotations is a feature that should be high on the priority list.

Annotations, like software, are not free of bugs. Hence, the image annotation tool should allow for collaboration similar to what modern software development tools enable. This may not be V1 feature but should certainly come soon after.

Now, that I have a good idea what we would like to have from an image annotation tool, it is time to think of how to implement one that incorporates the above mentioned functionality. In the next post, I will look at what we would like to annotate and how to approach the data model for annotations.