Scraping Images from Google Search Results using Python

This articles relies on the code written by Fabian Bosler:

I’ve only modified Bosler’s code to make it a bit easier to pull images for multiple search terms.

The full code can be found the series’ Github repository:

Magic Symbols

As I’ve mentioned in my previous article, I needed a lot of images of magic symbols for training a deep convolutional generative adversarial network (DCGAN). Luckily, I landed on Bosler’s article early on.

To get my images, I used Chrome browser, Chromedriver, Selenium, and a Python script to slowly scrape images from Google’s image search. The scraping was done throttled to near human speed, but allowed automating the collection of a lot of images.

Regarding this process, I’ll echo Bosler, I’m in no way a legal expert. I’m not a lawyer and nothing I state should be taking as legal advice. I’m just some hack on the internet. However, from what I understand, scraping the SERPs (search engine results pages) is not illegal, at least, not for personal use. But using Google’s Image search for automated scraping of images is against their terms of service (ToS). Replicate this project at your own risk. I know when I adjusted my script to search faster Google banned my IP. I’m glad it was temporary.

Bosler’s Modified Script

The script automatically searches for images and collects their underlying URL. After searching, it uses the Python requests library to download all the images into a folder named respective to the search term.

Here are the modifications I made to Bosler’s original script:

  • Added a search term loop. This allows the script to continue running past one search term.
  • The script was getting stuck when it ran into the “Show More Results,” I’ve fixed the issue.
  • The results are saved in directories associated with the search term. If the script is interrupted and rerun it will look at what directories are created first, and remove those from the search terms.
  • I added a timeout feature; thanks to a user on Stack Overflow.
  • I parameterized the number of images to look for per search term, sleep times, and timeout.

Code: Libraries

You will need to install Chromedriver and Selenium–this is explained well in the original article.

You will also need to install Pillow–a Python library for managing images.

You can install it with:

pip install pillow

After installing all the needed libraries the following block of code should execute without error:

import os
import time

import io
import hashlib
import signal
from glob import glob
import requests

from PIL import Image
from selenium import webdriver

If you have any troubles, revisit the original articles setup explanation or feel free to ask questions in the comments below.

Code: Parameters

I’ve added a few parameters to the script to make use easier.

number_of_images = 400

output_path = "/path/to/your/image/directory"

The number_of_images tells the script how many images to search for per search term. If the script runs out of images before reaching number_of_images, it will skip to the next term.

GET_IMAGE_TIMEOUT determines how long the script should wait for a response before skipping to the next image URL.

SLEEP_BETWEEN_INTERACTIONS is how long the script should delay before checking the URL of the next image. In theory, this can be set low, as I don’t think it makes any requests of Google. But I’m unsure, adjust at your own risk.

SLEEP_BEFORE_MORE is how long the script should wait before clicking on the “Show More Results” button. This should not be set lower than you can physically search. Your IP will be banned. Mine was.

Code: Search Terms

Here is where the magic happens. The search_terms array should include any terms which you think will get the sorts of images you are targeting.

Below are the exact set of terms I used to collect magic symbol images:

search_terms = [
    "black and white magic symbol icon",
    "black and white arcane symbol icon",
    "black and white mystical symbol",
    "black and white useful magic symbols icon",
    "black and white ancient magic sybol icon",
    "black and white key of solomn symbol icon",
    "black and white historic magic symbol icon",
    "black and white symbols of demons icon",
    "black and white magic symbols from book of enoch",
    "black and white historical magic symbols icons",
    "black and white witchcraft magic symbols icons",
    "black and white occult symbols icons",
    "black and white rare magic occult symbols icons",
    "black and white rare medieval occult symbols icons",
    "black and white alchemical symbols icons",
    "black and white demonology symbols icons",
    "black and white magic language symbols icon",
    "black and white magic words symbols glyphs",
    "black and white sorcerer symbols",
    "black and white magic symbols of power",
    "occult religious symbols from old books",
    "conjuring symbols",
    "magic wards",
    "esoteric magic symbols",
    "demon summing symbols",
    "demon banishing symbols",
    "esoteric magic sigils",
    "esoteric occult sigils",
    "ancient cult symbols",
    "gypsy occult symbols",
    "Feri Tradition symbols",
    "Quimbanda symbols",
    "Nagualism symbols",
    "Pow-wowing symbols",
    "Onmyodo symbols",
    "Ku magical symbols",
    "Seidhr And Galdr magical symbols",
    "Greco-Roman magic symbols",
    "Levant magic symbols",
    "Book of the Dead magic symbols",
    "kali magic symbols",

Before searching, the script checks the image output directory to determine if images have already been gathered for a particular term. If it has, the script will exclude the term from the search. This is part of my “be cool” code. We don’t need to be downloading a bunch of images twice.

The code below grabs all the directories in our output path, then reconstructs the search term from the directory name (i.e., it replaces the “_”s with “ “s.)

dirs = glob(output_path + "*")
dirs = [dir.split("/")[-1].replace("_", " ") for dir in dirs]
search_terms = [term for term in search_terms if term not in dirs]

Code: Chromedriver

Before starting the script, we have to kick off a Chromedriver session. Note, you must put the chromedriver executable into a folder listed in your PATH variable for Selenium to find it.

For MacOS users, setting up Chromedriver for Selenium use is a bit tough to do manually. But, using homebrew makes it easy.

brew install chromedriver

If everything is setup correctly, executing the following code will open a Chrome browser and bring up the Google search page.

wd = webdriver.Chrome()

Code: Chrome Timeout

The timeout class below I borrowed from Thomas Ahle at Stack Overflow. It is a dirty way of creating a timeout for the GET request to download the image. Without it, the script can get stuck on unresponsive image downloads.

class timeout:
    def __init__(self, seconds=1, error_message="Timeout"):
        self.seconds = seconds
        self.error_message = error_message

    def handle_timeout(self, signum, frame):
        raise TimeoutError(self.error_message)

    def __enter__(self):
        signal.signal(signal.SIGALRM, self.handle_timeout)

    def __exit__(self, type, value, traceback):

Code: Fetch Images

As I’ve hope I made clear, the code below I did not write; I just polished it. I’ll provide a brief explanation, but refer back to Bosler’s article for more information.

Essentially, the script:

  1. Creates a directory corresponding to a search term in the array.
  2. It passes the search term to the fetch_image_urls(), this function drives the Chrome session. The script navigates the Google to find images relating to the search term. It stores the image link in an list. After it has searched through all the images or reached the num_of_images it returns a list (res) containing all the image URLs.
  3. The list of image URLs is passed to the persist_image(), which then downloads each one of the images into the corresponding folder.
  4. It repeats steps 1-3 per search term.

I’ve added extra comments as a guide:

def fetch_image_urls(
    query: str,
    max_links_to_fetch: int,
    wd: webdriver,
    sleep_between_interactions: int = 1,
    def scroll_to_end(wd):
        wd.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Build the Google Query.
    search_url = "{q}&oq={q}&gs_l=img"

    # load the page

    # Declared as a set, to prevent duplicates.
    image_urls = set()
    image_count = 0
    results_start = 0
    while image_count < max_links_to_fetch:

        # Get all image thumbnail results
        thumbnail_results = wd.find_elements_by_css_selector("img.Q4LuWd")
        number_results = len(thumbnail_results)

            f"Found: {number_results} search results. Extracting links from {results_start}:{number_results}"

        # Loop through image thumbnail identified
        for img in thumbnail_results[results_start:number_results]:
            # Try to click every thumbnail such that we can get the real image behind it.
            except Exception:

            # Extract image urls
            actual_images = wd.find_elements_by_css_selector("img.n3VNCb")
            for actual_image in actual_images:
                if actual_image.get_attribute(
                ) and "http" in actual_image.get_attribute("src"):

            image_count = len(image_urls)

            # If the number images found exceeds our `num_of_images`, end the seaerch.
            if len(image_urls) >= max_links_to_fetch:
                print(f"Found: {len(image_urls)} image links, done!")
            # If we haven't found all the images we want, let's look for more.
            print("Found:", len(image_urls), "image links, looking for more ...")

            # Check for button signifying no more images.
            not_what_you_want_button = ""
                not_what_you_want_button = wd.find_element_by_css_selector(".r0zKGf")

            # If there are no more images return.
            if not_what_you_want_button:
                print("No more images available.")
                return image_urls

            # If there is a "Load More" button, click it.
            load_more_button = wd.find_element_by_css_selector(".mye4qd")
            if load_more_button and not not_what_you_want_button:

        # Move the result startpoint further down.
        results_start = len(thumbnail_results)

    return image_urls

def persist_image(folder_path: str, url: str):
        print("Getting image")
        # Download the image.  If timeout is exceeded, throw an error.
        with timeout(GET_IMAGE_TIMEOUT):
            image_content = requests.get(url).content

    except Exception as e:
        print(f"ERROR - Could not download {url} - {e}")

        # Convert the image into a bit stream, then save it.
        image_file = io.BytesIO(image_content)
        image ="RGB")
        # Create a unique filepath from the contents of the image.
        file_path = os.path.join(
            folder_path, hashlib.sha1(image_content).hexdigest()[:10] + ".jpg"
        with open(file_path, "wb") as f:
  , "JPEG", quality=IMAGE_QUALITY)
        print(f"SUCCESS - saved {url} - as {file_path}")
    except Exception as e:
        print(f"ERROR - Could not save {url} - {e}")

def search_and_download(search_term: str, target_path="./images", number_images=5):
    # Create a folder name.
    target_folder = os.path.join(target_path, "_".join(search_term.lower().split(" ")))

    # Create image folder if needed.
    if not os.path.exists(target_folder):

    # Open Chrome
    with webdriver.Chrome() as wd:
        # Search for images URLs.
        res = fetch_image_urls(

        # Download the images.
        if res is not None:
            for elem in res:
                persist_image(target_folder, elem)
            print(f"Failed to return links for term: {search_term}")

# Loop through all the search terms.
for term in search_terms:
    search_and_download(term, output_path, number_of_images)


Scraping tehe images resulted in a lot of garbage images (noise) along with my ideal training images.

For example, out of all the images shown, I only wanted the image highlighted:

There was also the problem of lots of magic symbols stored in a single image. These “collection” images would need further processing to extract all of the symbols.

However, even with a few rough edges, the script sure as hell beat manually downloading the 10k images I had in the end.

Train a Generative Adversarial Network to Create Magic Symbols

I love folklore dealing with magic. Spells, witches, and summoning the dead. It all piques my interest. I think it inspires me as it is far removed from being a data engineer–I know it might kill aspirations of young data engineers reading, but data engineering can be a bit boring at times. To beat the boredom, I decided to mix my personal and professional interests.

I’ve scraped the internet for images of magic symbols, then trained a deep convolutional generative adversarial network (DCGAN) to generate new magic symbols, which are congruent to real magic symbols. The DCGAN is built using PyTorch. I usually roll with Tensorflow, but working on learning PyTorch.

I’ve taken the “nothing but net” approach with this project. Most of the data augmentation I’ve done during this project have been using other neural networks. Most of these augmenting nets were written in Tensorflow.

I’ve planned a series of articles, as there is too much to cover in one. A lot of the code has been borrowed and adapted; I’ll do my best to give credit where it’s due.

What was in my Head

Let’s start with current results first. After getting the urge to teach a computer to make a magic sign, it took a couple days of hacking before I ended up with the images below.

Keep in mind, these are preliminary results. They were generated using my GTX 1060 6GB. The GPU RAM limits the model a lot–at least, until I rewrite the training loop. Why do I mention the the small GPU? Well, GANs are an architecture which provide much better results with more neurons. And the 6GB limits the network a lot for well performing GAN.

Anyway, ‘nuff caveats. Let’s dig in.


There are a few concepts I’ll refer to a lot throughout these articles–let’s define real quick.

First, “signal.” I like Wikipedia’s definition, even if it is sleep inducing.

In signal processing, a signal is a function that conveys information about a phenomenon.

One of the mistakes I made early in this project was not defining the desired signal. In future projects, I’ll lead with a written definition and modify it based on what I learn about the signal. However, for this project, here was my eventual definition.

The “magic symbol” signal had the following properties:

  • Used in traditional superstition
  • Defined

These terms became my measuring stick for determining whether an image was included in the training data.

Given poorly defined training images seemed to produce extremely muddy outputs, I decided each image should be “defined.” Meaning, an image must be easily discernible at the resolution in which it was trained.

Here are examples of what I see as “defined”:

And examples of “used in traditional superstition.” The top-left symbol is the Leviathan Cross and bottom-left is the Sigil of Bael.


Again, preliminary results. I’m shopping for a way to scale up the size of the network, which should increase the articulation of the outputs. Overall, the bigger the network the more interesting the results.

Small Symbols (64x64)

The following symbols were generated with a DCGAN using 64x64 dimensions as output. These symbols were then post-processed by using a deep denoising varational auto-encoder (DDVAE). It was a fancy way of removing “pepper” from the images.

Large Symbols (128x128)

The following symbols were generated with a GAN using 128x128 dimensions as input and output. These symbols were not post-processed.

Assessment of Outputs

Overall, I’m pleased with the output. Looking at how muddy the outputs are on the 128x128 you may be wondering why. Well, a few reasons.

I’ve been able to avoid mode collapse in almost all of my training sessions. Mode collapse is the bane of GANs. Simply put, the generator finds one or two outputs which always trick the discriminator and then produces those every time.

There is a lot of pepper throughout the generated images. I believe a lot of this comes from dirty input data, so when there’s time, I’ll refine my dataset further. However, the denoising auto-encoder seems to be the easiest way to get rid of the noise–as you can see the 64x64 samples (denoised) are much cleaner than the 128x128 samples. Also, I might try applying the denoiser to the inputs, rather than the outputs. In short, I feel training will greatly improve as I continue to refine the training data.

But do they look like real magic symbols? I don’t know. At this point, I’m biased, so I don’t trust my perspective. I did show the output to a coworker and asked, “What does this look like?” He said, “I don’t know, some sort of runes?” And my boss asked, “What are those Satan symbols?” So, I feel I’m on the right track.

How to Send Data between PC and Arduino using Bluetooth LE

A how-to guide on connecting your PC to an Arduino using Bluetooth LE and Python. To make it easier, we will use bleak an open source BLE library for Python. The code provided should work for connecting your PC to any Bluetooth LE devices.

Before diving in a few things to know

  • Bleak is under-development. It will have issues
  • Although Bleak is multi-OS library, Windows support is still rough
  • PC operating systems suck at BLE
  • Bleak is asynchronous; in Python, this means a bit more complexity
  • The code provided is a proof-of-concept; it should be improved before use

Ok, all warnings stated, let’s jump in.


Bleak is a Python package written by Henrik Blidh. Although the package is still under development, it is pretty nifty. It works on Linux, Mac, or Windows. It is non-blocking, which makes writing applications a bit more complex, but extremely powerful, as your code doesn’t have to manage concurrency.


Getting started with BLE using my starter application and bleak is straightforward. You need to install bleak and I’ve also included library called aioconsole for handling user input asynchronously

pip install bleak aioconsole

Once these packages are installed we should be ready to code. If you have any issues, feel free to ask questions in the comments. I’ll respond when able.

The Code

Before we get started, if you’d rather see the full-code it can be found at:

If you are new to Python then following code may look odd. You’ll see terms like async, await, loop, and future. Don’t let it scare you. These keywords are Python’s way of allowing a programmer to “easily” write asynchronous code in Python.

If you’re are struggling with using asyncio, the built in asynchronous Python library, I’d highly recommend Łukasz Langa’s detailed video series; it takes a time commitment, but is worth it.

If you are an experienced Python programmer, feel free to critique my code, as I’m a new to Python’s asynchronous solutions. I’ve got my big kid britches on.

Enough fluff. Let’s get started.

Application Parameters

There are a few code changes needed for the script to work, at least, with the Arduino and firmware I’ve outlined in the previous article:

The incoming microphone data will be dumped into a CSV; one of the parameters is where you would like to save this CSV. I’ll be saving it to the Desktop. I’m also retrieving the user’s home folder from the HOME environment variable, which is only available on Mac and Linux OS (Unix systems). If you are trying this project from Windows, you’ll need to replace the root_path reference with the full path.

root_path = os.environ["HOME"]
output_file = f"{root_path}/Desktop/microphone_dump.csv"

You’ll also need need to specify the characteristics which the Python app should try to subscribe to when connected to remote hardware. Referring back to our previous project, you should be able to get this from the Arduino code. Or the Serial terminal printout.

read_characteristic = "00001143-0000-1000-8000-00805f9b34fb"
write_characteristic = "00001142-0000-1000-8000-00805f9b34fb"


The main method is where all the async code is initialized. Essentially, it creates three different loops, which run asynchronously when possible.

  • Main – you’d put your application’s code in this loop. More on it later
  • Connection Manager – this is the heart of the Connection object I’ll describe more in a moment.
  • User Console – this loop gets data from the user and sends it to the remote device.

You can imagine each of these loops as independent, however, what they are actually doing is pausing their execution when any of the loops encounter a blocking I/O event. For example, when input is requested from the user or waiting for data from the remote BLE device. When one of these loops encounters an I/O event, they let one of the other loops take over until the I/O event is complete.

That’s far from an accurate explanation, but like I said, I won’t go in depth on async Python, as Langa’s video series is much better than my squawking.

Though, it’s important to know, the ensure_future is what tells Python to run a chunk of code asynchronously. And I’ve been calling them “loops” because each of the 3 ensure_future calls have a while True statement in them. That is, they do not return without error.

After creating the different futures, the loop.run_forever() is what causes them to run.

if __name__ == "__main__":
    # Create the event loop.
    loop = asyncio.get_event_loop()

    data_to_file = DataToFile(output_file)
    connection = Connection(
        loop, read_characteristic, write_characteristic, data_to_file.write_to_csv
    except KeyboardInterrupt:
        print("User stopped program.")

Where does bleak come in? You may have been wondering about the code directly before setting up the loops.

    connection = Connection(
        loop, read_characteristic, write_characteristic, data_to_file.write_to_csv

This class wrap the bleak library and makes it a bit easier to use. Let me explain.


You may be asking, “Why create a wrapper around bleak, Thomas?” Well, two reasons. First, the bleak library is still in development and there are several aspects which do not work well. Second, there are additional features I’d like my Bluetooth LE Python class to have. For example, if you the Bluetooth LE connection is broken, I want my code to automatically attempt to reconnect. This wrapper class allows me to add these capabilities.

I did try to keep the code highly hackable. I want anybody to be able to use the code for their own applications, with a minimum time investment.

Connection(): init

The Connection class has three required arguments and one optional.

  • loop – this is the loop established by asyncio, it allows the connection class to do async magic.
  • read_characteristic – the characteristic on the remote device containing data we are interested in.
  • write_characteristic – the characteristic on the remote device which we can write data.
  • data_dump_handler – this is the function to call when we’ve filled the rx buffer.
  • data_dump_size – this is the size of the rx buffer. Once it is exceeded, the data_dump_handler function is called and the rx buffer is cleared.
class Connection:
    client: BleakClient = None
    def __init__(
        loop: asyncio.AbstractEventLoop,
        read_characteristic: str,
        write_characteristic: str,
        data_dump_handler: Callable[[str, Any], None],
        data_dump_size: int = 256,
        self.loop = loop
        self.read_characteristic = read_characteristic
        self.write_characteristic = write_characteristic
        self.data_dump_handler = data_dump_handler
        self.data_dump_size = data_dump_size

Alongside the arguments are internal variables which track device state.

The variable self.connected tracks whether the BleakClient is connected to a remote device. It is needed since the await self.client.is_connected() currently has an issue where it raises an exception if you call it and it’s not connected to a remote device. Have I mentioned bleak is in progress?

        # Device state
        self.connected = False
        self.connected_device = None

self.selected_device hangs on to the device you selected when you started the app. This is needed for reconnecting on disconnect.

The rest of variables help track the incoming data. They’ll probably be refactored into a DTO at some point.

        # RX Buffer
        self.last_packet_time =
        self.rx_data = []
        self.rx_timestamps = []
        self.rx_delays = []

Connection(): Callbacks

There are two callbacks in the Connection class. One to handle disconnections from the Bluetooth LE device. And one to handle incoming data.

Easy one first, the on_disconnect method is called whenever the BleakClient loses connection with the remote device. All we’re doing with the callback is setting the connected flag to False. This will cause the Connection.connect() to attempt to reconnect.

    def on_disconnect(self, client: BleakClient):
        self.connected = False
        # Put code here to handle what happens on disconnet.
        print(f"Disconnected from {}!")

The notification_handler is called by the BleakClient any time the remote device updates a characteristic we are interested in. The callback has two parameters, sender, which is the name of the device making the update, and data, which is a bytearray containing the information received.

I’m converting the data from two-bytes into a single int value using Python’s from_bytes(). The first argument is the bytearray and the byteorder defines the endianness (usually big). The converted value is then appended to the rx_data list.

The record_time_info() calls a method to save the current time and the number of microseconds between the current byte received and the previous byte.

If the length of the rx_data list is greater than the data_dump_size, then the data are passed to the data_dump_handler function and the rx_data list is cleared, along with any time tracking information.

    def notification_handler(self, sender: str, data: Any):
        self.rx_data.append(int.from_bytes(data, byteorder="big"))
        if len(self.rx_data) >= self.data_dump_size:
            self.data_dump_handler(self.rx_data, self.rx_timestamps, self.rx_delays)

Connection(): Connection Management

The Connection class’s primary job is to manage BleakClient’s connection with the remote device.

The manager function is one of the async loops. It continually checks if the Connection.client exists, if it doesn’t then it prompts the select_device() function to find a remote connection. If it does exist, then it executes the connect().

    async def manager(self):
        print("Starting connection manager.")
        while True:
            if self.client:
                await self.connect()
                await self.select_device()
                await asyncio.sleep(15.0, loop=loop)       

The connect() is responsible for ensuring the PC’s Bluetooth LE device maintains a connection with the selected remote device.

First, the method checks if the the device is already connected, if it does, then it simply returns. Remember, this function is in an async loop.

If the device is not connected, it tries to make the connection by calling self.client.connect(). This is awaited, meaning it will not continue to execute the rest of the method until this function call is returned. Then, we check if the connection is was successful and update the Connection.connected property.

If the BleakClient is indeed connected, then we add the on_disconnect and notification_handler callbacks. Note, we only added a callback on the read_characteristic. Makes sense, right?

Lastly, we enter an infinite loop which checks every 5 seconds if the BleakClient is still connected, if it isn’t, then it breaks the loop, the function returns, and the entire method is called again.

    async def connect(self):
        if self.connected:
            await self.client.connect()
            self.connected = await self.client.is_connected()
            if self.connected:
                print(f"Connected to {}")
                await self.client.start_notify(
                    self.read_characteristic, self.notification_handler,
                while True:
                    if not self.connected:
                    await asyncio.sleep(5.0, loop=loop)
                print(f"Failed to connect to {}")
        except Exception as e:

Whenever we decide to end the connection, we can escape the program by hitting CTRL+C, however, before shutting down the BleakClient needs to free up the hardware. The cleanup method checks if the Connection.client exists, if it does, it tells the remote device we no longer want notifications from the read_characteristic. It also sends a signal to our PC’s hardware and the remote device we want to disconnect.

    async def cleanup(self):
        if self.client:
            await self.client.stop_notify(read_characteristic)
            await self.client.disconnect()

Device Selection

Bleak is a multi-OS package, however, there are slight differences between the different operating-systems. One of those is the address of your remote device. Windows and Linux report the remote device by it’s MAC. Of course, Mac has to be the odd duck, it uses a Universally Unique Identifier (UUID). Specially, it uses a CoreBluetooth UUID, or a CBUUID.

These identifiers are important as bleak uses them during its connection process. These IDs are static, that is, they shouldn’t change between sessions, yet they should be unique to the hardware.

The select_device method calls the method, which returns a list of BleakDevices advertising their connections within range. The code uses the aioconsole package to asynchronously request the user to select a particular device

    async def select_device(self):
        print("Bluetooh LE hardware warming up...")
        await asyncio.sleep(2.0, loop=loop) # Wait for BLE to initialize.
        devices = await discover()

        print("Please select device: ")
        for i, device in enumerate(devices):
            print(f"{i}: {}")

        response = -1
        while True:
            response = await ainput("Select device: ")
                response = int(response.strip())
                print("Please make valid selection.")
            if response > -1 and response < len(devices):
                print("Please make valid selection.")

After the user has selected a device then the Connection.connected_device is recorded (in case we needed it later) and the Connection.client is set to a newly created BleakClient with the address of the user selected device.

        print(f"Connecting to {devices[response].name}")
        self.connected_device = devices[response]
        self.client = BleakClient(devices[response].address, loop=self.loop)

Utility Methods

Not much to see here, these methods are used to handle timestamps on incoming Bluetooth LE data and clearing the rx buffer.

    def record_time_info(self):
        present_time =
        self.rx_delays.append((present_time - self.last_packet_time).microseconds)
        self.last_packet_time = present_time

    def clear_lists(self):

Save Incoming Data to File

This is a small class meant to make it easier to record the incoming microphone data along with the time it was received and delay since the last bytes were received.

class DataToFile:

    column_names = ["time", "delay", "data_value"]

    def __init__(self, write_path):
        self.path = write_path

    def write_to_csv(self, times: [int], delays: [datetime], data_values: [Any]):

        if len(set([len(times), len(delays), len(data_values)])) > 1:
            raise Exception("Not all data lists are the same length.")

        with open(self.path, "a+") as f:
            if os.stat(self.path).st_size == 0:
                print("Created file.")
                f.write(",".join([str(name) for name in self.column_names]) + ",\n")
                for i in range(len(data_values)):

App Loops

I mentioned three “async loops,” we’ve covered the first one inside the Connection class, but outside are the other two.

The user_console_manager() checks to see if the Connection instance has a instantiated a BleakClient and it is connected to a device. If so, it prompts the user for input in a non-blocking manner. After the user enters input and hits return the string is converted into a bytearray using the map(). Lastly, it is sent by directly accessing the Connection.client’s write_characteristic method. Note, that’s a bit of a code smell, it should be refactored (when I have time).

async def user_console_manager(connection: Connection):
    while True:
        if connection.client and connection.connected:
            input_str = await ainput("Enter string: ")
            bytes_to_send = bytearray(map(ord, input_str))
            await connection.client.write_gatt_char(write_characteristic, bytes_to_send)
            print(f"Sent: {input_str}")
            await asyncio.sleep(2.0, loop=loop)

The last loop is the one designed to take the application code. Right now, it only simulates application logic by sleeping 5 seconds.

async def main():
    while True:
        await asyncio.sleep(5)


Well, that’s it. You will have problems, especially if you are using the above code from Linux or Windows. But, if you run into any issues I’ll do my best to provide support. Just leave me a comment below.

Getting Started with Bluetooth LE on the Arduino Nano 33 Sense

This article will show you how to program the Arduino Nano BLE 33 devices to use Bluetooth LE.


Bluetooth Low Energy and I go way back. I was one of the first using the HM-10 module back in the day. Recently, my mentor introduced me to the Arduino Nano 33 BLE Sense. Great little board–packed with sensors!

Shortly after firing it up, I got excited. I’ve been wanting to start creating my own smartwatch for a long time (as long the Apple watch has sucked really). And it looks like I wasn’t the only one:

This one board had many of the sensors I wanted, all in one little package. The board is a researcher’s nocturnal emission.

Of course, my excitement was tamed when I realized there weren’t tutorials on how to use the Bluetooth LE portion. So, after a bit of hacking I figured I’d share what I’ve learned.

Blue on Everything

This article will be part of a series. Here, we will be building a Bluetooth LE peripheral from the Nano 33, but it’s hard to debug without having a central device to find and connect to the peripheral.

The next article in this series will show how to use use Python to connect to Bluetooth LE peripherals (above gif). This should allow one to connect to the Nano 33 from a PC. In short, stick with me. I’ve more Bluetooth LE content coming.

How to Install the Arduino Nano 33 BLE Board

After getting your Arduino Nano 33 BLE board there’s a little setup to do. First, open up the Arduino IDE and navigate to the “Boards Manager.”

Search for Nano 33 BLE and install the board Arduino nRF528xBoards (MBed OS).

Your Arduino should be ready work with the Nano 33 boards, except BLE. For that, we need another library.

How to Install the ArduinoBLE Library

There are are a few different Arduino libraries for Bluetooth LE–usually, respective to the hardware. Unfortunate, as this means we would need a different library to work with the Bluetooth LE on a ESP32, for example. Oh well. Back to the problem at hand.

The official library for working with the Arduino boards equipped with BLE is:

It works pretty well, though, the documentation is a bit spotty.

To get started you’ll need to fire up the Arduino IDE and go to Tools then Manager Libraries...

In the search box that comes up type ArduinoBLE and then select Install next to the library:

That’s pretty much it, we can now include the library at the top of our sketch:

#include <ArduinoBLE.h>

And access the full API in our code.

Project Description

If you are eager, feel free to skip this information and jump to the code.

Before moving on, if the following terms are confusing:

  • Peripheral
  • Central
  • Master
  • Slave
  • Server
  • Client

You might check out EmbeddedFM’s explanation:

I’ll be focusing on getting the Arduino 33 BLE Sense to act as a peripheral BLE device. As a peripheral, it’ll advertise itself as having services, one for reading, the other for writing.

UART versus Bluetooth LE

Usually when I’m working with a Bluetooth LE (BLE) device I want it to send and receive data. And that’ll be the focus of this article.

I’ve seen this send-n-receive’ing data from BLE referred to as “UART emulation.” I think that’s fair, UART is a classic communication protocol for a reason. I’ve like the comparison as a mental framework for our BLE code.

We will have a rx property to get data from a remote device and a tx property where we can send data. Throughout the Arduino program you’ll see my naming scheme using this analog. That stated, there are clear differences between BLE communication and UART. BLE is arguably more complex and versatile.

Data from the Arduino Microphone

To demonstrate sending and receiving data we need to data to send. We are going to grab information from the microphone on the Arduino Sense and send it to remote connected device. I’ll not cover the microphone code here, as I don’t understand it well enough to explain. However, here’s a couple reads:


Time to code. Below is what I hacked together, with annotations from the “gotchas” I ran into.

One last caveat, I used Jithin’s code as a base of my project:

Although, I’m not sure any of the original code is left. Cite your sources.

And if you’d rather look at the full code, it can be found at:


We load in the BLE and the PDM libraries to access the APIs to work with the microphone and the radio hardware.

#include <ArduinoBLE.h>
#include <PDM.h>

Service and Characteristics

Let’s create the service. First, we create the name displayed in the advertizing packet, making it easy for a user to identify our Arduino.

We also create a Service called microphoneService, passing it the full Universally Unique ID (UUID) as a string. When setting the UUID there are two options. A 16-bit or a 128-bit version. If you use one of the standard Bluetooth LE Services the 16-bit version is good. However, if you are looking to create a custom service, you will need to explore creating a full 128-bit UUID.

Here, I’m using the full UUIDs but with a standard service and characteristic, as it makes it easier to connect other hardware to our prototype, as the full UUID is known.

If you want to understand UUID’s more fully, I highly recommend Nordic’s article:

Anyway, we are going to use the following UUIDs:

You may notice reading the Bluetooth specifications, there are two mandatory characteristics we should be implementing for Generic Access:

For simplicity, I’ll leave these up to the reader. But they must be implemented for a proper Generic Access service.

Right, back to the code.

Here we define the name of the device as it should show to remote devices. Then, the service and two characteristics, one for sending, the other, receiving.

// Device name
const char* nameOfPeripheral = "Microphone";
const char* uuidOfService = "0000181a-0000-1000-8000-00805f9b34fb";
const char* uuidOfRxChar = "00002A3D-0000-1000-8000-00805f9b34fb";
const char* uuidOfTxChar = "00002A58-0000-1000-8000-00805f9b34fb";

Now, we actually instantiate the BLEService object called microphoneService.

// BLE Service
BLEService microphoneService(uuidOfService);

The characteristic responsible for receiving data, rxCharacteristic, has a couple of parameters which tell the Nano 33 how the characteristic should act.

// Setup the incoming data characteristic (RX).
const int RX_BUFFER_SIZE = 256;

RX_BUFFER_SIZE will be how much space is reserved for the rx buffer. And RX_BUFFER_FIXED_LENGTH will be, well, honestly, I’m not sure. Let me take a second and try to explain my ignorance.

When looking for the correct way to use the ArduinoBLE library, I referred to the documentation:

There are several different ways to initialize a characteristic, as a single value (e.g., BLEByteCharacteristic, BLEFloatCharacteristic, etc.) or as buffer. I decided on the buffer for the rxCharacteristic. And that’s where it got problematic.

Here’s what the documentation states regarding initializing a BLECharacteristic with a buffer.

BLECharacteristic(uuid, properties, value, valueSize)
BLECharacteristic(uuid, properties, stringValue)
uuid: 16-bit or 128-bit UUID in string format
properties: mask of the properties (BLEBroadcast, BLERead, etc)
valueSize: (maximum) size of characteristic value
stringValue: value as a string

Cool, makes sense. Unfortunately, I never got a BLECharacteristic to work initializing it with those arguments. I finally dug into the actual BLECharacteristic source and discovered their are two ways to initialize a BLECharacteristic:

BLECharacteristic(new BLELocalCharacteristic(uuid, properties, valueSize, fixedLength))
BLECharacteristic(new BLELocalCharacteristic(uuid, properties, value))

I hate misinformation. Ok, that tale aside, back to our code.

Let’s actually declare the rx and tx characteristics. Notice, we are using a buffered characteristic for our rx and a single byte value characteristic for our tx. This may not be optimal, but it’s what worked.

// RX / TX Characteristics
BLECharacteristic rxChar(uuidOfRxChar, BLEWriteWithoutResponse | BLEWrite, RX_BUFFER_SIZE, RX_BUFFER_FIXED_LENGTH);
BLEByteCharacteristic txChar(uuidOfTxChar, BLERead | BLENotify | BLEBroadcast);

The second argument is where you define how the characteristic should behave. Each property should be separated by the | as they are constants which are being ORed together into a single value (masking).

Here is a list of available properties:

  • BLEBroadcast – will cause the characteristic to be advertized
  • BLERead – allows remote devices to read the characteristic value
  • BLEWriteWithoutResponse – allows remote devices to write to the device without expecting an acknowledgement
  • BLEWrite – allows remote devices to write, while expecting an acknowledgement the write was successful
  • BLENotify – allows a remote device to be notified anytime the characteristic’s value is update
  • BLEIndicate – the same as BLENotify, but we expect a response from the remote device indicating it read the value


There are two global variables which keep track of the microphone data. The first is a small buffer called sampleBuffer, it will hold up to 256 values from the mic.

The volatile int samplesRead is the variable which will hold the immediate value from the mic sensor. It is used in the interrupt routine vector (ISR) function. The volatile keyword tells the Arduino’s C++ compiler the value in the variable may change at any time and it should check the value when referenced, rather than relying on a cached value in the processor (more on volatiles).

// Buffer to read samples into, each sample is 16-bits
short sampleBuffer[256];

// Number of samples read
volatile int samplesRead;


We initialize the Serial port, used for debugging.

void setup() {

  // Start serial.

  // Ensure serial port is ready.
  while (!Serial);

To see when the BLE actually is connected, we set the pins connected to the built-in RGB LEDs as OUTPUT.

  // Prepare LED pins.
  pinMode(LEDR, OUTPUT);
  pinMode(LEDG, OUTPUT);

Note, there is a bug in the source code where the LEDR and the LEDG are backwards. You can fix this by searching your computer for ARDUINO_NANO33BLE folder and editing the file pins_arduino.h inside.

Change the following:

#define LEDR        (22u)
#define LEDG        (23u)
#define LEDB        (24u)


#define LEDR        (23u)
#define LEDG        (22u)
#define LEDB        (24u)

And save. That should fix the mappings.

The onPDMdata() is an ISR which fires every time the microphone gets new data. And startPDM() starts the microphone integrated circuit.

  // Configure the data receive callback

  // Start PDM

Now Bluetooth LE is setup, we ensure the Bluetooth LE hardware has been powered-on within the Nano 33. We set the device name and begin advertizing the service. Then, add the rx and tx characteristics to the microphoneService. Lastly, add the microphoneService to the BLE object.

  // Start BLE.

  // Create BLE service and characteristics.

Now the Bluetooth LE hardware is turned on, we add callbacks which will fire when the device connects or disconnects. Those callbacks are great places to add notifications, setup, and teardown.

We also add a callback which will fire every time the Bluetooth LE hardware has a characteristic written. This allows us to handle data as it streams in.

  // Bluetooth LE connection handlers.
  BLE.setEventHandler(BLEConnected, onBLEConnected);
  BLE.setEventHandler(BLEDisconnected, onBLEDisconnected);

  // Event driven reads.
  rxChar.setEventHandler(BLEWritten, onRxCharValueUpdate);

Lastly, we command the Bluetooth LE hardware to begin advertizing its services and characteristics to the world. Well, at least +/-30ft of the world.

  // Let's tell devices about us.

Before beginning the main loop, I like spiting out all of the hardware information we setup. This makes it easy to add it into whatever other applications we are developing., which will connect to the newly initialized peripheral.

  // Print out full UUID and MAC address.
  Serial.println("Peripheral advertising info: ");
  Serial.print("Name: ");
  Serial.print("MAC: ");
  Serial.print("Service UUID: ");
  Serial.print("rxCharacteristic UUID: ");
  Serial.print("txCharacteristics UUID: ");

  Serial.println("Bluetooth device active, waiting for connections...");


The main loop grabs a reference to the central property from the BLE object. It checks if central exists and then it checks if central is connected. If it is, it calls the connectedLight() which will cause the green LED to come on, letting us know the hardware has made a connection.

Then, it checks if there are data in the sampleBuffer array, if so, it writes them to the txChar. After it has written all data, it resets the samplesRead variable to 0.

Lastly, if the device is not connected or not initialized, the loop turns on the disconnected light by calling disconnectedLight().

void loop()
  BLEDevice central = BLE.central();
  if (central)
    // Only send data if we are connected to a central device.
    while (central.connected()) {

      // Send the microphone values to the central device.
      if (samplesRead) {
        // print samples to the serial monitor or plotter
        for (int i = 0; i < samplesRead; i++) {
        // Clear the read count
        samplesRead = 0;
  } else {

Some may have noticed there is probably an issue with how I’m pulling the data from the sampleBuffer, as I’ve just noticed it myself writing this article, it may have a condition where the microphone’s ISR is called in the middle of writing the buffer to the txChar. If I’ve need to fix this, I’ll update this article.

Ok, hard part’s over, let’s move on to the helper methods.

Helper Methods


The startBLE() function initializes the Bluetooth LE hardware by calling the begin(). If it is unable to start the hardware, it will state so via the serial port, and then stick forever.

void startBLE() {
  if (!BLE.begin())
    Serial.println("starting BLE failed!");
    while (1);


This method is called when new data is received from a connected device. It grabs the data from the rxChar by calling readValue and providing a buffer for the data and how many bytes are available in the buffer. The readValue method returns how many bytes were read. We then loop over each of the bytes in our tmp buffer, cast them to char, and print them to the serial terminal. This is pretty helpful when debugging.

Before ending, we also print out how many bytes were read, just in case we’ve received data which can’t be converted to ASCII. Again, helpful for debugging.

void onRxCharValueUpdate(BLEDevice central, BLECharacteristic characteristic) {
  // central wrote new value to characteristic, update LED
  Serial.print("Characteristic event, read: ");
  byte tmp[256];
  int dataLength = rxChar.readValue(tmp, 256);

  for(int i = 0; i < dataLength; i++) {
  Serial.print("Value length = ");

LED Indicators

Not much to see here. These functions are called when our device connects or disconnects, respectively.

void onBLEConnected(BLEDevice central) {
  Serial.print("Connected event, central: ");

void onBLEDisconnected(BLEDevice central) {
  Serial.print("Disconnected event, central: ");

void connectedLight() {
  digitalWrite(LEDR, LOW);
  digitalWrite(LEDG, HIGH);

void disconnectedLight() {
  digitalWrite(LEDR, HIGH);
  digitalWrite(LEDG, LOW);


I stole this code from Arduino provided example. I think it initializes the PDM hardware (microphone) with a 16khz sample rate.

void startPDM() {
  // initialize PDM with:
  // - one channel (mono mode)
  // - a 16 kHz sample rate
  if (!PDM.begin(1, 16000)) {
    Serial.println("Failed to start PDM!");
    while (1);

Lastly, the onPDMData callback is fired whenever their are data available to be read. It checks how many bytes their are available by calling available() and reads that number of bytes into the buffer. Lastly, given the data are int16, it divides the number of bytes by 2 as this is the number of samples read.

void onPDMdata() {
  // query the number of bytes available
  int bytesAvailable = PDM.available();

  // read into the sample buffer
  int bytesRead =, bytesAvailable);

  // 16-bit, 2 bytes per sample
  samplesRead = bytesRead / 2;

Final Thoughts

Bluetooth LE is powerful–but tough to get right. To be clear, not saying I’ve gotten it right here, but I’m hoping I’m closer. If you find any issues please leave me a comment or send me an email and I’ll get them corrected as quick as I’m able.