Generating LEGO Images for Training a CNN

After having success with training a CNN on our initial dataset, we decided to up the game on generating training images. My buddy Rockets built a nice little turntable and ordered a couple of NEMA17s for each of us. His idea was we could both start generating training images.

I asked if he would be OK with me ordering some RAMPs boards and programming them to synchronize with the PiCamera. I figured, it would probably be better for reproducibility if we had solid hardware, with custom firmware and software.

After a few hours of coding over a couple of weeks I was able to control the RAMPs within a Python script from either the Raspberry Pi or a desktop computer.

I’ve listed the code parts below with a brief explanation–just in case someone would like to hack them for other projects.

Minimum Viable Hack

Warning words, I’m an advocate of the minimum viable product, especially, when it comes to my personal hacking time. I refer to this as the minimum viable hack. That stated, there are known issues in the code below. But! It does the job–so I’ve not addressed them.

Here are a few:

  1. The value 0x0A (\n) value is not handled as part of packet (e.g., if MILLI_BETWEEN = 10 bad things will happen).
  2. The motors are always on (reduces motor life).
  3. Pulse width is not adjustable without firmware update.
  4. The Python code is blocking. This makes the halt feature on the Arduino Mega side fairly useless.
  5. Only RAMPs motor X is setup (this one I will address later, as we will need several drivers before the end of this project).


To move the turn table we used a RAMPs 1.4 board:

Getting things going was straightforward. I put together the hardware, installed the Arduino IDE, and looked-up the pinout for the RAMPs controller.

I wrote the firmware to receive serial commands as a packet. The packet structure (at time of writing) looks like this:

MOTOR_PACKET = 0x01 0x01 0x00 0x03 0xE8 0x05 0x0A
INDEX        =  1    2     3    4    5    6   7
  • first_byte = This indicates what sort of packet type. Right now, there is only one, but I figure we might want to control other I/O on the Arduino later.
  • second_byte = the motor selected 1-5 (X, Y, Z, E1, E2).
  • third_byte = Motor direction, 0x00 is clockwise and 0x01 is counter-clockwise.
  • fourth_byte = first chunk of the steps.
  • fifth_byte = second chunk of the steps. The steps variable tells the motor how many steps to move before stopping.
  • sixth_byte = delay between steps in milliseconds.
  • seventh_byte = the end-of-transmission (EOT) character. I’ve used \n.

When the code receives an EOT character, it parses the packet and calls the writeMotor(). This function loops through the number of steps, delaying between each. Each loop, the function checks if a halt command has been received. If it has, it stops the motor mid-move.

Again, this code isn’t perfect. Far from it. But it does the job.

#include <avr/interrupt.h> 
#include <avr/io.h> 

// TODO: Pulse width set by initialization.
// TODO: Setup all motors to be selected by master.
// TODO: Add a timer to shutdown motors after threshold.
//       And keep motor enabled until threshold has been met.
// TODO: Handle 0x0A values as part of packet (e.g., if MILLI_BETWEEN = 10).
// TODO: Add a "holding torque" feature; making it so motors never disable.

// For RAMPS 1.4
#define X_STEP_PIN         54
#define X_DIR_PIN          55
#define X_ENABLE_PIN       38
#define X_MIN_PIN           3
#define X_MAX_PIN           2

#define Y_STEP_PIN         60
#define Y_DIR_PIN          61
#define Y_ENABLE_PIN       56
#define Y_MIN_PIN          14
#define Y_MAX_PIN          15

#define Z_STEP_PIN         46
#define Z_DIR_PIN          48
#define Z_ENABLE_PIN       62
#define Z_MIN_PIN          18
#define Z_MAX_PIN          19

#define E_STEP_PIN         26
#define E_DIR_PIN          28
#define E_ENABLE_PIN       24

#define SDPOWER            -1
#define SDSS               53
#define LED_PIN            13

#define FAN_PIN            9

#define PS_ON_PIN          12
#define KILL_PIN           -1

#define HEATER_0_PIN       10
#define HEATER_1_PIN       8
#define TEMP_0_PIN         13   // ANALOG NUMBERING
#define TEMP_1_PIN         14   // ANALOG NUMBERING

#define MOTOR_X         0x01
#define MOTOR_Y         0x02
#define MOTOR_Z         0x03
#define MOTOR_E1        0x04
#define MOTOR_E2        0x05

#define DRIVE_CMD       (char)0x01
#define HALT_CMD        (char)0x0F
#define DIR_CC          (char)0x00
#define DIR_CCW         (char)0x01

#define COMPLETED_CMD   (char)0x07
#define END_TX          (char)0x0A
#define ACK             (char)0x06 // Acknowledge
#define NACK            (char)0x15 // Negative Acknowledge

// Determine the pulse width of motor.
#define MOTOR_ANGLE           1.8

#define RX_BUFFER_SIZE 16

      X     = 0
      Y     = 1
      Z     = 2
      E1    = 3
      E2    = 4
      0x01 = motor_write
      0x02 = motor_halt

      0x00 = CW
      0x01 = CCW

                       0               1     2     3        4       5         6
  MOTOR_PACKET =    01                01    00    03     E8        05         0A
  MOTOR_PACKET =    0x 01010003E8050A

  HALT         = 0x0F

/* Create a structure for the motors
 *  direction_pin = pin to control direction of stepper.
 *  step_pin      = pin to control the steps.
 *  enable_pin    = pin to enable motor.
struct MOTOR {
  uint8_t direction_pin;
  uint8_t step_pin;
  uint8_t enable_pin;
  uint8_t pulse_width_micros;

struct BUFFER {
  uint8_t data[RX_BUFFER_SIZE];
  uint8_t bufferSize;
  uint8_t index;
  boolean packetComplete;
  uint8_t shutdownThreshold;

/* Initialize motors */
MOTOR motorX = {

// Urgent shutdown.
volatile boolean halt = false;
volatile static bool triggered;

/* Initialize RX buffer */
BUFFER rxBuffer;;

/* Initialize program */
void setup()
  // Initialize the structures
  rxBuffer.bufferSize = RX_BUFFER_SIZE;

  // Disable holding torque.
  digitalWrite(motorX.enable_pin, HIGH);

/* Main */
void loop()
  // If packet is packetComplete
  if (rxBuffer.packetComplete) {
    uint8_t packet_type =[0];

    switch (packet_type) {
      case DRIVE_CMD:
          // Unpack the command.
          uint8_t motorNumber =[1];
          uint8_t direction =[2];
          uint16_t steps = ((uint8_t)[3] << 8)  | (uint8_t)[4];
          uint8_t milliSecondsDelay =[5];

          // Let the master know command is in process.
          // Start the motor
          writeMotor(motorX, direction, steps, milliSecondsDelay);
    // Clear the buffer for the nexgt packet.

/*  ############### MOTORS ############### */

/* Method for initalizing MOTOR */
void motorSetup(MOTOR motor) {

  // Setup motor pins
  pinMode(motor.direction_pin, OUTPUT);
  pinMode(motor.step_pin, OUTPUT);
  pinMode(motor.enable_pin, OUTPUT);


/* Write to MOTOR */
void writeMotor(MOTOR motor, int direction, uint16_t numberOfSteps, int milliBetweenSteps) {

    // Enable motor.
    digitalWrite(motor.enable_pin, LOW);

    // Check direction;
    switch (direction) {
      case DIR_CC:
        digitalWrite(motor.direction_pin, HIGH);
      case DIR_CCW:
        digitalWrite(motor.direction_pin, LOW);

    // Move the motor (but keep an eye for a halt command)
    for(int n = 0; n < numberOfSteps; n++) {
      // Interrupt motor
      if(checkForHalt()) {  
      digitalWrite(motor.step_pin, HIGH);
      digitalWrite(motor.step_pin, LOW);

    // Disable holding torque.
    digitalWrite(motor.enable_pin, HIGH);

    // Let the user know the move is done.


/*  ############### COMMUNICATION ###############
void serialEvent() {

  // Get all the data.
  while (Serial.available()) {

    // Read a byte
    uint8_t inByte = (uint8_t);

    // Store the byte in the buffer.[rxBuffer.index] = inByte;

    // If a complete packet character is found, mark the packet
    // as ready for execution.
    if ((char)inByte == '\n') {
      rxBuffer.packetComplete = true;

// Clear the buffer.
void resetBuffer(struct BUFFER *buffer) {
  memset(buffer->data, 0, sizeof(buffer->data));
  buffer->index = 0;
  buffer->packetComplete = false;

// Does not count termination char.
int packetLength(BUFFER buffer){
  for(int i = 0; i < buffer.bufferSize; i++) {
    if((char)[i] == '\n'){ return i; }
  return -1;

void sendAck() {

void sendNack() {

void sendCompletedAction() {

// Halt is handled outside normal communication protocol.
boolean checkForHalt() {
  if (Serial.available()){
    // Halt command has no termination character.
    if ((uint8_t) == HALT_CMD) {
      return true;
  return false;


Python RAMPS

There are two variants of the Python code. First, is for the Raspberry Pi. It’s where I focused coding time, as it made sense to generate training images using the same hardware (PiCamera) as would be used for production. However, I’ve a simpler desktop version which uses OpenCV and a webcam.

For the Raspberry Pi and desktop versions you will need the following:

  • Python 3.7 – this should be standard on Raspbian Buster.

On the desktop you will need opencv, it can be installed using:

pip install opencv

In both cases you will need the custom class ramps_control, if you clone the repository and run your script from the ./turn_table directory, that should be handled for you.

What’s it Do?

The turn table script initializes the camera. It then creates a loop over the number of angles you want to take images.

A full rotation is 3200 steps and if you ask for 60 images, then the script will rotate the turntable ~53.33 steps. At the end of the rotation, the script will capture an image of your target. Then, it will rotate another 53.33 steps and take another picture. It will do this 60 times, where it should have completed a full rotation.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
Created on Wed Sep 25 05:58:48 2019

@author: ladvien

from picamera import PiCamera
import os
from time import sleep
import ramps_control
import serial
import glob

# Parameters

FULL_ROTATION           = 3200

print(f'Steps per image: {STEPS_BEFORE_PIC}')

# Don't Overwrite
def check_existing_images(output_directory):
    existing_image_files = glob.glob(f'{output_directory}/*.jpg')
    max_file_index = 0
    for file in existing_image_files:
        file_index = file.split('/')[-1].split('_')[1].replace('.jpg', '')
            file_index = int(file_index)
            if file_index > max_file_index:
                max_file_index = file_index
    return max_file_index

# Open Serial
ser = serial.Serial('/dev/ttyUSB0', 115200)

# Init Camera
#picam v2 resolution 3280 x 2464
camera = PiCamera()
PIC_SIZE = 1200

# Init RAMPS
ramps = ramps_control.RAMPS(ser, debug = False)

# Track whether the motor is at work.
motor_moving = False

# Reset the RAMPs program.

# Main

part = ''

while True:
    part_candidate = input(f'Enter part number and hit enter. (Default {part}; "q" to quit): ')

    if part_candidate.lower() == 'q':
    elif part_candidate != '':
        part = part_candidate

    output_directory = f'/home/pi/Desktop/lego_images/{part}' 

    if not os.path.exists(output_directory):

    max_file_index = check_existing_images(output_directory)

    for i in range(IMAGES_PER_ROTATION):

            success = ramps.move(ramps.MOTOR_X,
            if success:
                print('Table move a success.')

                file_path = f'{output_directory}/{part}_{i + max_file_index}.jpg'
            # sleep(0.05)


Python RAMPS Class

To increase resuability of the code, I’ve abstracted the RAMPs controller code into a Python class. This class is called by the script above. It is blocking code which handles sending commands, polling the Arduino, and reports received information.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
Created on Sat Sep 28 05:39:18 2019

@author: ladvien
from time import sleep, time

      X     = 0
      Y     = 1
      Z     = 2
      E1    = 3
      E2    = 4
      0x01 = motor_write
      0x02 = motor_halt

      0x00 = CW
      0x01 = CCW

                       0               1     2     3        4       5         6


class RAMPS:
    DRIVE_CMD       = 0x01
    HALT_CMD        = 0x0F
    DIR_CC          = 0x00
    DIR_CCW         = 0x01
    COMPLETED_CMD   = 0x07
    END_TX          = 0x0A
    ACKNOWLEDGE     = 0x06
    MOTOR_X         = 0x01
    MOTOR_Y         = 0x02
    MOTOR_Z         = 0x03
    MOTOR_E1        = 0x04
    MOTOR_E2        = 0x05
    def __init__(self, ser, debug = False):
        self.ser = ser
        self.toggle_debug = debug
        self.rx_buffer_size = 256
        self.serial_delay = 0.1
    def toggle_debug(self):
        self.debug = not self.debug
    def print_debug(self, message):
        if self.toggle_debug:
    # Prepare for a serial send.
    def encode_packet(self, values):
        return bytearray(values)
    # Prepare a packet the slave will understand
    def prepare_motor_packet(self, motor_num, direction, steps, milli_between):
        steps_1 = (steps >> 8) & 0xFF
        steps_2 = (steps) & 0xFF
        return [self.DRIVE_CMD, motor_num, direction, steps_1, steps_2, milli_between, self.END_TX]
    def read_available(self, as_ascii = False):
        self.print_debug(f'Reading available.')
        # 1. Get all available data.
        # 2. Unless buffer exceeded.
        # 3. Return a list of the data.
        incoming_data = []
        incoming_data_size = 0
        while self.ser.in_waiting > 0:
            incoming_data_size += 1
            if incoming_data_size > self.rx_buffer_size:
                self.print_debug(f'Buffer overflow.')
                return list('RX buffer overflow.')
            if as_ascii:
                incoming_data += self.ser.readline()

        self.print_debug(f'Completed reading available.')
        return incoming_data

    def check_for_confirm(self, command_expected):
        confirmation = self.read_available()
        if len(confirmation) > 0:
            if confirmation[0] == command_expected:
                return True
            return False

            RAMPS UTILITY
    def reset_ramps(self, print_welcome = False):

        self.print_debug(f'Reseting Arduino.')
        # Reset the Arduino Mega.
        # Get welcome message.
        welcome_message = []
        while self.ser.in_waiting > 0:
            welcome_message.append(self.ser.readline().decode('utf-8') )
        self.print_debug(f'Completed reset.')
        if print_welcome:
            # Print it for the user.

            MOTOR COMMANDS
    def move(self, motor, direction, steps, milli_secs_between_steps):
        # 1. Create a list containg RAMPs command.
        # 2. Encode it for serial writing.
        # 3. Write to serial port.
        # 4. Check for ACK or NACK.
        # 5. Poll serial for completed command.
        packet = self.prepare_motor_packet(motor,
        packet = self.encode_packet(packet)
        self.print_debug(f'Created move packet: {packet}')
        # Don't miss ACK to being in a hurry.
        confirmation = self.read_available()
        if confirmation[0] == self.ACKNOWLEDGE:
            self.print_debug(f'Move command acknowledged.')
            return True
        return False
    def wait_for_complete(self, timeout):
        # 1. Wait for complete or timeout
        # 2. Return whether the move was successful.
        start_time = time()
        while True:
            now_time = time()
            duration = now_time - start_time
            if(duration > timeout):
                return False
            if self.check_for_confirm(self.COMPLETED_CMD):
                self.print_debug(f'Move command completed.')
                return True
    def write_move(self, packet):        
        self.print_debug(f'Executed move packet: {packet}')


That’s pretty much it. I’ve kept this article light, as I’m saving most of my free time for coding. But, feel free to ask questions in the comments below.

Training a CNN to Classify LEGOs

This article is part of a series. It should explain the code used to train our convolutional neural-network (CNN) LEGO classifier.

If you want to code along with this article, we’ve made it available in Google’s Colab:

Or if you want to run the code locally:

It’s a WIP, so comment below if you run into any issues.

Classifier Code:

Our code started with a notebook found on Kaggle:

However, there problems in the code. I rewrote most of it, so I’m not sure how much of the original is left. Still, cite your sources!

Some of the issues were:

  • It used a model more complex than needed.
  • The code format was a mess.
  • Mismatch of target output and loss.

It was the last one which is super tricky, but critical. It’s a hard to catch bug which inaccurately reports high accuracy. I’ll discuss it more below, but it’s a trap I’ve fallen into myself. Regardless of the issues, it was good jump-starter code, since we’ve never worked with a CNN.

Project Setup (local only)

If you are running this code locally, you will need to do the following.

Enter the command prompt and navigate to your home directory. We’re going to clone the project repository (repo), then, clone the data repo inside the project folder.

git clone
cd lego_sorter
git clone

Then, open your Python IDE, set your directory to ./lego_sorter, and open

Lastly, if you see a cell like this:

!git clone
!mkdir ./data
!mkdir ./data/output

Skip or delete them, they are need when running the Colab notebook. Of course, if you are running the Colab notebook, make sure to execute them.

Classifier Code: Needed Libraries

Below is the code we used. Reviewing it, I see some ways to clean it up, so know it may change in the future.

Here’s a breakdown of why the libraries are needed:

  • tensorflow – Google’s main deep-learning library, it’s the heart of the project.
  • keras – a library abstracting a lot of the details from creating a machine learning model.
  • json – we write the classes to file for use later.
  • tensorboard – a library for visualizing your training session.
  • webbrowser – this is opens your webrowser to Tensorboard.

# Import needed tools.
import os
import matplotlib.pyplot as plt
import json
import numpy as np
from scipy import stats

# Import Keras
import tensorflow as tf
import tensorflow.keras
from tensorflow.keras.layers import Dense,Flatten, Dropout, Lambda
from tensorflow.keras.layers import SeparableConv2D, BatchNormalization, MaxPooling2D, Conv2D, Activation
from tensorflow.compat.v1.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard, CSVLogger, ReduceLROnPlateau
from tensorflow.keras.preprocessing import image

# Tensorboard
from tensorboard import program
import webbrowser
import time

If you are following along with this code locally and need help setting up these libraries, just drop a comment below. I got you.

Classifier Code: Parameters

The parameters sections is the heart of the training, I’ll explain what the parameters are doing and highlight those you might want to tweak.

continue_training       = False
initial_epoch           = 0
clear_logs              = True

input_shape             = (300, 300, 3) # This is the shape of the image width, length, colors
image_size              = (input_shape[0], input_shape[1]) # DOH! image_size is (height, width)
train_test_ratio        = 0.2
zoom_range              = 0.1
shear_range             = 0.1

# Hyperparameters
batch_size              = 16
epochs                  = 40
steps_per_epoch         = 400
validation_steps        = 100 
optimizer               = 'adadelta' 
learning_rate           = 1.0
val_save_step_num       = 1

path_to_graphs          = './data/output/logs/'
model_save_dir          = './data/output/'
train_dir               = './lego_id_training_data/gray_train/'
val_dir                 = './lego_id_training_data/gray_test/'

Parameters: Training Session

The first few parameters help continue from an interrupted training session. For example, if your session is interrupted at epoch 183, then you could set continue_training = True and initial_epoch = 184, then execute the script. This should then load the last best model and pick back up training where you left off. Lastly, if you set clear_logs = True then it clears the Tensorboard information. So, if you continue a session, you will want to set this to False.

This section is a WIP and there are several issues. First, the Tensorboard logs should be saved in separate folders and shouldn’t need to be cleared. Also, when continuing a training session, it resets the best validation score (tracked for saving your model before overfitting) resulting in a temporary dip in performance.

Parameters: Image Data

The input_shape refers to the dimensions of an image: height, width, and color (RGB) values. image_size comes from the input_shape.

Note, one issue I had early on with image_size. I tried non-square images (which hurt training and aren’t recommended for CNNs) and found the hard way most of the image parameters for width and height reverse their order.

For example, this is what’s needed:

    target_size = (height_here, width_here),

I was expecting:

    target_size = (width_here, height_here),

It bit me hard, as most frameworks I’ve used expect width first and then height. I mean, even when we talk screen resolution we list width then height (e.g., 1920x1080). Just be aware when using rectangle images. Always RTFM (because, apparently, I didn’t).

The train_test_ratio controls how many images are held back for testing the model. I’d have to run through the code again, but I don’t think this is needed. As the preprocessing script created a folder with validation images. Hmm, I’ll add it to my tech debt list.

The zoom_range parameter controls how far the script should zoom in on the images. And, lastly, shear_range controls how much of the images to clip from the edges before feeding them to the CNN.

Parameters: CNN Hyperparameters

A “hyperparameter” is what machine-learning engineers call parameters which may impact the outcome of training a neural-net.

Here are the hyperparamters we’ve exposed:

batch_size refers to the number of photos a neural-net should attempt predictions on before updating the weights of each perceptron. Note, the highest batch size is usually limited by your GPU RAM. Locally, I use a GTX 1060 with 6GB of RAM–I couldn’t get a batch bigger than around 16. YMMV.

steps_per_epoch are the number of batches to go through before considering one epoch complete. An epoch is an arbitrary number representing how many batches * steps_per_epoch to go through before considering the training complete.

So, the length of training would be training schedule = epochs * steps_per_epoch * batch_size

validation_steps is the number of batches from the training data to use for validating the current weights. This will be used when we fit (train) our classifier and when we evaluate it.

optimizer is the name of the optimizer used. This is the heart of training, as it is responsible for deciding how the the weights should be updated after each batch.

I’ve setup the code to only use one of three optimizers, either adam, adagrad, sgd.

def get_optimizer(optimizer, learning_rate = 0.001):
    if optimizer == 'adam':
        return tensorflow.keras.optimizers.Adam(lr = learning_rate, beta_1 = 0.9, beta_2 = 0.999, epsilon = None, decay = 0., amsgrad = False)
    elif optimizer == 'sgd':
        return tensorflow.keras.optimizers.SGD(lr = learning_rate, momentum = 0.99) 
    elif optimizer == 'adadelta':
        return tensorflow.keras.optimizers.Adadelta(lr=learning_rate, rho=0.95, epsilon=None, decay=0.0)

Here is more information on optimizers.

Easy to read:

Primary source:

The primary reason, as I understand it, to use adagrad over adam, is adagrad’s learning_rate will naturally modify itself to be more conducive to optimal convergence.

However, there are many optimizers. A lot of them available in Keras:

  • Stochastic Gradient Descent (SGD)
  • RMSprop
  • Adagrad
  • Adadelta
  • Adam
  • Nadam
  • Adamax

Keras’ docs on optimizers:

The learning_rate controls how drastically the optimizer should change the perceptrons’s weights when they have made an incorrect prediction. Too high, it won’t converge (learn) too low and it will take a while.

You will find a lot of documentation saying, “The default learning rate of an optimizer is best, it doesn’t need to be changed.” I’ve found this advice to be true, well, mostly. I did run into an issue when using adam’s default setting of 0.001 in this project. The neural-net just didn’t learn–I had to drop it to around 0.0001, which did much better.

A starter read on learning rate:

It’s not exhaustive. If you interested in tweaking the optimizer or learning rate, Google and read as much as possible.

Lastly, val_save_step_num controls how many training epochs should pass before the validator tests whether your model is performing well on the test set. We have the code setup such if the validator says the model is performing better than any of the previous tests within this training session, then it will save the model automatically.

Classifier Code: Data Preparation

The make_dir allows making a directory, if it doesn’t already exist. We then use it to create our model save directory.

def make_dir(dir_path):
    if not os.path.exists(dir_path):

# Create needed dirs

The next bit saves the classes the train_gen found to a file. This is useful later when we are trying to quickly deploy the model to production.

# Save Class IDs
classes_json = train_gen.class_indices
num_classes = len(train_gen.class_indices)

This saves one object to a json file. The key (e.g., “2456”) represents the code provided by LEGO. And the value is the numeric class assigned by the classifier.

    "2456": 0,
    "3001": 1,
    "3002": 2,
    "3003": 3,
    "3004": 4,
    "3010": 5,
    "3039": 6,
    "32064": 7,
    "3660": 8,
    "3701": 9

We can do the following after we’ve trained the model:

predicted_lego_code = json_classes[model.predict()]

And the model will return the LEGO class it has identified.

Classifier Code: Data Generator

When dealing with CNNs, often, the training data are too large to fit in RAM, let alone GPU RAM, at once.

Instead, a DataGenarator is used. A DataGenerator is class provided by Keras, it loads training data in manageable chunks to feed to your model during training. Let’s run through using it.

We initialize ImageDataGenerator – a subclass of kerasDataGenerator. Then, we create two flows, one for loading data from the training folder into the model. The other is the same, however, it loads data from the test folder. The latter will be used to validate the model.

Parameters used in our ImageDataGenerator:

  • shear_range – this controls how much of the images’ edge is trimmed off as a percentage of the whole image. This is useful for quickly reducing the size of images (thereby increasing training speed).
  • zoom_range – is how far to zoom in before feeding the image to the model.
  • horizontal_flip – if this is set to true, the images are randomly mirrored horizontally. This essentially doubles your training images. Though, it shouldn’t be used in all cases. If the target has a “handediness” to it, then this would destroy accuracy. A simple example of this downfall would be training a CNN to determine whether baseball player is left or right handed.
  • validation_split – determines the percentage of images held back for validation.
# These Keras generators will pull files from disk
# and prepare them for training and validation.
augs_gen = ImageDataGenerator (
    shear_range = shear_range,  
    zoom_range = shear_range,        
    horizontal_flip = True,
    validation_split = train_test_ratio

Now,the parameters of the ImageDataGenerator.flow_from_directory methods:

  • target_size – this one bit me. It’s the size of your images as a tuple (e.g., “(150, 150)”). It expects height then width.
  • batch_size – this is the number of images loaded into the GPU RAM and trained on before updating the weights.
  • class_modean import argument. This sets up the targets for the model’s attempt at prediction. sparse indicates the targets will be LabelEncoded.

Below lies a tale of woe I keep hinting at.

If you have more than one class to predict, like us, you have two options. Either sparse or categorical.




1 2 3
1 0 0
0 1 0
0 0 1
0 1 0

However, this is where the bug in the original code was. It had setup the targets as categorical, however, it used binary_crossentropy as the loss function. This error is difficult to catch–it’s the machine-learning equivalent of the “there” and “their” error.

With the mismatch of targets and loss function there’s no help either. The model will still compile and train without problems. But the cruel combination of categorical targets and binary_crossentropy leads to an extremely high accuracy but an extremely bad production accuracy. The problem is the loss function is only looking at column 1 in the categorical table above. If the model model predicts it is 1 when the first column is 1 then it thinks its “correct.” Otherwise, if the model predicts a 0 when column 1 is 0, then the model still thinks its correct. After all, “it wasn’t 1.” And to be clear, the model isn’t wrong–we’ve just given it the wrong target labels.

This is the quintessential “hotdog, not a hotdog” problem.

In short, if you feel your model quickly trains to an accuracy too good to be true, it is.

train_gen = augs_gen.flow_from_directory (
    target_size = image_size, # THIS IS HEIGHT, WIDTH
    batch_size = batch_size,
    class_mode = 'sparse',
    shuffle = True

test_gen = augs_gen.flow_from_directory (
    target_size = image_size,
    batch_size = batch_size,
    class_mode = 'sparse',
    shuffle = False

Classifier Code: Building the Model

Close to done. I’m not going to go over the design of a CNN for two reasons. I’m still learning what it all means and there are much better explanations elsewhere.

However, there are a couple of things important to us.

  • num_classes is the number of LEGOs we are trying to classify.
  • activation on the last layer controls the type of output from the CNN. It will need to correspond with the optimizer and will need to correspond to the class_mode setting of the the DataGenerators.
  • build_model is a convenience function. It allows us to quickly build a Keras CNN model and return it to be used.
  • model.summary outputs a text diagram of the model.
  • model.compile prepares the entire model for training.
def build_model(opt, input_shape, num_classes):
    model = tf.keras.models.Sequential()
    model.add(tf.keras.layers.Conv2D(32, (3, 3), input_shape = input_shape))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(64, (3, 3)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Conv2D(128, (3, 3)))
    model.add(tf.keras.layers.MaxPooling2D(pool_size=(2, 2)))
    model.add(tf.keras.layers.Flatten())  # this converts our 3D feature maps to 1D feature vectors
    model.add(tf.keras.layers.Dense(num_classes, activation = 'softmax'))
    return model

# Create model

selected_optimizer = get_optimizer(optimizer, learning_rate)

model = build_model(selected_optimizer, input_shape, num_classes)

    loss = 'sparse_categorical_crossentropy',
    optimizer = selected_optimizer,
    metrics = ['accuracy']

Classifier Code: Creating Callbacks

Before we execute training we should setup of Keras callbacks.

These pre-written callback functions will be passed to the model and executed at important points throughout the training session.

  • ModelCheckpoint this method is called after the number of epochs set by val_save_step_num. It runs a validation batch and compares the val_loss against other past scores. If it is the best val_loss yet, the method will save the model and, more importantly, weights to the best_model_weights path.
  • TensorBoard opens a TensorBoard session for visualizing the training session.
best_model_weights = model_save_dir + 'base.model'

checkpoint = ModelCheckpoint(
    monitor = 'val_loss',
    verbose = 1,
    save_best_only = True,
    mode = 'min',
    save_weights_only = False,
    period = val_save_step_num

tensorboard = TensorBoard(
    log_dir = model_save_dir + '/logs',

Before any KerasCallbacks can be added to the training session, they must be gathered into a list, as it is how training method will except to receive the

callbacks = [checkpoint, tensorboard]

Classifier Code: Training

Gross, I need to rewrite this portion of the code. It is a kludge way to restart a training session after interruption.

It checks if you indicated you want to continue a session. It then loads the best saved model and evaluates it on the test data.

if continue_training:
    model_score = model.evaluate_generator(test_gen, steps = validation_steps)

    print('Model Test Loss:', model_score[0])
    print('Model Test Accuracy:', model_score[1])

And here, we come to the end. The following function executes the training session. It will initialize the callbacks, then train for the number of epochs set. Each epoch it is pulling a batch of data from the train_gen (DataGenerator), attempting predictions, and then updating weights based on outcomes. After the number of epochs set in the checkpoint callback, the model will pull data from the test_gen, these data it has “never” seen before, and attempt predictions. If the outcome of the test is better than the outcome of any previous test, the model will save.

history = model.fit_generator(
    steps_per_epoch  = steps_per_epoch, 
    validation_data  = test_gen,
    validation_steps = validation_steps,
    epochs = epochs, 
    verbose = 1,
    callbacks = callbacks

Whew, that’s it. The above model converged for me after 20 minutes to 98% validation accuracy. However, there’s lots left to do though. As I’ve said before, “Just because we have high validation accuracy does not mean we will have high production accuracy.” In the future, I’ll be writing about the turntable for quickly generating training data. It’s nifty. Based on a NEMA17, RAMPS kit, and RPi with RPi Camera. It’s the bomb-dot-com.

A LEGO Classifier -- CNN and Elbow Grease

I’ve a robot friend. To be clear, the friend is not a robot, rather, we build robots together. One of the projects we tossed about is building a LEGO sorting machine. Rockets is the friends name–again, not a robot–teaches robotics to kids. For their designs, LEGOs are the primary component. Unfortunately, this results in much time spent to preparing for an event.

He mentioned to me, “What I really need is a sorting machine.” And proceeded to explain his plain for building one.

I was skeptical for some time, but finally, I got drawn in he talked about incorporating a deep neural-network. More specifically, a convolutional neural-network (CNN). I’d been looking for an excuse to build a CNN. This was a good one.

Anyway, these blog posts are our journal in build the LEGO sorter.

Before we get started, a note about this series: I won’t spend much time on explaining parts of the work where it is better documented elsewhere. Instead, I’m going to focus on stuff I’ve found everyone else omitting. Like, putting the neural-network to work. This one bugged me. Everyone loves to say, “Dude, my classifier has a validation accuracy of 99.999%!” That’s great, but as we found out, validation accuracy doesn’t always translate into production accuracy.


If you don’t want to listen to my rambling or want to do things the easy way, you can jump straight into the code using Google’s Colab:

This notebook is setup to download Rocket’s data and train the classifier. Thanks to Google for providing a GPU to train on and Github for hosting the data.

Or if you want to run the code locally, Rocket made the training data public. Just know, you’ll need a GPU.

Then jump to the code by clicking here.

The Idea

It was pretty straightfoward to begin with. We’d find some images of LEGOs on the internet and then train a CNN to classify them by their part code. It was a bit naive, but that’s where must projects being, right? Hopeful naiveté.

Anyway, we searched the webs for projects like this, as we hoped they had prepared images. Google told us several folks doing similar work. I’m not going to list them all, only what I considered worth a read:

This is an extremely well documented project by Paco Garcia.

So, after reading a few articles, we figured we could do this. We just needed data. After a bit more searching we found the following datasets:

I wasn’t happy about these datasets. Their structures weren’t great and they were not designed to help train a classifier. But then, Rockets found Paco had actually opened his dataset to the public:

One bit more, Paco also made his code public:

Paco, you are a robot friend, too!

Alright, we were encouraged by Paco. We knew the project would be possible. However, we didn’t want to step on brownfield. We needed the green. Or if you don’t speak dev, we didn’t want to do this the easy way and replicate Paco’s work. We wanted to really beat ourselves up by doing everything from scratch.

Creating a Dataset

As I stated before, I didn’t like any datasets but Paco’s. It was real images and meant to train a classifier. But, they weren’t the LEGOs we wanted to classify. Rockets’s LEGO projects involve a lot of technic bricks, which didn’t seem to be in Paco’s mix. So, we set out to create our own.

The first attempt creating training images was by rendering images from .stl files found on the internet using the Python version of Visualization Toolkit. I won’t cover it here since it was a fail and as I’ll create an article later about the stuff we tried and didn’t work.

Anyway, while I was working on it Rockets had a brilliant plan. He created an instrument to take pictures of a LEGO on a spin plate. It used a Raspberry Pi, Pi Cam, and stepper motor, and unicorn farts.

Then Rockets began taking pictures of 10 classes of LEGOs. Not sure how long this took , but shortly he pinged me saying he had 19,000 images. (Ok, ok, he might be part robot.)

I’m not going to attempt explaining the build, as I believe Rockets will do this later. Besides, about the only part I understand is the unicorn flatulence.

Alright! Now I needed to get my butt in gear and fix up the software.

Preprocessing Code

Before we could start training a CNN on Rockets’s images we needed to do some preprocessing. First, the images came in at full resolution, but we needed to crop them, as the CNN train better on square image. Of course, the image would need to be cropped as not to lose the target data (the LEGO).

For example

Also, the trainer would be expecting a file structure something like this:

├── test
│   ├── 2456
│   │     └── 2456_0001.jpg
│   │     └── 2456_0002.jpg
│   │     └── 2456_0003.jpg
│   │     └── ....
│   ├── 3001
│   ├── 3002
│   ├── 3003
│   ├── 3004
│   ├── 3010
│   ├── 3039
│   ├── 32064
│   ├── 3660
│   └── 3701
└── train
    ├── 2456
    ├── 3001
    ├── 3002
    ├── 3003
    ├── 3004
    ├── 3010
    ├── 3039
    ├── 32064
    ├── 3660
    └── 3701

Therefore, I’ve written a Python script to do the following

  1. Take a path where images are stored by name of the class
  2. Load the image
  3. Resize the image to specified size
  4. Crop from the center of the image out
  5. Create a train and test folder
  6. Create sub-folders in train and test with the class name
  7. Shuffle the images in the process
  8. Save the cropped file in the appropriate folder, depending what percentage of images you want to withhold for testing.
  9. Repeat steps 2-8 for every image

Let’s jump into the code.

The full code can found here:

But I’ll walk through the code below.

Preprocessing Code: Needed Libraries

import os
import glob
import cv2
import random

The only non-standard Python library we are using is:

This may be a bit tricky depending on which OS you are using and whether you are using Anaconda or straight Python. However, the following is what we used:

pip install

If you have any troubles load the cv2 library, it probably means there was an issue installing OpenCV. Just let me know in the comments and I can help debug.

Preprocessing Code: Processing Parameters

The following control the the flow of preprocessing

  • dry_run: if set to true, it does not save the images, but does everything else
  • gray_scale: converts the images to gray-scale.
  • root_path: the root folder of the project
  • show_image: shows the before and after of the image.
  • output_img_size: adjust this to the size of your desired output image
  • grab_area: the total area of the original image to take before resizing
  • train_test_split: the rate of test images to withhold
  • shuffle_split: should the images be shuffled in the process
  • part_numbers: a list of all the class folders contained in the input
# Parameters

dry_run                 = False # If true, will print output directory.
gray_scale              = True

root_path               = './data/'
input_path              = f'{root_path}raw/size_1080/'
output_path             = f'{root_path}cropped/'

show_image              = False

output_img_size         = (300, 300)
grab_area               = 500
train_test_split        = 0.3
shuffle_split           = True

part_numbers            = [

Below is the main loop. It is going to repeat for every folder it finds in the the root folder.

for part_number in part_numbers:

    part_input_path  = f'{input_path}{part_number}/'
    # Get input file paths.
    image_files = glob.glob(f'{part_input_path}*.jpg')
    num_files = len(image_files)

    # Image index.
    index = 0

    # If true, the images will be loaded and then split at random.
    if shuffle_split:
        file_index = random.sample(range(1, num_files), num_files - 1)
        file_index = range(1, num_files)

This is the inner loop, it loads each of the image files in the class class folder, modifies it, and saves it to the output folders.

    for file_num in file_index:
        # Increment the file index.
        index += 1
        # Load the image
        input_file_path = f'{input_path}{part_number}/{str(file_num).zfill(4)}.jpg'
        print(f'LOADED: {input_file_path}')
        # Crop raw image from center.
        img = cv2.imread(input_file_path)

        # Get the center of the image.
        c_x, c_y = int(img.shape[0] / 2), int(img.shape[1] / 2)
        img = img[c_y - grab_area: c_y + grab_area, c_x - grab_area: c_x + grab_area]
        # Resize image
        img = cv2.resize(img, output_img_size, interpolation = cv2.INTER_AREA)
        # Should we convert it to grayscale?
        if gray_scale:
            img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        # Show to user.
        if show_image:
            cv2.imshow('image', img)

        # Determine if it should be output to train or test.
        test_or_train = 'train'        
        if index < int(num_files * train_test_split): 
            test_or_train = 'test'
        # Prepare the output folder.
        color = ''
        if gray_scale:
            part_output_folder = f'{output_path}gray_scale/{test_or_train}/{part_number}/'
            part_output_folder = f'{output_path}color/{test_or_train}/{part_number}/'
        # Make the output directory, if it doesn't exist.
        if not os.path.exists(part_output_folder):

        # Create part path.
        part_image_path = f'{part_output_folder}{part_number}_{index}.jpg'
        # Output
        if dry_run:
            print(f'Would have saved to: {part_image_path}')
            print(f'SAVED: {part_image_path}')
            cv2.imwrite(part_image_path, img)

Fairly straightfoward. Just make sure to run to run the script from the main directory. For example

└── <--- run from here
└── data
    ├── test
    │   ├── 2456
    │   │     └── 2456_0001.jpg

Or, if you don’t want to do it the hardway. Rockets has made his images available


Next, I’m going to dive into the Tensorflow CNN code. Stay tuned, my robot friends!

Setup a Local MySQL Database

The last two articles have been getting oriented to SQL, however, the information in them will disappear quickly if we don’t give you a way to practice on data meaningful to you. Let’s face it, as much fun as it is to find out random employees salaries, those don’t mean anything to you.

This article will show you how to setup a copy of MySQL Server on your PC, connect to it, load data from a CSV, and query those data. There’s a lot to get done, so let’s get started.

Local MySQL Server Setup

Each of the three operating systems are a little different on how you must go about setting up a local copy of MySQL Server. Unfortunately, Windows is the most complex. Anyway, feel free to skip to the appropriate section


First, download the MySQL MSI Installer.

After you’ve downloaded it, open the the file.

If you are prompted to “Upgrade” go ahead and say “Yes”

The installer is a bit confusing, but don’t worry, most everything is fine left on its default.

Click on MySQL Server then the Add button. Add “MySQL Server” and “Connector/OBDC x64.” Then click “Next.” You will see a Installation summary, click on “Execute” and wait for the download to finish and then install wizard to begin.

As I stated, most of the install wizard questions we will leave as default.

On the “Accounts and Roles” section you will need to decide on your password for the SQL Server on your local PC. I obviously didn’t pick a great one. MySQL Server will automatically setup a user with the name of root and a password you set here. This root user will permissions to do anything to the server.

Execute the installer and let it finish.

Once it finishes you should now have MySQL Server installed on your local PC. Skip to the last section to test it out.


Mac’s a bit simpler.

Download the .dmg installer.

Click on “No thanks, just start my download” and when the download is finished, double click on it.

Double click on the installer. You will need to enter your system password for the installer to setup MySQL Server, but you will also need to provide the MySQL Server root user a password. Don’t consfuse the two, unless you plan for them to be the same.

Once it finishes you should now have MySQL Server installed on your Mac. Skip to the last section to test it out.

Testing your Local SQL Server

Go ahead and open MySQL Workbench and let’s connect to this new local server.

Click on the “New Connection” icon and leave everything default, except the “Connection Name,” here enter localhost.

Double click on the new connection and enter the password you created during installation. Voila!

Let’s run a command to make sure everything is working.

SHOW databases;

You should see:

Loading CSV

Create a database

CREATE DATABASE name_of_your_database;

Before we create a table, make sure we are using the created datebase.

USE name_of_your_database

Now, we need to go over a bit of boring stuff before we get to loading the CSV. Sorry, I’ll try to keep it brief.


In SQL, every field has something called a “datatype.” You can think of a datatype as a tag on your data tell the computer how to read them.

Ultimately, a computer can’t make sense of any human-words. It has to convert everything into 0 and 1 before it understand its. If this conversion was left up to the computer entirely, it might see a word and say, “Oh, yah, this is one of those French words,” when it is actually English, thus, the conversion to 0 and 1s are incorrect.

You may have encountered this in a spreadsheet. If you open a spreadsheet and see something like

The data I actually provided the spreadsheet were:


Notice the zero in front of 6579, this was due to the computer saying, “Oh, these data look like numbers–and since the human didn’t tell me otherwise, I’m going to treat them like numbers. And, well, it is perfectly valid to drop the leading zero of a number.”

I wish all datatypes were this simple, however, the above example is about is simple as it gets. We can try to skip over a lot of nuances of datatypes and focus on the three we will probably see the most:

  • DATE
  • TIME
  • INT (short of integer)
  • CHAR (short for character)

Here are what samples of the above data would look like in a spreadsheet:

2019-10-01 2019-10-01 12:01:22 42 42.4 The answer to it all.


Dates are pretty straightforward, they store a year, month, and day as a number. However, when we retrieve this number it is put in the human readable format listed above.


Time is exactly like DATE, but it also includes hours, minutes, and seconds (sometimes milliseconds).


An INT stores a number no bigger than 2,147,483,647. However, one thing an INT cannot do is store a partial numbers. For example, if we try to store 0.5 in an INT field it will probably get converted to 1.


FLOATs fill in where INTS fail. That is, a FLOAT store only up to the precision you specifiy. For example, if we tried to store a 0.5 in a FLOAT with two precision points we’d be fine. However, if we tried to store 0.4567 in a FLOAT with only two precision points, then it would be converted to 0.46, or rounded up.


CHAR is meant to store human readable text. When you put data into a CHAR field, the SQL program knows this is human readable information and doesn’t try to figure it out at all. It leaves it literally as it is. This is why CHARS are known as “literals.” They are also called “strings,” because the computer seems them as a bunch of characters strung together.

SQL Datatypes

In SQL there are a lot of datatypes, however, some you may never need to use. One way SQL is a bit different than a spreadsheet is it wants to know ahead of time the size it needs to make the field.

CHAR Revisited

This will mainly impact us when dealing with CHAR. When the SQL program creates a CHAR field it wants to know the maximum number of characters which will ever go into the field.

For example:

  • CHAR(19) could hold the following: <-------19-------->
  • CHAR(5) could hold the following: <-5->

One important note, if you put a single character in a CHAR(5) field, then the SQL program will fill in the other four characters with a NULL. In short, a CHAR field will always be full.


There is another type of character field which allows you to put more or less data than was decided at the beginning. The VARCHAR datatype stands for “variable character” field. It will allow you to store up to 65,535 characters on MySQL. This is around 3 pages of text.


Why have CHAR at all? Shouldn’t we always use VARCHAR for everything just in case? Well, usually, but not always.

Often, when you design a database you want to make it as efficient as possible (I mean, it’s going to be successful business product, right?). The maximum size of the data a human will try to stuff in the field is important to the SQL program, as it tries to store data in such a way it minimizes space used and maximizes efficiency in retrieving the data.

In short, CHAR has a few advantages for your database. And take Social Security Numbers, if your database has to store these data then it should probably be a CHAR as these data have historically been 9 characters (11 if you include dashes).

Pop quiz, why don’t we store a Social Security Number as an INT?

Creating the Table

Ok, I’ve put you through a crash course of datatypes to get you to this point.

We are going to:

  1. Create a database called tasksDB
  2. Active tasksDB
  3. Create a table on tasksDB, setting the fields datatype
  4. Then import a CSV into this table
  5. Lastly, we will write a query against the table

Ready!? Let’s do it!

Creating Database

Open Workbench, type, and run the following:


SHOW databases;

USE tasksDB;

SELECT * FROM tasks;

LOAD DATA INFILE './task.csv'  INTO TABLE tasks

SELECT * FROM tasks;

    title VARCHAR(255) NOT NULL,
    start_date DATE,
    due_date DATE,
    status TINYINT NOT NULL,
    priority TINYINT NOT NULL,
    description TEXT,
    PRIMARY KEY (task_id)