Ladvien's Lab

Latest Posts

Training a Toxic Comment Detector

I'm writing learning-notes from implementing a "toxic comment" detector using a convolutional neural network (CNN). This is a common project across the interwebs, however, the articles I've seen on the matter leave a few bits out. So, I'm attempting to augment public knowledge--not write a comprehensive tutorial.

A common omission is what the data look like as they travel through pre-processing. I'll try to show how the data look before falling into the neural-net black-hole. However, I'll stop short before reviewing the CNN setup, as this is explained much better elsewhere. Though, I've put all the original code, relevant project links, tutorial links, and other resources towards the bottom.

The Code

Code: Imports

from __future__ import print_function

import numpy as np
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense, Input, GlobalMaxPooling1D, Conv1D, Embedding, MaxPooling1D
from keras.models import Model
from keras.initializers import Constant
import gensim.downloader as api
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score

The above code includes several packages which would need to be downloaded. The easiest way is to use pip .

pip install keras
pip install gensim
pip install pandas

Code: Variables

BASE_DIR = 'your project directory'
TRAIN_TEXT_DATA_DIR = BASE_DIR + 'train.csv'
MAX_SEQUENCE_LENGTH = 100
MAX_NUM_WORDS = 20000
EMBEDDING_DIM = 300
VALIDATION_SPLIT = 0.2

The above variables define the preprocessing actions and the neural-network.

TRAIN_TEXT_DATA_DIR

The directory containing the data file train.csv

MAX_SEQUENCE_LENGTH

The toxic_comment data set contains comments collected from Wikipedia. MAX_SEQUENCE_LENGTH is used in the preprocessing stages to truncate a comment if too long. That is, greater than MAX_SEQUENCE_LENGTH . For example, a comment like:

You neeed to @#$ you mother!$@#$&...

Probably doesn't need much more for the network to discern it's a toxic comment. Also, if we create the network based around the longest comment, it will become unnecessarily large and slow. Much like the human brain (See Overchoice ), we need to provide as little information as needed to make a good decision.

MAX_NUM_WORDS

This variable is the maximum number of words to include--or, vocabulary size.

Much like truncating the sequence length, the maximum vocabulary should not be overly inclusive. The number 20,000 comes from a "study" stating an average person only uses 20,000 words. Of course, I've not found a primary source stating this--not saying it's not out there, but I've not found it yet. (Halfhearted search results in the appendix.)

Regardless, it seems to help us justify keeping the NN nimble.

EMBEDDING_DIM

In my code, I've used gensim to download pre-trained word embeddings. But beware, not all pre-trained embeddings have the same number of dimensions. This variables defines the size of the embeddings used. Please note, if you use embeddings other than glove-wiki-gigaword-300 you will need to change this variable to match.

VALIDATION_SPLIT

A helper function in Keras will split our data into a test and validation . This percentage represents how much of the data to hold back for validation.

Code: Load Embeddings

print('Loading word vectors.')
# Load embeddings
info = api.info()
embedding_model = api.load("glove-wiki-gigaword-300")

The info object is a list of gensim embeddings available. You can use any of the listed embeddings in the format api.load('name-of-desired-embedding') . One nice feature of gensim 's api.load is it will automatically download the embeddings from the Internet and load them into Python. Of course, once they've been downloaded, gensim will load the local copy. This makes it easy to experiment with different embedding layers.

Code: Process Embeddings

index2word = embedding_model.index2word
vocab_size = len(embedding_model.vocab)
word2idx = {}
for index in range(vocab_size):
    word2idx[index2word[index]] = index

The two dictionaries index2word and word2idx are key to embeddings.

The word2idx is a dictionary where the keys are the words contained in the embedding and the values are the integers they represent.

word2idx = {
    "the": 0,
    ",": 1,
    ".": 2,
    "of": 3,
    "to": 4,
    "and": 5,
    ....
    "blah": 12984,
    ...
}  

index2word is a list where the the values are the words and the word's position in the string represents it's index in the word2idx .

index2word = ["the", ",", ".", "of", "to", "and", ...]

These will be used to turn our comment strings into integer vectors.

After this bit of code we should have three objects.

  1. embedding_model -- Pre-trained relationships between words, which is a matrix 300 x 400,000.
  2. index2word -- A dictionary containing key-value pairs, the key being the word as a string and value being the integer representing the word. Note, these integers correspond with the index in the embedding_model .
  3. word2idx -- A list containing all the words. The index corresponds to the word's position in the word embeddings. Essentially, the reverse of the index2word .

Code: Get Toxic Comments Labels

print('Loading Toxic Comments data.')
with open(TRAIN_TEXT_DATA_DIR) as f:
    toxic_comments = pd.read_csv(TRAIN_TEXT_DATA_DIR)

print('Getting Comment Labels.')
prediction_labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
labels = toxic_comments[prediction_labels].values

This loads the toxic_comment.csv as a Pandas dataframe called toxic_comments . We then grab all of the comment labels using their column names. This becomes a second a numpy matrix called labels .

We will use the text in the toxic_comments dataframe to predict the data found in the labels matrix. That is, toxic_comments will be our x_train and labels our y_train .

You may notice, the labels are also included in our toxic_comments . But they will not be used, as we will only be taking the comment_text column to become our sequences here in a moment.

toxic_comments dataframe

id comment_text toxic severe_toxic obscene threat insult identity_hate
5 00025465d4725e87 Congratulations from me as well, use the tools well. · talk 0 0 0 0 0 0
6 0002bcb3da6cb337 COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK 1 1 1 0 1 0
7 00031b1e95af7921 Your vandalism to the Matt Shirvington article has been reverted. Please don't do it again, or you will be banned. 0 0 0 0 0 0

labels ( y_train ) numpy matrix

| 0 | 1 | 2 | 3 | 4 | 5 | |----|----|----|----|----|----|-----| | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 1 | 1 | 0 | 1 | 0 | | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 0 | 0 | 0 | 0 |

Code: Convert Comments to Sequences

print('Tokenizing and sequencing text.')

tokenizer = Tokenizer(num_words=MAX_NUM_WORDS)
tokenizer.fit_on_texts(toxic_comments['comment_text'].fillna("<DT>").values)
sequences = tokenizer.texts_to_sequences(toxic_comments['comment_text'].fillna("<DT>").values)
word_index = tokenizer.word_index

print('Found %s sequences.' % len(sequences))

The Tokenizer object comes from the Keras API. It takes chunks of texts cleans them and then converts them to unique integer values.

The num_words argument tells the Tokenizer to only preserve the word frequencies higher than this threshold. This makes it necessary to run the fit() on the targeted texts before using the Tokenizer. The fit function will determine the number of occurrences each word has throughout all the texts provided, then, it will order these by frequency. This frequency rank can be found in the tokenizer.word_index property.

For example, looking at the dictionary below, if num_words = 7 all words after "i" would be excluded.

{
    "the": 1,
    "to": 2,
    "of": 3,
    "and": 4,
    "a": 5,
    "you": 6,
    "i": 7,
    "is": 8,
    ...
    "hanumakonda": 210334,
    "956ce": 210335,
    "automakers": 210336,
    "ciu": 210337
}

Also, as we are loading the data, we are filling any missing values with a dummy token (i.e., "

"). This probably isn't the best way to handle missing values, however, given the amount of data, it's probably best to try and train the network using this method. Then, come back and handle na values more strategically. Diminishing returns and all that.

Code: Padding

data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)

This is an easy one. It pads our sequences so they are all the same length. The pad_sequences function is part of the Keras library. A couple of important arguments have default values: padding and truncating .

Here's the Keras docs explanation:

padding: String, 'pre' or 'post': pad either before or after each sequence.

truncating: String, 'pre' or 'post': remove values from sequences larger than maxlen, either at the beginning or at the end of the sequences.

Both arguments default to pre .

Lastly, the maxlen argument controls where padding and truncation happen. And we are setting it with our MAX_SEQUENCE_LENGTH variable.

padding-sequences-before-after

Code: Applying Embeddings

num_words = min(MAX_NUM_WORDS, len(word_index)) + 1
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word_index.items():
    try:
        embedding_vector = embedding_model.get_vector(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
    except:
        continue

Here's where stuff gets good. The code above will take all the words from our tokenizer , look up the word-embedding (vector) for each word, then add this to the embedding matrix . The embedding_matrix will be converted into a keras.layer.Embeddings object.

I think of an Embedding layer as a transformation tool sitting at the top of our neural-network. It takes the integer representing a word and outputs its word-embedding vector. It then passes the vector into the neural-network. Simples!

Probably best to visually walk through what's going on. But first, let's talk about the code before the for-loop .

num_words = min(MAX_NUM_WORDS, len(word_index)) + 1

This gets the maximum number of words to be addeded in our embedding layer. If it is less than our "average English speaker's vocabulary"--20,000--we'll use all of the words found in our tokenizer. Otherwise, the for-loop will stop after num_words is met. And remember, the tokenizer has kept the words in order of their frequency--so, the words which are lost aren't as critical.

embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))

This initializes our embedding_matrix, which is a numpy object with all values set to zero. Note, if the EMBEDDING_DIM size does not match the size of the word-embeddings loaded, the code will execute, but you will get a bad embedding matrix. Further, you might not notice until your network isn't training. I mean, not that this happened to me --I'm just guessing it could happen to someone .

for word, i in word_index.items():
    try:
        embedding_vector = embedding_model.get_vector(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector
    except:
        continue

Here's where the magic happens. The for-loop iterates over the words in the tokenizer object word_index . It attempts to find the word in word-embeddings, and if it does, it adds the vector to the embedding matrix at a row respective to its index in the word_index object.

Confused? Me too. Let's visualize it.

Let's walk through the code with a word in mind: "of".

for word, i in word_index.items():

By now the for-loop is two words in. The words "the" and "to" have already been added. Therefore, for this iteration word = 'of' and i = 2.

embedding_vector = embedding_model.get_vector(word)

The the word-embedding for the word "of" is

-0.076947, -0.021211, 0.21271, -0.72232, -0.13988, -0.12234, ...

This list is contained in a numpy.array object.

embedding_matrix[i] = embedding_vector

Lastly, the word-embedding vector representing "of" gets added to the third row of the embedding matrix (the matrix index starts at 0).

Here's how the embedding matrix should look after the word "of" is added. (The first column added for readability.)

word 1 2 3 4 ...
the 0 0 0 0 ...
to 0.04656 0.21318 -0.0074364 -0.45854 ...
of -0.25756 -0.057132 -0.6719 -0.38082 ...
... ... ... ... ... ...

Also, for a deep visualization, check the image above. The picture labeled "word embeddings" is actually the output of our embedding_matrix . The big difference? The word vectors in the gensim embedding_model which are not found anywhere in our corpus (all the text contained in the toxic_comments column) have been replaced with all zeroes.

embedding-matrix

Code: Creating Embedding Layer

embedding_layer = Embedding(len(word2idx),
                            EMBEDDING_DIM,
                            embeddings_initializer=Constant(embedding_matrix),
                            input_length=MAX_SEQUENCE_LENGTH,
                            trainable=False)

Here we are creating the first layer of our NN. The primary parameter passed into the Keras Embedding class is the embedding_matrix , which we created above. However, there are several other attributes of the embedding_layer we must define. Keep in mind our embedding_layer will take an integer representing a word as input and output a vector, which is the word-embedding.

First, the embedding_layers needs to know the input dimensions. The input dimension is the number of words we are considering for this training session. This can be found by taking the length of our word2idx object. So, the len(word2idx) returns the total number of words to consider.

One note on the layer's input, there are two "input" arguments for keras.layers.Embedding class initializer, which can be confusing. They are input and input_length . The input is the number of possible values provided to the layer. The input_length is how many values will be passed in a sequence.

Here are the descriptions from the Keras documentation:

input

int > 0. Size of the vocabulary, i.e. maximum integer index + 1.

input_length

Length of input sequences, when it is constant. This argument is required if you are going to connect Flatten then Dense layers upstream (without it, the shape of the dense outputs cannot be computed).

In our case, the input will be the vocabulary size and input_length is the number of words in a sequence, which should be MAX_SEQUENCE_LENGTH . This is also why we padded comments shorter than MAX_SEQUENCE_LENGTH , as the embedding layer will expect a consistent size.

Next, the embedding_layers needs to know the dimensions of the output. The output is going to be a word-embedding vector, which should be the same size as the word embeddings loaded from the gensim library.
We defined this size with the EMBEDDING_DIM variable.

Lastly, the training option is set to False so the word-embedding relationships are not updated as we train our toxic_comment detector. You could set it to True , but come on, let's be honest, are we going to be doing better than Google?

Code: Splitting the Data

nb_validation_samples = int(VALIDATION_SPLIT * data.shape[0])
x_train = data[:-nb_validation_samples]
y_train = labels[:-nb_validation_samples]
x_val = data[-nb_validation_samples:]
y_val = labels[-nb_validation_samples:]

Here we are forming our data as inputs. We convert the data into x_train and x_val . The labels dataframe becomes y_train and y_val . And here marks the end of pre-processing.

But! Let's recap before you click away:

  1. Load the word-embeddings. These are pre-trained word relationships. It is a matrix 300 x 400,000.
  2. Create two look up objects: index2word and word2idx
  3. Get our toxic_comment and labels data.
  4. Convert the comments column from toxic_comments dataframe into the sequences list.
  5. Create a tokenizer object and fit it to the sequences text
  6. Pad all the sequences so they are the same size.
  7. Look up the word-embedding vector for each unique word in sequences . Store the word-embedding vector in th embedding_matrix . If the word is not found in the embeddings, then leave the index all zeroes. Also, limit the embedding-matrix to the 20,000 most used words.
  8. Create a Keras Embedding layer from the embedding_matrix
  9. Split the data for training and validation.

And that's it. The the prepared embedding_layer will become the first layer in the network.

Code: Training

Like I stated at the beginning, I'm not going to review training the network, as there are many better explanations--and I'll link them in the Appendix. However, for those interested, here's the rest of the code.

input_ = Input(shape=(MAX_SEQUENCE_LENGTH,))
x = embedding_layer(input_)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 5, activation='relu')(x)
x = MaxPooling1D(5)(x)
x = Conv1D(128, 3, activation='relu')(x)
x = GlobalMaxPooling1D()(x)
x = Dense(128, activation='relu')(x)
output = Dense(len(prediction_labels), activation='sigmoid')(x)
model = Model(input_, output)
model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])

print('Training model.')
# happy learning!
history = model.fit(x_train, y_train, epochs=2, batch_size=512, validation_data=(x_val, y_val))

Oh! There's one more bit I'd like to go over, which most other articles have left out. Prediction.

Code: Predictions

I mean, training a CNN is fun and all, but how does one use it? Essentially, it comes down to repeating the steps above, but with with less data.

def create_prediction(model, sequence, tokenizer, max_length, prediction_labels):
    # Convert the sequence to tokens and pad it.
    sequence = tokenizer.texts_to_sequences(sequence)
    sequence = pad_sequences(sequence, maxlen=max_length)

    # Make a prediction
    sequence_prediction = model.predict(sequence, verbose=1)

    # Take only the first of the batch of predictions
    sequence_prediction = pd.DataFrame(sequence_prediction).round(0)

    # Label the predictions
    sequence_prediction.columns = prediction_labels
    return sequence_prediction

# Create a test sequence
sequence = ["""
            Put your test sentence here.
            """]
prediction = create_prediction(model, sequence, tokenizer, MAX_SEQUENCE_LENGTH, prediction_labels)

The function above needs the following arguments: * The pre-trained model . This is the Keras model we just trained.
* A sequence you'd like to determine whether it is "toxic". * The tokenizer , which is used to encode the prediction sequence the same way as the training sequences. * max_length must be the same as the maximum size of the training sequences * The prediction_labels are a list of strings containing the human readable labels for the predicted tags (e.g. "toxic", "severe_toxic", "insult", etc.)

Really, the function takes all the important parts of our pre-processing and reuses them on the prediction sequence.

One piece of the function you might tweak is the .round(0) . I've put this there to convert the predictions into binary. That is, if prediction for a sequence is .78 it is rounded up to 1 . This is do to the binary nature of the prediction. Either a comment is toxic or it is not. Either 0 or 1 .

Well, that's what I got. Thanks for sticking it out. Let me know if you have any questions.

Appendix

Full Code

Tutorials

If you want to know more about gensim and how it can be used with Keras. * Depends on the Definition

Data

The data are hosted by Kaggle.

Please note, you will have to sign-up for a Kaggle account.

Average Person's Vocabulary Size

Primary sources on vocabulary size: * How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age * How Large Can a Receptive Vocabulary Be? * Toward a Meaningful Definition of Vocabulary Size * Vocabulary size revisited: the link between vocabulary size and academic achievement * How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age

Image source: Louis Reed
Distributing Machine Learning Jobs

Boss

A human sends machine learning job to the Boss. A Job is JSON object containing the the desired machine learning script and the parameters needed for successful execution. The Boss stores the Job and Creates an Order. The Order is another JSON object representing the state of a requested Job.

         Job #4
 0                        Boss
/|\ +----------------->   ____
/ \                       +""+
                          +__+
                         [ ==.]`)
                   +----+====== 0 +--+
                   +                 |
                Order #3           Job #3
                   |                 |
                Order #2           Job #2
                   |                 |
                Order #1           Job #1

Worker

The Worker uses node-scheduler to fire an HTTP request to the Boss letting it know the Worker is "bored." The Boss will then search through the Orders for the oldest unassigned Order, if it finds one, it will return this Order to the Worker as a JSON object. At this point, the Boss updates the Order's status to "assigned."

The Worker sends another HTTP request, this time requesting the Job information associated with the Order the Boss had assigned.

          Boss
          ____
          +""+
          +__+
         [ ==.]`)
   +----+====== 0 +--+
   +                 +            If the Boss finds an unassigned
Order #3           Job #3         Order it is returned. The worker requests the
   +                 +            related Job. The Boss updates the
Order #2           Job #2         the Order status to "assigned"
   +                 +                   Worker
Order #1           Job #1<-+              ____
  ^                        +----------->  +""+
  |                                       +__+
  +------------------------------------+ [ ==.]`)
          The worker checks with
          the boss periodically
          for the oldest submitted
          Order.

The worker passes the Job information into the appropriate machine learning Python script via stdout . The script is executed and whether successful or not, an Outcome object is passed back to the Worker Node through stdout .

Worker
 ____
 +""+     Job #1
 +__+ +--------------->  Python Script
[ ==.]                         +
  ^                            |
  |                            |
  |                            v
  +------------------------ Outcome #1

The Worker then makes a callback API call and passes the Outcome object to the Boss to be stored in the database

          Boss                                Worker
          ____                                 ____
          +""+                                 +""+
          +__+                                 +__+
         [ ==.]`)                             [ ==.]`)
   +----+====== 0 +------+                       +
   |         |           |                       |
Order #3   Job #3     Outcome #1 <---------------+
   |         |
Order #2   Job #2
   |         |
Order #1   Job #1

MongoDB on Mac

brew install mongodb
nano /usr/local/etc/mongod.conf

Your file should look something like this

systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb
net:
  bindIp: 127.0.0.1

Change the dbPath to where you'd like Mongo to store your databases. Then, start and enable Mongo with brew's services.

brew services start mongo

Sample Objects

Order

{
    "_id" : "5bcc93d67f0b3f4844c87c7a",
    "jobId" : "5bcc93d67f0b3f4844c87c79",
    "createdDate" : ISODate("2018-10-21T14:57:26.980Z"),
    "status" : "unassigned",
}

Job

{
    "_id" : ObjectId("5bcc93d67f0b3f4844c87c79"),
    "hiddenLayers" : [ 
        {
            "activation" : "relu",
            "widthModifier" : 4,
            "dropout" : 0.2
        }, 
        {
            "activation" : "relu",
            "widthModifier" : 2.3,
            "dropout" : 0.2
        }, 
        {
            "activation" : "relu",
            "widthModifier" : 1.3,
            "dropout" : 0.2
        }
    ],
    "dataFileName" : "wine_data.csv",
    "scriptName" : "nn.py",
    "projectName" : "wine_data",
    "depedentVariable" : "con_lot",
    "crossValidateOnly" : true,
    "crossValidationCrossingType" : "neg_mean_squared_error",
    "batchSize" : 100000,
    "epochs" : 3000,
    "patienceRate" : 0.05,
    "slowLearningRate" : 0.01,
    "loss" : "mse",
    "pcaComponents" : -1,
    "extraTreesKeepThreshd" : 0,
    "saveWeightsOnlyAtEnd" : false,
    "optimizer" : "rmsprop",
    "lastLayerActivator" : "",
    "learningRate" : 0.05,
    "l1" : 0.1,
    "l2" : 0.1,
    "minDependentVarValue" : 0,
    "maxDependentVarValue" : 1500,
    "scalerType" : "standard",
}

Outcomes

{
    "_id" : ObjectId("5bcc88fa7f0b3f4844c87c78"),
    "status" : 200,
    "jobId" : "5bcc724d7449f746b5aa6fe8",
    "loss" : 15109.168650257,
    "metric" : 14281.4453526111,
}

Code

Worker

server.js

var express = require('express');
var bodyParser = require('body-parser');
var pythonRunner = require('./preprocessing-services/python-runner');
var schedule = require('node-schedule');
var axios = require('axios');
var fs = require('fs');
var {Worker} = require('./worker/worker');

// Get Worker Node configuration
var fs = require('fs');
var config = JSON.parse(fs.readFileSync('./python-scripts/worker-node-configure.json', 'utf8'));

if(!config) { 
    console.log('No configuration file found.')
    process.exit();
}

// Boss' address
bossAddress = config.bossAddress;
nodeName = config.nodeName;
console.log(`Boss's address is ${bossAddress}`);
console.log(`This worker's name is ${nodeName}`);

var worker = new Worker('bored');

// Start server and add Middleware
var app = express();
const port = 3000;
app.use(bodyParser.json())

// Start checking for Boredom
var j = schedule.scheduleJob('*/1 * * * *', function(){
    if (worker.status === 'bored') {
        console.log('Worker is bored.');
        axios({
            method: 'post',
            url: bossAddress + `/bored/${nodeName}`
        }).then((response) => {
            let orderId = response.data._id
            let jobId = response.data.jobId;
            console.log(`Boss provided jobID #${jobId}`);
            axios({
                method: 'get',
                url: bossAddress + `/retrieve/job/${jobId}`
            }).then((response) => {
                let job = response.data;
                console.log(`Worker found the details for jobID #${jobId}`);
                job.callbackAddress = bossAddress;
                job.assignmentId = orderId;
                pythonRunner.scriptRun(job, worker)
                .then((response) => {
                    console.log('Worker started job, will let Boss know when finished.');
                });
            }).catch((error) => {
                console.log(error);
            });
        }).catch((error) => {
            console.log('Failed to find new job.')
        });
    }
});

// Python script runner interface
app.post('/scripts/run', (req, res) => {
    try {
        let pythonJob = req.body;
        pythonRunner.scriptRun(pythonJob)
        .then((response) => {
            console.log(response);
            res.send(response);
        });
    } catch (err) {
        res.send(err);
    }
});

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});

python-runner.js

let {PythonShell} = require('python-shell')
var fs = require('fs');
var path = require('path');
var axios = require('axios');

var scriptRun = function(pythonJob, worker){
    console.log(worker);
    worker.status = 'busy';
    return new Promise((resolve, reject) => {
        try {
            let callbackAddress = pythonJob.callbackAddress;
            let options = {
                mode: 'text',
                pythonOptions: ['-u'], // get print results in real-time
                scriptPath: path.relative(process.cwd(), 'python-scripts/'),
                args: [JSON.stringify(pythonJob)]
            };
            PythonShell.run(pythonJob.scriptName, options, function (err, results) {
                if (err) throw err;
                try {
                    result = JSON.parse(results.pop());
                    if(result) {
                        console.log(callbackAddress + '/callback')
                        axios({
                            method: 'post',
                            url: callbackAddress + '/callback',
                            data: result
                        }).then((response) => {
                            console.log(`Worker let let the Boss know job is complete.`);
                            worker.status = 'bored';
                        }).catch((error) => {
                            worker.status = 'bored'
                        });
                    } else {
                        worker.status = 'bored'
                    }
                } catch (err) {
                   worker.status = 'bored'
                }
            });
            resolve({'message': 'job started'});
        } catch (err) {
            reject(err)
            worker.status = 'bored'
        }
    });
}
module.exports = {scriptRun}

Boss

server.js

const express = require('express');
const bodyParser = require('body-parser');
const axios = require('axios');
var timeout = require('connect-timeout')

const {mongoose} = require('./backend/database-services/dl-mongo');
const workerNode = require('./backend/services/worker-node');
const work = require('./backend/services/work');

// Database collection
var {Job} = require('./backend/database-services/models/job');
var {Order} = require('./backend/database-services/models/order');


const bossAddress = 'http://maddatum.com'

// Server setup.
var app = express();
const port = 3000;

// Add request parameters.
app.use((req, res, next) => {
    res.setHeader('Access-Control-Allow-Origin', '*');
    res.setHeader('Access-Control-Allow-Headers', 
                  'Origin, X-Requested-With, Content-Type, Accept'); 
    res.setHeader('Access-Control-Allow-Methods', 'GET, POST, PUT, PATCH, DELETE, OPTIONS');
    next();
});

// Add the middleware.
app.use(bodyParser.json())

/*
This route is for creating new Jobs on the queue
*/
app.post('/job/:method', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        switch (req.params.method) {
            case 'create':
                work.create(req.body)
                .then((response) =>{
                    res.send(response);
                }).catch((error) => {
                    res.send({'error': error })
                });
                break;
            default:
                res.send({'error': 'No method selected.'})
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err})
    }
});

/*
This route is for adding new WorkerNodes to the database.
*/
app.post('/worker-node/:method', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        switch (req.params.method) {
            case 'create':
                workerNode.create(req.body)
                .then((response) =>{
                    res.send(response);
                }).catch((error) =>{
                    res.send({'error': error.message});
                })
            break;
            default:
                throw err;
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

app.post('/callback', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    let outcome = req.body;
    console.log(outcome);
    try {
        work.file(outcome)
        .then((response) =>{
            console.log(response);
            res.send(response);
        })
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

/*
Route for Worker Node to let the Boss know it needs a Job.
The oldest Job which is unassigned is provided.
*/
app.post('/bored/:id', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        let workerNodeId = req.params.id;
        console.log(`${workerNodeId} said it's bored.`);
        if (!workerNodeId) { throw {'error': 'No id provided.'}}
        Order.findOne({ status: 'unassigned' }, {}, { sort: { 'created_at' : -1 } }, (err, order) => {
            console.log(`Found a work order, #${order._id}`)
            order.status = 'assigned';
            console.log(`Provided ${workerNodeId} with ${order.jobId}`);
            order.save()
            .then((doc) => {
                console.log(`Updated the Order #${doc.id}'s status to ${order.status}`);
                res.send(doc);
            });
        })
        .catch((err) => {
            res.send({'message': `No work to do.  Don't get used to it.`})
        });
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

/*
Retrieve Orders or Job
*/
app.get('/retrieve/:type/:id?/:param1?', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        let type = req.params.type;
        let id = req.params.id;
        let param1 = req.params.param1;
        switch(type) {
            case 'order':
                Order.find().then((response) => {
                    res.send(response);
                });
                break;
            case 'job':
                if (!id)  { throw {'error': 'Missing Id'} }
                Job.findOne({'_id': id })
                .then((response) => {
                    res.send(response);
                });
                break;
            default:
                throw error
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
})

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});
Image source: Louis Reed
Using Python, NodeJS, Angular, and MongoDB to Create a Machine Learning System

I've started designing a system to manage data analysis tools I build.

  1. An illegitimate REST interface
  2. Interface for existing Python scripts
  3. Process for creating micro-services from Python scripts
  4. Interface for creating machine learning jobs to be picked up my free machines.
  5. Manage a job queue for work machines to systematically tackle machine learning jobs
  6. Data storage and access
  7. Results access and job meta data
  8. A way to visualize results

I've landed on a fairly complicated process of handling the above. I've tried cutting frameworks, as I know it'll be a nightmare to maintain, but I'm not seeing it.

  • Node for creating RESTful interfaces between the HQ Machine and the Worker Nodes
  • Node on the workers to ping the HQ machine periodically to see if their are jobs to run
  • MongoDB on the HQ Machine to store the job results data, paths to datasets, and possibly primary data
  • Angular to interact with the HQ Node for creating job creation and results viewing UI.
  • ngx-datatables for viewing tabular results.
  • ngx-charts for viewing job results (e.g., visualizing variance and linearity )
  • Python for access to all the latest awesome ML frameworks
  • python-shell (npm) for creating an interface between Node and Python.

Utilizing all Machines in the House

Machine learning is a new world for me. But, it's pretty dern cool. I like making machines do the hard stuff while I'm off doing other work. It makes me feel extra productive--like, "I created that machine, so any work it does I get credit for. And ! The work I did while it as doing its work." This is the reason I own two 3D-printers.

I'm noticing there is a possibility of utilizing old computers I've lying around the house for the same effect. The plan is to abstract a neural network script, install it on all the computers lying about, and create a HQ Computer where I can create a sets of hyperparameters passed to the Worker Nodes throughout the house.

Why? Glad I asked for you. I feel guilty there are computers used. There's an old AMD desktop with a GFX1060 in it, a 2013 MacBook Pro (my son's), and my 2015 MacBook Pro. These don't see much use anymore, since my employer has provided an iMac to work on. They need to earn their keep.

How? Again, glad to ask for you. I'll create a system to make deep-learning jobs from hyperparameter sets and send them to these idle machines, thus, trying to get them to solve problems while I'm working on paying the bills. This comes from the power of neural networks. They need little manual tweaking. You simply provide them with hyperparameters and let them run.

Here are the napkin-doodles:

+-Local------------------------------------------------------+
|                                                            |
|        ____                   ____      Each machine runs  |
|        |""|                   |""|      Node and Express   |
|  HQ    |__|             #1    |__|      server, creating   |
|       [ ==.]`)               [ ==.]`)   routes to Python   |
|       ====== 0               ====== 0   scripts using      |
|  The HQ machine runs          ____      stdin and stdout   |
|  Node and Express, but        |""|                         |
|  the routes are for     #2    |__|                         |
|  storing results in a        [ ==.]`)                      |
|  database.                   ====== 0                      |
|                               ____                         |
|                               |""|                         |
|                         #3    |__|        Worker           |
|                              [ ==.]`)     Nodes            |
|                              ====== 0                      |
|                                                            |
+------------------------------------------------------------+
+-Local------------------------------------------------------+
|                 Each worker Node checks         Workers    |
|        ____    with HQ on a set interval         ____      |
|        |""|       for jobs to run                |""|      |
|  HQ    |__|   <--------------------------+ #1    |__|      |
|       [ ==.]`)                                  [ ==.]`)   |
|       ====== 0                                  ====== 0   |
|       ^ |                                        ____      |
|       | |                                  #2    |""|      |
|       | +--------------------------------------->|__|      |
|       |             If there is a job, the      [ ==.]`)   |
|       |             Worker will send a GET      ====== 0   |
|       |              request for the job         ____      |
|       |                  parameters              |""|      |
|       |                                    #3    |__|      |
|       +-----------------------------------------[ ==.]`)   |
|         Once completed, the Worker updates HQ   ====== 0   |
|              with the job results.                         |
+------------------------------------------------------------+

Worker Nodes

The Worker Nodes code is pretty straightforward. It uses Node, Express, and python-shell to create a bastardized REST interface to create simple interactions between the HQ Node controlling the job queue.

Node Side

Here's the proof-of-concept NodeJS code.

var express = require('express');
var bodyParser = require('body-parser');
var pythonRunner = require('./preprocessing-services/python-runner');

var app = express();
const port = 3000;

app.use(bodyParser.json())

// Python script runner interface
app.post('/scripts/run', (req, res) => {
    try {
        let pythonJob = req.body;
        pythonRunner.scriptRun(pythonJob)
        .then((response, rejection) => {
            res.send(response);
        });
    } catch (err) {
        res.send(err);
    }
});

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});

The above code is a dead simple NodeJS server using Express. It is using body-parser middleware to shape JSON objects. The pythonJob object looks something like this (real paths names have been changed to help protect their anonymity).

{
    "scriptsPath": "/Users/hinky-dink/dl-principal/python-scripts/",
    "scriptName": "union.py",
    "jobParameters": {
        "dataFileName": "",
        "dataPath": "/Users/hinky-dink/bit-dl/data/lot-data/wine_encoded/",
        "writePath": "/Users/hinky-dink/bit-dl/data/lot-data/wine_encoded/",
        "execution": {
            "dataFileOne": "wine_2017_encoded.csv",
            "dataFileTwo": "wine_2018_encoded.csv",
            "outputFilename": "wine_17-18.csv"
        }
    }
}

Each of these attributes will be passed to the Python shell in order to execute data_prep.py . They are passed to the shell as system arguments.

Here's the python-runner.js

let {PythonShell} = require('python-shell')

var scriptRun = function(pythonJob){    
    return new Promise((resolve, reject) => {
        console.log(pythonJob)
        try {
            let options = {
                mode: 'text',
                pythonOptions: ['-u'], // get print results in real-time
                scriptPath: pythonJob.scriptsPath,
                args: [pythonJob.jobParameters.dataFileName, 
                       pythonJob.jobParameters.dataPath, 
                       pythonJob.jobParameters.writePath,
                       JSON.stringify(pythonJob.jobParameters.execution)]
            };
            PythonShell.run(pythonJob.scriptName, options, function (err, results) {
                if (err) throw err;
                try {
                    result = JSON.parse(results.pop());
                    if(result) {
                        resolve(result);
                    } else {
                        reject({'err': ''})
                    }
                } catch (err) {
                    reject({'error': 'Failed to parse Python script return object.'})
                }
            });
        } catch (err) {
            reject(err)
        }
    });
}
module.exports = {scriptRun}

Python Side

Here's the Python script in the above example. It is meant to detect what type of data is in a table. If it's is continuous it leaves it alone (I'll probably add normalization option as some point), if it is categorical, it converts it to a dummy variable . It then saves this encoded data on the Worker Node side (right now). Lastly, it returns a JSON string back to the node side.

"""
Created on Mon Jun 11 21:12:10 2018
@author: cthomasbrittain
"""

import sys
import json
#
filename = sys.argv[1]
filepath = sys.argv[2]
pathToWriteProcessedFile = sys.argv[3]

request = sys.argv[4]
request = json.loads(request)

try:
    cols_to_remove = request['columnsToRemove']
    unreasonable_increase = request['unreasonableIncreaseThreshold']
except:
    # If columns aren't contained or no columns, exit nicely
    result = {'status': 400, 'message': 'Expected script parameters not found.'}
    print(str(json.dumps(result)))
    quit()

pathToData = filepath + filename


# Clean Data --------------------------------------------------------------------
# -------------------------------------------------------------------------------

# Importing data transformation libraries
import pandas as pd

# The following method will do the following:a
#   1. Add a prefix to columns based upon datatypes (cat and con)
#   2. Convert all continuous variables to numeric (float64)
#   3. Convert all categorical variables to objects
#   4. Rename all columns with prefixes, convert to lower-case, and replace
#      spaces with underscores.
#   5. Continuous blanks are replaced with 0 and categorical 'not collected'
# This method will also detect manually assigned prefixes and adjust the 
# columns and data appropriately.  
# Prefix key:
# a) con = continuous
# b) cat = categorical
# c) rem = removal (discards entire column)

def add_datatype_prefix(df, date_to_cont = True):    
    import pandas as pd
    # Get a list of current column names.
    column_names = list(df.columns.values)
    # Encode each column based with a three letter prefix based upon assigned datatype.
    # 1. con = continuous
    # 2. cat = categorical

    for name in column_names:
        if df[name].dtype == 'object':
            try:
                df[name] = pd.to_datetime(df[name])
                if(date_to_cont):
                    new_col_names = "con_" + name.lower().replace(" ", "_").replace("/", "_")
                    df = df.rename(columns={name: new_col_names})
                else:
                    new_col_names = "date_" + name.lower().replace(" ", "_").replace("/", "_")
                    df = df.rename(columns={name: new_col_names})                    
            except ValueError:
                pass

    column_names = list(df.columns.values)

    for name in column_names:
        if name[0:3] == "rem" or "con" or "cat" or "date":
            pass
        if df[name].dtype == 'object':
            new_col_names = "cat_" + name.lower().replace(" ", "_").replace("/", "_")
            df = df.rename(columns={name: new_col_names})
        elif df[name].dtype == 'float64' or df[name].dtype == 'int64' or df[name].dtype == 'datetime64[ns]':
            new_col_names = "con_" + name.lower().replace(" ", "_").replace("/", "_")
            df = df.rename(columns={name: new_col_names})
    column_names = list(df.columns.values)

    # Get lists of coolumns for conversion
    con_column_names = []
    cat_column_names = []
    rem_column_names = []
    date_column_names = []

    for name in column_names:
        if name[0:3] == "cat":
            cat_column_names.append(name)
        elif name[0:3] == "con":
            con_column_names.append(name)
        elif name[0:3] == "rem":
            rem_column_names.append(name)
        elif name[0:4] == "date":
            date_column_names.append(name)

    # Make sure continuous variables are correct datatype. (Otherwise, they'll be dummied).
    for name in con_column_names:
        df[name] = pd.to_numeric(df[name], errors='coerce')
        df[name] = df[name].fillna(value=0)

    for name in cat_column_names:
        df[name] = df[name].apply(str)
        df[name] = df[name].fillna(value='not_collected')

    # Remove unwanted columns    
    df = df.drop(columns=rem_column_names, axis=1)
    return df

# ------------------------------------------------------
# Encoding Categorical variables
# ------------------------------------------------------

# The method below creates dummy variables from columns with
# the prefix "cat".  There is the argument to drop the first column
# to avoid the Dummy Variable Trap.
def dummy_categorical(df, drop_first = True):
    # Get categorical data columns.
    columns = list(df.columns.values)
    columnsToEncode = columns.copy() 

    for name in columns:
        if name[0:3] != 'cat':          
            columnsToEncode.remove(name)

    # if there are no columns to encode, return unmutated.
    if not columnsToEncode:
        return df


    # Encode categories
    for name in columnsToEncode:

        if name[0:3] != 'cat':
            continue

        tmp = pd.get_dummies(df[name], drop_first = drop_first)
        names = {}

        # Get a clean column name.
        clean_name = name.replace(" ", "_").replace("/", "_").lower()
        # Get a dictionary for renaming the dummay variables in the scheme of old_col_name + response_string
        if clean_name[0:3] == "cat":
            for tmp_name in tmp:
                tmp_name = str(tmp_name)
                new_tmp_name = tmp_name.replace(" ", "_").replace("/", "_").lower()
                new_tmp_name = clean_name + "_" + new_tmp_name
                names[tmp_name] = new_tmp_name

        # Rename the dummy variable dataframe
        tmp = tmp.rename(columns=names)

        # join the dummy variable back to original dataframe.
        df = df.join(tmp)

    # Drop all old categorical columns
    df = df.drop(columns=columnsToEncode, axis=1)
    return df

# Read the file
df = pd.read_csv(pathToData)

# Drop columns such as unique IDs
try:
    df = df.drop(cols_to_remove, axis=1)
except:
    # If columns aren't contained or no columns, exit nicely
    result = {'status': 404, 'message': 'Problem with columns to remove.'}
    print(str(json.dumps(result)))
    quit()

# Get the number of columns before hot encoding
num_cols_before = df.shape[1]

# Encode the data.
df = add_datatype_prefix(df)
df = dummy_categorical(df)

# Get the new dataframe shape.
num_cols_after = df.shape[1]


percentage_increase = num_cols_after / num_cols_before

result = ""

if percentage_increase > unreasonable_increase:
    message = "\"error\": \"Feature increase is greater than unreasonableIncreaseThreshold, most likely a unique id was included."
    result = {'status': 400, 'message': message}
else:
    filename = filename.replace(".csv", "")
    import os
    if not os.path.exists(pathToWriteProcessedFile):
        os.makedirs(pathToWriteProcessedFile)


    writeFile = pathToWriteProcessedFile + filename + "_encoded.csv"
    df.to_csv(path_or_buf=writeFile, sep=',')


    # Process the results and return JSON results object
    result = {'status': 200, 'message': 'encoded data', 'path': writeFile}

print(str(json.dumps(result)))

That's the premise. I'll be adding more services to as a series of articles.

Image source: Darius Bashar
Recording Brain Waves -- Mongo Database with a NodeJS API

Saving Brain Waves to Remote MongoDB by way of Node REST API

In this section I'm going to focus getting a remote Linux server setup with MongoDB and NodeJS. This will allow us to make POST requests to our Linux server, saving the EEG data.

I'm going to assume you are able to SSH into your Ubuntu 16 LTS server for this guide. You don't have a server? No sweat. I wrote a guide on setting up a blog post which explains how to get a cheap Linux server setup.

1. Install MongoDB

SSH into your server. I'm assume this is a fresh new Linux install. Let's start with upgrading the packages.

sudo apt-get update -y

I'll be following the Mongo website for instructions on installing MonogoDB Community version on Ubuntu.

Let's get started. Add the Debian package key.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4

We need to create a list file.

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list

Now reload the database

sudo apt-get update

If you try to update and run into this error

E: The method driver /usr/lib/apt/methods/https could not be found.
N: Is the package apt-transport-https installed?
E: Failed to fetch https://repo.mongodb.org/apt/ubuntu/dists/xenial/mongodb-org/4.0/InRelease  
E: Some index files failed to download. They have been ignored, or old ones used instead.

Then install apt-transport-https

sudo apt-get install apt-transport-https

Now, let's install MongoDB.

sudo apt-get install -y mongodb-org

Voila!

2. Setup MongoDB

We still need to do a bit of setup. First, let's check and make sure Mongo is fully installed.

sudo service mongod start

This starts the MongoDB daemon, the program which runs in the background and waits for someone to make connection with the database.

Speaking of which, let's try to connect to the database

mongo

You should get the following:

root@localhost:~# mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
    http://docs.mongodb.org/
Questions? Try the support group
    http://groups.google.com/group/mongodb-user
Server has startup warnings:
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten]
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten]
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten]
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>

This is good. It means Mongo is up and running. Notice, it is listening on 127.0.0.1:27017 . If you try to access the database from any network, other than locally, it will refuse. The plan, to have NodeJS connect to the MongoDB database locally. Then, will send all of our data to Node and let it handle security.

In the Mongo command line type:

quit()

And hit enter. This should bring you back to the Linux command prompt.

A few notes on MongoDB on Ubuntu.

  • The congfiguration file is located at /etc/mongod.conf
  • Log file is at /var/log/mongodb/mongod.log
  • The database is stored at /var/lib/mongodb , but this can be changed in the config file.

Oh, and one last bit. Still at the Linux command prompt type:

sudo systemctl enable mongod

You should get back

Created symlink from /etc/systemd/system/multi-user.target.wants/mongod.service to /lib/systemd/system/mongod.service.

This setup a symlink which will cause Linux to load mongod every time it boots--you won't need to manually start it.

Next, NodeJS.

3. Install NodeJS and npm

Type

sudo apt-get install nodejs -y

This should install NodeJS , but we also need the Node Package Managers npm .

sudo apt-get install npm -y

Let's upgrade npm . This is important, as the mind-wave-journal-server depends on recent versions of several packages that are not accessible to earlier versions of npm .

The following commands should prepare npm for upgrading, then upgrade.

sudo npm cache clean -f
sudo npm install -g n
sudo n stable
sudo n latest

Let's reboot the server to make sure all of the upgrades are in place.

sudo reboot now

When the server boots back up, ssh back in.

Check and make sure your mongod is still running

mongo

If mongo doesn't start, then revisit step 2.

Let's check our node and npm versions.

node -v

I'm running node v10.9.0

npm -v

I'm running npm v6.2.0

4. Clone, Install, and Run the mind-wave-journal-server

I've already created a basic Node project, which we'll be able to grab from my Github account.

If you don't already have git installed, let's do it now.

sudo apt-get install git -y

Now, grab the Noder server I built.

git clone https://github.com/Ladvien/mind-wave-journal-server.git
cd mind-wave-journal-server/

Install all the needed Node packages.

npm install

This should download all the packages needed to run the little server program I wrote to store the EEG data into the Mongo database.

Let's run the mind-wave-journal-server .

node server/server.js

This should be followed with:

root@localhost:~/mind-wave-journal-server# node server/server.js
(node:1443) DeprecationWarning: current URL string parser is deprecated, and will be removed in a future version. To use the new parser, pass option { useNewUrlParser: true } to MongoClient.connect.
Started on port 8080

5. Testing mind-wave-journal-server with Postman

Now, we are going to use Postman to test our new API.

For this next part you'll need either a Mac or Chrome, as Postman has a native Mac app or a Chrome app.

I'm going to show the Chrome application.

Head over to the Chrome app store:

add-postman-chrome-app

After you add the Postman app it should redirect you to your Chrome applications. Click on the Postman icon.

run-postman-chrome-app

Your choice, but I skipped the sign-up option for now.

skipped-signup-postman-chrome-app

Select Create a Request skipped-signup-postman-chrome-app

The purpose of Postman, in a nutshell, we are going to use it to create POST requests and send them to the mind-wave-journal-server to make sure it's ready for the iOS app to start making POST requests, saving the EEG data to our Mongo server.

Let's create our first test POST request. Start by naming the request Test eegsamples . Create a folder to put the new request in, I named it mind-wave-journal-server . Then click

create-request-postman-chrome-app

You will need to set the type as POST . The url will be

http://your_ip_address:8080/eegsamples

create-request-postman-chrome-app

No select the Headers section and add the Content Type: application/json

create-request-postman-chrome-app

Lastly, select Body , then raw and enter the following JSON into the text area:

{  
   "highBeta":5,
   "lowGamma":6,
   "theta":55,
   "lowAlpha":2,
   "highAlpha":3,
   "lowBeta":4,
   "highGamma":7,
   "blink":55,
   "attention":8,
   "meditation":9,
   "time":4
}

And then! Hit Send

create-request-postman-chrome-app

If all goes well, then you should get a similar response in the Postman response section

create-request-postman-chrome-app

Notice, the response is similar to what we sent. However, there is the additional _id . This is great. It is the id assigned to the by MongoDB when the data is entered. In short, it means it successfully saved to the database.

6. Now What?

Several caveats.

First, each time you restart your server you will manually need to start your mind-waver-journal-server . You can turn it into a Linux service and enable it. If this interests anyone, let me know in the comments and I'll add it.

Second, notice I don't currently have a way to retrieve data from the MongDB. The easiest way will probably be using Robot 3T . Like the first caveat, if anyone is interested let me know and I'll add instructions. Otherwise, this series will stay on track to setup a Mongo BI connection to the database for viewing in Tableau (eh, gross).

Your Node server is ready to be called by the iOS app. In the next article I'll return to building the MindWaveJournal app in iOS.

Image source: Darius Bashar
Recording Brain Waves -- iOS SDK Setup

Step 1: iOS App

I'm going to assume you have Xcode installed.

Step 1.1: Install CocoaPods

CocoaPods is a package handler for Xcode. We will be using it to install Alamofire , which a Swift library for making HTTP requests. We will need HTTP call support as we will call our server to store the EEG samples.

sudo gem install cocoapods

After you hit Return it will prompt for your password

cocoapods-installation

Step 1.2: Setup Xcode Project

Now, let's setup a project folder. This is main folder where all the iOS app code will live. It's a bad habit, but I usually put mine on the Desktop.

Open Xcode and select "Create a new Xcode proejct"

xcode-project-start

Then select "Single View App" and click "Next"

xcode-project-start

Let's call the project MindWaveJournaler and click "Next" xcode-project-start

Choose your Desktop as location for the project and click "Create" xcode-project-start

Step 1.3: Development Environment Setup

You've created a Project Folder, but we have to setup the project folder to be used with CocoaPods. After, we will use CocoaPods to install Alamofire.

Back in the terminal, type:

cd ~/Desktop/MindWaveJournaler
pod init

This creates a Podfile in the root folder of our project. We can list CocoaPod packages in the Podfile and run pod install in the same directory, this will cause CocoaPods to install all the packages we listed.

Sadly, we are really only doing this for Alamofire right now. But, later, when we start building on to this app it will allow us to quickly access third-party frameworks.

Ok, back to typing:

open -a Xcode Podfile

This will open the Podfile for editing in Xcode. Now let's insert the our desired pod information.

Copy information below and paste it into your file:

# Uncomment the next line to define a global platform for your project
platform :ios, '11.4'

target 'MindWaveJournaler' do
  # Comment the next line if you're not using Swift and don't want to use dynamic frameworks
  use_frameworks!

  # Pods for MindWaveJournaler
  pod 'Alamofire', '~> 4.7'

  target 'MindWaveJournalerTests' do
    inherit! :search_paths
    # Pods for testing
  end

  target 'MindWaveJournalerUITests' do
    inherit! :search_paths
    # Pods for testing
  end

end

You may notice the only changes we made were

platform :ios, '11.4'
...
pod 'Alamofire', '~> 4.7'

These lines tell CocoaPods which version of iOS we are targetting with our app (this will silence a warning, but shouldn't be required). The other, is telling CocoaPods which version of Alamofire we'd like to use on this project.

Ok, now let's run this Podfile.

Back in the same directory as the Podfile type:

pod install

You should see CocoaPods do its thing with output much like below.

cocoapods-installed-alamofire

Step 1.4: Install NeuroSky iOS SDK

NeuroSky has a "Swift SDK." Really, it's an Objective-C SDK which is "bridged" into Swift. Essentialy, this means we won't be able to see what's going on the SDK, but we can use functions from the pre-compiled binaries.

I've not been impressed with NeuroSky's website. Or the SDK. It does the job, but not much more.

Anyway, the SDK download is annoyingly behind a sign-up wall.

Visit the link above and click on "Add to Cart"

neurosky-sdk-sign-up

Then "Proceed to Checkout"

neurosky-sdk-sign-up

Lastly, you have to enter your "Billing Information." Really, this is only your email address, last name, street address, city, and zip.

(Really NeuroSky? This is very 1990.)

Eh, I made mine up.

Anyway, after your enter information click, then click "Continue to PayPal" (What? I just provided my information...) You should be rewarded with a download link. Click it and download the files.

neurosky-sdk-sign-up

Unzip the files and navigate lib folder

iOS Developer Tools 4.8 -> MWM_Comm_SDK_for_iOS_V0.2.9 -> lib

Copy all files from the lib folder into the main directory of the MindWaveJournaler project folders.

neurosky-sdk-lib

Step 1.5: Workspace Setup

CocoaPods works by creating a .xcworkspace file. It contains all the information needed to compile your project with all of the CocoaPod packages installed. In our case the file will be called MindWaveJournaler.xcworkspace . And every time you want to work on your project, you must open it with this specific file.

It can be a bit confusing because Xcode created a .xcodeproj file which is tempting to click on. xcworkspace

Go ahead and open the MindWaveJournaler.xcworkspace file. The workspace should open with one warning, which we will resolve shortly.

But first, another caveat. CoreBluetooth, Apple's Bluetooth LE Framework, only works when compiled for and run on an actual device. It does not work in the iOS Simulator. Once upon a time it did, if your Mac had the hardware, however, my version of the story is Apple didn't like having to support the confusion and dropped it.

eeg-apple-workspace

Moving on. Click on the yellow warning. Then click on the warning in the sidebar. This should create a prompt asking if you'd like to make some changes. This should automatically make some tweaks to the build settings which should make our project mo' betta.

Click Perform Changes . eeg-apple-workspace-resolve-warning

This should silence the warning and make your project error free. Go ahead and hit Play button and let it compile to the simulator (we aren't testing the Bluetooth, so it's ok). Everything should compile correctly, if not, just let me know the specifics of your problems in the comments.

Step 1.5: Enable Secure HTTP Request

There are still a few tweaks we need to make to the Xcode workspace to get everything working.

First, open the ViewController.swift file and add import Alamofire right below import UIKit . If auto-complete lists Alamofire as an option you know the workspace is detecting its presence. Good deal.

Now, for Alamofire to be able to securely make HTTP request an option needs to be added to the Info.plist file. I scratched my head as to why the HTTP calls were not being made successfully until Manab Kumar Mal's StackOverflow post:

Thanks, buddy.

Ok, following his instructions open up the Info.plist file in your MindWaveJournaler folder. Now add an entry by right-clicking and selecting Add Row . Change the Application Category to NSAppTransportSecurity and make sure it's set as dictionary . Now, click the plus sign by the new dictionary and set this attribute as NSAllowsArbitraryLoads , setting the type bool , and the value as YES .

eeg-apple-workspace-add-secure-layer

Step 1.5: Setup Objective-C Bridge Header for MindWave SDK

There's a few other bits of housekeeping, though. As I mentioned earlier, the MindwAve SDK is in an Objective-C precompiled binary. It is usable in a Swift project, but requires setting up a "bridge header" file.

Start by creating the bridge header file. Go to File -> New -> File...

bridge-header-file

Then select Header and click Next .

bridge-header-file

Name the file YourProjectName-Bridging-Header and make sure the file is saved to the same folder which contains the .xcworkspace file , then click Create .

bridge-header-file

The header file should automatically open. Copy and paste the following to the bottom of the header file.

#import "MWMDevice.h"
#import "MWMDelegate.h"
#import "MWMEnum.h"

My entire file looked like this once done.

MindWaveJournaler-Bridging-Header.h

//
//  MindWaveJournaler-Bridging-Header.h
//  MindWaveJournaler
//
//  Created by Casey Brittain on 8/3/18.
//  Copyright © 2018 Honeysuckle Hardware. All rights reserved.
//

#ifndef MindWaveJournaler_Bridging_Header_h
#define MindWaveJournaler_Bridging_Header_h


#endif /* MindWaveJournaler_Bridging_Header_h */

#import "MWMDevice.h"
#import "MWMDelegate.h"
#import "MWMEnum.h"

Let's tell the Swift compile we have a header file. In Xcode go to Project File -> Build Settings -> All then in the search box type Swift Compiler - General (if you don't include the hyphen and spaces it wont find it).

bridge-header-file

Double-click on the line Objective-C Bridging Header directly underneath the name of your project (see red box in image). Copy and paste the following into the box and click off to save the change.

$(PROJECT_DIR)/$(PROJECT_NAME)-Bridging-Header.h

This creates a relative path to your Bridging-Header file. In a little bit we are going to try to compile, if you get errors around this file not being found, then it's probably not named per our naming scheme ( YourProjectName-Bridging-Header ) or it wasn't saved in the same folder as the .xworkspace file. No worries, if you have troubles just leave me a comment below.

bridge-header-file

One last thing to do before we're ready to code. We still need to import the MindWave SDK into our project.

bridge-header-file

Right click on your project file and select New Group . Name the group MindWave SDK . Now right click on the folder you created and select Add Files to "MindWave SDK"... . Navigate to the lib folder containing the MindWave SDK and select all files inside it.

mindwave-sdk

When you add the SDK, Xcode should automatically detect the binary file ( libMWMSDK.a ) and create a link to it. But, let's make sure, just in case. Click on your project file, then go to the General tab.

mindwave-sdk

It needs to be linked under the Build Phases tab as well, under Linked Frameworks and Libraries .

mindwave-sdk

That's it. Let's test and make sure your app is finding the SDK appropriately.

Open the ViewController file and under viewDidLoad() after the existing code, type:

let mwDevice = MWMDevice()
mwDevice.scanDevice()

Watch for autocomplete detecting the existince of the MindWave SDK

mindwave-sdk

Now for the true test, Compile and Run . But, before we do, please be aware--this will only work on an actual iOS device. If you try to run it in the iOS simulator it will fail. It actually fails on two accounts, first, CoreBluetooth will not work in the iOS simulator, second, the MindWave SDK binaries were compiled specifically ARM architecture.

Ok! Enough preamble. Connect and select your iOS device and hit Run .

mindwave-app-run

If all goes well you should see two things. A blank white screen appear on your phone and concerning message in the Xcode console.

corebluetooth-error-api-misuse

The CoreBluetooth error has to do with firing up the iOS Bluetooth services without checking to make sure the iOS BLE is turned on and ready to go. This is a good thing, it probably means the MindWave SDK has been foudn and is functioning properly.

If you get any other errors, let's chat. I'll help if I can.

This is part of a series, which I'm writing with care as I've time. I'll get the next part out ASAP.

Image source: Darius Bashar
Recording Brain Waves to MongoDB

Description

This project takes brain wave readings from a MindWave Mobile 2+, transmits them to an iOS app via Bluetooth LE. The iOS app makes calls to a remote Node server, which is a minimal REST API, passing off the brain wave sample. The Node server stores the data on a MongoDB server. The MongoDB server is then exposed to business intelligence applications use with MongoDB BI Connector. Lastly, using Tableau Professional Desktop, the data is accessed and visualizations created.

Whew.

To recap: * MindWave Mobile 2+ * iOS App (tentatively named Mind Wave Journaler; Swift) * REST Server (mind-wave-journaler; NodeJS) * MongoDB BI Connector Server * Tableau Desktop Professional

The end result is a system which could allow a remote EEG analyst to examine samples nearly in real time.

eeg-visualization

Below, I'm going to show how I was able to setup the system. But, before that a few words of warning.

Gotchas

Hacker Haters

This isn't a hacker friendly project. It relies on several paid licenses, an Apple Developer License ($99) and Tableau Desktop Professional ($10,000,000,000 or something). Of course, the central piece of hardware, the MindWave Mobile, is also $99, but I think that one is fair. Oh! Let's not forget, even though you bought an Apple Developer license, you still need a Mac (or Hackintosh) to compile the app.

However, as a proof-of-concept, I think it's solid. Hopefully a good hacker will be able to see how several tweaks in the system could make it dirt cheap to deploy.

Mimimum Viable Hack..er, Product

The source code provided here is a minimally viable . Fancy words meaning, only base functionality was implemented. There many other things which could be done to improve each piece of the system.

Not to be a douche, but please don't point them out. That's the only thing I ask for providing this free information.

There are many improvements I know can be made. The reason they were not made had nothing to do with my ignorance (well, at least a majority of them), but rather my time constraints.

I Hate Tableau

That's it. I hate Tableau.

Getting Started

Let's make a list of what's needed before beginning this project.

Regarding the business intelligence platform--if anyone has a free suggestions, please leave them in the comments below. The first improvement I'd like to the entire system is to get away from Tableau. Have I mentioned I hate it?

Ok, let's get started!

Setting up Nginx on Linode

I've used Jekyll to create my website. A lot of the heavy lifting was done by Michael Rose in the form of a Jekyll theme he created called Hpstr.

Much respect.

But, setup was pretty painful for me. I knew nothing about websites, let alone creating a static page website . I've decided to set my hand to journal a lot of the nuances I ran into. Try to save someone some time. Or, save myself some time when something goes wrong.

These articles will not be on CSS, JavaScript, or HTML. After tinkering with computers for 20 years, I still suck at CSS and HTML--no, there are much better resources on the matter.

I actually recommend spending $30 on the following Udemy courses. They are great courses and will get you everything you need to be competitive.

(Note, make sure to get them on sale. Second note, they go on sale a lot.)

I'm not getting a kick back from Udemy, I list these courses because they are the ones I've taken and will vouch they are great courses to with this guide series.

1. Orientation

A lot of other articles will recommend setting up Jekyll locally, building your site to perfection, then get a rent a server when you have the time. I don't recommend going this route.

In one way it makes sense to get a feel for Jekyll before deploying. You aren't paying money while you learn. But, building a Jekyll site out locally, with all the bells and whistles, may cause a lot of problems deploying it. Was it the 5th gem or the 12th gem which is causing problems? No, I found it's better to go for broke and start building the site on the web.

To compare the work steps

Common Workflow My Workflow
Setup Jekyll Locally Get Server
Deploy Site Locally Setup Server
Refine Setup Jekyll on Server
Deploy Site Locally Setup Jekyll Locally
Refine Deploy Site to Server
Deploy Site Locally Refine
Get Server Deploy Site to Server
Setup Server Refine
Setup Jekyll on Server Deploy Site to Server
Deploy Site to Server Beer
Beer Second Beer

A couple of reasons I prefer my workflow.

First, the psychological payoff doesn't happen until the gross stuff is out of the way. Setting up the server side is tedious and can be boring. But, it is necessary for your site to be up and running on your own server. The payoff being when your site is available to your buddy in Maine who can see the friggin awesome site you've built.

If you put the kudos and warm fuzzies at the beginning, meaning, you deploy your site locally and tell yourself how great it looks, it robs you have the drive needed to trudge through the server side setup. Science!

Second, there are many different variables to account for between your local machine and the server. For example, if you are building Jekyll from a Windows machine and serving it on Ubuntu there can often be dependency differences which you must troubleshoot. Best to start doing it right away (see first point).

Ok, have I persuaded you? No? Then why are you still reading? Ha!

Also, the one thing you'll have setting up the server side I did not is this guide. I plan to setup a new site walking while writing these articles to assure this guide is relevant. But if I miss anything, I'm available to help in the comments. It makes my day to save someone some development time.

2. Choose a Server Provided

Ever rented a server before? I hadn't either.

Here is my tip sheet laden with my opinion.

a. Don't Go Flashy

I don't recommend going with a flashy name. E.g, GoDaddy, HostGator, etc. The general rule is, if they are pushy with their marketing they probably aren't a solid choice.

The two solid choices right now are * Digital Ocean * Linode

b. Go with Linux

Oh! And go with Linux!

I had a CEO one time who forced me to use Windows on our server. Man, it was a flop.

First, Windows back-ends aren't well documented on the web. They cost more. There are fewer free tools. You know what, let me just refer you to others' rants.

There is a reason 80% (circa 2014) of servers are deployed using Linux, jus' sayin'.

c. Go Small and Scale

If you go with Digital Ocean or Linode, they both have reasonable start servers, which can in turn be scaled. Meaning, you can pay more later for additional server resources without having to completely rebuild your server.

Ok! For this article I'm going to use Linode. I like them. They've who I started with and was extremely happy with their quality and reliability.

3. Get a Server

Head over to

Linode

And Sign Up

Login, then go to Add Linode . Here select the smallest sized Linode as possible. When I started, the small servers were $5 a month--but it looks like they've gone up. My guess is, you can find them on sale occasionally.

You don't have to select the smallest--but I think it's plenty for a Jekyll blog.

Once you've selected the size of server, scroll to the bottom and select a location central to your audience. If there isn't one, then simply select the location closet to you.

Then select Add this Linode!

Once you've added your Linode you will be re-directed to your Linodes dashboard

Notice, the IP Address is the IP address of your very first server! Waahoo!

It'll take it a second, but the status of your linode should change from Being created to Brand new , when it does, you will be ready for the fun!

6. Setup Linux

Let's get Linux setup on your machine. Click on the name of your Linode.

This should load the server dashboard for your server. Looking something like this.

Don't be alarmed. There is a lot going on here, but we are going to taker it one step at a time. Don't worry, I got you.

First, let's tell the computer which manages your server to install Linux on it. You can do this by going to Deploy an Image

Beware ye Stackscripts!

A stackscript is a Linux script meant for a machine with newly installed Linux. The script tells the machine to do a bunch of automated setup work to prepare the machine for a particular task. In our case prepare our machine to be a server. I'm not going to show how to use them in this walkthrough. For a few reasons. We will learn more setting things up ourselves, and therefore, will be able to maintain it. Also, I've not found a stack which is specifically for Jekyll. Most of them have a lot of extra stuff we don't need.

Ok, back to work. Let's fill out our setup request

Be sure to save your password somewhere! Not a lot of ways to recover it. Once everything is selected hit Deploy

Your server will quickly be formatted and a fresh copy of Ubuntu 16.04 LTS installed. Oh, and I've not mentioned

5. SSH

SSH stands for secure shell access. Shell being the command prompt environment which Linux is based. This is going to be our main way of interacting with the server. It may feel terse and inhumane, but I strongly encourage you to embrace the command line. If you do, the powers of Linux will be yours for free.

And besides, I'm writing this tutorial around it, so you kinda must to keep following along.

Ok, let's fire up your machine. Open up the Linode dashboard and click on your linode's name. At the top right there be a box called Server Status and it is probably Powered Off . Let's turn it on by hitting the Boot button.

Wait until the status below shows your linode has fully booted.

Now, I'm assuming you are using Linux or Mac as your local operating system. On either, open a terminal and type

ssh root@your.ip.number.here

And press enter.

You should see something along the lines

[ladvien@ladvien ladvien.github.io]$ ssh root@your.ip.number.here
The authenticity of host 'your.ip.number.here (your.ip.number.here)' can't be established.
ECDSA key fingerprint is SHA256:ee2BPBSeaZAFbVdpWFj1oHLxdPdGoxCaSRl3lu6u2Fc.
Are you sure you want to continue connecting (yes/no)?

Type yes and hit enter.

You will then be prompted to enter the password entered as the root password during the setup phase in the Linode Manager.

6. Nginx Setup

You are now on your server. Do you feel a bit like Mr. Robot? Live the feeling. And don't let anyone give you a hard time for being a shell noob. Embrace the shell.

I'm not going to go Linux stuff in detail. Please refer to more in depth tutorial. They are all over the Internet. But, I will point out, the Tab key works as an auto-complete. This is the single most important tidbit of working in shell. Instead of having to type out a long file name, type the first two letters and hit tab. It'll try to fill it in for you.

Let's start our server setup.

Your server is simply a computer. But, we are going to install a program your computer which will cause anyone visiting your IP address in the browser to see parts of your file system. The visitor's browser loads information from your file system and, if the files are in a language the browser understands, renders it for the visitor. These files will be in HTML and CSS produced by Jekyll.

Ok. The server program we will be using is called nginx . It is not the oldest or the most common. But I find its use straightforward and it seems pretty darn fast too.

But first, let's update Linux system. At your server's command line type.

sudo apt-get update

And hit enter. This causes all the repository links to be updated. The repository links are libraries of Internet addresses telling your computer when it can find free stuff! Everything is swag on Linux.

Let's take a second to check something before we start install nginx . Open any browser and type your linode's ip address in the browser address bar and hit enter. Most likely, nothing will happen. The browser is trying to make contact with your server, but there is no program installed on your server to serve the website to a browser. That's what nginx will do.

Let's download nginx now

sudo apt-get install nginx

It will ask if you want to install nginx say yes.

Once it's installed, let's test and make sure it works.

Type

nginx

It should respond with

nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to [::]:80 failed (98: Address already in use)
nginx: [emerg] still could not bind()

Great! This means it is installed and working. We just need to setup nginx to serve our files on our server address instead of 0.0.0.0:80 .

Also, open a browser and type your sever's IP address again. Hit enter. This time you should see:

Wow, your are now serving an html to the world, for anyone who visits your website. Pretty cool, eh? I think so.

Want to see something pretty cool?

Type (note, do not include sudo here)

nano /var/www/html/index.nginx-debian.html

You should see the content of the html file being served by nginx .

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Change

<h1>Welcome to nginx!</h1>

To

<h1>Welcome to the Jungle, baby!</h1>

Then hit CTRL + O , which should save the file. Then hit CTRL + X to exit the nano editor.

Now, switch back to your browser, go back to your website's IP address, and hit refresh. You should see.

Not seeing it? You didn't change the <title> instead of the <h1> , right? Ask me how I know that...

Friggin awesome! Let's move on to setting up Nginx, so you can serve your own website.

Linode actually has a great walkthrough on setting up Nginx.

But, for now, are going to stick with the basic nginx setup. There will other articles in this series where I show how to edit nginx to make the website better.

7. Jekyll

Let's setup Jekyll locally. To follow utilize Jekyll we are going to need to download and install the following programs.

Ruby

Ruby is programing environment which contains a package manager which we will use a lot called [gem](https://en.wikipedia.org/wiki/RubyGems) . For example, when we type gem install cool-program it is the ruby environment pulling the cool-program from the Internet and installing it on your machine.

Bundler

Bundler is a program which helps pull all the dependencies needed to run a program together. As they say in the README, "Bundler makes sure Ruby applications run the same code on every machine."

Git

Git is version control program. It also has the ability to pull source code off line. We are going to use it at first to pull a theme off line, but eventually, we will manage your website Jekyll source code with it.

Homebrew (Mac Only)

Homebrew , often referred to sa Brew, is a program which is like apt for Linux. It is a command line tool which lets you pull programs from the Internet and installs them locally.

Ok, let's get going

At your local computer's terminal type:

Linux

sudo apt-get install ruby
gem install jekyll

Mac

To setup Ruby correctly on Mac we are going to install a command line package manager for Mac called brewed . This is the equivalent of apt in Linux.

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install ruby
gem install jekyll
gem install bundler

8. Get a Jekyll Starter

Jekyll is great for creating websites, but there is a lot of boilerplate. I found it much easier to clone someone else's Jekyll starter site than make my own from scratch.

For this series we are going to use the Neo-HPSTR theme.

Open the terminal and pick a directory where you would like to put a copy of your website. For me, I'm Linux and will use the home directory.

Now, let's download our theme.

git clone https://github.com/aron-bordin/neo-hpstr-jekyll-theme

Git clones the neo-HPSTR theme from the Internet and puts it in a directory called /neo-hpstr-jekyll-theme Feel free to rename the directory the name of your website. For example, my directory is called ladvien.com We are getting to putting this website on-line, just a few more steps.

9. Build the Jekyll Theme

Open your website's directory

cd neo-hpstr-jekyll-theme

And enter

bundler install

This will pull all the need programs to make this theme build on your computer. Note, you may be required to enter your password for file access.

Ok, moment of truth. Type

bundle exec jekyll build

You should see a response similar to

Configuration file: /home/ladvien/neo-hpstr-jekyll-theme/_config.yml
       Deprecation: The 'gems' configuration option has been renamed to 'plugins'. Please update your config file accordingly.
            Source: /home/ladvien/neo-hpstr-jekyll-theme
       Destination: /home/ladvien/neo-hpstr-jekyll-theme/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
                    done in 1.103 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
 ```
But, if you didn't get any errors, you should be good.

Breaking this down, we used the `bundler` program to execute the `jekyll` program.  We passed the `build` command to the `jekyll` program, which tells `jekyll` to take all your jekyll files and compile them into your website.  The `bundler` program made sure `jekyll` had everything it needed to compile correctly.

In your file explorer, navigate to your website directory and enter the `_site` directory.  This directory contains your entire website after compilation.

![jekyll_site_folder](/images/the_site_folder.png)

Open this folder and then double click on the file `index.html`.  This should open your website locally in the browser.

![jekyll_site_locally](/images/local_jekyll.png)  

But this isn't what we want.  Let's get it on the webserver we setup.

Open the command prompt and switch directories to your website's main directory.  Then, type

scp -r site/* root@your.website.ip.address:/var/www/html/ ``` This should copy all of your compiled website files to your website. Go to your website address and you should see the website on-line! _Booyah!

10. That It?

Noooooo , this was the bare minimum setup. Here's a list of what I plan to tackle in this series.

  • Editing the _config.yml file to customize your theme
  • Setup your code on Github
  • Adding SSL encryption
  • Tweaking the server to zip assets before sending them to your viewers
  • Make the server more secure -- this is called hardening
  • Create a script which will automatically compile Jekyll, send it to Github , and then copy the compiled files to your website.
Creating a GPU Accelerated Deep-Learning Environment on Arch Linux

This article logs a weekend of efforts to create a deep-learning environment which meets the following criteria

It was a tough one.

UPDATE: 2019-01-19

It seems the Anaconda conda install tool now takes care of the gpu setup.

The following steps:

  • Install NVIDIA
  • Downgrade CUDA to match CDNN

Can now be replaced by installing tensorflow-gpu after installing Anaconda.

Run the following once conda is setup:

conda install -vv tensorflow-gpu

TL;DR

There was error I had a hell of a time debugging. Installing the toolchain is fairly straightforward, except CUDA. At the time of writing this article (2018-04-29), there is a version mismatch between CUDA and CUDNN in the Arch Linux repositories.

This results in an the following error every time I tried to import tensorflow in Python.

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

The Arch Linux package CUDA was pulling the latest version 9.1.1 (at writing) and the Arch Linux package CUDNN was looking for version 9.0. That little mismatch cost me 10 hours.

0. Other Arch Linux Deep-Learning Articles

There are a couple other Arch Linux deep-learning setup walkthroughs. Definitely need to give these guys credit, they are smarter than me. However, neither walkthrough had everything I was looking for.

This article was alright. But it focused a lot on preparing Arch Linux from the bare metal, which is usually the right idea with Arch, if you are on a resource budget. For example, running on a server or Raspberry Pi. But the extra few bytes of RAM saved doesn't really justify the time spent on meticulous tunning when we will be talking in megabytes and not bytes. And let my immolation begin.

Also, this article doesn't include information on GPU support. Whaawhaa.

This one was a bit closer to what I need. In fact, I did use the middle part. However, the mismatch was not mentioned. Of course, it's not the author's fault. At the time he wrote it I'm guessing the repositories matched.

Alright, on to my attempt.

1. Install Antergos (Arch Linux)

I love me some Arch Linux. It's lightweight and avoids the long-term issues of other flavors. Plus, it is meant to be headless, so it's great for embedded projects. Given how many embedded projects I take on it made me accustomed to using daily, eventually, I made it my main desktop flavor. It should sound too Linux-snobby, though, I dual-boot it on my Mac Book Pro. The one issue with Arch Linux is it can be a little unfriendly to new users--or those with limited time and cannot be bothered with the nuances of setup. Enter Antergos.

Antergos is essentially Arch Linux with a desktop environment and a GUI installer. A perfect choice for my deep-learning endeavors. Really, you should check it out. Go now.

We're going to use it for this project.

Download the iso file

You'll need a little jumpdrive, 4gb should work.

I use Etcher as it makes it painless to create boot media.

After Etcher does its thing, insert the jumpdrive, open Etcher, and then select the Antergos iso file. Here's the usual warning, if you have anything on your jumpdrive it's about to get deleted forever.

Insert the media into the machine you want to install Arch on and boot from the jumpdrive.

Windows

You will need to hit a special key during the boot sequence to enter the BIOS' boot menu

Mac

While booting hold down the Option key.

If all goes well you should see a menu which says

Welcome to GRUB!

And then shows an Antergos boot menu. Select boot Antergos Live.

Once the boot sequence is finished you should see the Antergos desktop environment start and shortly after cnchi , which is Antergos' GUI installer

Select Install It . The installer is fairly self explantory. But, if you run in to any issues, please feel free to ask me questions in the comments. I'm glad to help.

Once the installer is complete you will be prompted to restart the computer. It's go time.

2. Install NVIDIA

When you boot up the installed Antergos open the terminal.

We will start with installing the base NVIDIA packages. As part of it, we are going to get the wrong version of CUDA. But, I found downloading the NVIDIA as whole packages and then replacing CUDA with an earlier version, much eaiser than trying to pull everything together myself.

Ok, here we go.

sudo pacman -S nvidia nvidia-utils cuda cudnn

That might take awhile.

...

So, how you been? Oh--wait, it's done.

Ok, to initialize the changes reboot.

sudo reboot now

3. Downgrade CUDA to match CDNN

That should have gotten everything at once. Now, let's downgrade CUDA from 9.1 to 9.0 .

wget https://archive.archlinux.org/packages/c/cuda/cuda-9.0.176-4-x86_64.pkg.tar.xz

This downloads a pkg file for CUDA 9.0, which is what the most recent version of Tensorflow is expecting (at this time, 1.8). I found the easiest way to replace CUDA 9.1 with 9.0 to simply double click on the file we downloaded from the GUI file browser. This opens it in Antergos' answer to a GUI based package manager. It will warn you this package will downgrade your CUDA version and ask you to Commit to the changes. Hit the commit button.

Wait for the file to be replaced before moving on.

4. Anaconda

Anaconda is a great package manager for data (mad) scientist tools. It is Python centric, but also supports R and other stuff I don't know how to use yet.

We will be using it to prepare our system to support deep-learning projects.

Download the Linux version suited for your computer.

Once the file is downloaded right click on the file and select Show In Folder . Once there, right-click in the open space and select Open in Terminal .

Make Anaconda executable and then run it.

chmod +x Anaconda3-5.1.0-Linux-x86_64.sh
./Anaconda3-5.1.0-Linux-x86_64.sh

The Anaconda installtion is off and running. It will ask you to agree to a form. After, it will ask whether you want to install Anaconda in its default directory. We do.

Now, it will install every data scientist package known to existance. Mwhahaa. Erm.

When it asks

Do you wish the installer to prepend the Anaconda3 install location
to PATH in your /home/ladvien/.bashrc ? [yes|no]

Type yes . This will make Anaconda accessible throughout your system.

Of course, this new path variable will not be loaded until you start your user session again (log off and back on). But we can force it to load by typing.

cd ~
source ./bash_profile

Double check we are using the Anaconda version of Python.

[ladvien@ladvien ~]$ which python
/home/ladvien/anaconda3/bin/python

If it doesn't refer to anaconda somewhere in this path, then we need to fix that. Let me know in the comments below and I'll walk you through correcting it.

If it does, then let's move forward!

6. Tensorflow and Keras

Alright, almost done.

Let's go back to the command prompt and type:

sudo pacman -S python-pip

This will download Python's module download manager pip . This is usually packaged with Python, but isn't included on Arch.

How'd we get Python? Anaconda installed it.

Let's download Tensorflow with GPU support.

sudo pip install tensorflow-gpu --upgrade --ignore-installed

Let's test and see if it's worked. At command prompt type

python

And in Python

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

You should a response similar to

2018-05-01 05:25:25.929575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.66GiB
2018-05-01 05:25:25.929619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-01 05:25:26.333292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-01 05:25:26.333346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-01 05:25:26.333356: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-01 05:25:26.333580: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5442 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1
2018-05-01 05:25:26.455082: I tensorflow/core/common_runtime/direct_session.cc:284] Device mapping:
/job:localhost/replica:0/task:0/device:GPU:0 -> device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1

Which means you are good to go! At this point, Python is setup to do accelerated deep-learning. Most deep-learning peeps stop here, as Python is the deep-learning language. However, like a pirate I'm an R sort of guy.

7. Installing R and RStudio

To setup a GPU accelerated deep-learning environment in R there isn't a lot of additional setup. There are keras and tensorflow R packages, which connect the R code to a Python backend.

To get R in Arch Linux open the terminal and type:

sudo pacman -S r

And what's R without RStudio? Actually, it's still R, which is bad-ass unto itself--but anyway, let's not argue. Time to download RStudio...because you insist.

In terminal

cd ~
git clone https://aur.archlinux.org/rstudio-desktop-bin.git
cd rstudio-desktop-bin
makepkg -i

After, you should find RStudio in the Antergos Menu.

You can right click on the icon and click Add to Panel to make a shortcut.

Open up RStudio and lets finish this up.

8. R Packages for Deep Learning

Inside RStudio's code console type

install.packages("tensorflow")

This will install the package which will help the R environment find the Tensorflow Python modules.

Then,

install.packages("keras")

Keras is the boss package, it's going to connect all the Python modules needed to Tensorflow for us to focus on just the high-level deep-learning tuning. It's awesome.

Once the keras package is installed, we need to load it and connect it to the unerlying infrastructure we setup.

library(keras)
install_keras(method = "conda", tensorflow = "gpu")

This will install the underlying Keras packages using the Anaconda ecosystem and Tensorflow Python modules using CUDA and CUDDN. Note, a lot of this we setup manually, so it should report the needed modules are already there. However, this step is still needed to awaken R to the fact those modules exist.

Alright, moment of truth. Let's run this code in R.

library(tensorflow)

with(tf$device("/gpu:0"), {
  const <- tf$constant(42)
})

sess <- tf$Session()
sess$run(const)

If all went well, it should provide you with a familiar output

> library(tensorflow)
>
> with(tf$device("/gpu:0"), {
+   const <- tf$constant(42)
+ })
/home/dl/.virtualenvs/r-tensorflow/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
>
> sess <- tf$Session()
2018-05-01 05:55:07.412011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.7715
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.38GiB
2018-05-01 05:55:07.412057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-01 05:55:07.805042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-01 05:55:07.805090: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-05-01 05:55:07.805115: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-05-01 05:55:07.805348: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5150 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
> sess$run(const)
[1] 42

9. Scream Hello World

And the payoff?

Using the prepared Deep Dream script from the Keras documentation

Voila!

Google Vision API using Raspberry Pi and Node

This is a jumpstart guide to connecting a Raspberry Pi Zero W to the Google Vision API.

1. Get an Account

Sadly, Google Vision API is not a completely free service. At the time of writing an API account provides 1000 free Google Vision API calls a month. Then, it's a $1.00 for each 1000 calls.

I know, I know, not too bad. But this isn't a commercial project. I'm wanting to use it for a puttering little house bot. If my wife gets a bill for $40 because I decided to stream images to the API, well, it'll be a dead bot. Anyway, I thought I'd still explore the service for poo-and-giggles.

To get an account visit

And sign-in with an existing Google account or create one.

2. Enter Billing Information

Now, here's the scary part, you've must enter your billing information before getting going. Remember, you will be charged if you go over 1000 calls.

Again, if you exceed your 1,000 free calls you will be charged. (What? I said that already? Oh.)

2. Enable Cloud Vision API

After setting up billing information we still need to enable the Cloud Vision API. This is a security feature, essentially, all Google APIs are disabled by default so if someone accidentally gets access they don't unleash hell everywhere.

Now search for Vision and click the button. Here there should be a glaring Enable button. Press it.

The last thing we need to do is get the API key. This needs to be included in the API call headers for authentication.

Do not let anyone get your API key. And do not hardcode it in your code. Trust me, this will bite you. If this accidentally gets pushed onto the web, a web crawler will find it quickly and you will be paying bajillions of dollars.

Let this article scare you a bit.

Let's go get your API Key. Find the Credentials section

You probably wont see any credentials created, as you've probably have not created any yet.

Let's create a new API Key.

I'd name the key something meaningful and limit it to only the Google Cloud API.

Go ahead and copy your API key, as we will need it in the next step.

3. Raspberry Pi Side Setup

The articles listed at the top of this one will help you setup the Raspberry Pi for this step. But if you are doing things different, most of this should still work for you. However, when we get to the part of about environment variables, that'll be different for other Linux flavors.

Start by SSH'ing into your Pi.

And update all packages

sudo pacman -Syu

We're going to create an environment variable for the Google Cloud Vision API. This is to avoid hardcoding your API key into the code further down. That will work , but I highly recommend you stick with me and setup an environment variable manager to handle the API.

Switch to the root user by typing

su

Enter your password.

The next thing we do is add your Google Vision API Key as an environment variable to the /etc/profile file, this should cause it to be intialized at boot.

Type, replacing YOUR_API_KEY with your actual API Key.

echo 'export GOOGLE_CLOUD_VISION_API_KEY=YOUR_API_KEY' >> /etc/profile

Now reboot the Pi so that takes effect.

sudo reboot

Log back in. Let's check to make sure it's loading the API key.

echo $GOOGLE_CLOUD_VISION_API_KEY

If your API key is echoed back, you should be good to go.

4. Project Setup

Let's create a project directory.

mkdir google-vis
cd google-vis

Now let's initialize a new Node project.

npm init

Feel free to customize the package details if you like. If you're lazy like me, hit enter until you are back to the command prompt.

Let's add the needed Node libraries. It's one. The axios library, which enables async web requests.

npm axios

Also, let's create a resource directory and download our lovely test image. Ah, miss Hepburn!

Make sure you are in the google-vis/resources project directory when downloading the image.

mkdir resources
cd resources
wget /images/hepburn.png

5. NodeJS Code

Create a file in the go-vis directory called app.js

nano app.js

Then paste in the code below and save the file by typing CTRL+O and exiting using CTRL+X.

// https://console.cloud.google.com/
const axios = require('axios');
const fs = require('fs');

const API_KEY = process.env.GOOGLE_CLOUD_VISION_API_KEY

if (!API_KEY) {
  console.log('No API key provided')
} 

function base64_encode(file) {
    // read binary data
    var bitmap = fs.readFileSync(file);
    // convert binary data to base64 encoded string
    return new Buffer(bitmap).toString('base64');
}
var base64str = base64_encode('./resources/audrey.jpg');

const apiCall = `https://vision.googleapis.com/v1/images/:annotate?key=${API_KEY}`;

const reqObj = {
    requests:[
        {
          "image":{
            "content": base64str
          },
          "features":[
                {
                    "type":"LABEL_DETECTION",
                    "maxResults":5
                },
                {
                    "type":"FACE_DETECTION",
                    "maxResults":5            
                },
                {
                    "type": "IMAGE_PROPERTIES",
                    "maxResults":5
                }
            ]
        }
      ]
}

axios.post(apiCall, reqObj).then((response) => {
    console.log(response);
    console.log(JSON.stringify(response.data.responses, undefined, 4));
}).catch((e) => {
    console.log(e.response);
});

This code grabs the API key environment variable and creates a program constant from it.

const API_KEY = process.env.GOOGLE_CLOUD_VISION_API_KEY

This is how we avoid hardcoding the API key.

6. Run

Let's run the program.

node app.js

If all went well you should get similar output to below

data: { responses: [ [Object] ] } }
[
    {
        "labelAnnotations": [
            {
                "mid": "/m/03q69",
                "description": "hair",
                "score": 0.9775374,
                "topicality": 0.9775374
            },
            {
                "mid": "/m/027n3_",
                "description": "eyebrow",
                "score": 0.90340185,
                "topicality": 0.90340185
            },
            {
                "mid": "/m/01ntw3",
                "description": "human hair color",
                "score": 0.8986981,
                "topicality": 0.8986981
            },
            {
                "mid": "/m/0ds4x",
                "description": "hairstyle",
                "score": 0.8985265,
                "topicality": 0.8985265
            },
            {
                "mid": "/m/01f43",
                "description": "beauty",
                "score": 0.87356544,
                "topicality": 0.87356544
            }
        ],
  ....
]

6. And so much more...

This article is short--a jump start. However, there is lots of potential here. For example, sending your own images using the Raspberry Pi Camera

Please feel free to ask any questions regarding how to use the output.

There are other feature detection requests.

However, I'm going to end the article and move on to rolling my on vision detection systems. As soon as I figure out stochastic gradient descent.

1B1 Robot

Not too long ago there was a post on Hackaday about a little four-wheeled bot made with a Raspberry Pi and some eBay motor drivers.

Raspberry Pi Zero Drives Tiny RC Truck

I really liked the little chassis, ordered one, and was happy to find it was delivered with the motors already mounted. (As I become an aged hacker, it's the little time savers which are genuinely appreciated.)

On buying the chassis I'd already decided to use one of my Raspberry Pi Zero W's (rp0w) to control the bot. I really like Arch Linux on the rp0w. It's light weight and the packages are well curated. Again, it's the little time savers. I liked the combination even more since I found a way to set the rp0w headlessly, which meant I could go from SD card to SSH'ing into little Linux board.

Coincidentally, I purchased several DRV8830 modules from eBay. This is a sad story -- I've played with the DRV8830 chip a long time ago:

Because Sparkfun did a great job of documenting the IC and creating an Arduino library to go with it. I was disheartened to find Sparkfun and EOL'ed the boards.

Probably because buttholes like me kept buying them off eBay. I've got some mixed feelings here -- one of them is guilt.

Anyway, I was surprised to find the mounting holes on the DRV8830s matched a set on the chassis. I decided to attempt using one module to drive two motors, thereby only needing two DRV8830 modules to drive the entire bot.

I've had some thermal paste lying about for years--it works nicely as an adhesive. Also, I was hoping to use the chassis to heatsink the motor drivers.

A bit of a tangent. At work one of the skills which is useful for our team is being able to work with APIs. For awhile I've wanted to learn NodeJS, since it seems to be the goto framework for solid back-end business applications. It doesn't hurt StackOverflow's Developer Survey for the last few years has shown JavaScript is a solid language to stay sharp on. Specifically, being able to work within the NodeJS framework makes one pretty darn marketable.

Ok, for these reasons I decided to build this bot using NodeJS. I've written a separate article on setting up NodeJS, working with i2c-bus, and porting the DRV8830 Sparkfun library to NodeJS.

  • Not yet written (shesh, been busy. Judge much? :P)

It didn't take time at all to get the little motor spinning using NodeJS largely due to Michael Hord's (Sparkfun) MiniMoto library. (Again, some guilt here.)

I drove the motor shown using two series Li-Ion batteries connecting to a buck converter set to output ~5.0v. The motor spun nicely and pulled around 200mA. However, the real test would be connecting to two geared motors per DRV8830.

'use strict';
var i2c = require('i2c-bus'), i2c1 = i2c.openSync(1);
var sleep = require('sleep');
var drv8830 = require('./drv8830');

const motorAddressOne = 0x61;
const motorAddressTwo = 0x67;

var motor1 = new drv8830(motorAddressOne, i2c1);
var motor2 = new drv8830(motorAddressTwo, i2c1);

motor1.drive(50);
motor2.drive(50)
sleep.msleep(3500);
motor1.drive(-50);
motor2.drive(50);
motor1.stop()
motor2.stop()

It was time to wire up the chassis motors and create a test of the system. The wire used was some eBay single core aluminum wire (the cheap stuff). Wiring was pretty straightforward.

However, I did make a little i2c bus board from perfboard and JST connectors. Adding both ceramic and electrolytic decoupling capacitors for smoothing and to aid peak discharge.

Note the heaping amount of heatsink goop on the underside of the perfboard, this was a hacker's solution to galvanically isolating the perfboard from the steel chassis.

One-B-One Schematic

+--------------+                    +------------------+           +------------------+
|              |                    |                  |           |                  |
|              +--+LEAD1+----+OUT1+-+                  |VCC----+5V-+                  |
|              |                    |                  |           |                  |
| Motor 1      +--+LEAD2+----+OUT2+-+   DRV8830+A      +----GND----+  Buck Regulator  |
|              |                    |                  |           |                  |
|              |                    |                  |           |                  |
|              |                    |                  |           |                  |
+--------------+                    +-----+---+--------+           +--+--+------------+
                                          |   |                       |  |
                                      SDA1|   | SCL1               5V |  | GND
                                          |   |                       |  |
                                          |   |                       |  |
                                          |   |                       |  |
                                          |   |                       |  |
                                     +----+---+--------+              |  |
                                     |                 |              |  |
                                     |                 |              |  |
                        +----+VCC2+--+  ADUM1250ARZ    ++VCC1+--------+  |
                        |            |                 |                 |
                        |   ++GND2+--+                 ++GND1+-----------+
                        |   |        |                 |
                        |   |        +----+--+---------+
                        |   |             |  |
                        |   |         SDA1|  | SCL2
                        |   |             |  |
                        |   |             |  |
                        |   |             |  |
                  +-----+---+-------------+--+-------+

                            Raspberry Pi Zero W

The ADUM1250ARZ is a bi-directional galvanic isolator for digital communication up to 1mbs. It's the first chip I ever designed a PCB for and it's still my favorite. Essentially, the ADUM1250 seperates the rp0w from the noisy motors -- and more importantly, if I screw something up on the motor side, won't kill my rp0w. The ADUM1250 is not necessary for most people, just me

The last bit I had to figure out was the the Raspberry Pi's power. I attempted to use a single Li-Ion battery and a boost regulator to power it, but the regulator's I bought were DOA.

Then I remembered the load-sharing and boost converter circuit salvaged from a battery bank. The charge circuit was built for Li-Po chemistry and the only Li-Po I had lying about was a 350mA. I wired it up and was surprised the whole thing worked, with the added benefit of being able to charge the rp0w battery without disconnecting it. Booyah!

The last bit I did for the video. I pulled the npm package keypress and wrote this little program.

'use strict';
var i2c = require('i2c-bus'), i2c1 = i2c.openSync(1);
var sleep = require('sleep');
var drv8830 = require('./drv8830');
var keypress = require('keypress');

const motorAddressOne = 0x61;
const motorAddressTwo = 0x67;

var motor1 = new drv8830(motorAddressOne, i2c1);
var motor2 = new drv8830(motorAddressTwo, i2c1);

// var speed = 63;
var turnSpeed = 33;
var driverSideSpeed = 63;
var passangerSideSpeed = 63; 

// make `process.stdin` begin emitting "keypress" events 
keypress(process.stdin);

// listen for the "keypress" event 
process.stdin.on('keypress', function (ch, key) {  
  if (key && key.ctrl && key.name == 'c') {
    process.stdin.pause();
  }
  switch(key.name) {

    case 'w':
        motor1.drive(driverSideSpeed);
        motor2.drive(passangerSideSpeed);
        break;
    case 's':
        var motors = [motor1, motor2];
        setDriveWithAcceleration(motors, driverSideSpeed, 10);
        break;
    case 'd':
        motor1.drive(turnSpeed);
        motor2.drive(turnSpeed*-1);
        break;
    case 'a':
        motor1.drive(turnSpeed*-1);
        motor2.drive(turnSpeed);
        break;
    default:
        motor1.stop();
        motor2.stop();
  }

});
process.stdin.setRawMode(true);
process.stdin.resume();

var setDriveWithAcceleration = function(motors, desiredSpeed, accelTimeMilliSec) {
    for(var i = 0; i < desiredSpeed; i++){    
        motors[0].drive(i);
        motors[1].drive(i);
        sleep.msleep(accelTimeMilliSec);
    }
}

Then, I shot the following video and called it donesies.