Distributing Machine Learning Jobs

Boss

A human sends machine learning job to the Boss. A Job is JSON object containing the the desired machine learning script and the parameters needed for successful execution. The Boss stores the Job and Creates an Order. The Order is another JSON object representing the state of a requested Job.

         Job #4
 0                        Boss
/|\ +----------------->   ____
/ \                       +""+
                          +__+
                         [ ==.]`)
                   +----+====== 0 +--+
                   +                 |
                Order #3           Job #3
                   |                 |
                Order #2           Job #2
                   |                 |
                Order #1           Job #1

Worker

The Worker uses node-scheduler to fire an HTTP request to the Boss letting it know the Worker is “bored.” The Boss will then search through the Orders for the oldest unassigned Order, if it finds one, it will return this Order to the Worker as a JSON object. At this point, the Boss updates the Order’s status to “assigned.”

The Worker sends another HTTP request, this time requesting the Job information associated with the Order the Boss had assigned.

          Boss
          ____
          +""+
          +__+
         [ ==.]`)
   +----+====== 0 +--+
   +                 +            If the Boss finds an unassigned
Order #3           Job #3         Order it is returned. The worker requests the
   +                 +            related Job. The Boss updates the
Order #2           Job #2         the Order status to "assigned"
   +                 +                   Worker
Order #1           Job #1<-+              ____
  ^                        +----------->  +""+
  |                                       +__+
  +------------------------------------+ [ ==.]`)
          The worker checks with
          the boss periodically
          for the oldest submitted
          Order.

The worker passes the Job information into the appropriate machine learning Python script via stdout. The script is executed and whether successful or not, an Outcome object is passed back to the Worker Node through stdout.

Worker
 ____
 +""+     Job #1
 +__+ +--------------->  Python Script
[ ==.]                         +
  ^                            |
  |                            |
  |                            v
  +------------------------ Outcome #1

The Worker then makes a callback API call and passes the Outcome object to the Boss to be stored in the database

          Boss                                Worker
          ____                                 ____
          +""+                                 +""+
          +__+                                 +__+
         [ ==.]`)                             [ ==.]`)
   +----+====== 0 +------+                       +
   |         |           |                       |
Order #3   Job #3     Outcome #1 <---------------+
   |         |
Order #2   Job #2
   |         |
Order #1   Job #1

MongoDB on Mac

brew install mongodb
nano /usr/local/etc/mongod.conf

Your file should look something like this

systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb
net:
  bindIp: 127.0.0.1

Change the dbPath to where you’d like Mongo to store your databases. Then, start and enable Mongo with brew’s services.

brew services start mongo

Sample Objects

Order

{
    "_id" : "5bcc93d67f0b3f4844c87c7a",
    "jobId" : "5bcc93d67f0b3f4844c87c79",
    "createdDate" : ISODate("2018-10-21T14:57:26.980Z"),
    "status" : "unassigned",
}

Job

{
    "_id" : ObjectId("5bcc93d67f0b3f4844c87c79"),
    "hiddenLayers" : [ 
        {
            "activation" : "relu",
            "widthModifier" : 4,
            "dropout" : 0.2
        }, 
        {
            "activation" : "relu",
            "widthModifier" : 2.3,
            "dropout" : 0.2
        }, 
        {
            "activation" : "relu",
            "widthModifier" : 1.3,
            "dropout" : 0.2
        }
    ],
    "dataFileName" : "wine_data.csv",
    "scriptName" : "nn.py",
    "projectName" : "wine_data",
    "depedentVariable" : "con_lot",
    "crossValidateOnly" : true,
    "crossValidationCrossingType" : "neg_mean_squared_error",
    "batchSize" : 100000,
    "epochs" : 3000,
    "patienceRate" : 0.05,
    "slowLearningRate" : 0.01,
    "loss" : "mse",
    "pcaComponents" : -1,
    "extraTreesKeepThreshd" : 0,
    "saveWeightsOnlyAtEnd" : false,
    "optimizer" : "rmsprop",
    "lastLayerActivator" : "",
    "learningRate" : 0.05,
    "l1" : 0.1,
    "l2" : 0.1,
    "minDependentVarValue" : 0,
    "maxDependentVarValue" : 1500,
    "scalerType" : "standard",
}

Outcomes

{
    "_id" : ObjectId("5bcc88fa7f0b3f4844c87c78"),
    "status" : 200,
    "jobId" : "5bcc724d7449f746b5aa6fe8",
    "loss" : 15109.168650257,
    "metric" : 14281.4453526111,
}

Code

Worker

server.js

var express = require('express');
var bodyParser = require('body-parser');
var pythonRunner = require('./preprocessing-services/python-runner');
var schedule = require('node-schedule');
var axios = require('axios');
var fs = require('fs');
var {Worker} = require('./worker/worker');

// Get Worker Node configuration
var fs = require('fs');
var config = JSON.parse(fs.readFileSync('./python-scripts/worker-node-configure.json', 'utf8'));

if(!config) { 
    console.log('No configuration file found.')
    process.exit();
}

// Boss' address
bossAddress = config.bossAddress;
nodeName = config.nodeName;
console.log(`Boss's address is ${bossAddress}`);
console.log(`This worker's name is ${nodeName}`);

var worker = new Worker('bored');

// Start server and add Middleware
var app = express();
const port = 3000;
app.use(bodyParser.json())

// Start checking for Boredom
var j = schedule.scheduleJob('*/1 * * * *', function(){
    if (worker.status === 'bored') {
        console.log('Worker is bored.');
        axios({
            method: 'post',
            url: bossAddress + `/bored/${nodeName}`
        }).then((response) => {
            let orderId = response.data._id
            let jobId = response.data.jobId;
            console.log(`Boss provided jobID #${jobId}`);
            axios({
                method: 'get',
                url: bossAddress + `/retrieve/job/${jobId}`
            }).then((response) => {
                let job = response.data;
                console.log(`Worker found the details for jobID #${jobId}`);
                job.callbackAddress = bossAddress;
                job.assignmentId = orderId;
                pythonRunner.scriptRun(job, worker)
                .then((response) => {
                    console.log('Worker started job, will let Boss know when finished.');
                });
            }).catch((error) => {
                console.log(error);
            });
        }).catch((error) => {
            console.log('Failed to find new job.')
        });
    }
});

// Python script runner interface
app.post('/scripts/run', (req, res) => {
    try {
        let pythonJob = req.body;
        pythonRunner.scriptRun(pythonJob)
        .then((response) => {
            console.log(response);
            res.send(response);
        });
    } catch (err) {
        res.send(err);
    }
});

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});

python-runner.js

let {PythonShell} = require('python-shell')
var fs = require('fs');
var path = require('path');
var axios = require('axios');

var scriptRun = function(pythonJob, worker){
    console.log(worker);
    worker.status = 'busy';
    return new Promise((resolve, reject) => {
        try {
            let callbackAddress = pythonJob.callbackAddress;
            let options = {
                mode: 'text',
                pythonOptions: ['-u'], // get print results in real-time
                scriptPath: path.relative(process.cwd(), 'python-scripts/'),
                args: [JSON.stringify(pythonJob)]
            };
            PythonShell.run(pythonJob.scriptName, options, function (err, results) {
                if (err) throw err;
                try {
                    result = JSON.parse(results.pop());
                    if(result) {
                        console.log(callbackAddress + '/callback')
                        axios({
                            method: 'post',
                            url: callbackAddress + '/callback',
                            data: result
                        }).then((response) => {
                            console.log(`Worker let let the Boss know job is complete.`);
                            worker.status = 'bored';
                        }).catch((error) => {
                            worker.status = 'bored'
                        });
                    } else {
                        worker.status = 'bored'
                    }
                } catch (err) {
                   worker.status = 'bored'
                }
            });
            resolve({'message': 'job started'});
        } catch (err) {
            reject(err)
            worker.status = 'bored'
        }
    });
}
module.exports = {scriptRun}

Boss

server.js

const express = require('express');
const bodyParser = require('body-parser');
const axios = require('axios');
var timeout = require('connect-timeout')

const {mongoose} = require('./backend/database-services/dl-mongo');
const workerNode = require('./backend/services/worker-node');
const work = require('./backend/services/work');

// Database collection
var {Job} = require('./backend/database-services/models/job');
var {Order} = require('./backend/database-services/models/order');


const bossAddress = 'http://maddatum.com'

// Server setup.
var app = express();
const port = 3000;

// Add request parameters.
app.use((req, res, next) => {
    res.setHeader('Access-Control-Allow-Origin', '*');
    res.setHeader('Access-Control-Allow-Headers', 
                  'Origin, X-Requested-With, Content-Type, Accept'); 
    res.setHeader('Access-Control-Allow-Methods', 'GET, POST, PUT, PATCH, DELETE, OPTIONS');
    next();
});

// Add the middleware.
app.use(bodyParser.json())

/*
This route is for creating new Jobs on the queue
*/
app.post('/job/:method', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        switch (req.params.method) {
            case 'create':
                work.create(req.body)
                .then((response) =>{
                    res.send(response);
                }).catch((error) => {
                    res.send({'error': error })
                });
                break;
            default:
                res.send({'error': 'No method selected.'})
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err})
    }
});

/*
This route is for adding new WorkerNodes to the database.
*/
app.post('/worker-node/:method', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        switch (req.params.method) {
            case 'create':
                workerNode.create(req.body)
                .then((response) =>{
                    res.send(response);
                }).catch((error) =>{
                    res.send({'error': error.message});
                })
            break;
            default:
                throw err;
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

app.post('/callback', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    let outcome = req.body;
    console.log(outcome);
    try {
        work.file(outcome)
        .then((response) =>{
            console.log(response);
            res.send(response);
        })
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

/*
Route for Worker Node to let the Boss know it needs a Job.
The oldest Job which is unassigned is provided.
*/
app.post('/bored/:id', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        let workerNodeId = req.params.id;
        console.log(`${workerNodeId} said it's bored.`);
        if (!workerNodeId) { throw {'error': 'No id provided.'}}
        Order.findOne({ status: 'unassigned' }, {}, { sort: { 'created_at' : -1 } }, (err, order) => {
            console.log(`Found a work order, #${order._id}`)
            order.status = 'assigned';
            console.log(`Provided ${workerNodeId} with ${order.jobId}`);
            order.save()
            .then((doc) => {
                console.log(`Updated the Order #${doc.id}'s status to ${order.status}`);
                res.send(doc);
            });
        })
        .catch((err) => {
            res.send({'message': `No work to do.  Don't get used to it.`})
        });
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
});

/*
Retrieve Orders or Job
*/
app.get('/retrieve/:type/:id?/:param1?', (req, res) => {
    if (!req.body) { return { 'message': 'No request provided.' }};
    try {
        let type = req.params.type;
        let id = req.params.id;
        let param1 = req.params.param1;
        switch(type) {
            case 'order':
                Order.find().then((response) => {
                    res.send(response);
                });
                break;
            case 'job':
                if (!id)  { throw {'error': 'Missing Id'} }
                Job.findOne({'_id': id })
                .then((response) => {
                    res.send(response);
                });
                break;
            default:
                throw error
        }
    } catch (err) {
        res.send({'error': 'Error with request shape.', err })
    }
})

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});
Using Python, NodeJS, Angular, and MongoDB to Create a Machine Learning System

I’ve started designing a system to manage data analysis tools I build.

  1. An illegitimate REST interface
  2. Interface for existing Python scripts
  3. Process for creating micro-services from Python scripts
  4. Interface for creating machine learning jobs to be picked up my free machines.
  5. Manage a job queue for work machines to systematically tackle machine learning jobs
  6. Data storage and access
  7. Results access and job meta data
  8. A way to visualize results

I’ve landed on a fairly complicated process of handling the above. I’ve tried cutting frameworks, as I know it’ll be a nightmare to maintain, but I’m not seeing it.

  • Node for creating RESTful interfaces between the HQ Machine and the Worker Nodes
  • Node on the workers to ping the HQ machine periodically to see if their are jobs to run
  • MongoDB on the HQ Machine to store the job results data, paths to datasets, and possibly primary data
  • Angular to interact with the HQ Node for creating job creation and results viewing UI.
  • ngx-datatables for viewing tabular results.
  • ngx-charts for viewing job results (e.g., visualizing variance and linearity )
  • Python for access to all the latest awesome ML frameworks
  • python-shell (npm) for creating an interface between Node and Python.

Utilizing all Machines in the House

Machine learning is a new world for me. But, it’s pretty dern cool. I like making machines do the hard stuff while I’m off doing other work. It makes me feel extra productive–like, “I created that machine, so any work it does I get credit for. And! The work I did while it as doing its work.” This is the reason I own two 3D-printers.

I’m noticing there is a possibility of utilizing old computers I’ve lying around the house for the same effect. The plan is to abstract a neural network script, install it on all the computers lying about, and create a HQ Computer where I can create a sets of hyperparameters passed to the Worker Nodes throughout the house.

Why? Glad I asked for you. I feel guilty there are computers used. There’s an old AMD desktop with a GFX1060 in it, a 2013 MacBook Pro (my son’s), and my 2015 MacBook Pro. These don’t see much use anymore, since my employer has provided an iMac to work on. They need to earn their keep.

How? Again, glad to ask for you. I’ll create a system to make deep-learning jobs from hyperparameter sets and send them to these idle machines, thus, trying to get them to solve problems while I’m working on paying the bills. This comes from the power of neural networks. They need little manual tweaking. You simply provide them with hyperparameters and let them run.

Here are the napkin-doodles:

+-Local------------------------------------------------------+
|                                                            |
|        ____                   ____      Each machine runs  |
|        |""|                   |""|      Node and Express   |
|  HQ    |__|             #1    |__|      server, creating   |
|       [ ==.]`)               [ ==.]`)   routes to Python   |
|       ====== 0               ====== 0   scripts using      |
|  The HQ machine runs          ____      stdin and stdout   |
|  Node and Express, but        |""|                         |
|  the routes are for     #2    |__|                         |
|  storing results in a        [ ==.]`)                      |
|  database.                   ====== 0                      |
|                               ____                         |
|                               |""|                         |
|                         #3    |__|        Worker           |
|                              [ ==.]`)     Nodes            |
|                              ====== 0                      |
|                                                            |
+------------------------------------------------------------+
+-Local------------------------------------------------------+
|                 Each worker Node checks         Workers    |
|        ____    with HQ on a set interval         ____      |
|        |""|       for jobs to run                |""|      |
|  HQ    |__|   <--------------------------+ #1    |__|      |
|       [ ==.]`)                                  [ ==.]`)   |
|       ====== 0                                  ====== 0   |
|       ^ |                                        ____      |
|       | |                                  #2    |""|      |
|       | +--------------------------------------->|__|      |
|       |             If there is a job, the      [ ==.]`)   |
|       |             Worker will send a GET      ====== 0   |
|       |              request for the job         ____      |
|       |                  parameters              |""|      |
|       |                                    #3    |__|      |
|       +-----------------------------------------[ ==.]`)   |
|         Once completed, the Worker updates HQ   ====== 0   |
|              with the job results.                         |
+------------------------------------------------------------+

Worker Nodes

The Worker Nodes code is pretty straightforward. It uses Node, Express, and python-shell to create a bastardized REST interface to create simple interactions between the HQ Node controlling the job queue.

Node Side

Here’s the proof-of-concept NodeJS code.

var express = require('express');
var bodyParser = require('body-parser');
var pythonRunner = require('./preprocessing-services/python-runner');

var app = express();
const port = 3000;

app.use(bodyParser.json())

// Python script runner interface
app.post('/scripts/run', (req, res) => {
    try {
        let pythonJob = req.body;
        pythonRunner.scriptRun(pythonJob)
        .then((response, rejection) => {
            res.send(response);
        });
    } catch (err) {
        res.send(err);
    }
});

app.listen(port, () => {
    console.log(`Started on port ${port}`);
});

The above code is a dead simple NodeJS server using Express. It is using body-parser middleware to shape JSON objects. The pythonJob object looks something like this (real paths names have been changed to help protect their anonymity).

{
    "scriptsPath": "/Users/hinky-dink/dl-principal/python-scripts/",
    "scriptName": "union.py",
    "jobParameters": {
    	"dataFileName": "",
        "dataPath": "/Users/hinky-dink/bit-dl/data/lot-data/wine_encoded/",
        "writePath": "/Users/hinky-dink/bit-dl/data/lot-data/wine_encoded/",
        "execution": {
        	"dataFileOne": "wine_2017_encoded.csv",
        	"dataFileTwo": "wine_2018_encoded.csv",
        	"outputFilename": "wine_17-18.csv"
        }
    }
}

Each of these attributes will be passed to the Python shell in order to execute data_prep.py. They are passed to the shell as system arguments.

Here’s the python-runner.js

let {PythonShell} = require('python-shell')
 
var scriptRun = function(pythonJob){    
    return new Promise((resolve, reject) => {
        console.log(pythonJob)
        try {
            let options = {
                mode: 'text',
                pythonOptions: ['-u'], // get print results in real-time
                scriptPath: pythonJob.scriptsPath,
                args: [pythonJob.jobParameters.dataFileName, 
                       pythonJob.jobParameters.dataPath, 
                       pythonJob.jobParameters.writePath,
                       JSON.stringify(pythonJob.jobParameters.execution)]
            };
            PythonShell.run(pythonJob.scriptName, options, function (err, results) {
                if (err) throw err;
                try {
                    result = JSON.parse(results.pop());
                    if(result) {
                        resolve(result);
                    } else {
                        reject({'err': ''})
                    }
                } catch (err) {
                    reject({'error': 'Failed to parse Python script return object.'})
                }
            });
        } catch (err) {
            reject(err)
        }
    });
}
module.exports = {scriptRun}

Python Side

Here’s the Python script in the above example. It is meant to detect what type of data is in a table. If it’s is continuous it leaves it alone (I’ll probably add normalization option as some point), if it is categorical, it converts it to a dummy variable. It then saves this encoded data on the Worker Node side (right now). Lastly, it returns a JSON string back to the node side.

"""
Created on Mon Jun 11 21:12:10 2018
@author: cthomasbrittain
"""

import sys
import json
#
filename = sys.argv[1]
filepath = sys.argv[2]
pathToWriteProcessedFile = sys.argv[3]

request = sys.argv[4]
request = json.loads(request)

try:
    cols_to_remove = request['columnsToRemove']
    unreasonable_increase = request['unreasonableIncreaseThreshold']
except:
    # If columns aren't contained or no columns, exit nicely
    result = {'status': 400, 'message': 'Expected script parameters not found.'}
    print(str(json.dumps(result)))
    quit()

pathToData = filepath + filename


# Clean Data --------------------------------------------------------------------
# -------------------------------------------------------------------------------

# Importing data transformation libraries
import pandas as pd

# The following method will do the following:a
#   1. Add a prefix to columns based upon datatypes (cat and con)
#   2. Convert all continuous variables to numeric (float64)
#   3. Convert all categorical variables to objects
#   4. Rename all columns with prefixes, convert to lower-case, and replace
#      spaces with underscores.
#   5. Continuous blanks are replaced with 0 and categorical 'not collected'
# This method will also detect manually assigned prefixes and adjust the 
# columns and data appropriately.  
# Prefix key:
# a) con = continuous
# b) cat = categorical
# c) rem = removal (discards entire column)

def add_datatype_prefix(df, date_to_cont = True):    
    import pandas as pd
    # Get a list of current column names.
    column_names = list(df.columns.values)
    # Encode each column based with a three letter prefix based upon assigned datatype.
    # 1. con = continuous
    # 2. cat = categorical
    
    for name in column_names:
        if df[name].dtype == 'object':
            try:
                df[name] = pd.to_datetime(df[name])
                if(date_to_cont):
                    new_col_names = "con_" + name.lower().replace(" ", "_").replace("/", "_")
                    df = df.rename(columns={name: new_col_names})
                else:
                    new_col_names = "date_" + name.lower().replace(" ", "_").replace("/", "_")
                    df = df.rename(columns={name: new_col_names})                    
            except ValueError:
                pass
    
    column_names = list(df.columns.values)
    
    for name in column_names:
        if name[0:3] == "rem" or "con" or "cat" or "date":
            pass
        if df[name].dtype == 'object':
            new_col_names = "cat_" + name.lower().replace(" ", "_").replace("/", "_")
            df = df.rename(columns={name: new_col_names})
        elif df[name].dtype == 'float64' or df[name].dtype == 'int64' or df[name].dtype == 'datetime64[ns]':
            new_col_names = "con_" + name.lower().replace(" ", "_").replace("/", "_")
            df = df.rename(columns={name: new_col_names})
    column_names = list(df.columns.values)
    
    # Get lists of coolumns for conversion
    con_column_names = []
    cat_column_names = []
    rem_column_names = []
    date_column_names = []
    
    for name in column_names:
        if name[0:3] == "cat":
            cat_column_names.append(name)
        elif name[0:3] == "con":
            con_column_names.append(name)
        elif name[0:3] == "rem":
            rem_column_names.append(name)
        elif name[0:4] == "date":
            date_column_names.append(name)
            
    # Make sure continuous variables are correct datatype. (Otherwise, they'll be dummied).
    for name in con_column_names:
        df[name] = pd.to_numeric(df[name], errors='coerce')
        df[name] = df[name].fillna(value=0)
    
    for name in cat_column_names:
        df[name] = df[name].apply(str)
        df[name] = df[name].fillna(value='not_collected')
    
    # Remove unwanted columns    
    df = df.drop(columns=rem_column_names, axis=1)
    return df

# ------------------------------------------------------
# Encoding Categorical variables
# ------------------------------------------------------

# The method below creates dummy variables from columns with
# the prefix "cat".  There is the argument to drop the first column
# to avoid the Dummy Variable Trap.
def dummy_categorical(df, drop_first = True):
    # Get categorical data columns.
    columns = list(df.columns.values)
    columnsToEncode = columns.copy() 

    for name in columns:
        if name[0:3] != 'cat':          
            columnsToEncode.remove(name)

    # if there are no columns to encode, return unmutated.
    if not columnsToEncode:
        return df


    # Encode categories
    for name in columnsToEncode:

        if name[0:3] != 'cat':
            continue

        tmp = pd.get_dummies(df[name], drop_first = drop_first)
        names = {}
        
        # Get a clean column name.
        clean_name = name.replace(" ", "_").replace("/", "_").lower()
        # Get a dictionary for renaming the dummay variables in the scheme of old_col_name + response_string
        if clean_name[0:3] == "cat":
            for tmp_name in tmp:
                tmp_name = str(tmp_name)
                new_tmp_name = tmp_name.replace(" ", "_").replace("/", "_").lower()
                new_tmp_name = clean_name + "_" + new_tmp_name
                names[tmp_name] = new_tmp_name
        
        # Rename the dummy variable dataframe
        tmp = tmp.rename(columns=names)
        
        # join the dummy variable back to original dataframe.
        df = df.join(tmp)
    
    # Drop all old categorical columns
    df = df.drop(columns=columnsToEncode, axis=1)
    return df

# Read the file
df = pd.read_csv(pathToData)

# Drop columns such as unique IDs
try:
    df = df.drop(cols_to_remove, axis=1)
except:
    # If columns aren't contained or no columns, exit nicely
    result = {'status': 404, 'message': 'Problem with columns to remove.'}
    print(str(json.dumps(result)))
    quit()
    
# Get the number of columns before hot encoding
num_cols_before = df.shape[1]

# Encode the data.
df = add_datatype_prefix(df)
df = dummy_categorical(df)

# Get the new dataframe shape.
num_cols_after = df.shape[1]


percentage_increase = num_cols_after / num_cols_before

result = ""

if percentage_increase > unreasonable_increase:
    message = "\"error\": \"Feature increase is greater than unreasonableIncreaseThreshold, most likely a unique id was included."
    result = {'status': 400, 'message': message}
else:
    filename = filename.replace(".csv", "")
    import os
    if not os.path.exists(pathToWriteProcessedFile):
        os.makedirs(pathToWriteProcessedFile)
        
    
    writeFile = pathToWriteProcessedFile + filename + "_encoded.csv"
    df.to_csv(path_or_buf=writeFile, sep=',')
    
    
    # Process the results and return JSON results object
    result = {'status': 200, 'message': 'encoded data', 'path': writeFile}
 
print(str(json.dumps(result)))

That’s the premise. I’ll be adding more services to as a series of articles.

Recording Brain Waves -- Mongo Database with a NodeJS API

Saving Brain Waves to Remote MongoDB by way of Node REST API

In this section I’m going to focus getting a remote Linux server setup with MongoDB and NodeJS. This will allow us to make POST requests to our Linux server, saving the EEG data.

I’m going to assume you are able to SSH into your Ubuntu 16 LTS server for this guide. You don’t have a server? No sweat. I wrote a guide on setting up a blog post which explains how to get a cheap Linux server setup.

1. Install MongoDB

SSH into your server. I’m assume this is a fresh new Linux install. Let’s start with upgrading the packages.

sudo apt-get update -y

I’ll be following the Mongo website for instructions on installing MonogoDB Community version on Ubuntu.

Let’s get started. Add the Debian package key.

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 9DA31620334BD75D9DCB49F368818C72E52529D4

We need to create a list file.

echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/4.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.0.list

Now reload the database

sudo apt-get update

If you try to update and run into this error

E: The method driver /usr/lib/apt/methods/https could not be found.
N: Is the package apt-transport-https installed?
E: Failed to fetch https://repo.mongodb.org/apt/ubuntu/dists/xenial/mongodb-org/4.0/InRelease  
E: Some index files failed to download. They have been ignored, or old ones used instead.

Then install apt-transport-https

sudo apt-get install apt-transport-https

Now, let’s install MongoDB.

sudo apt-get install -y mongodb-org

Voila!

2. Setup MongoDB

We still need to do a bit of setup. First, let’s check and make sure Mongo is fully installed.

sudo service mongod start

This starts the MongoDB daemon, the program which runs in the background and waits for someone to make connection with the database.

Speaking of which, let’s try to connect to the database

mongo

You should get the following:

root@localhost:~# mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
	http://docs.mongodb.org/
Questions? Try the support group
	http://groups.google.com/group/mongodb-user
Server has startup warnings:
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten]
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-09-02T03:52:18.996+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten]
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2018-09-02T03:52:19.820+0000 I CONTROL  [initandlisten]
---
Enable MongoDB's free cloud-based monitoring service, which will then receive and display
metrics about your deployment (disk utilization, CPU, operation statistics, etc).

The monitoring data will be available on a MongoDB website with a unique URL accessible to you
and anyone you share the URL with. MongoDB may use this information to make product
improvements and to suggest MongoDB products and deployment options to you.

To enable free monitoring, run the following command: db.enableFreeMonitoring()
To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
>

This is good. It means Mongo is up and running. Notice, it is listening on 127.0.0.1:27017. If you try to access the database from any network, other than locally, it will refuse. The plan, to have NodeJS connect to the MongoDB database locally. Then, will send all of our data to Node and let it handle security.

In the Mongo command line type:

quit()

And hit enter. This should bring you back to the Linux command prompt.

A few notes on MongoDB on Ubuntu.

  • The congfiguration file is located at /etc/mongod.conf
  • Log file is at /var/log/mongodb/mongod.log
  • The database is stored at /var/lib/mongodb, but this can be changed in the config file.

Oh, and one last bit. Still at the Linux command prompt type:

sudo systemctl enable mongod

You should get back

Created symlink from /etc/systemd/system/multi-user.target.wants/mongod.service to /lib/systemd/system/mongod.service.

This setup a symlink which will cause Linux to load mongod every time it boots–you won’t need to manually start it.

Next, NodeJS.

3. Install NodeJS and npm

Type

sudo apt-get install nodejs -y

This should install NodeJS, but we also need the Node Package Managers npm.

sudo apt-get install npm -y

Let’s upgrade npm. This is important, as the mind-wave-journal-server depends on recent versions of several packages that are not accessible to earlier versions of npm.

The following commands should prepare npm for upgrading, then upgrade.

sudo npm cache clean -f
sudo npm install -g n
sudo n stable
sudo n latest

Let’s reboot the server to make sure all of the upgrades are in place.

sudo reboot now

When the server boots back up, ssh back in.

Check and make sure your mongod is still running

mongo

If mongo doesn’t start, then revisit step 2.

Let’s check our node and npm versions.

node -v

I’m running node v10.9.0

npm -v

I’m running npm v6.2.0

4. Clone, Install, and Run the mind-wave-journal-server

I’ve already created a basic Node project, which we’ll be able to grab from my Github account.

If you don’t already have git installed, let’s do it now.

sudo apt-get install git -y

Now, grab the Noder server I built.

git clone https://github.com/Ladvien/mind-wave-journal-server.git
cd mind-wave-journal-server/

Install all the needed Node packages.

npm install

This should download all the packages needed to run the little server program I wrote to store the EEG data into the Mongo database.

Let’s run the mind-wave-journal-server.

node server/server.js

This should be followed with:

root@localhost:~/mind-wave-journal-server# node server/server.js
(node:1443) DeprecationWarning: current URL string parser is deprecated, and will be removed in a future version. To use the new parser, pass option { useNewUrlParser: true } to MongoClient.connect.
Started on port 8080

5. Testing mind-wave-journal-server with Postman

Now, we are going to use Postman to test our new API.

For this next part you’ll need either a Mac or Chrome, as Postman has a native Mac app or a Chrome app.

I’m going to show the Chrome application.

Head over to the Chrome app store:

add-postman-chrome-app

After you add the Postman app it should redirect you to your Chrome applications. Click on the Postman icon.

run-postman-chrome-app

Your choice, but I skipped the sign-up option for now.

skipped-signup-postman-chrome-app

Select Create a Request skipped-signup-postman-chrome-app

The purpose of Postman, in a nutshell, we are going to use it to create POST requests and send them to the mind-wave-journal-server to make sure it’s ready for the iOS app to start making POST requests, saving the EEG data to our Mongo server.

Let’s create our first test POST request. Start by naming the request Test eegsamples. Create a folder to put the new request in, I named it mind-wave-journal-server. Then click

create-request-postman-chrome-app

You will need to set the type as POST. The url will be

http://your_ip_address:8080/eegsamples

create-request-postman-chrome-app

No select the Headers section and add the Content Type: application/json

create-request-postman-chrome-app

Lastly, select Body, then raw and enter the following JSON into the text area:

{  
   "highBeta":5,
   "lowGamma":6,
   "theta":55,
   "lowAlpha":2,
   "highAlpha":3,
   "lowBeta":4,
   "highGamma":7,
   "blink":55,
   "attention":8,
   "meditation":9,
   "time":4
}

And then! Hit Send

create-request-postman-chrome-app

If all goes well, then you should get a similar response in the Postman response section

create-request-postman-chrome-app

Notice, the response is similar to what we sent. However, there is the additional _id. This is great. It is the id assigned to the by MongoDB when the data is entered. In short, it means it successfully saved to the database.

6. Now What?

Several caveats.

First, each time you restart your server you will manually need to start your mind-waver-journal-server. You can turn it into a Linux service and enable it. If this interests anyone, let me know in the comments and I’ll add it.

Second, notice I don’t currently have a way to retrieve data from the MongDB. The easiest way will probably be using Robot 3T. Like the first caveat, if anyone is interested let me know and I’ll add instructions. Otherwise, this series will stay on track to setup a Mongo BI connection to the database for viewing in Tableau (eh, gross).

Your Node server is ready to be called by the iOS app. In the next article I’ll return to building the MindWaveJournal app in iOS.

Recording Brain Waves -- iOS SDK Setup

Step 1: iOS App

I’m going to assume you have Xcode installed.

Step 1.1: Install CocoaPods

CocoaPods is a package handler for Xcode. We will be using it to install Alamofire, which a Swift library for making HTTP requests. We will need HTTP call support as we will call our server to store the EEG samples.

sudo gem install cocoapods

After you hit Return it will prompt for your password

cocoapods-installation

Step 1.2: Setup Xcode Project

Now, let’s setup a project folder. This is main folder where all the iOS app code will live. It’s a bad habit, but I usually put mine on the Desktop.

Open Xcode and select “Create a new Xcode proejct”

xcode-project-start

Then select “Single View App” and click “Next”

xcode-project-start

Let’s call the project MindWaveJournaler and click “Next” xcode-project-start

Choose your Desktop as location for the project and click “Create” xcode-project-start

Step 1.3: Development Environment Setup

You’ve created a Project Folder, but we have to setup the project folder to be used with CocoaPods. After, we will use CocoaPods to install Alamofire.

Back in the terminal, type:

cd ~/Desktop/MindWaveJournaler
pod init

This creates a Podfile in the root folder of our project. We can list CocoaPod packages in the Podfile and run pod install in the same directory, this will cause CocoaPods to install all the packages we listed.

Sadly, we are really only doing this for Alamofire right now. But, later, when we start building on to this app it will allow us to quickly access third-party frameworks.

Ok, back to typing:

open -a Xcode Podfile

This will open the Podfile for editing in Xcode. Now let’s insert the our desired pod information.

Copy information below and paste it into your file:

# Uncomment the next line to define a global platform for your project
platform :ios, '11.4'

target 'MindWaveJournaler' do
  # Comment the next line if you're not using Swift and don't want to use dynamic frameworks
  use_frameworks!

  # Pods for MindWaveJournaler
  pod 'Alamofire', '~> 4.7'

  target 'MindWaveJournalerTests' do
    inherit! :search_paths
    # Pods for testing
  end

  target 'MindWaveJournalerUITests' do
    inherit! :search_paths
    # Pods for testing
  end

end

You may notice the only changes we made were

platform :ios, '11.4'
...
pod 'Alamofire', '~> 4.7'

These lines tell CocoaPods which version of iOS we are targetting with our app (this will silence a warning, but shouldn’t be required). The other, is telling CocoaPods which version of Alamofire we’d like to use on this project.

Ok, now let’s run this Podfile.

Back in the same directory as the Podfile type:

pod install

You should see CocoaPods do its thing with output much like below.

cocoapods-installed-alamofire

Step 1.4: Install NeuroSky iOS SDK

NeuroSky has a “Swift SDK.” Really, it’s an Objective-C SDK which is “bridged” into Swift. Essentialy, this means we won’t be able to see what’s going on the SDK, but we can use functions from the pre-compiled binaries.

I’ve not been impressed with NeuroSky’s website. Or the SDK. It does the job, but not much more.

Anyway, the SDK download is annoyingly behind a sign-up wall.

Visit the link above and click on “Add to Cart”

neurosky-sdk-sign-up

Then “Proceed to Checkout”

neurosky-sdk-sign-up

Lastly, you have to enter your “Billing Information.” Really, this is only your email address, last name, street address, city, and zip.

(Really NeuroSky? This is very 1990.)

Eh, I made mine up.

Anyway, after your enter information click, then click “Continue to PayPal” (What? I just provided my information…) You should be rewarded with a download link. Click it and download the files.

neurosky-sdk-sign-up

Unzip the files and navigate lib folder

iOS Developer Tools 4.8 -> MWM_Comm_SDK_for_iOS_V0.2.9 -> lib

Copy all files from the lib folder into the main directory of the MindWaveJournaler project folders.

neurosky-sdk-lib

Step 1.5: Workspace Setup

CocoaPods works by creating a .xcworkspace file. It contains all the information needed to compile your project with all of the CocoaPod packages installed. In our case the file will be called MindWaveJournaler.xcworkspace. And every time you want to work on your project, you must open it with this specific file.

It can be a bit confusing because Xcode created a .xcodeproj file which is tempting to click on. xcworkspace

Go ahead and open the MindWaveJournaler.xcworkspace file. The workspace should open with one warning, which we will resolve shortly.

But first, another caveat. CoreBluetooth, Apple’s Bluetooth LE Framework, only works when compiled for and run on an actual device. It does *not work in the iOS Simulator.* Once upon a time it did, if your Mac had the hardware, however, my version of the story is Apple didn’t like having to support the confusion and dropped it.

eeg-apple-workspace

Moving on. Click on the yellow warning. Then click on the warning in the sidebar. This should create a prompt asking if you’d like to make some changes. This should automatically make some tweaks to the build settings which should make our project mo’ betta.

Click Perform Changes. eeg-apple-workspace-resolve-warning

This should silence the warning and make your project error free. Go ahead and hit Play button and let it compile to the simulator (we aren’t testing the Bluetooth, so it’s ok). Everything should compile correctly, if not, just let me know the specifics of your problems in the comments.

Step 1.5: Enable Secure HTTP Request

There are still a few tweaks we need to make to the Xcode workspace to get everything working.

First, open the ViewController.swift file and add import Alamofire right below import UIKit. If auto-complete lists Alamofire as an option you know the workspace is detecting its presence. Good deal.

Now, for Alamofire to be able to securely make HTTP request an option needs to be added to the Info.plist file. I scratched my head as to why the HTTP calls were not being made successfully until Manab Kumar Mal’s StackOverflow post:

Thanks, buddy.

Ok, following his instructions open up the Info.plist file in your MindWaveJournaler folder. Now add an entry by right-clicking and selecting Add Row. Change the Application Category to NSAppTransportSecurity and make sure it’s set as dictionary. Now, click the plus sign by the new dictionary and set this attribute as NSAllowsArbitraryLoads, setting the type bool, and the value as YES.

eeg-apple-workspace-add-secure-layer

Step 1.5: Setup Objective-C Bridge Header for MindWave SDK

There’s a few other bits of housekeeping, though. As I mentioned earlier, the MindwAve SDK is in an Objective-C precompiled binary. It is usable in a Swift project, but requires setting up a “bridge header” file.

Start by creating the bridge header file. Go to File -> New -> File...

bridge-header-file

Then select Header and click Next.

bridge-header-file

Name the file YourProjectName-Bridging-Header and make sure the file is saved to the same folder which contains the .xcworkspace file, then click Create.

bridge-header-file

The header file should automatically open. Copy and paste the following to the bottom of the header file.

#import "MWMDevice.h"
#import "MWMDelegate.h"
#import "MWMEnum.h"

My entire file looked like this once done.

MindWaveJournaler-Bridging-Header.h

//
//  MindWaveJournaler-Bridging-Header.h
//  MindWaveJournaler
//
//  Created by Casey Brittain on 8/3/18.
//  Copyright © 2018 Honeysuckle Hardware. All rights reserved.
//

#ifndef MindWaveJournaler_Bridging_Header_h
#define MindWaveJournaler_Bridging_Header_h


#endif /* MindWaveJournaler_Bridging_Header_h */

#import "MWMDevice.h"
#import "MWMDelegate.h"
#import "MWMEnum.h"

Let’s tell the Swift compile we have a header file. In Xcode go to Project File -> Build Settings -> All then in the search box type Swift Compiler - General (if you don’t include the hyphen and spaces it wont find it).

bridge-header-file

Double-click on the line Objective-C Bridging Header directly underneath the name of your project (see red box in image). Copy and paste the following into the box and click off to save the change.

$(PROJECT_DIR)/$(PROJECT_NAME)-Bridging-Header.h

This creates a relative path to your Bridging-Header file. In a little bit we are going to try to compile, if you get errors around this file not being found, then it’s probably not named per our naming scheme (YourProjectName-Bridging-Header) or it wasn’t saved in the same folder as the .xworkspace file. No worries, if you have troubles just leave me a comment below.

bridge-header-file

One last thing to do before we’re ready to code. We still need to import the MindWave SDK into our project.

bridge-header-file

Right click on your project file and select New Group. Name the group MindWave SDK. Now right click on the folder you created and select Add Files to "MindWave SDK".... Navigate to the lib folder containing the MindWave SDK and select all files inside it.

mindwave-sdk

When you add the SDK, Xcode should automatically detect the binary file (libMWMSDK.a) and create a link to it. But, let’s make sure, just in case. Click on your project file, then go to the General tab.

mindwave-sdk

It needs to be linked under the Build Phases tab as well, under Linked Frameworks and Libraries.

mindwave-sdk

That’s it. Let’s test and make sure your app is finding the SDK appropriately.

Open the ViewController file and under viewDidLoad() after the existing code, type:

let mwDevice = MWMDevice()
mwDevice.scanDevice()

Watch for autocomplete detecting the existince of the MindWave SDK

mindwave-sdk

Now for the true test, Compile and Run. But, before we do, please be aware–this will only work on an actual iOS device. If you try to run it in the iOS simulator it will fail. It actually fails on two accounts, first, CoreBluetooth will not work in the iOS simulator, second, the MindWave SDK binaries were compiled specifically ARM architecture.

Ok! Enough preamble. Connect and select your iOS device and hit Run.

mindwave-app-run

If all goes well you should see two things. A blank white screen appear on your phone and concerning message in the Xcode console.

corebluetooth-error-api-misuse

The CoreBluetooth error has to do with firing up the iOS Bluetooth services without checking to make sure the iOS BLE is turned on and ready to go. This is a good thing, it probably means the MindWave SDK has been foudn and is functioning properly.

If you get any other errors, let’s chat. I’ll help if I can.

This is part of a series, which I’m writing with care as I’ve time. I’ll get the next part out ASAP.

Recording Brain Waves to MongoDB

Description

This project takes brain wave readings from a MindWave Mobile 2+, transmits them to an iOS app via Bluetooth LE. The iOS app makes calls to a remote Node server, which is a minimal REST API, passing off the brain wave sample. The Node server stores the data on a MongoDB server. The MongoDB server is then exposed to business intelligence applications use with MongoDB BI Connector. Lastly, using Tableau Professional Desktop, the data is accessed and visualizations created.

Whew.

To recap:

The end result is a system which could allow a remote EEG analyst to examine samples nearly in real time.

eeg-visualization

Below, I’m going to show how I was able to setup the system. But, before that a few words of warning.

Gotchas

Hacker Haters

This isn’t a hacker friendly project. It relies on several paid licenses, an Apple Developer License ($99) and Tableau Desktop Professional ($10,000,000,000 or something). Of course, the central piece of hardware, the MindWave Mobile, is also $99, but I think that one is fair. Oh! Let’s not forget, even though you bought an Apple Developer license, you still need a Mac (or Hackintosh) to compile the app.

However, as a proof-of-concept, I think it’s solid. Hopefully a good hacker will be able to see how several tweaks in the system could make it dirt cheap to deploy.

Mimimum Viable Hack..er, Product

The source code provided here is a minimally viable. Fancy words meaning, only base functionality was implemented. There many other things which could be done to improve each piece of the system.

Not to be a douche, but please don’t point them out. That’s the only thing I ask for providing this free information.

There are many improvements I know can be made. The reason they were not made had nothing to do with my ignorance (well, at least a majority of them), but rather my time constraints.

I Hate Tableau

That’s it. I hate Tableau.

Getting Started

Let’s make a list of what’s needed before beginning this project.

Regarding the business intelligence platform–if anyone has a free suggestions, please leave them in the comments below. The first improvement I’d like to the entire system is to get away from Tableau. Have I mentioned I hate it?

Ok, let’s get started!