Churning Data into Information
I work with a lot of data on the behalf of an agency without a lot of money. Exploring free-to-use and open-source tools is key to being effective in my job.
Recently, I’ve written a a couple of series on how to use R and SQL to sort through Homeless Management Information System data.
These data are essential to local governments helping individuals experiencing homelessness to be housed quickly and appropriately.
But one area R and SQL have not delivered is on-line interactive dashboards. Data is one thing, but easy to digest information is really key to informing stakeholders how the system is working to end homelessness.
In other projects I’ve attempted to generate graphs as images and upload to a static link. Then, each time the data change re-generate replace the image. But, most website servers cache the images so it is not ideal.
This has pushed me to try to learn D3.
I’m not going to lie, I’ve felt confused by languages, IDEs, and libraries. And I’ve overcome most of the these challenges. But I’ve never been so confused as by the layout and syntax of D3. The dyslexic feeling I get trying to work in D3 has discouraged me from spending too much time on it.
But recently I decided to take another stab at it– this time I lucked out and found the C3.js.
Essentially, C3 is a library which greatly simplifies D3. It boils down building a graph into a set of options passed to the C3 graph builder as a JSON object.
Using this CSV:
Produces the following graph:
I did run into a one hiccup in setup. It seems the most recent version of d3 (version 4.0) has had much of its API overhauled. In such, it will not work with C3. But D3 v3 is still available from the D3 CDN:
Calling this library and following the instructions outlined by the C3 site, you can be generating graphs in little time.
Updating Data Securely and On Schedule
Now that I’ve the ability to use R and SQL to sort through my data, and I could quickly generate graphs using D3 and C3, it’d be really nice if a lot of this could be automated. And luckily, I’d run into a few other tools which made it pretty easy to replace the data on my C3 graphs.
Rsync is primarily a Linux tool, but it is available on Windows as well. It is nice since it will allow you to quickly reconcile two file-trees (think of a manual Dropbox).
It will also allow you to sync a local file tree with a server file tree across an SSH connection. For example, I use the following command to sync the data mentioned above to the server
After running this command it will prompt for a password to access the server. Then, it will proceed to sync the two file-trees. Nifty!
This allows me to quickly update the data on the graph. Now, if only there were a way to automatically insert my password, then I could write a script to automate the whole process.
Python Keyring is a tool which allows you to save and retrieve passwords from your PC’s keyring.
It is compatible with:
- Mac OS X Keychain
- Freedesktop Secret Service (requires secretstorage)
- KWallet (requires dbus)
- Windows Credential Vault
If you have Python installed you can install the Keyring tool with Pip:
After, you can store a password in the keyring by using the command-line tool. You will need to replace
username with the name of your server login.
And retrieve it with:
This is great. It means we can store our password in the keyring and retrieve it securely from a script.
Great! Now we could write a script to have Rsync sync the any data changes locally with the server. Right? Well, almost. We needed one more tool.
There is a problem with using Rsync to sync files remotely from a script. When Rsync is called from a script it will not wait for parameters to be passed to the tool. Sigh.
Luckily, I’m not the only with this problem and a tool was created to solve this problem.
If you are on a Mac you’ll need to use Brew to install SSHPass.
There we go! Now we can automate the whole process.
I wrote this script to do the dirty work:
Ok! One last bit of sugar on this whole process. Let’s create a Cron job. This will run the script in the background at an interval of our choosing.
For me, I’ve a staff who pulls data and runs a master script every Monday. So, I’ll set my automated script to update my C3 graph data on Tuesday, when I know new data is available.
You can use Nano to edit your Cron job list.
To run a Cron job on Tuesday we would set the fifth asterisk to 2.
And don’t forget to make the
I’m a hacker hacking with a hacksaw!