Trends of Homelessness, Rapid Rehousing, and Permanent Supportive Housing
Individuals Experiencing Homelessness
This graph shows the trend of those homeless in Tarrant County, week-to-week who meet the following conditions:
The person counted has stayed at least one night in a Night-by-Night shelter of the week counted.
Or the person counted has been contacted by Street Outreach of the week counted.
Or the person was active in an Entry / Exit shelter program within the week of the count.
Most likely the count is inflated approximately 33%, given there is a large known number of duplicates in the count. The software used to generate the data has no administrator option to merge duplicates. A request has been made for mass merger.
Active in Rapid Rehousing
Another trend found in the graph is a week-to-week count of those homeless who are active in a Rapid Rehousing (RRH) project.
The duplicate issue should not be as pronounced here, as even if a duplicate where created during the sheltered phase of a participant’s stay in homelessness, then only one of the pair would be enrolled into the housing project. Therefore, enrollment into housing is a natural filter.
Active in Permanent Supportive Housing
This trend is similar to the RRH trend.
Notice the line is flat. This is to be expected, as entry and exits are rare in Permanent Supportive Housing projects.
This graph relates to the Trends of Homelessness, Rapid Rehousing, and Permanent Supportive Housing graph. It looks at the last week of the same data. Of those participants who are still actively homeless (and therefore eligible for housing), what sorts of barriers do these individuals face. HUD refers to these groups of individuals with particular difficulties as “subpopulations.”
It is important to understand these barriers are not mutually exclusive. For example, Jane could report both a Mental Health Problem and Substance Abuse Disorder and she would therefore be counted in both sub-populations.
The three are categories defined as follows:
Eligible for Rapid Rehousing are individuals who are actively in a homeless situation and are not met the chronically homeless threshold.
Eligible for Permanent Supportive Housing are individuals who are actively in a homeless situation are have met the threshold of chronically homeless
All Eligible for Housing is the sum of both Eligible for Rapid Rehousing and Eligible for Permanent Supportive Housing
It should be noted, Eligible for Rapid Rehousing and Eligible for Permanent Supportive Housing are mutually exclusive. Therefore, the All Eligible for Housing is an accurate count save the duplicates described above.
These data are essential to local governments helping individuals experiencing homelessness to be housed quickly and appropriately.
But one area R and SQL have not delivered is on-line interactive dashboards. Data is one thing, but easy to digest information is really key to informing stakeholders how the system is working to end homelessness.
In other projects I’ve attempted to generate graphs as images and upload to a static link. Then, each time the data change re-generate replace the image. But, most website servers cache the images so it is not ideal.
I’m not going to lie, I’ve felt confused by languages, IDEs, and libraries. And I’ve overcome most of the these challenges. But I’ve never been so confused as by the layout and syntax of D3. The dyslexic feeling I get trying to work in D3 has discouraged me from spending too much time on it.
But recently I decided to take another stab at it– this time I lucked out and found the C3.js.
I did run into a one hiccup in setup. It seems the most recent version of d3 (version 4.0) has had much of its API overhauled. In such, it will not work with C3. But D3 v3 is still available from the D3 CDN:
Calling this library and following the instructions outlined by the C3 site, you can be generating graphs in little time.
Now that I’ve the ability to use R and SQL to sort through my data, and I could quickly generate graphs using D3 and C3, it’d be really nice if a lot of this could be automated. And luckily, I’d run into a few other tools which made it pretty easy to replace the data on my C3 graphs.
Rsync is primarily a Linux tool, but it is available on Windows as well. It is nice since it will allow you to quickly reconcile two file-trees (think of a manual Dropbox).
It will also allow you to sync a local file tree with a server file tree across an SSH connection. For example, I use the following command to sync the data mentioned above to the server
After running this command it will prompt for a password to access the server. Then, it will proceed to sync the two file-trees. Nifty!
This allows me to quickly update the data on the graph. Now, if only there were a way to automatically insert my password, then I could write a script to automate the whole process.
Python Keyring is a tool which allows you to save and retrieve passwords from your PC’s keyring.
It has worked well so far. However, I’ve not fully tested the Bluetooth LE hardware on the custom kernel.
One of the issues I’ve had is not being able to connect to new hotspots headlessly. Usually, you’d boot the rp0w connected to a monitor, keyboard, mouse, and edit wpa_supplicant.conf directly. But what if you want to go into a new location with only your laptop and the rp0w. How would you add the wifi credentials to the rp0w without a monitor, etc.
For awhile, I tried to get the ethernet gadget setup to work on the rp0w without any luck. I think the problems relates to trying to use the gadget hardware on a Mac rather than a Windows machine.
In the end, I decided I would add a script which would do the following:
Mount the /boot partition (which is editable through PC’s SD card reader).
Look for a file on the /boot called “wpa_supplicant.txt” and copy it to the /etc/wpa_supplicant.conf
Look for a file on the /boot called “interfaces.txt” and copy it to the /etc/networks/interfaces
Remove the /boot directory
I saved this script in /root as wifi_setup.sh. I then added a call to it in /etc/rc.local
Here’s the wifi_setup.sh
This has let me add a new network from my laptop with merely an SD card reader.
In this work challenge we will combine Client and Enrollment data, then we will add a non-HMIS dataframe which contains user account information. After these data are merged, we will then parse them for missing data elements and provide a by-user list of data errors.
As stated above, the data needed are:
A dataframe containg HMIS user contact info.
The key to this challenge literally is at the end of every HMIS CSV. Each exported CSV contains some metadata which describes how the data were produced.
The DateCreated should represent when the respective row was actually entered into the HMIS. DateUpdated is the last time that row was modified and saved in the HMIS. The UserID is the case-manager who last modified these data. Lastly, the ExportID is the number which identifies a collection of HMIS CSVs to be in the same batch.
We are going to focus in on the UserID element. Notice, you will not find the usernames, real names, email address, or really any contact information for individual HMIS users. However, having a unique user ID in each CSV would still allow HUD to use internal validity tests to determine the reliability of the user.
For us, we are going to take another source of data containing all of the UserIDs and contact information for the users. Now, this will probably be different each HMIS software vendor. But each vendor should have a way to export a list of the users in the system with their UserID, which will allow us to join these data to the HMIS CSVs.
For those participating in the work challenge from my CoC, I’ll provide a CSV with these user data.
After actual user names are joined to the CSVs, then we will begin to parse the CSVs for data errors. If you aren’t yet familiar with the term parse in computer science, think of it as diagraming a setence where we make the computer do all the work. Instead of a sentence, we will be diagraming a row of data to determine if there are any errors.
What’s an HMIS Data Error?
The HMIS Data Dictionary is specific about what a data error is.
8 – Client doesn’t know
9 – Client refused
99 – Data not collected
Here’s an example of a Client.csv which contains one of each type of error.
Here are the data errors:
Tesa is first name blank
Sarah’s DOB is blank
Fela’s SSN is an incomplete response (must be 9 digits)
Sarah’s SSN is non-determinable
Sarah’s DisablingCondition was not collected.
Tesa refused to provide a VeteranStatus.
We are going to take a HMIS data and join it with a dataframe containing end-user information. Then, we will create a query to subset the dataframe so we get a dataframe which contains only rows with data errors. Lastly, we will get counts of the types of data errors and the names end-users who’ve caused the most data errors.
The data elements we will look into for errors:
To get this information we will need to do the following:
Load Client.csv, Enrollment.csv, and Users.xlsx
Left join the clientDf and enrollmentDf.
Left join the usersDf to the result of step 2.
Parse the data elements listed above for data errors
Create a dataframe which contains only rows with data errors
Use the SQL Count function to count the number of data errors by the element list above.
Use the SQL Count function to count how many times a end-users name is associated with a row containing errors.
Create a dataframe of these counts
Save the dataframe containing the error counts into an Excel file (.xlsx)
Below are the resources which should help for each step:
We’ve worked a bit with Comma Separated Values (.csv) files, but it they aren’t the only way to store data. There are a lot of data storage formats, each with its strengths and weaknesses. One of the deficits of the CSV format is it cannot store formatting or graphs. This is the reason Excel format (.xls or .xlsx) has become another industry standard.
Excel is a program created by Microsoft to allow people to easily work with spreadsheets. With it, they created a way of storing data which allows for formatting and other information to be included. In fact, Excel documents have become so sophisticated programmers can include entire programs within the document. This is the reason you’ll often get the “Enable Content” button when open Excel document. That means there is some code embedded in the Excel document which will run if you say “Enable”. (Be careful, malicious programmers can insert code which could royally blink up your computer.)
When working with HMIS data being able to load and write Excel documents is necessary. Unfortunately, it adds a lot of complexity.
There are several R libraries which will allow us to work with Excel documents in R. They have different strengths, therefore, I’ll focus on two libraries, rather than one.
Installing either of these libraries should be as simple as running the following code:
However, there are lots of ifs. Both of these libraries rely on the rJava library. Unfortunately, there is often some mismatch of computer architecture. What does that mean? Well, often you’ll install R for amd64, but rJava is easiest to get working with R for i386.
Just know, RStudio has a way to set the version of R you are using by going to Tools then go to Global Options. If you are in Windows, at the top of the R General section you will see the option to change your R version. If you are having difficulty getting the above Excel documents working, try switching the R version to i386. (Don’t forget to restart RStudio after switching.)
Past this, I’d be more than happy to help you troubleshoot. Just leave a comment below or shoot me an email. However, it can get pretty hairy–especially on a Mac.
Working with XLConnect
Now days, I only use XLConnect to load data from Excel sheets. I’ve just been too lazy to re-write all my code to use one library (which would be openxlsx). It’s my opinion the reason to use XLConnect is it’s a little easier to understand how it loads data. Its weakness is it doesn’t have as much flexibility in formatting Excel documents to be saved on your computer. And it can be confusing to save Excel sheets.
Loading Data from Xlsx Documents
Loading data using XLConnect is a little different than using the read.csv function. Like I stated earlier, Xlsx documents contain other information besides data. One critical piece of information is the sheet number.
Unlike CSVs a single Excel document can contain multiple spreadsheets. Each of these sheets will be broken out in tabs when you open an Excel document
XLConnect doesn’t make any assumptions, it wants you to tell it which sheet you’d like to load.
Here’s how to load an Excel document, the first sheet, in XLConnect:
It is similar to the read.csv() function, but notice the file in the path refers to VI-SPDAT v2.0.xlsx? You want to make sure your file format is either .xlsx or .xls as the readWorkSheetFromFile() function only works with Excel documents.
Also, there are two other parameters. The first, sheet = 1 is telling XLConnect to read in only the first sheet. Just know, you could set it to whatever sheet number you’d like. And for reference, the sheets are 1, 2, 3, 5…etc., left to right when opened in Excel. So, even if your sheets have different names XLConnect will still load the data respective to their numerical order.
The second parameter is startRow = 1. This allows you to tell R where to start the dataframe. For example, if you had a header in your Excel document which didn’t contain data.
We could skip down to row three, where the column headers are, by telling XLConnect startRow = 3.
Writing a Dataframe to Excel Document
Writing Excel documents are a little more complex–and one reason I’m not a huge fan of XLConnect.
Here’s how you’d write an Excel file:
After running this code you should have a file called People.xlsx in your working directory (remember, getwd() will tell provide the working directory). If you open this file, it should look something like this:
This looks a little complex, but it’s just because XLConnect makes it look complex. Here’s what it is is doing:
A workbook is created, which is a place where worksheets can be stored.
myPeopleWorksheet is created inside the workbook created above. The sheet is called “My People”
The worksheet has our peopleDf added to it, then it is saved as a file called “People.xlsx” in our working directory.
Like I said, it’s a lot of unneeded complexity, in my opinion.
Why use Excel Documents
After the added complexity of reading and saving Excel documents you might wonder what the benefit is? Great question.
As stated at the beginning, Excel documents can contain other information besides just data. It contain formatting, images, graphs, and a lot of other stuff. And one of the reasons for writing report scripts is to automate all redundant tasks.
Imagine, you’ve got a data set of 12,000 participant enrollments. You want to create a spreadsheet which puts the enrollment in descending order. And you want to create this report daily.
If you used the write.csv() you would need to open the CSV after creating it, then manually add the sort to the document, save it as an Excel file, then send it out. I guarantee, after doing that for several weeks you are going to want to find a way to automate it. Especially, if you decide the headers need to have font size 18 as well.
Excel documents allow us to store the formating tweaks and XLConnect allows us to insert them automatically.
Adding formatting can get a little more complex and will be the focus of another article. Also, we will use openxlsx as it is much easier to output formatting, again, just my opinion.