This article is part of a series.
- Part 1 - HMIS, R, and SQL -- Introduction
- Part 2 - HMIS, R, and SQL -- Basics
- Part 3 - Preamble to Mixing R and SQL
- Part 4 - HMIS, R, SQL -- Work Challenge One
- Part 5 - Read and Write CSVs in R
- Part 6 - Working with Dates in SQLdf
- Part 7 - Filter to Most Recent HUD Assessment
- Part 8 - This Article
- Part 9 - Comparing Values in R and SQL
- Part 10 - Working with R Excel Libraries
- Part 11 - HMIS, R, SQL -- Work Challenge Three
- Part 12 - HMIS, R, SQL -- Work Challenge Four
- Part 13 - SQL CASE and R Paste
Providing Chronically Homeless List
With this work challenge we are going to take the concepts we’ve learned from the first challenge and build on them. We will combine two dataframes derived from Client.csv and Enrollment.csv. Then, we will apply HUD’s formula to get a by-name-list of those who are chronically homeless.
The current definition of chronically homeless is found in HUD’s federal register:
A “chronically homeless” individual is defined to mean a homeless individual with a disability who lives either in a place not meant for human habitation, a safe haven, or in an emergency shelter, or in an institutional care facility if the individual has been living in the facility for fewer than 90 days and had been living in a place not meant for human habitation, a safe haven, or in an emergency shelter immediately before entering the institutional care facility. In order to meet the “chronically homeless” definition, the individual also must have been living as described above continuously for at least 12 months, or on at least four separate occasions in the last 3 years, where the combined occasions total a length of time of at least 12 months. Each period separating the occasions must include at least 7 nights of living in a situation other than a place not meant for human habitation, in an emergency shelter, or in a safe haven.
There are several data elements which will be needed for us to calculate whether someone is chronically homeless. These data elements are reported to case-managers and entered into a HUD Entry Assessment when a client enters a program.
Here’s a list of the data elements we will use:
All of the above data elements are found in the Enrollment.csv. Therefore, similar to the last Challenge, we will need to join the Client.csv and the Enrollment.csv.
We’ve covered how to get all data from CSVs into one dataframe using joins. This Challenge will build on that skill. The new concepts here will be combining logic to get to a specific answer.
In SQL we will use the following logic operators:
- IS (==)
- NOT (!=)
- AND (&&)
- OR (||)
- > (greater than)
- < (less than)
For example, let’s take the chronically homeless definition and turn it into something a computer can understand using these logic operators. We can do this by re-writing the definition several times, each time dropping what makes sense to humans and leaving what makes sense to computers.
For example, this should make sense to most humans.
A chronically homeless individual is disabled and been homeless greater than 364 days. Or, is disabled and been homeless greater than three times in three years and the time spent in homelessness adding up to greater than 364 days.
That paragraph seems a little hard to read, right? But still, humans should be able to understand it. Now, let’s look at the same paragraph emphasizing the logic operators.
A chronically homeless individual IS disabled AND been homeless GREATER THAN 364 days. OR, IS disabled AND been homeless GREATER THAN three times in three years AND the time spent in homelessness adding up to GREATER THAN 364 days.
This is skill of a Computational-Thinker, taking a definition like HUD provided and re-write it from something a human would understand into something a computer will understand.
The next step is re-writing the paragraph in something called pseudo-code.
This helps us make sure everything is in place to feed to the computer. The next step will be actually writing the SQL code.
Below is the following code to get chronically homeless:
This may look overwhelming, but that’ll be the purpose of this week’s Challenge, to demonstrate this is code is actually pretty simple when broke down into its basic parts.
That’s the real lesson here, every complex question may be made extremely simple when taken once piece at a time. The power of computational-thinking is extraordinary.
We are going to merge the two data sets and to discover the following:
- A list of individuals who are chronically homeless.
- Export this list to an Excel document.
To get this information we will need to do the following:
- Load the Client.csv into the dataframe clientDf.
- Load the Enrollment.csv into the dataframe enrollmentDf.
- Inner join the clientDf to enrollmentDf.
- Calculate whether someone is chronically homeless.
- Filter to those who are chronically homeless.
- Write the by-name-list of individuals to an Excel document.
Below are the resources which should help for each step:
Step 1 & 2
- R Programming A-Z – Video 41 – Loading and Importing Data in R
- R Programming A-Z – Video 21 – Functions in R
- Read and Write CSVs in R
- The Complete SQL Bootcamp – Video #51 – Overview of Inner Joins
- The Complete SQL Bootcamp – Video #52 – Example of Inner Joins * HMIS, R, SQL – Basics
Step 4 & 5
- HMIS, R, SQL – Queries
- HMIS Data Dictionary
- Query Results Using Boolean Logic (all sections and videos)
- Writing Excel Workbooks – Tutorial Coming