A Reading in SIS Forms
The code for this report is specific to data logged in the G-324A-19 form and the incident summaries portion of that form. The incident summary portion of this form is probably the most complex data structure in the project and also contains the largest volume of quantitative information. Therefore, it seemed like a good place to start with an initial proof of concept.
At the outset, there were issues reading in the Google Sheet. Several of the numeric columns read into R as lists which was undesirable. In the Google Sheet, Craig set the columns to plain text
rather than auto
and the data are read into R as character columns. This seemed to resolve the issue for a time, and it was possible to use the type_convert()
function from the readr
library (Wickham, Hester, and Bryan 2022) to convert these columns to numbers. The col_types
argument of the read_sheet
function of the googlesheets4
library (Bryan 2021) was used to explicitly set the column types to character. After applying the clean_facility_names
custom function, additional transformations like converting facility to a factor are applied.
There were some issues using lubridate
(Spinu, Grolemund, and Wickham 2021) to wrangle the month and year fields to a proper date. Craig was not able to discern the cause of the issue. However, changing the month format on the incident sheet from abbreviated to fully written out months solved the issue. Those changes were made in the master Google Sheet and the data were called using the googlesheets4
library.
At present 382 SIS and inspection cover letter combinations out of approximately 300 inspections are complete. The completed inspections range in time from 2019-09-06 to 2021-07-01. Students are now working on older SIS forms which are structured differently. The preliminary EDA in this document is restricted to the more current SIS forms which were first used in May 2019.
A.1 Summary Tables
Summary tables help to provide an overview of how many instances occur within particular category of data or how frequent a particular issue is recorded at a given facility. Summary tables are generated using a combination of group_by
with summarize
with the results piped to kable
and kableExtra
. This produces results similar to a “pivot table” from Excel.
A.2 Facet Plots
Facet plots partition each portion of a plot into a matrix of panels. These plots are produced for several categories of data where each panel represents a facility for a given variable reported in the SIS form. Within each category, columns are pivoted longer and plotted by date. Graphing is done using the ggplot2
library (Wickham, Chang, et al. 2021) with the facet_wrap
function to provide a means to compare multiple facilities simultaneously. Such a plot can help identify trends and guide more specific questions.