Symphony Care Network: Tackling COVID-19 with Panoply and Power BI

Panoply FTW (genuine)!

We just became Panoply customers in March. With this hackathon, our goal was to show how we could pull in data from multiple sources extremely fast. Because this was a real-life data set, we needed to be able to create a fully functional, live solution in just a few days. The challenge for us is that we needed to make use of both incremental loading and truncation of tables. For this project, we are collecting data from three sources:

  • Amazon S3 Bucket (initial loading of tables > 100 MB)
  • Microsoft SQL Server (about five different tables)
  • Google Sheets (two separate sheets with multiple tabs)

The first two data sources on the list are relatively straightforward. Most of our tables are over 100 MB, so we need to load them through S3 (Panoply has a 100 MB limit on the File Upload data source).

The Google Sheets are a different story. Because our Electronic Medical Record (EMR) vendor was slow to create any kind of tracking forms for BOTH patients AND employees, we had to create our own temporarily. There were a couple of tricks to getting the Google Sheets in the correct format. First, the column headers need to be in the first row (sort of a no-brainer, but some tools let you specify the header row). The second issue was the users accidentally (or intentionally) changed the row size of some of the rows. Panoply didn’t like that, and we received a few failed data collection messages. We ran a quick script to put all the rows at the same size, and then Panoply was able to load them successfully:

image

The Google Sheets documentation (https://panoply.io/docs/data-sources/google-sheets/) comes in pretty handy in understanding the collection process and how the data will appear in tables once loaded.

Now that the Google Sheets are loaded, we turn our attention to the S3 loads. We used Alteryx Designer to load the data directly from the Microsoft SQL Server database table to the Amazon S3 bucket. Below is a sample of the initial full load of all records with no filtering or formatting applied:

image

We loaded patient diagnoses, census, patient profiles, and COVID-19 assessment form answers. After the initial load, we created Microsoft SQL Server data sources to perform incremental loading on the same tables we just loaded via S3:

Update: The primary key in Panoply should be entered within curly brackets. So instead of client_diagnosis_id it should be entered as {client_diagnosis_id}. Note that in the post we used “id” instead. Thanks to @alon for correcting this setting.

Two updates that we had to make at this point was that the DateTime data type coming over from SQL Server was causing an error in the incremental loading. We went back to the SQL we used to load the table and created a TimeStamp instead:

image

Panoply creates an “id” field by default, but if you include a field named “id” in the load, you can indicate that field as the primary key which will be used to update changed records during incremental loading:

image

We are going to incrementally load all of our datasets except the COVID-19 database tables. We’ll perform full loads each time on the coronavirus data since the row count will be less than 10k rows, and the underlying schema may change. In addition, we’ll place this table in a new schema called “covid”:

If you’ve never created a new schema before, you can issue this command in the Workbench:

image

The last step is to schedule the jobs to run hourly:

image

Now that we’ve got all the data sources lined up and ready to go, we can start to build out the visualization. Here’s a snapshot of a few of the final data sources:

image

“I have the Power”… BI


Note: Due to privacy rules (patient names and dates have been blurred in the screenshots)

We built three main functional reports inside Power BI:

  • Patient Tracking of COVID-19 & Employee Tracking of COVID-19
  • Patient Vitals Screening
  • COVID-19 Assessment Form List ( the form is completed on each shift for patients that have a lab-confirmed positive case, are showing symptoms, or have been exposed to others that are COVID-19 positive. )

The patient and employee tracker both have very similar designs. The data source for these reports is Panoply via the Google Sheets integration. We have dubbed these dashboards “Powerboards” because they are functional, with filters and slicers that allow the clinical team to select patients by facility, date, and test result. There are four summaries present indicating the total positive tests for the selected cohort along with the number sent to the hospital (as opposed to this still in-house), the total patients tested, and the total patients being tracked (regardless of those filtered).

As mentioned in the previous section, this data is pulled hourly into Panoply. The top right corner of the Powerboard contains a card showing the last time the report was refreshed. The card is an important indicator and will be discussed more in the “Hack” section below.

The table has a default sort by facility and patient but is customizable by the user. The primary columns of interest are the symptoms, test result, and if the patient was sent to the hospital.

image

The second functional report is the Patient Vitals Screener. The data source for this report comes through Panoply via the SQL Server data sources. The incremental refreshing is key to timely refreshes of the data. The vitals table has over 200 million rows of data, and Panoply is set the incrementally refresh this data hourly. This report shows the critical vitals related to a coronavirus diagnosis: temperature, respiration, and oxygen saturation (O2 sats).

image

After consulting with the clinical team, they wanted the ability to see the vital sign details over time. In this report, users can right-click on any of the rows and drill-through to a second report that shows temperature and respirations on a timeline chart.

image

The third and final interactive visual is the COVID-19 assessment form list. The form is completed each shift for patients under surveillance. Having access to this data provides the clinical team with the ability to monitor the progression of each patient and potentially treat symptoms in near real-time.

image

A key feature of the report is a link back to our web-based EMR for a quick review of the full assessment. The visual contains five key indicators: O2 sats, Respirations, Temperature, Breath Sounds (for each lung), and Cough. Each value is the most recent value from the assessment. The report also contains conditional formatting for several of the fields. The format provides a visual guide to those values that are falling out of range. Another indicator is the “Errors” card, which shows how many assessments have failed validation and need to be corrected. Clicking on the card navigates users to a list of errored out assessments.

image

Again, a link is provided to navigate directly to the assessment. The user can correct the form and re-submit the assessment.

Putting the “Hack” in Hackathon

There were two game-changing features that we used in Power BI to make the assessment report and vitals reports to work flawlessly. First, the assessments are loaded into Panoply as one row per answer. What that means is every temperature, O2 sats, breath sounds, and cough are an individual row for each patient. To get the data into a columnar format, we used the pivot function in Power BI. Here’s a sample for the data as rows:

image

Now we add in the Pivot function to get the data into columns grouped by the Assessment ID:

image

The second issue we faced is that, although we have incremental loading in Panoply, we load all the vitals in Power BI for each patient. So we needed a way to incremental load in Power BI as well. Thankfully, there is a way to do an incremental refresh in Power BI.

The abbreviated version of the instructions is to (1) create parameters in Power BI that have a (2) start and end date, then use those dates as (3) filters for the data sets that you want to incrementally refresh. The changes reduced our load time from 35 minutes down to about 90 seconds.

The last “hack” was dealing with the filthy, dirty data that came from the Google Sheets. We made heavy use of Power Query within Power BI to clean as much of the data as possible:

Here’s where the “last updated” card came into play. We did experience a few Power BI load failures do due to bad data. The last updated card helped (along with alerts) in notifying the team that the data was not refreshed and that we needed to update the Power BI cleaning steps.

Wrap it up!

The speed with which we were able to build out an end-to-end solution was stunning (about four days). The integrations between each component, the data sources, Panoply, and Power BI, let us quickly adjust and make changes on the fly without having to go back and redo our work. As we move forward through the coronavirus pandemic, the team will continue to build on the success of this project as we move more of our data to Panoply. Next up, all of our timekeeping data in Kronos!

2 Likes


Nice work @NPT! Thanks for the detailed walkthrough!

Thanks Trevor! Love the GIF!

~ Nathan

Wow 200 million rows 0_0 By the way, I think you can upload files over 100MB through Google drive also.
Anyway, glad all that data is being put to use. Thanks for your work!

Thanks Jonathan! :+1:

~ Nathan