Troubleshooting GA data source



I’ve created multiple GA data sources for different GA accounts, and I’m seeing across the board that I’m getting 0 values for one measure and many duplicate values for several measures. In general, three measures have lower values than I’d expect. Any suggestions for troubleshooting? I was told to ignore the primary key set up originally, should I adjust that?


One possible source of unexpected results is sampling, which happens depending on how many records we are fetching from GA. We avoid sampling as much as possible. We do that by slicing the date range as long as we get sampled data back. We keep slicing to the smallest date range possible, which is a single day. If the single day API call returns sampled data, we keep it as is and mark the specific rows with __sampled = 1 in the table. Changing the date range of the data source will not change the result.

One way to rule out sampling is to first truncate the table then load only 7 days of data. If numbers appear correct then issue is (almost surely) sampling. If numbers are the same, then we should take a closer look at the reports in GA to understand what is being generated. It could be that these numbers are in fact all that we can get from GA.

Barring sampling, we usually get exactly what is requested of Google. The next big “gotcha” is knowing that different combinations of metrics and dimensions will affect the return results of GA reports in relatively unpredictable ways, so you need to ensure the data source in Panoply has exactly the same metrics and dimensions selected as the chart/report in GA.

The surest way to ensure this is using this interface from Google: You can sign in there, input your metrics and dimensions and see what shows up. This is what the GA reports UI uses on the backend and is also what we are querying, so it is a good source of truth.