What is an incremental key and why do I need one?

Technically speaking, an incremental key is a field that specifies the update time of a row or a running number that states the most up-to-date rows in comparison to others. This field can be used to collect only the most up-to-date rows, saving vast amounts of time in the collection process and improving performance.

Incremental keys are great for providing shorter update times and faster scheduling. Panoply uses an incremental key to only pull the information that was updated since the last pull and then only updates the rows that have been changed. This means that we’re only updating the rows that have been changed, and not writing the same data to the table.

In most sources, incremental keys are already included, so there’s no need to make any changes. However, in some cases with some data sources, you might need to select your own incremental key to benefit from its advantages. Otherwise, Panoply will pull all of the data from within the source with every refresh.

A couple of practical tips for using incremental keys with panoply connectors:

  1. Incremental keys need to be added manually for any datasource that is not an API–so if it’s mongo, postgres, mysql, SQL server, Cassandra, JSONL, or a CSV file you must specify these manually. Always. Panoply does not infer these automatically for these data sources.
  2. For APIs, Panoply may or may not infer these automatically. If an option to specify an incremental key does not appear in the UI when you are setting up the data source, it likely means the API does not support incremental loads.
  3. If you do not see a place to specify an incremental key and you believe the API does support incremental loads, please contact us!
  4. Incremental keys work on a per-job basis so if you have two tables with different incremental keys (say “updated_at” vs “updatedat”, you will need to define two separate connections to distinguish these two keys)