Hey y’all. from Panoply’s Data Architect team.
We recently had a client who was migrating data into Panoply from MongoDB and we wanted to share some tips.
Practical tuning steps for importing data from MongoDB
- Be sure to check the frequency of your import schedule.
In the screenshot below, you will notice many jobs report finishing ~13 hours ago. A clump like this may indicate that all jobs are set to run once daily at the same time.
You should increase the frequency to every few hours or even every hour. To do this, click the clock icon and choose your frequency.
Set the “incremental key” field for the mongo data source. This appears under the “advanced” options. See screenshot below for appearance. The key is the name of the attribute that is used for the incremental update. It is specified using dot notation syntax. Read more on incremental keys here: https://panoply.io/docs/manage-data/incremental-key/
Exclude any unused columns. Columns take time to process, especially if they are long strings (var char(256) for example), URLS, that trigger our parser or IPs that trigger geo-ip lookups. Exclusions are done in the advanced section of the data source definition using dot notation syntax.
Break up the collected tables into multiple data source definitions. This will increase parallelism in import over using a single definition to collect all tables.