Identifying Parent-Child Table Relationships the Easy Way

If you’ve ever ingested “nested” data into Panoply (Zendesk, Hubspot, Salesforce, and Shopify for example), you know that we store the nested data in subtables. To get the most out of your data you have to transform your data into views that make it easy to work with. Here is a query you can use to show the relationships between parent and child tables and the corresponding keys you can join the child table to the parent table.


SELECT 	child_table, c_id.column child_key, parent_table, p_id.column parent_key, source_id, materialized_view
FROM (
SELECT c.relname AS child_table, 
        CASE
            WHEN d.description like '%"materialized":true%' THEN true
            ELSE false
        END AS materialized_view, 
        split_part("replace"(regexp_substr(d.description, '"sourceId":"[^"]*"'), '"', ''), ':', 2) AS source_id, 
        split_part("replace"(regexp_substr(d.description, '"parent":"[^"]*"'), '"', ''), 'public-', 2) AS parent_table
   FROM pg_description d
   JOIN pg_class c ON c.oid = d.objoid
) AS table_relationships
JOIN pg_table_def c_id ON table_relationships.child_table=c_id.tablename AND c_id.column ilike '%'||parent_table||'\\_id'
JOIN pg_table_def p_id ON table_relationships.parent_table=p_id.tablename AND p_id.column = 'id'
WHERE parent_table <> ''
ORDER BY child_table;

This is what the query ends up looking like:

So now, using the data from the Zendesk results shown above I can easily write a join clause into my query that joins the parent and child tables together. Like so…

SELECT * 
FROM zendesk_tickets_fields tf
JOIN zendesk_tickets t on tf.zendesk_tickets_id=t.id;

If you want to list all the column in the child tables too, here you go:

SELECT 	cols.column child_column, child_table, c_id.column child_key, parent_table, p_id.column parent_key, source_id, materialized_view
FROM (
SELECT c.relname AS child_table, 
        CASE
            WHEN d.description like '%"materialized":true%' THEN true
            ELSE false
        END AS materialized_view, 
        split_part("replace"(regexp_substr(d.description, '"sourceId":"[^"]*"'), '"', ''), ':', 2) AS source_id, 
        split_part("replace"(regexp_substr(d.description, '"parent":"[^"]*"'), '"', ''), 'public-', 2) AS parent_table
   FROM pg_description d
   JOIN pg_class c ON c.oid = d.objoid
) AS table_relationships
JOIN pg_table_def c_id ON table_relationships.child_table=c_id.tablename AND c_id.column ilike '%'||parent_table||'\\_id'
JOIN pg_table_def p_id ON table_relationships.parent_table=p_id.tablename AND p_id.column = 'id'
JOIN pg_table_def cols ON table_relationships.child_table=cols.tablename AND cols.column <> c_id.column
WHERE parent_table <> ''
ORDER BY child_table, child_column;