Accurate row count

faq

#1

I suspect that some of my data may have been incorrectly duplicated. I’d like to get an accurate row count excluding any duplicates. Any recommendations?

Thanks in advance!


#2

Open the Tables page and run a query like:

select count(*) from table foo

This will give you an accurate row count.


#3

Wouldn’t that provide a count of all of the rows on the table?

In the example of if its duplicated or not, wouldn’t you want to look for a unique identifier and do a count on total table vs Distinct count on unique identifier to try to see what data may be duplicated?


#4

Hey Adam - depends what Steve is observing. The id field defines uniqueness for a table, so if rows have been duplicated incorrectly, the most common reason is an incorrectly set id (aka primary key) field. This field is set in the data source panel. It can be a single column or any combination of columns.


#5

An additional note here about the COUNT query. The number of rows on the Tables page is based on svv_table_info where the value includes rows marked for deletion but not yet vacuumed. Panoply has several automatic procedures to take care of vacuuming the tables but they do not run immediately after a table is updated, so you may see differences. If you have duplicate data that you’ve just deleted, and you want to reexamine the number of rows in a table, a COUNT query will give the correct result.

As noted above, it would go like this:

SELECT COUNT(*) FROM {table_name};