ChatGPT解决这个技术问题 Extra ChatGPT

How do you find the row count for all your tables in Postgres

I'm looking for a way to find the row count for all my tables in Postgres. I know I can do this one table at a time with:

SELECT count(*) FROM table_name;

but I'd like to see the row count for all the tables and then order by that to get an idea of how big all my tables are.


S
Shubham Chaudhary

There's three ways to get this sort of count, each with their own tradeoffs.

If you want a true count, you have to execute the SELECT statement like the one you used against each table. This is because PostgreSQL keeps row visibility information in the row itself, not anywhere else, so any accurate count can only be relative to some transaction. You're getting a count of what that transaction sees at the point in time when it executes. You could automate this to run against every table in the database, but you probably don't need that level of accuracy or want to wait that long.

WITH tbl AS
  (SELECT table_schema,
          TABLE_NAME
   FROM information_schema.tables
   WHERE TABLE_NAME not like 'pg_%'
     AND table_schema in ('public'))
SELECT table_schema,
       TABLE_NAME,
       (xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int AS rows_n
FROM tbl
ORDER BY rows_n DESC;

The second approach notes that the statistics collector tracks roughly how many rows are "live" (not deleted or obsoleted by later updates) at any time. This value can be off by a bit under heavy activity, but is generally a good estimate:

SELECT schemaname,relname,n_live_tup 
  FROM pg_stat_user_tables 
ORDER BY n_live_tup DESC;

That can also show you how many rows are dead, which is itself an interesting number to monitor.

The third way is to note that the system ANALYZE command, which is executed by the autovacuum process regularly as of PostgreSQL 8.3 to update table statistics, also computes a row estimate. You can grab that one like this:

SELECT 
  nspname AS schemaname,relname,reltuples
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE 
  nspname NOT IN ('pg_catalog', 'information_schema') AND
  relkind='r' 
ORDER BY reltuples DESC;

Which of these queries is better to use is hard to say. Normally I make that decision based on whether there's more useful information I also want to use inside of pg_class or inside of pg_stat_user_tables. For basic counting purposes just to see how big things are in general, either should be accurate enough.


For completions sake, please add this for the first option (thanks goes to @a_horse_with_no_name): with tbl as (SELECT table_schema,table_name FROM information_schema.tables where table_name not like 'pg_%' and table_schema in ('public')) select table_schema, table_name, (xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, table_name), false, true, '')))[1]::text::int as rows_n from tbl ORDER BY 3 DESC;
@Greg Smith Which version introduced n_live_tup? My Redshift database lacks that column. It's a derivative of Postgres 8.0.2.
The 'second approach' query (using pg_stat_user_tables) returned mostly zeros in n_live_tup for me because ANALYZE had never been run. Rather than run ANALYZE on every schema/table and wait forever for an answer, I first checked the results using 'third approach' and that one (using pg_class) returned very accurate counts.
@BrianD, it is possible to execute the analyze at database level using analyzedb utility as "analyzedb -d dbname"
@estani, thanks! Your sql is the only one from this answer where i haven't discovered obvious errors. For example Gregs second approach showed zeros for all tables, and third approach was accurate only in new DB where i restored the dump (compared to the proposed query output actual counts in original DB tables were different)
a
a_horse_with_no_name

Here is a solution that does not require functions to get an accurate count for each table:

select table_schema, 
       table_name, 
       (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count
from (
  select table_name, table_schema, 
         query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count
  from information_schema.tables
  where table_schema = 'public' --<< change here for the schema you want
) t

query_to_xml will run the passed SQL query and return an XML with the result (the row count for that table). The outer xpath() will then extract the count information from that xml and convert it to a number

The derived table is not really necessary, but makes the xpath() a bit easier to understand - otherwise the whole query_to_xml() would need to be passed to the xpath() function.


Very clever. It's a pity there is no query_to_jsonb().
@a_horse_with_no_name, will it give any performance issue on busy and huge tables while executing?
@Spike: performance issues compared to what? The major performance bottleneck is running a select count(*) on every table.
@a_horse_with_no_name, by executing x_path function against a 100 million records.
This gives a TRUE count and the accepted answer doesn't which is expected. Thanks!
C
Community

To get estimates, see Greg Smith's answer.

To get exact counts, the other answers so far are plagued with some issues, some of them serious (see below). Here's a version that's hopefully better:

CREATE FUNCTION rowcount_all(schema_name text default 'public')
  RETURNS table(table_name text, cnt bigint) as
$$
declare
 table_name text;
begin
  for table_name in SELECT c.relname FROM pg_class c
    JOIN pg_namespace s ON (c.relnamespace=s.oid)
    WHERE c.relkind = 'r' AND s.nspname=schema_name
  LOOP
    RETURN QUERY EXECUTE format('select cast(%L as text),count(*) from %I.%I',
       table_name, schema_name, table_name);
  END LOOP;
end
$$ language plpgsql;

It takes a schema name as parameter, or public if no parameter is given.

To work with a specific list of schemas or a list coming from a query without modifying the function, it can be called from within a query like this:

WITH rc(schema_name,tbl) AS (
  select s.n,rowcount_all(s.n) from (values ('schema1'),('schema2')) as s(n)
)
SELECT schema_name,(tbl).* FROM rc;

This produces a 3-columns output with the schema, the table and the rows count.

Now here are some issues in the other answers that this function avoids:

Table and schema names shouldn't be injected into executable SQL without being quoted, either with quote_ident or with the more modern format() function with its %I format string. Otherwise some malicious person may name their table tablename;DROP TABLE other_table which is perfectly valid as a table name.

Even without the SQL injection and funny characters problems, table name may exist in variants differing by case. If a table is named ABCD and another one abcd, the SELECT count(*) FROM... must use a quoted name otherwise it will skip ABCD and count abcd twice. The %I of format does this automatically.

information_schema.tables lists custom composite types in addition to tables, even when table_type is 'BASE TABLE' (!). As a consequence, we can't iterate oninformation_schema.tables, otherwise we risk having select count(*) from name_of_composite_type and that would fail. OTOH pg_class where relkind='r' should always work fine.

The type of COUNT() is bigint, not int. Tables with more than 2.15 billion rows may exist (running a count(*) on them is a bad idea, though).

A permanent type need not to be created for a function to return a resultset with several columns. RETURNS TABLE(definition...) is a better alternative.


A
Aur Saraf

The hacky, practical answer for people trying to evaluate which Heroku plan they need and can't wait for heroku's slow row counter to refresh:

Basically you want to run \dt in psql, copy the results to your favorite text editor (it will look like this:

 public | auth_group                     | table | axrsosvelhutvw
 public | auth_group_permissions         | table | axrsosvelhutvw
 public | auth_permission                | table | axrsosvelhutvw
 public | auth_user                      | table | axrsosvelhutvw
 public | auth_user_groups               | table | axrsosvelhutvw
 public | auth_user_user_permissions     | table | axrsosvelhutvw
 public | background_task                | table | axrsosvelhutvw
 public | django_admin_log               | table | axrsosvelhutvw
 public | django_content_type            | table | axrsosvelhutvw
 public | django_migrations              | table | axrsosvelhutvw
 public | django_session                 | table | axrsosvelhutvw
 public | exercises_assignment           | table | axrsosvelhutvw

), then run a regex search and replace like this:

^[^|]*\|\s+([^|]*?)\s+\| table \|.*$

to:

select '\1', count(*) from \1 union/g

which will yield you something very similar to this:

select 'auth_group', count(*) from auth_group union
select 'auth_group_permissions', count(*) from auth_group_permissions union
select 'auth_permission', count(*) from auth_permission union
select 'auth_user', count(*) from auth_user union
select 'auth_user_groups', count(*) from auth_user_groups union
select 'auth_user_user_permissions', count(*) from auth_user_user_permissions union
select 'background_task', count(*) from background_task union
select 'django_admin_log', count(*) from django_admin_log union
select 'django_content_type', count(*) from django_content_type union
select 'django_migrations', count(*) from django_migrations union
select 'django_session', count(*) from django_session
;

(You'll need to remove the last union and add the semicolon at the end manually)

Run it in psql and you're done.

            ?column?            | count
--------------------------------+-------
 auth_group_permissions         |     0
 auth_user_user_permissions     |     0
 django_session                 |  1306
 django_content_type            |    17
 auth_user_groups               |   162
 django_admin_log               |  9106
 django_migrations              |    19
[..]

I like this idea
In Atom, I had to regex search and replace like this: select '$1', count(*) from $1 union/g
Also, the post says: "You'll need to remove the union and add the semicolon at the end." This is a typo. You need to remove /g (keep union) and add one semicolon (;) at the very end. Don't forget to remove the last union before the semicolon.
"Don't forget to remove the last union before the semicolon" is what I meant :) Added the word "last" to clarify
For VSCode what worked for me was select '$1', count(*) from $1 union
i
ig0774

If you don't mind potentially stale data, you can access the same statistics used by the query optimizer.

Something like:

SELECT relname, n_tup_ins - n_tup_del as rowcount FROM pg_stat_all_tables;

@mlissner: If your autovacuum interval is too long or you haven't run a manual ANALYZE on the table, the statistics can get way off. Its a question of database load and how the database is configured (if the statistics are updated more frequently, the stats will be more accurate, but it could reduce runtime performance). Ultimately, the only way to get accurate data is to run select count(*) from table for all tables.
x
xvga

Simple Two Steps: (Note : No need to change anything - just copy paste) 1. create function

create function 
cnt_rows(schema text, tablename text) returns integer
as
$body$
declare
  result integer;
  query varchar;
begin
  query := 'SELECT count(1) FROM ' || schema || '.' || tablename;
  execute query into result;
  return result;
end;
$body$
language plpgsql;

2. Run this query to get rows count for all the tables

select sum(cnt_rows) as total_no_of_rows from (select 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE') as subq;

or To get rows counts tablewise

select
  table_schema,
  table_name, 
  cnt_rows(table_schema, table_name)
from information_schema.tables
where 
  table_schema not in ('pg_catalog', 'information_schema') 
  and table_type='BASE TABLE'
order by 3 desc;

S
Stew-au

Not sure if an answer in bash is acceptable to you, but FWIW...

PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
            SELECT   table_name
            FROM     information_schema.tables
            WHERE    table_type='BASE TABLE'
            AND      table_schema='public'
            \""
TABLENAMES=$(export PGPASSWORD=test; eval "$PGCOMMAND")

for TABLENAME in $TABLENAMES; do
    PGCOMMAND=" psql -h localhost -U fred -d mydb -At -c \"
                SELECT   '$TABLENAME',
                         count(*) 
                FROM     $TABLENAME
                \""
    eval "$PGCOMMAND"
done

At its essence, this just boils down to the same select count(*) from table_name; in the OP!
A
ADTC

I usually don't rely on statistics, especially in PostgreSQL.

SELECT table_name, dsql2('select count(*) from '||table_name) as rownum
FROM information_schema.tables
WHERE table_type='BASE TABLE'
    AND table_schema='livescreen'
ORDER BY 2 DESC;
CREATE OR REPLACE FUNCTION dsql2(i_text text)
  RETURNS int AS
$BODY$
Declare
  v_val int;
BEGIN
  execute i_text into v_val;
  return v_val;
END; 
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

This is nice but first query should also include the schema for the rownum value. If there are conflicting names in different schemas this will not work as expected. So this part of the query should look more like dsql2('select count(*) from livescreen.'||table_name) or better it could be turned into a function of its own.
e
estani

Extracted from my Comment in the answer from GregSmith to make it more readable:

with tbl as (
  SELECT table_schema,table_name 
  FROM information_schema.tables
  WHERE table_name not like 'pg_%' AND table_schema IN ('public')
)
SELECT 
  table_schema, 
  table_name, 
  (xpath('/row/c/text()', 
    query_to_xml(format('select count(*) AS c from %I.%I', table_schema, table_name), 
    false, 
    true, 
    '')))[1]::text::int AS rows_n 
FROM tbl ORDER BY 3 DESC;

Thanks to @a_horse_with_no_name


P
Pradeep Maurya

This worked for me

SELECT schemaname,relname,n_live_tup FROM pg_stat_user_tables ORDER BY n_live_tup DESC;


This gives some interesting numbers, but it's not (always?) the row count. OK, the docs say it's estimated: postgresql.org/docs/9.3/… (how to update these stats?)
G
Gnanam

I don't remember the URL from where I collected this. But hope this should help you:

CREATE TYPE table_count AS (table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT 
            c.relname
        FROM
            pg_catalog.pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
        WHERE 
            c.relkind = ''r''
            AND n.nspname = ''public'' 
        ORDER BY 1 
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.relname 
            LOOP 
            END LOOP; 

            r.table_name := t_name.relname; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 

Executing select count_em_all(); should get you row count of all your tables.


It's good idea to quote column names (like quote_ident(t_name.relname)) to ensure proper support for unusual names ("column-name", for example).
To drop it afterwards: DROP FUNCTION count_em_all();
Got an error: select count_em_all(); ERROR: syntax error at or near "group" LINE 1: SELECT COUNT() AS "count" FROM group ^ QUERY: SELECT COUNT() AS "count" FROM group CONTEXT: PL/pgSQL function count_em_all() line 18 at FOR over EXECUTE statement
Great! To select and sort - SELECT * FROM count_em_all() as r ORDER BY r.num_rows DESC;
B
Bhargav Modi

I made a small variation to include all tables, also for non-public tables.

CREATE TYPE table_count AS (table_schema TEXT,table_name TEXT, num_rows INTEGER); 

CREATE OR REPLACE FUNCTION count_em_all () RETURNS SETOF table_count  AS '
DECLARE 
    the_count RECORD; 
    t_name RECORD; 
    r table_count%ROWTYPE; 

BEGIN
    FOR t_name IN 
        SELECT table_schema,table_name
        FROM information_schema.tables
        where table_schema !=''pg_catalog''
          and table_schema !=''information_schema''
        ORDER BY 1,2
        LOOP
            FOR the_count IN EXECUTE ''SELECT COUNT(*) AS "count" FROM '' || t_name.table_schema||''.''||t_name.table_name
            LOOP 
            END LOOP; 

            r.table_schema := t_name.table_schema;
            r.table_name := t_name.table_name; 
            r.num_rows := the_count.count; 
            RETURN NEXT r; 
        END LOOP; 
        RETURN; 
END;
' LANGUAGE plpgsql; 

use select count_em_all(); to call it.

Hope you find this usefull. Paul


ERROR: "r.table_schema" is not a known variable
u
user5359531

Here is a much simpler way.

tables="$(echo '\dt' | psql -U "${PGUSER}" | tail -n +4 | head -n-2 | tr -d ' ' | cut -d '|' -f2)"
for table in $tables; do
printf "%s: %s\n" "$table" "$(echo "SELECT COUNT(*) FROM $table;" | psql -U "${PGUSER}" | tail -n +3 | head -n-2 | tr -d ' ')"
done

output should look like this

auth_group: 0
auth_group_permissions: 0
auth_permission: 36
auth_user: 2
auth_user_groups: 0
auth_user_user_permissions: 0
authtoken_token: 2
django_admin_log: 0
django_content_type: 9
django_migrations: 22
django_session: 0
mydata_table1: 9011
mydata_table2: 3499

you can update the psql -U "${PGUSER}" portion as needed to access your database

note that the head -n-2 syntax may not work in macOS, you could probably just use a different implementation there

Tested on psql (PostgreSQL) 11.2 under CentOS 7

if you want it sorted by table, then just wrap it with sort

for table in $tables; do
printf "%s: %s\n" "$table" "$(echo "SELECT COUNT(*) FROM $table;" | psql -U "${PGUSER}" | tail -n +3 | head -n-2 | tr -d ' ')"
done | sort -k 2,2nr

output;

mydata_table1: 9011
mydata_table2: 3499
auth_permission: 36
django_migrations: 22
django_content_type: 9
authtoken_token: 2
auth_user: 2
auth_group: 0
auth_group_permissions: 0
auth_user_groups: 0
auth_user_user_permissions: 0
django_admin_log: 0
django_session: 0

S
Syed Mushtaq

You Can use this query to generate all tablenames with their counts

select ' select  '''|| tablename  ||''', count(*) from ' || tablename ||' 
union' from pg_tables where schemaname='public'; 

the result from the above query will be

select  'dim_date', count(*) from dim_date union 
select  'dim_store', count(*) from dim_store union
select  'dim_product', count(*) from dim_product union
select  'dim_employee', count(*) from dim_employee union

You'll need to remove the last union and add the semicolon at the end !!

select  'dim_date', count(*) from dim_date union 
select  'dim_store', count(*) from dim_store union
select  'dim_product', count(*) from dim_product union
select  'dim_employee', count(*) from dim_employee  **;**

RUN !!!


C
Community

I like Daniel Vérité's answer. But when you can't use a CREATE statement you can either use a bash solution or, if you're a windows user, a powershell one:

# You don't need this if you have pgpass.conf
$env:PGPASSWORD = "userpass"

# Get table list
$tables = & 'C:\Program Files\PostgreSQL\9.4\bin\psql.exe' -U user -w -d dbname -At -c "select table_name from information_schema.tables where table_type='BASE TABLE' AND table_schema='schema1'"

foreach ($table in $tables) {
    & 'C:\path_to_postresql\bin\psql.exe' -U root -w -d dbname -At -c "select '$table', count(*) from $table"
}

M
MrMesees

I wanted the total from all tables + a list of tables with their counts. A little like a performance chart of where most time was spent

WITH results AS ( 
  SELECT nspname AS schemaname,relname,reltuples
    FROM pg_class C
    LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
    WHERE 
      nspname NOT IN ('pg_catalog', 'information_schema') AND
      relkind='r'
     GROUP BY schemaname, relname, reltuples
)

SELECT * FROM results
UNION
SELECT 'all' AS schemaname, 'all' AS relname, SUM(reltuples) AS "reltuples" FROM results

ORDER BY reltuples DESC

You could of course put a LIMIT clause on the results in this version too so that you get the largest n offenders as well as a total.

One thing that should be noted about this is that you need to let it sit for a while after bulk imports. I tested this by just adding 5000 rows to a database across several tables using real import data. It showed 1800 records for about a minute (probably a configurable window)

This is based from https://stackoverflow.com/a/2611745/1548557 work, so thank you and recognition to that for the query to use within the CTE


P
Peter Vandivier

If you're in the psql shell, using \gexec allows you to execute the syntax described in syed's answer and Aur's answer without manual edits in an external text editor.

with x (y) as (
    select
        'select count(*), '''||
        tablename||
        ''' as "tablename" from '||
        tablename||' '
    from pg_tables
    where schemaname='public'
)
select
    string_agg(y,' union all '||chr(10)) || ' order by tablename'
from x \gexec

Note, string_agg() is used both to delimit union all between statements and to smush the separated datarows into a single unit to be passed into the buffer.

\gexec Sends the current query buffer to the server, then treats each column of each row of the query's output (if any) as a SQL statement to be executed.


R
Radhakrishna

below query will give us row count and size for each table

select table_schema, table_name, pg_relation_size('"'||table_schema||'"."'||table_name||'"')/1024/1024 size_MB, (xpath('/row/c/text()', query_to_xml(format('select count(*) AS c from %I.%I', table_schema, table_name), false, true,'')))[1]::text::int AS rows_n from information_schema.tables order by size_MB desc;


S
SuperNova

You can just do select from

select from table_name;