Postgres and Indexes on Foreign Keys and Primary Keys

C

Craig Ringer

PostgreSQL automatically creates indexes on primary keys and unique constraints, but not on the referencing side of foreign key relationships.

When Pg creates an implicit index it will emit a NOTICE-level message that you can see in psql and/or the system logs, so you can see when it happens. Automatically created indexes are visible in \d output for a table, too.

The documentation on unique indexes says:

PostgreSQL automatically creates an index for each unique constraint and primary key constraint to enforce uniqueness. Thus, it is not necessary to create an index explicitly for primary key columns.

and the documentation on constraints says:

Since a DELETE of a row from the referenced table or an UPDATE of a referenced column will require a scan of the referencing table for rows matching the old value, it is often a good idea to index the referencing columns. Because this is not always needed, and there are many choices available on how to index, declaration of a foreign key constraint does not automatically create an index on the referencing columns.

Therefore you have to create indexes on foreign-keys yourself if you want them.

Note that if you use primary-foreign-keys, like 2 FK's as a PK in a M-to-N table, you will have an index on the PK and probably don't need to create any extra indexes.

While it's usually a good idea to create an index on (or including) your referencing-side foreign key columns, it isn't required. Each index you add slows DML operations down slightly, so you pay a performance cost on every INSERT, UPDATE or DELETE. If the index is rarely used it may not be worth having.

I hope this edit is OK; I've added links to the relevant documentation, a quote that makes it utterly explicit that the referencing side of FK relationships doesn't produce an implicit index, shown how to see indexes in psql, rephrased the 1st par for clarity, and added a note that indexes aren't free so it's not always right to add them.

@CraigRinger, how do you determine if the benefit of an index surpasses its cost? Do I profile unit tests before/after adding an index and check for an overall performance gain? Or is there a better way?

@Gili That's a topic for a separate dba.stackexchange.com question.

The docs also give a hint about when you would want to create indexes for foreign-keys:

If the referenced column(s) are changed frequently, it might be wise to add an index to the referencing column(s) so that referential actions associated with the foreign key constraint can be performed more efficiently.

N

Nux

This query will list missing indexes on foreign keys, original source.

Edit: Note that it will not check small tables (less then 9 MB) and some other cases. See final WHERE statement.

-- check for FKs where there is no matching index
-- on the referencing side
-- or a bad index

WITH fk_actions ( code, action ) AS (
    VALUES ( 'a', 'error' ),
        ( 'r', 'restrict' ),
        ( 'c', 'cascade' ),
        ( 'n', 'set null' ),
        ( 'd', 'set default' )
),
fk_list AS (
    SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
        conname, relname, nspname,
        fk_actions_update.action as update_action,
        fk_actions_delete.action as delete_action,
        conkey as key_cols
    FROM pg_constraint
        JOIN pg_class ON conrelid = pg_class.oid
        JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
        JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
        JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
    WHERE contype = 'f'
),
fk_attributes AS (
    SELECT fkoid, conrelid, attname, attnum
    FROM fk_list
        JOIN pg_attribute
            ON conrelid = attrelid
            AND attnum = ANY( key_cols )
    ORDER BY fkoid, attnum
),
fk_cols_list AS (
    SELECT fkoid, array_agg(attname) as cols_list
    FROM fk_attributes
    GROUP BY fkoid
),
index_list AS (
    SELECT indexrelid as indexid,
        pg_class.relname as indexname,
        indrelid,
        indkey,
        indpred is not null as has_predicate,
        pg_get_indexdef(indexrelid) as indexdef
    FROM pg_index
        JOIN pg_class ON indexrelid = pg_class.oid
    WHERE indisvalid
),
fk_index_match AS (
    SELECT fk_list.*,
        indexid,
        indexname,
        indkey::int[] as indexatts,
        has_predicate,
        indexdef,
        array_length(key_cols, 1) as fk_colcount,
        array_length(indkey,1) as index_colcount,
        round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
        cols_list
    FROM fk_list
        JOIN fk_cols_list USING (fkoid)
        LEFT OUTER JOIN index_list
            ON conrelid = indrelid
            AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] @> key_cols

),
fk_perfect_match AS (
    SELECT fkoid
    FROM fk_index_match
    WHERE (index_colcount - 1) <= fk_colcount
        AND NOT has_predicate
        AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
    SELECT 'no index' as issue, *, 1 as issue_sort
    FROM fk_index_match
    WHERE indexid IS NULL
    UNION ALL
    SELECT 'questionable index' as issue, *, 2
    FROM fk_index_match
    WHERE indexid IS NOT NULL
        AND fkoid NOT IN (
            SELECT fkoid
            FROM fk_perfect_match)
),
parent_table_stats AS (
    SELECT fkoid, tabstats.relname as parent_name,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
        round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = parentid
),
fk_table_stats AS (
    SELECT fkoid,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
        seq_scan as table_scans
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = conrelid
)
SELECT nspname as schema_name,
    relname as table_name,
    conname as fk_name,
    issue,
    table_mb,
    writes,
    table_scans,
    parent_name,
    parent_mb,
    parent_writes,
    cols_list,
    indexdef
FROM fk_index_check
    JOIN parent_table_stats USING (fkoid)
    JOIN fk_table_stats USING (fkoid)
WHERE table_mb > 9
    AND ( writes > 1000
        OR parent_writes > 1000
        OR parent_mb > 10 )
ORDER BY issue_sort, table_mb DESC, table_name, fk_name;

Doesn't appear to work. Returns 0 rows when I know I have columns without indexes on them that reference domain tables.

@juanitogan Watch the where clauses: Amongst others, it only takes tables into consideration which size is more than 9 MB.

@Matthias - Ah, got it. Thanks. Yeah, I obviously didn't take time to read through the code. It wasn't critical enough to bother. The OP could've mentioned the limitations. Maybe I'll check it out again sometime.

@SergeyB it seems to give false positive on referenced columns having primary key constraint on them, thus automatically having an index but the query still flags them.

d

dland

If you want to list the indexes of all the tables in your schema(s) from your program, all the information is on hand in the catalog:

select
     n.nspname  as "Schema"
    ,t.relname  as "Table"
    ,c.relname  as "Index"
from
          pg_catalog.pg_class c
     join pg_catalog.pg_namespace n on n.oid        = c.relnamespace
     join pg_catalog.pg_index i     on i.indexrelid = c.oid
     join pg_catalog.pg_class t     on i.indrelid   = t.oid
where
        c.relkind = 'i'
    and n.nspname not in ('pg_catalog', 'pg_toast')
    and pg_catalog.pg_table_is_visible(c.oid)
order by
     n.nspname
    ,t.relname
    ,c.relname

If you want to delve further (such as columns and ordering), you need to look at pg_catalog.pg_index. Using psql -E [dbname] comes in handy for figuring out how to query the catalog.

+1 because the use of pg_catalog and psql -E is really very useful

"For reference \di will also list all the indexes in the database." (comment copied form other answer, applies here as well)

M

Milen A. Radev

Yes - for primary keys, no - for foreign keys (more in the docs).

\d <table_name>

in "psql" shows a description of a table including all its indexes.

For reference \di will also list all the indexes in the database.

\d+ TABLE also list all idexes.

N

Nabi

I love how this is explained in the article Cool performance features of EclipseLink 2.5

Indexing Foreign Keys The first feature is auto indexing of foreign keys. Most people incorrectly assume that databases index foreign keys by default. Well, they don't. Primary keys are auto indexed, but foreign keys are not. This means any query based on the foreign key will be doing full table scans. This is any OneToMany, ManyToMany or ElementCollection relationship, as well as many OneToOne relationships, and most queries on any relationship involving joins or object comparisons. This can be a major perform issue, and you should always index your foreign keys fields.

If we should always index our foreign keys fields, why don't the database engines already do that? It seems to me there is more to this than meets the eye.

@Bobort Since adding index incurs performance penalty on all inserts, updates and deletes, and lots of foreign keys could really add up in this case. That's why this behavior is opt-in I guess - developer should make conscious choice in this matter. There could also be cases when foreign key is used to enforce data integrity, but is not queried often or queried at all - in this case performance penalty of index would be for nothing

There also tricky cases with compound indices, since those are applied left to right: i.e compound index on [user_id, article_id] on comments table would effectively cover both querying ALL comments by user (e.g. to show aggregated comments log on website) and fetching all comments made by this user for a specific article. Adding a separate index on user_id in this case is effectively a waste of disk space and cpu time on inserts/updates/deletes.

Aha! Then the advice is poor! We should NOT always index our foreign keys. As @Dr.Strangelove has pointed out, there are actually times when we don't want to index them! Thank you so much, Dr.!

Why aren't they indexed by default? Is there an important use case which makes this necessary?

c

coterobarros

This function, based on the work by Laurenz Albe at https://www.cybertec-postgresql.com/en/index-your-foreign-key/, list all the foreign keys with missing indexes. The size of the table is shown, as for small tables the scanning performance could be superior to the index one.

--
-- function:    missing_fk_indexes
-- purpose:     List all foreing keys in the database without and index in the referencing table.
-- author:      Based on the work of Laurenz Albe
-- see:         https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes () 
returns table (
  referencing_table regclass,
  fk_columns        varchar,
  table_size        varchar,
  fk_constraint     name,
  referenced_table  regclass
)
language sql as $$
  select
    -- referencing table having ta foreign key declaration
    tc.conrelid::regclass as referencing_table,
    
    -- ordered list of foreign key columns
    string_agg(ta.attname, ', ' order by tx.n) as fk_columns,
    
    -- referencing table size
    pg_catalog.pg_size_pretty (
      pg_catalog.pg_relation_size(tc.conrelid)
    ) as table_size,
    
    -- name of the foreign key constraint
    tc.conname as fk_constraint,
    
    -- name of the target or destination table
    tc.confrelid::regclass as referenced_table
    
  from pg_catalog.pg_constraint tc
  
  -- enumerated key column numbers per foreign key
  cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
  
  -- name for each key column
  join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
  
  where not exists (
    -- is there ta matching index for the constraint?
    select 1 from pg_catalog.pg_index i
    where 
      i.indrelid = tc.conrelid and 
      -- the first index columns must be the same as the key columns, but order doesn't matter
      (i.indkey::smallint[])[0:cardinality(tc.conkey)-1] @> tc.conkey) and 
      tc.contype = 'f'
    group by 
      tc.conrelid, 
      tc.conname, 
      tc.confrelid
    order by 
      pg_catalog.pg_relation_size(tc.conrelid) desc
$$;

test it this way,

select * from missing_fk_indexes();

you'll see a list like this.

   referencing_table    |    fk_columns    | table_size |                fk_constraint                 | referenced_table 
------------------------+------------------+------------+----------------------------------------------+------------------
 stk_warehouse          | supplier_id      | 8192 bytes | stk_warehouse_supplier_id_fkey               | stk_supplier
 stk_reference          | supplier_id      | 0 bytes    | stk_reference_supplier_id_fkey               | stk_supplier
 stk_part_reference     | reference_id     | 0 bytes    | stk_part_reference_reference_id_fkey         | stk_reference
 stk_warehouse_part     | part_id          | 0 bytes    | stk_warehouse_part_part_id_fkey              | stk_part
 stk_warehouse_part_log | dst_warehouse_id | 0 bytes    | stk_warehouse_part_log_dst_warehouse_id_fkey | stk_warehouse
 stk_warehouse_part_log | part_id          | 0 bytes    | stk_warehouse_part_log_part_id_fkey          | stk_part
 stk_warehouse_part_log | src_warehouse_id | 0 bytes    | stk_warehouse_part_log_src_warehouse_id_fkey | stk_warehouse
 stk_product_part       | part_id          | 0 bytes    | stk_product_part_part_id_fkey                | stk_part
 stk_purchase           | parent_id        | 0 bytes    | stk_purchase_parent_id_fkey                  | stk_purchase
 stk_purchase           | supplier_id      | 0 bytes    | stk_purchase_supplier_id_fkey                | stk_supplier
 stk_purchase_line      | reference_id     | 0 bytes    | stk_purchase_line_reference_id_fkey          | stk_reference
 stk_order              | freighter_id     | 0 bytes    | stk_order_freighter_id_fkey                  | stk_freighter
 stk_order_line         | product_id       | 0 bytes    | stk_order_line_product_id_fkey               | cnt_product
 stk_order_fulfillment  | freighter_id     | 0 bytes    | stk_order_fulfillment_freighter_id_fkey      | stk_freighter
 stk_part               | sibling_id       | 0 bytes    | stk_part_sibling_id_fkey                     | stk_part
 stk_order_part         | part_id          | 0 bytes    | stk_order_part_part_id_fkey                  | stk_part

For those who decided to systematically create and index on every referencing column, this other version could be more efficient:

--
-- function:    missing_fk_indexes2
-- purpose:     List all foreing keys in the database without and index in the referencing table.
--              The listing contains create index sentences
-- author:      Based on the work of Laurenz Albe
-- see:         https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes2 () 
returns setof varchar
language sql as $$
  select
    -- create index sentence
    'create index on ' || 
    tc.conrelid::regclass || 
    '(' || 
    string_agg(ta.attname, ', ' order by tx.n) || 
    ')' as create_index
        
  from pg_catalog.pg_constraint tc
  
  -- enumerated key column numbers per foreign key
  cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
  
  -- name for each key column
  join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
  
  where not exists (
    -- is there ta matching index for the constraint?
    select 1 from pg_catalog.pg_index i
    where 
      i.indrelid = tc.conrelid and 
      -- the first index columns must be the same as the key columns, but order doesn't matter
      (i.indkey::smallint[])[0:cardinality(tc.conkey)-1] @> tc.conkey) and 
      tc.contype = 'f'
    group by 
      tc.conrelid, 
      tc.conname, 
      tc.confrelid
    order by 
      pg_catalog.pg_relation_size(tc.conrelid) desc
$$;

Now the output is the create index sentence you have to add to your database.

select * from missing_fk_indexes2();

                   missing_fk_indexes2                    
----------------------------------------------------------
 create index on stk_warehouse(supplier_id)
 create index on stk_reference(supplier_id)
 create index on stk_part_reference(reference_id)
 create index on stk_warehouse_part(part_id)
 create index on stk_warehouse_part_log(dst_warehouse_id)
 create index on stk_warehouse_part_log(part_id)
 create index on stk_warehouse_part_log(src_warehouse_id)
 create index on stk_product_part(part_id)
 create index on stk_purchase(parent_id)
 create index on stk_purchase(supplier_id)
 create index on stk_purchase_line(reference_id)
 create index on stk_order(freighter_id)
 create index on stk_order_line(product_id)
 create index on stk_order_fulfillment(freighter_id)
 create index on stk_part(sibling_id)
 create index on stk_order_part(part_id)

Very useful, @coterobarros. Next step is to have it generate the DDL script in case you want to create them...

This is aweseome piece of code. Thank you.

Q

Quassnoi

For a PRIMARY KEY, an index will be created with the following message:

NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "index" for table "table"

For a FOREIGN KEY, the constraint will not be created if there is no index on the referenced table.

An index on referencing table is not required (though desired), and therefore will not be implicitly created.

M

Mustafa

And here's a bash script that generates the SQL to create indexes for missing indexes on foreign keys using @sergeyB's SQL.

#!/bin/bash

read -r -d '' SQL <<EOM

WITH fk_actions ( code, action ) AS (
    VALUES ( 'a', 'error' ),
        ( 'r', 'restrict' ),
        ( 'c', 'cascade' ),
        ( 'n', 'set null' ),
        ( 'd', 'set default' )
),
fk_list AS (
    SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
        conname, relname, nspname,
        fk_actions_update.action as update_action,
        fk_actions_delete.action as delete_action,
        conkey as key_cols
    FROM pg_constraint
        JOIN pg_class ON conrelid = pg_class.oid
        JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
        JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
        JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
    WHERE contype = 'f'
),
fk_attributes AS (
    SELECT fkoid, conrelid, attname, attnum
    FROM fk_list
        JOIN pg_attribute
            ON conrelid = attrelid
            AND attnum = ANY( key_cols )
    ORDER BY fkoid, attnum
),
fk_cols_list AS (
    SELECT fkoid, array_to_string(array_agg(attname), ':') as cols_list
    FROM fk_attributes
    GROUP BY fkoid
),
index_list AS (
    SELECT indexrelid as indexid,
        pg_class.relname as indexname,
        indrelid,
        indkey,
        indpred is not null as has_predicate,
        pg_get_indexdef(indexrelid) as indexdef
    FROM pg_index
        JOIN pg_class ON indexrelid = pg_class.oid
    WHERE indisvalid
),
fk_index_match AS (
    SELECT fk_list.*,
        indexid,
        indexname,
        indkey::int[] as indexatts,
        has_predicate,
        indexdef,
        array_length(key_cols, 1) as fk_colcount,
        array_length(indkey,1) as index_colcount,
        round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
        cols_list
    FROM fk_list
        JOIN fk_cols_list USING (fkoid)
        LEFT OUTER JOIN index_list
            ON conrelid = indrelid
            AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] @> key_cols

),
fk_perfect_match AS (
    SELECT fkoid
    FROM fk_index_match
    WHERE (index_colcount - 1) <= fk_colcount
        AND NOT has_predicate
        AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
    SELECT 'no index' as issue, *, 1 as issue_sort
    FROM fk_index_match
    WHERE indexid IS NULL
    UNION ALL
    SELECT 'questionable index' as issue, *, 2
    FROM fk_index_match
    WHERE indexid IS NOT NULL
        AND fkoid NOT IN (
            SELECT fkoid
            FROM fk_perfect_match)
),
parent_table_stats AS (
    SELECT fkoid, tabstats.relname as parent_name,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
        round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = parentid
),
fk_table_stats AS (
    SELECT fkoid,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
        seq_scan as table_scans
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = conrelid
)
SELECT relname as table_name,
    cols_list
FROM fk_index_check
    JOIN parent_table_stats USING (fkoid)
    JOIN fk_table_stats USING (fkoid)
ORDER BY issue_sort, table_mb DESC, table_name;
EOM

DB_NAME="dbname"
DB_USER="dbuser"
DB_PASSWORD="dbpass"
DB_HOSTNAME="hostname"
DB_PORT=5432

export PGPASSWORD="$DB_PASSWORD"
psql -h $DB_HOSTNAME -p $DB_PORT -U $DB_USER -d $DB_NAME -t -A -F"," -c "$SQL" | while read -r line; do
  IFS=','
  parts=($line)
  unset IFS
  tableName=${parts[0]}
  colsList=${parts[1]}

  indexName="${tableName}_${colsList//:/_}_index"
  printf -- "\n--Index: %s\nDROP INDEX IF EXISTS %s;\n
CREATE INDEX %s\n\t\tON %s USING btree\n\t(%s);
  " "$indexName" "$indexName" "$indexName" "$tableName" "$colsList"
done

Postgres and Indexes on Foreign Keys and Primary Keys

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US