ChatGPT解决这个技术问题 Extra ChatGPT

外键和主键的 Postgres 和索引

Postgres 会自动将索引放在外键和主键上吗?我怎么知道?是否有返回表上所有索引的命令?


C
Craig Ringer

PostgreSQL 会自动在主键和唯一约束上创建索引,但不会在外键关系的引用端创建索引。

当 Pg 创建一个隐式索引时,它会发出一条 NOTICE 级别的消息,您可以在 psql 和/或系统日志中看到该消息,因此您可以看到它何时发生。自动创建的索引在表的 \d 输出中也是可见的。

documentation on unique indexes 说:

PostgreSQL 自动为每个唯一约束和主键约束创建索引以强制唯一性。因此,没有必要为主键列显式创建索引。

constraints 上的文档说:

由于从引用表中删除行或更新引用列将需要扫描引用表以查找与旧值匹配的行,因此索引引用列通常是一个好主意。因为这并不总是需要,并且有很多关于如何索引的选择,外键约束的声明不会自动在引用列上创建索引。

因此,如果需要,您必须自己在外键上创建索引。

请注意,如果您在 M-to-N 表中使用主外键,例如 2 个 FK 作为 PK,您将在 PK 上有一个索引,并且可能不需要创建任何额外的索引。

虽然在(或包括)引用端外键列上创建索引通常是个好主意,但这不是必需的。您添加的每个索引都会稍微减慢 DML 操作,因此您需要为每个 INSERTUPDATEDELETE 支付性能成本。如果索引很少使用,它可能不值得拥有。


我希望这个编辑没问题;我已经添加了相关文档的链接,一个引文清楚地表明 FK 关系的引用端不会产生隐式索引,展示了如何在 psql 中查看索引,为了清楚起见重新表述了第一个 par,并添加了一个请注意,索引不是免费的,因此添加它们并不总是正确的。
@CraigRinger,您如何确定索引的收益是否超过其成本?我是否在添加索引之前/之后分析单元测试并检查整体性能提升?或者,还有更好的方法?
@Gili 这是一个单独的 dba.stackexchange.com 问题的主题。
docs 还提示您何时希望为外键创建索引:If the referenced column(s) are changed frequently, it might be wise to add an index to the referencing column(s) so that referential actions associated with the foreign key constraint can be performed more efficiently.
N
Nux

此查询将列出外键上缺少的索引original source

编辑:注意它不会检查小表(小于 9 MB)和其他一些情况。请参阅最后的 WHERE 语句。

-- check for FKs where there is no matching index
-- on the referencing side
-- or a bad index

WITH fk_actions ( code, action ) AS (
    VALUES ( 'a', 'error' ),
        ( 'r', 'restrict' ),
        ( 'c', 'cascade' ),
        ( 'n', 'set null' ),
        ( 'd', 'set default' )
),
fk_list AS (
    SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
        conname, relname, nspname,
        fk_actions_update.action as update_action,
        fk_actions_delete.action as delete_action,
        conkey as key_cols
    FROM pg_constraint
        JOIN pg_class ON conrelid = pg_class.oid
        JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
        JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
        JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
    WHERE contype = 'f'
),
fk_attributes AS (
    SELECT fkoid, conrelid, attname, attnum
    FROM fk_list
        JOIN pg_attribute
            ON conrelid = attrelid
            AND attnum = ANY( key_cols )
    ORDER BY fkoid, attnum
),
fk_cols_list AS (
    SELECT fkoid, array_agg(attname) as cols_list
    FROM fk_attributes
    GROUP BY fkoid
),
index_list AS (
    SELECT indexrelid as indexid,
        pg_class.relname as indexname,
        indrelid,
        indkey,
        indpred is not null as has_predicate,
        pg_get_indexdef(indexrelid) as indexdef
    FROM pg_index
        JOIN pg_class ON indexrelid = pg_class.oid
    WHERE indisvalid
),
fk_index_match AS (
    SELECT fk_list.*,
        indexid,
        indexname,
        indkey::int[] as indexatts,
        has_predicate,
        indexdef,
        array_length(key_cols, 1) as fk_colcount,
        array_length(indkey,1) as index_colcount,
        round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
        cols_list
    FROM fk_list
        JOIN fk_cols_list USING (fkoid)
        LEFT OUTER JOIN index_list
            ON conrelid = indrelid
            AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] @> key_cols

),
fk_perfect_match AS (
    SELECT fkoid
    FROM fk_index_match
    WHERE (index_colcount - 1) <= fk_colcount
        AND NOT has_predicate
        AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
    SELECT 'no index' as issue, *, 1 as issue_sort
    FROM fk_index_match
    WHERE indexid IS NULL
    UNION ALL
    SELECT 'questionable index' as issue, *, 2
    FROM fk_index_match
    WHERE indexid IS NOT NULL
        AND fkoid NOT IN (
            SELECT fkoid
            FROM fk_perfect_match)
),
parent_table_stats AS (
    SELECT fkoid, tabstats.relname as parent_name,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
        round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = parentid
),
fk_table_stats AS (
    SELECT fkoid,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
        seq_scan as table_scans
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = conrelid
)
SELECT nspname as schema_name,
    relname as table_name,
    conname as fk_name,
    issue,
    table_mb,
    writes,
    table_scans,
    parent_name,
    parent_mb,
    parent_writes,
    cols_list,
    indexdef
FROM fk_index_check
    JOIN parent_table_stats USING (fkoid)
    JOIN fk_table_stats USING (fkoid)
WHERE table_mb > 9
    AND ( writes > 1000
        OR parent_writes > 1000
        OR parent_mb > 10 )
ORDER BY issue_sort, table_mb DESC, table_name, fk_name;

似乎不起作用。当我知道我的列上没有引用域表的索引时,返回 0 行。
@juanitogan 注意 where 子句:除其他外,它只考虑大小超过 9 MB 的表。
@Matthias - 啊,明白了。谢谢。是的,我显然没有花时间阅读代码。这还不够重要,不能打扰。 OP本可以提到这些限制。也许我会在某个时候再次检查它。
@SergeyB 似乎对具有主键约束的引用列给出了误报,因此自动具有索引,但查询仍然标记它们。
d
dland

如果您想从程序中列出架构中所有表的索引,所有信息都在目录中:

select
     n.nspname  as "Schema"
    ,t.relname  as "Table"
    ,c.relname  as "Index"
from
          pg_catalog.pg_class c
     join pg_catalog.pg_namespace n on n.oid        = c.relnamespace
     join pg_catalog.pg_index i     on i.indexrelid = c.oid
     join pg_catalog.pg_class t     on i.indrelid   = t.oid
where
        c.relkind = 'i'
    and n.nspname not in ('pg_catalog', 'pg_toast')
    and pg_catalog.pg_table_is_visible(c.oid)
order by
     n.nspname
    ,t.relname
    ,c.relname

如果您想进一步深入研究(例如列和排序),您需要查看 pg_catalog.pg_index。使用 psql -E [dbname] 可以方便地确定如何查询目录。


+1 因为 pg_catalog 和 psql -E 的使用真的非常有用
“供参考 \di 还将列出数据库中的所有索引。” (从其他答案复制的评论,也适用于此处)
M
Milen A. Radev

是 - 用于主键,否 - 用于外键(docs 中有更多内容)。

\d <table_name>

"psql" 中显示了一个表的描述,包括它的所有索引。


作为参考,\di 还将列出数据库中的所有索引。
\d+ TABLE 还列出了所有的索引。
N
Nabi

我喜欢文章 Cool performance features of EclipseLink 2.5 中对此的解释

索引外键 第一个功能是自动索引外键。大多数人错误地认为数据库默认索引外键。好吧,他们没有。主键是自动索引的,但外键不是。这意味着任何基于外键的查询都将进行全表扫描。这是任何 OneToMany、ManyToMany 或 ElementCollection 关系,以及许多 OneToOne 关系,以及涉及连接或对象比较的任何关系的大多数查询。这可能是一个主要的性能问题,您应该始终索引您的外键字段。


如果我们应该总是索引我们的外键字段,为什么数据库引擎不这样做呢?在我看来,这不仅仅是表面上看到的。
@Bobort由于添加索引会导致所有插入,更新和删除的性能损失,并且在这种情况下确实会增加很多外键。这就是为什么我猜这种行为是可选的——开发人员应该在这件事上做出有意识的选择。在某些情况下,外键用于强制数据完整性,但不经常查询或根本不查询 - 在这种情况下,索引的性能损失将是徒劳的
复合索引也有棘手的情况,因为它们是从左到右应用的:即评论表上 [user_id, article_id] 上的复合索引将有效地涵盖用户查询所有评论(例如,在网站上显示聚合评论日志)和获取所有评论此用户对特定文章的评论。在这种情况下,在 user_id 上添加单独的索引实际上是在插入/更新/删除时浪费磁盘空间和 CPU 时间。
啊哈!然后建议很差!我们不应该总是索引我们的外键。正如@Dr.Strangelove 指出的那样,实际上有时我们不想索引它们!非常感谢你,博士!
为什么它们默认不被索引?是否有一个重要的用例需要这样做?
c
coterobarros

此函数基于 Laurenz Albe 在 https://www.cybertec-postgresql.com/en/index-your-foreign-key/ 的工作,列出所有缺少索引的外键。显示了表的大小,对于小表,扫描性能可能优于索引表。

--
-- function:    missing_fk_indexes
-- purpose:     List all foreing keys in the database without and index in the referencing table.
-- author:      Based on the work of Laurenz Albe
-- see:         https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes () 
returns table (
  referencing_table regclass,
  fk_columns        varchar,
  table_size        varchar,
  fk_constraint     name,
  referenced_table  regclass
)
language sql as $$
  select
    -- referencing table having ta foreign key declaration
    tc.conrelid::regclass as referencing_table,
    
    -- ordered list of foreign key columns
    string_agg(ta.attname, ', ' order by tx.n) as fk_columns,
    
    -- referencing table size
    pg_catalog.pg_size_pretty (
      pg_catalog.pg_relation_size(tc.conrelid)
    ) as table_size,
    
    -- name of the foreign key constraint
    tc.conname as fk_constraint,
    
    -- name of the target or destination table
    tc.confrelid::regclass as referenced_table
    
  from pg_catalog.pg_constraint tc
  
  -- enumerated key column numbers per foreign key
  cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
  
  -- name for each key column
  join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
  
  where not exists (
    -- is there ta matching index for the constraint?
    select 1 from pg_catalog.pg_index i
    where 
      i.indrelid = tc.conrelid and 
      -- the first index columns must be the same as the key columns, but order doesn't matter
      (i.indkey::smallint[])[0:cardinality(tc.conkey)-1] @> tc.conkey) and 
      tc.contype = 'f'
    group by 
      tc.conrelid, 
      tc.conname, 
      tc.confrelid
    order by 
      pg_catalog.pg_relation_size(tc.conrelid) desc
$$;

以这种方式测试它,

select * from missing_fk_indexes();

你会看到这样的列表。

   referencing_table    |    fk_columns    | table_size |                fk_constraint                 | referenced_table 
------------------------+------------------+------------+----------------------------------------------+------------------
 stk_warehouse          | supplier_id      | 8192 bytes | stk_warehouse_supplier_id_fkey               | stk_supplier
 stk_reference          | supplier_id      | 0 bytes    | stk_reference_supplier_id_fkey               | stk_supplier
 stk_part_reference     | reference_id     | 0 bytes    | stk_part_reference_reference_id_fkey         | stk_reference
 stk_warehouse_part     | part_id          | 0 bytes    | stk_warehouse_part_part_id_fkey              | stk_part
 stk_warehouse_part_log | dst_warehouse_id | 0 bytes    | stk_warehouse_part_log_dst_warehouse_id_fkey | stk_warehouse
 stk_warehouse_part_log | part_id          | 0 bytes    | stk_warehouse_part_log_part_id_fkey          | stk_part
 stk_warehouse_part_log | src_warehouse_id | 0 bytes    | stk_warehouse_part_log_src_warehouse_id_fkey | stk_warehouse
 stk_product_part       | part_id          | 0 bytes    | stk_product_part_part_id_fkey                | stk_part
 stk_purchase           | parent_id        | 0 bytes    | stk_purchase_parent_id_fkey                  | stk_purchase
 stk_purchase           | supplier_id      | 0 bytes    | stk_purchase_supplier_id_fkey                | stk_supplier
 stk_purchase_line      | reference_id     | 0 bytes    | stk_purchase_line_reference_id_fkey          | stk_reference
 stk_order              | freighter_id     | 0 bytes    | stk_order_freighter_id_fkey                  | stk_freighter
 stk_order_line         | product_id       | 0 bytes    | stk_order_line_product_id_fkey               | cnt_product
 stk_order_fulfillment  | freighter_id     | 0 bytes    | stk_order_fulfillment_freighter_id_fkey      | stk_freighter
 stk_part               | sibling_id       | 0 bytes    | stk_part_sibling_id_fkey                     | stk_part
 stk_order_part         | part_id          | 0 bytes    | stk_order_part_part_id_fkey                  | stk_part

对于那些决定在每个引用列上系统地创建和索引的人来说,这个其他版本可能更有效:

--
-- function:    missing_fk_indexes2
-- purpose:     List all foreing keys in the database without and index in the referencing table.
--              The listing contains create index sentences
-- author:      Based on the work of Laurenz Albe
-- see:         https://www.cybertec-postgresql.com/en/index-your-foreign-key/
--
create or replace function missing_fk_indexes2 () 
returns setof varchar
language sql as $$
  select
    -- create index sentence
    'create index on ' || 
    tc.conrelid::regclass || 
    '(' || 
    string_agg(ta.attname, ', ' order by tx.n) || 
    ')' as create_index
        
  from pg_catalog.pg_constraint tc
  
  -- enumerated key column numbers per foreign key
  cross join lateral unnest(tc.conkey) with ordinality as tx(attnum, n)
  
  -- name for each key column
  join pg_catalog.pg_attribute ta on ta.attnum = tx.attnum and ta.attrelid = tc.conrelid
  
  where not exists (
    -- is there ta matching index for the constraint?
    select 1 from pg_catalog.pg_index i
    where 
      i.indrelid = tc.conrelid and 
      -- the first index columns must be the same as the key columns, but order doesn't matter
      (i.indkey::smallint[])[0:cardinality(tc.conkey)-1] @> tc.conkey) and 
      tc.contype = 'f'
    group by 
      tc.conrelid, 
      tc.conname, 
      tc.confrelid
    order by 
      pg_catalog.pg_relation_size(tc.conrelid) desc
$$;

现在输出是您必须添加到数据库中的 create index 句子。

select * from missing_fk_indexes2();
                   missing_fk_indexes2                    
----------------------------------------------------------
 create index on stk_warehouse(supplier_id)
 create index on stk_reference(supplier_id)
 create index on stk_part_reference(reference_id)
 create index on stk_warehouse_part(part_id)
 create index on stk_warehouse_part_log(dst_warehouse_id)
 create index on stk_warehouse_part_log(part_id)
 create index on stk_warehouse_part_log(src_warehouse_id)
 create index on stk_product_part(part_id)
 create index on stk_purchase(parent_id)
 create index on stk_purchase(supplier_id)
 create index on stk_purchase_line(reference_id)
 create index on stk_order(freighter_id)
 create index on stk_order_line(product_id)
 create index on stk_order_fulfillment(freighter_id)
 create index on stk_part(sibling_id)
 create index on stk_order_part(part_id)

非常有用,@coterobarros。下一步是让它生成 DDL 脚本,以防您想创建它们...
这是一段很棒的代码。谢谢你。
Q
Quassnoi

对于 PRIMARY KEY,将使用以下消息创建索引:

NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "index" for table "table" 

对于 FOREIGN KEY,如果 referenced 表上没有索引,则不会创建约束。

不需要引用表的索引(尽管需要),因此不会隐式创建。


M
Mustafa

这是一个 bash 脚本,它生成 SQL 以使用 @sergeyB 的 SQL 为外键上缺失的索引创建索引。

#!/bin/bash

read -r -d '' SQL <<EOM

WITH fk_actions ( code, action ) AS (
    VALUES ( 'a', 'error' ),
        ( 'r', 'restrict' ),
        ( 'c', 'cascade' ),
        ( 'n', 'set null' ),
        ( 'd', 'set default' )
),
fk_list AS (
    SELECT pg_constraint.oid as fkoid, conrelid, confrelid as parentid,
        conname, relname, nspname,
        fk_actions_update.action as update_action,
        fk_actions_delete.action as delete_action,
        conkey as key_cols
    FROM pg_constraint
        JOIN pg_class ON conrelid = pg_class.oid
        JOIN pg_namespace ON pg_class.relnamespace = pg_namespace.oid
        JOIN fk_actions AS fk_actions_update ON confupdtype = fk_actions_update.code
        JOIN fk_actions AS fk_actions_delete ON confdeltype = fk_actions_delete.code
    WHERE contype = 'f'
),
fk_attributes AS (
    SELECT fkoid, conrelid, attname, attnum
    FROM fk_list
        JOIN pg_attribute
            ON conrelid = attrelid
            AND attnum = ANY( key_cols )
    ORDER BY fkoid, attnum
),
fk_cols_list AS (
    SELECT fkoid, array_to_string(array_agg(attname), ':') as cols_list
    FROM fk_attributes
    GROUP BY fkoid
),
index_list AS (
    SELECT indexrelid as indexid,
        pg_class.relname as indexname,
        indrelid,
        indkey,
        indpred is not null as has_predicate,
        pg_get_indexdef(indexrelid) as indexdef
    FROM pg_index
        JOIN pg_class ON indexrelid = pg_class.oid
    WHERE indisvalid
),
fk_index_match AS (
    SELECT fk_list.*,
        indexid,
        indexname,
        indkey::int[] as indexatts,
        has_predicate,
        indexdef,
        array_length(key_cols, 1) as fk_colcount,
        array_length(indkey,1) as index_colcount,
        round(pg_relation_size(conrelid)/(1024^2)::numeric) as table_mb,
        cols_list
    FROM fk_list
        JOIN fk_cols_list USING (fkoid)
        LEFT OUTER JOIN index_list
            ON conrelid = indrelid
            AND (indkey::int2[])[0:(array_length(key_cols,1) -1)] @> key_cols

),
fk_perfect_match AS (
    SELECT fkoid
    FROM fk_index_match
    WHERE (index_colcount - 1) <= fk_colcount
        AND NOT has_predicate
        AND indexdef LIKE '%USING btree%'
),
fk_index_check AS (
    SELECT 'no index' as issue, *, 1 as issue_sort
    FROM fk_index_match
    WHERE indexid IS NULL
    UNION ALL
    SELECT 'questionable index' as issue, *, 2
    FROM fk_index_match
    WHERE indexid IS NOT NULL
        AND fkoid NOT IN (
            SELECT fkoid
            FROM fk_perfect_match)
),
parent_table_stats AS (
    SELECT fkoid, tabstats.relname as parent_name,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as parent_writes,
        round(pg_relation_size(parentid)/(1024^2)::numeric) as parent_mb
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = parentid
),
fk_table_stats AS (
    SELECT fkoid,
        (n_tup_ins + n_tup_upd + n_tup_del + n_tup_hot_upd) as writes,
        seq_scan as table_scans
    FROM pg_stat_user_tables AS tabstats
        JOIN fk_list
            ON relid = conrelid
)
SELECT relname as table_name,
    cols_list
FROM fk_index_check
    JOIN parent_table_stats USING (fkoid)
    JOIN fk_table_stats USING (fkoid)
ORDER BY issue_sort, table_mb DESC, table_name;
EOM

DB_NAME="dbname"
DB_USER="dbuser"
DB_PASSWORD="dbpass"
DB_HOSTNAME="hostname"
DB_PORT=5432

export PGPASSWORD="$DB_PASSWORD"
psql -h $DB_HOSTNAME -p $DB_PORT -U $DB_USER -d $DB_NAME -t -A -F"," -c "$SQL" | while read -r line; do
  IFS=','
  parts=($line)
  unset IFS
  tableName=${parts[0]}
  colsList=${parts[1]}

  indexName="${tableName}_${colsList//:/_}_index"
  printf -- "\n--Index: %s\nDROP INDEX IF EXISTS %s;\n
CREATE INDEX %s\n\t\tON %s USING btree\n\t(%s);
  " "$indexName" "$indexName" "$indexName" "$tableName" "$colsList"
done