ChatGPT解决这个技术问题 Extra ChatGPT

How do I delete a fixed number of rows with sorting in PostgreSQL?

I'm trying to port some old MySQL queries to PostgreSQL, but I'm having trouble with this one:

DELETE FROM logtable ORDER BY timestamp LIMIT 10;

PostgreSQL doesn't allow ordering or limits in its delete syntax, and the table doesn't have a primary key so I can't use a subquery. Additionally, I want to preserve the behavior where the query deletes exactly the given number or records -- for example, if the table contains 30 rows but they all have the same timestamp, I still want to delete 10, although it doesn't matter which 10.

So; how do I delete a fixed number of rows with sorting in PostgreSQL?

Edit: No primary key means there's no log_id column or similar. Ah, the joys of legacy systems!

Why not add the primary key? Piece o' cake in postgresql: alter table foo add column id serial primary key.
That was my initial approach, but other requirements prevent it.

m
mu is too short

You could try using the ctid:

DELETE FROM logtable
WHERE ctid IN (
    SELECT ctid
    FROM logtable
    ORDER BY timestamp
    LIMIT 10
)

The ctid is:

The physical location of the row version within its table. Note that although the ctid can be used to locate the row version very quickly, a row's ctid will change if it is updated or moved by VACUUM FULL. Therefore ctid is useless as a long-term row identifier.

There's also oid but that only exists if you specifically ask for it when you create the table.


This works, but how reliable is it? Are there any 'gotchas' I need to look out for? Is it possible for VACUUM FULL or autovacuum to cause problems if they change the ctid values in the table while the query is running?
Incremental VACUUMs won't change ctids, I don't think. Since that just compacts within each page, and the ctid is just the line number not a page offset. A VACUUM FULL or a CLUSTER operation would change the ctid, but those operations take an access exclusive lock on the table first.
@Whatsit: My impression of the ctid documentation is that ctid is stable enough to make this DELETE work okay but not stable enough to, for example, put in another table as a ghetto-FK. Presumably you don't UPDATE the logtable so you don't have to worry about that changing ctids and VACUUM FULL does lock the table (postgresql.org/docs/current/static/routine-vacuuming.html) so you don't have to worry about the other way that ctids can change. @araqnid's PostgreSQL-Fu is pretty strong and the docs agree with him to boot.
Thanks to both of you for the clarification. I did look into the docs but I wasn't certain I was interpreting them correctly. I'd never encountered ctids before this.
This is actually a pretty bad solution since Postgres is not able to use TID scan in joins (IN is a particular case of it). If you look at the plan, it should be quite terrible. So "very quickly" applies only when you specify CTID explicitly. The said is as of version 10.
p
pd40

Postgres docs recommend to use array instead of IN and subquery. This should work much faster

DELETE FROM logtable 
WHERE id = any (array(SELECT id FROM logtable ORDER BY timestamp LIMIT 10));

This and some other tricks can be found here


@Konrad Garus Here you go link, 'Fast first n rows removing'
@BlakeRegalia No, because there's no primary key in the table specified. This will delete all the rows with an "ID" found in the first 10. If all the rows have the same ID all the rows will be deleted.
If any (array( ... )); is faster than in ( ... ) that sounds like a bug in the query optimiser - it should be able to spot that transformation and do the same thing with the data itself.
I found this method to be considerable slower than using IN on an UPDATE (which might be the difference).
Measurement on 12 GB table: first query 450..1000 ms, second one 5..7 seconds: Fast one: delete from cs_logging where id = any (array( select id from cs_logging where date_created < now() - interval '1 days' * 30 and partition_key like '%I' order by id limit 500 )) Slow one: delete from cs_logging where id in ( select id from cs_logging where date_created < now() - interval '1 days' * 30 and partition_key like '%I' order by id limit 500 ). Using ctid was a lot slower (minutes).
K
Konrad Garus
delete from logtable where log_id in (
    select log_id from logtable order by timestamp limit 10);

u
user2449151

If you don't have a primary key you can use the array Where IN syntax with a composite key.

delete from table1 where (schema,id,lac,cid) in (select schema,id,lac,cid from table1 where lac = 0 limit 1000);

This worked for me.


P
Patrick Hüsler

Assuming you want to delete ANY 10 records (without the ordering) you could do this:

DELETE FROM logtable as t1 WHERE t1.ctid < (select t2.ctid from logtable as t2  where (Select count(*) from logtable t3  where t3.ctid < t2.ctid ) = 10 LIMIT 1);

For my use case, deleting 10M records, this turned out to be faster.


This is not performant! Use instead: delete from "table" where ("table_ID") in (select "table_ID" from "table" limit 10)
B
Bernhard

You could write a procedure which loops over the delete for individual lines, the procedure could take a parameter to specify the number of items you want to delete. But that's a bit overkill compared to MySQL.