When I have a column with separated values, I can use the unnest()
function:
myTable
id | elements
---+------------
1 |ab,cd,efg,hi
2 |jk,lm,no,pq
3 |rstuv,wxyz
select id, unnest(string_to_array(elements, ',')) AS elem
from myTable
id | elem
---+-----
1 | ab
1 | cd
1 | efg
1 | hi
2 | jk
...
How can I include element numbers? I.e.:
id | elem | nr
---+------+---
1 | ab | 1
1 | cd | 2
1 | efg | 3
1 | hi | 4
2 | jk | 1
...
I want the original position of each element in the source string. I've tried with window functions (row_number()
, rank()
etc.) but I always get 1
. Maybe because they are in the same row of the source table?
I know it's a bad table design. It's not mine, I'm just trying to fix it.
Postgres 9.4 or later
Use WITH ORDINALITY
for set-returning functions:
When a function in the FROM clause is suffixed by WITH ORDINALITY, a bigint column is appended to the output which starts from 1 and increments by 1 for each row of the function's output. This is most useful in the case of set returning functions such as unnest().
In combination with the LATERAL
feature in pg 9.3+, and according to this thread on pgsql-hackers, the above query can now be written as:
SELECT t.id, a.elem, a.nr FROM tbl AS t LEFT JOIN LATERAL unnest(string_to_array(t.elements, ',')) WITH ORDINALITY AS a(elem, nr) ON true;
LEFT JOIN ... ON true
preserves all rows in the left table, even if the table expression to the right returns no rows. If that's of no concern you can use this otherwise equivalent, less verbose form with an implicit CROSS JOIN LATERAL
:
SELECT t.id, a.elem, a.nr
FROM tbl t, unnest(string_to_array(t.elements, ',')) WITH ORDINALITY a(elem, nr);
Or simpler if based off an actual array (arr
being an array column):
SELECT t.id, a.elem, a.nr
FROM tbl t, unnest(t.arr) WITH ORDINALITY a(elem, nr);
Or even, with minimal syntax:
SELECT id, a, ordinality
FROM tbl, unnest(arr) WITH ORDINALITY a;
a
is automatically table and column alias. The default name of the added ordinality column is ordinality
. But it's better (safer, cleaner) to add explicit column aliases and table-qualify columns.
Postgres 8.4 - 9.3
With row_number() OVER (PARTITION BY id ORDER BY elem)
you get numbers according to the sort order, not the ordinal number of the original ordinal position in the string.
You can simply omit ORDER BY
:
SELECT *, row_number() OVER (PARTITION by id) AS nr
FROM (SELECT id, regexp_split_to_table(elements, ',') AS elem FROM tbl) t;
While this normally works and I have never seen it fail in simple queries, PostgreSQL asserts nothing concerning the order of rows without ORDER BY
. It happens to work due to an implementation detail.
To guarantee ordinal numbers of elements in the blank-separated string:
SELECT id, arr[nr] AS elem, nr
FROM (
SELECT *, generate_subscripts(arr, 1) AS nr
FROM (SELECT id, string_to_array(elements, ' ') AS arr FROM tbl) t
) sub;
Or simpler if based off an actual array:
SELECT id, arr[nr] AS elem, nr FROM (SELECT *, generate_subscripts(arr, 1) AS nr FROM tbl) t;
Related answer on dba.SE:
How to preserve the original order of elements in an unnested array?
Postgres 8.1 - 8.4
None of these features are available, yet: RETURNS TABLE
, generate_subscripts()
, unnest()
, array_length()
. But this works:
CREATE FUNCTION f_unnest_ord(anyarray, OUT val anyelement, OUT ordinality integer)
RETURNS SETOF record
LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1
FROM generate_series(array_lower($1,1), array_upper($1,1)) i';
Note in particular, that the array index can differ from ordinal positions of elements. Consider this demo with an extended function:
CREATE FUNCTION f_unnest_ord_idx(anyarray, OUT val anyelement, OUT ordinality int, OUT idx int)
RETURNS SETOF record
LANGUAGE sql IMMUTABLE AS
'SELECT $1[i], i - array_lower($1,1) + 1, i
FROM generate_series(array_lower($1,1), array_upper($1,1)) i';
SELECT id, arr, (rec).*
FROM (
SELECT *, f_unnest_ord_idx(arr) AS rec
FROM (
VALUES
(1, '{a,b,c}'::text[]) -- short for: '[1:3]={a,b,c}'
, (2, '[5:7]={a,b,c}')
, (3, '[-9:-7]={a,b,c}')
) t(id, arr)
) sub;
id | arr | val | ordinality | idx
----+-----------------+-----+------------+-----
1 | {a,b,c} | a | 1 | 1
1 | {a,b,c} | b | 2 | 2
1 | {a,b,c} | c | 3 | 3
2 | [5:7]={a,b,c} | a | 1 | 5
2 | [5:7]={a,b,c} | b | 2 | 6
2 | [5:7]={a,b,c} | c | 3 | 7
3 | [-9:-7]={a,b,c} | a | 1 | -9
3 | [-9:-7]={a,b,c} | b | 2 | -8
3 | [-9:-7]={a,b,c} | c | 3 | -7
Compare:
Normalize array subscripts so they start with 1
Try:
select v.*, row_number() over (partition by id order by elem) rn from
(select
id,
unnest(string_to_array(elements, ',')) AS elem
from myTable) v
Use Subscript Generating Functions.
http://www.postgresql.org/docs/current/static/functions-srf.html#FUNCTIONS-SRF-SUBSCRIPTS
For example:
SELECT
id
, elements[i] AS elem
, i AS nr
FROM
( SELECT
id
, elements
, generate_subscripts(elements, 1) AS i
FROM
( SELECT
id
, string_to_array(elements, ',') AS elements
FROM
myTable
) AS foo
) bar
;
More simply:
SELECT
id
, unnest(elements) AS elem
, generate_subscripts(elements, 1) AS nr
FROM
( SELECT
id
, string_to_array(elements, ',') AS elements
FROM
myTable
) AS foo
;
If the order of element is not important, you can
select
id, elem, row_number() over (partition by id) as nr
from (
select
id,
unnest(string_to_array(elements, ',')) AS elem
from myTable
) a
I think this is related, using a correlated subquery to assign arbitrary ranked / ordinal values to the final set. It's more of a practical applied use using PG array handling to De-Pivot a dataset (works w/ PG 9.4).
WITH _students AS ( /** CTE **/
SELECT * FROM
( SELECT 'jane'::TEXT ,'doe'::TEXT , 1::INT
UNION
SELECT 'john'::TEXT ,'doe'::TEXT , 2::INT
UNION
SELECT 'jerry'::TEXT ,'roe'::TEXT , 3::INT
UNION
SELECT 'jodi'::TEXT ,'roe'::TEXT , 4::INT
) s ( fn, ln, id )
) /** end WITH **/
SELECT s.id
, ax.fanm
, ax.anm
, ax.val
, ax.num
FROM _students s
,UNNEST /** MULTI-UNNEST() BLOCK **/
(
( SELECT ARRAY[ fn, ln ]::text[] AS anm
/** CORRELATED SUBQUERY **/
FROM _students s2 WHERE s2.id = s.id
)
,( SELECT ARRAY[ 'first name', 'last name' ]::text[] AS fanm )
,( SELECT ARRAY[ '9','8','7'] AS val)
,( SELECT ARRAY[ 1,2,3,4,5 ] AS num)
) ax ( anm, fanm, val, num )
;
DE-PIVOTED RESULT SET:
+--+----------+-----+----+---+
|id|fanm |anm |val |num|
+--+----------+-----+----+---+
|2 |first name|john |9 |1 |
|2 |last name |doe |8 |2 |
|2 |NULL |NULL |7 |3 |
|2 |NULL |NULL |NULL|4 |
|2 |NULL |NULL |NULL|5 |
|1 |first name|jane |9 |1 |
|1 |last name |doe |8 |2 |
|1 |NULL |NULL |7 |3 |
|1 |NULL |NULL |NULL|4 |
|1 |NULL |NULL |NULL|5 |
|4 |first name|jodi |9 |1 |
|4 |last name |roe |8 |2 |
|4 |NULL |NULL |7 |3 |
|4 |NULL |NULL |NULL|4 |
|4 |NULL |NULL |NULL|5 |
|3 |first name|jerry|9 |1 |
|3 |last name |roe |8 |2 |
|3 |NULL |NULL |7 |3 |
|3 |NULL |NULL |NULL|4 |
|3 |NULL |NULL |NULL|5 |
+--+----------+-----+----+---+
unnest2() as exercise
Older versions before pg v8.4 need a user-defined unnest()
. We can adapt this old function to return elements with an index:
CREATE FUNCTION unnest2(anyarray)
RETURNS setof record AS
$BODY$
SELECT $1[i], i
FROM generate_series(array_lower($1,1),
array_upper($1,1)) i;
$BODY$ LANGUAGE sql IMMUTABLE;
RETURNS TABLE
, yet. I added a chapter to my answer discussing a solution.
setof record
.
Success story sharing
WITH ORDINALITY
is preferred overgenerate_subscripts()
? It looks to me likegenerate_subscripts()
is better as it shows the actual element location in the array. This is useful, for example, when updating the array... should I be usingWITH ORDINALITY
instead?WITH ORDINALITY
is the general solution to get row numbers for any set returning function in an SQL query. It's the fastest, reliable way and it also happens to work perfectly for 1-dimenstional, 1-based arrays (the default for Postgres arrays, consider this). If you work with any other kind of arrays (most people don't), and you actually need to preserve / work with the original subscripts, thengenerate_subscripts()
is the way to go. Butunnest()
flattens everytihng to begin with ...Table functions appearing in FROM can also be preceded by the key word LATERAL, but for functions the key word is optional; the function's arguments can contain references to columns provided by preceding FROM items in any case.