很容易找到一个字段的重复项:
SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
所以如果我们有一张桌子
ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com
此查询将为我们提供 John、Sam、Tom、Tom,因为它们都具有相同的 email
。
但是,我想要的是获得具有相同 email
和 name
的重复项。
也就是说,我想得到“汤姆”,“汤姆”。
我需要这个的原因:我犯了一个错误,并允许插入重复的 name
和 email
值。现在我需要删除/更改重复项,所以我需要先找到它们。
name
字段。
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
只需在两列上进行分组。
注意:旧的 ANSI 标准是在 GROUP BY 中包含所有非聚合列,但这已经随着 "functional dependency" 的想法而改变:
在关系数据库理论中,函数依赖是数据库关系中两组属性之间的约束。换句话说,函数依赖是描述关系中属性之间关系的约束。
支持不一致:
最近的 PostgreSQL 支持它。
SQL Server(与 SQL Server 2017 一样)仍然需要 GROUP BY 中的所有非聚合列。
MySQL 是不可预测的,你需要 sql_mode=only_full_group_by: GROUP BY lname ORDER BY 显示错误的结果;在没有 ANY() 的情况下,这是最便宜的聚合函数(请参阅已接受答案中的评论)。
GROUP BY lname ORDER BY 显示错误结果;
在没有 ANY() 的情况下,这是最便宜的聚合函数(请参阅已接受答案中的评论)。
Oracle 不够主流(警告:幽默,我不了解 Oracle)。
尝试这个:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
输出:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
如果您想要 dups 的 ID,请使用以下命令:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
输出:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
删除重复项尝试:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
输出:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
尝试这个:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
如果要删除重复项,这是一种比在三重子选择中查找偶数/奇数行更简单的方法:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
所以要删除:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
更容易阅读和理解恕我直言
注意:唯一的问题是您必须执行请求,直到没有删除任何行,因为您每次只删除每个重复项中的 1 个
You can't specify target table 'users' for update in FROM clause
与其他答案相比,您可以查看包含所有列(如果有)的 whole 记录。在 row_number 函数的 PARTITION BY
部分中,选择所需的唯一/重复列。
SELECT *
FROM (
SELECT a.*
, Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
FROM Customers AS a
) AS b
WHERE r > 1;
当你想选择所有字段的所有重复记录时,你可以这样写
CREATE TABLE test (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, c1 integer
, c2 text
, d date DEFAULT now()
, v text
);
INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;
SELECT *
FROM test
WHERE (c1, c2) IN (
SELECT c1, c2
FROM test
GROUP BY 1,2
HAVING count(*) > 1
)
ORDER BY 1;
在 PostgreSQL 中测试。
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
聚会有点晚了,但我找到了一个非常酷的解决方法来查找所有重复的 ID:
SELECT email, GROUP_CONCAT(id)
FROM users
GROUP BY email
HAVING COUNT(email) > 1;
GROUP_CONCAT
将在某个预定长度后停止,因此您可能无法获得所有的 id
。
这会从每组重复项中选择/删除除一条记录之外的所有重复记录。因此,删除会留下所有唯一记录 + 每组重复项中的一条记录。
选择重复项:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
删除重复项:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
请注意大量记录,这可能会导致性能问题。
试试这个代码
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
如果您使用 Oracle,这种方式会更可取:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
如果您想查看表中是否有任何重复的行,我使用了下面的查询:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
我认为这可以正常搜索特定列中的重复值。
select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
这是我想出的简单的事情。它使用公用表表达式 (CTE) 和分区窗口(我认为这些功能在 SQL 2008 及更高版本中提供)。
此示例查找具有重复姓名和出生日期的所有学生。您要检查重复的字段位于 OVER 子句中。您可以在投影中包含您想要的任何其他字段。
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
我们如何计算重复值?要么重复 2 次,要么重复 2 次以上。只计算它们,而不是分组。
简单到
select COUNT(distinct col_01) from Table_01
通过使用 CTE,我们也可以找到像这样的重复值
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
我想这会对你有所帮助
SELECT name, email, COUNT(* )
FROM users
GROUP BY name, email
HAVING COUNT(*)>1
好吧,在上述所有答案中,这个问题都得到了很好的回答。但我想列出所有可能的方式,我们可以通过各种方式来做到这一点,这可能会传达我们如何做到这一点的理解,并且寻求者可以选择最适合他/她需要的解决方案之一,因为这是其中之一最常见的查询 SQL 开发人员会遇到不同的业务用例,或者有时在面试中也会遇到。
创建示例数据
我将从仅从这个问题中设置一些示例数据开始。
Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT NewTable VALUES (1,'John','asd@asd.com')
INSERT NewTable VALUES (2,'Sam','asd@asd.com')
INSERT NewTable VALUES (3,'Tom','asd@asd.com')
INSERT NewTable VALUES (4,'Bob','bob@asd.com')
INSERT NewTable VALUES (5,'Tom','asd@asd.com')
https://i.stack.imgur.com/ljKwM.png
1. 按条款使用分组
SELECT
name,email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name,email
HAVING COUNT(*)>1
https://i.stack.imgur.com/A2bC4.png
这个怎么运作:
GROUP BY 子句按名称和电子邮件列中的值将行分组。
然后,COUNT() 函数返回每个组(姓名、电子邮件)的出现次数。
然后,HAVING 子句只保留重复组,即出现多次的组。
2. 使用 CTE:
要为每个重复行返回整行,请使用公用表表达式 (CTE) 将上述查询的结果与 NewTable
表连接起来:
WITH cte AS (
SELECT
name,
email,
COUNT(*) occurrences
FROM NewTable
GROUP BY
name,
email
HAVING COUNT(*) > 1
)
SELECT
t1.Id,
t1.name,
t1.email
FROM NewTable t1
INNER JOIN cte ON
cte.name = t1.name AND
cte.email = t1.email
ORDER BY
t1.name,
t1.email;
https://i.stack.imgur.com/J2V0X.png
3. 使用 ROW_NUMBER() 函数
WITH cte AS (
SELECT
name,
email,
ROW_NUMBER() OVER (
PARTITION BY name,email
ORDER BY name,email) rownum
FROM
NewTable t1
)
SELECT
*
FROM
cte
WHERE
rownum > 1;
https://i.stack.imgur.com/tKofK.png
这个怎么运作:
ROW_NUMBER() 将 NewTable 表的行按名称和电子邮件列中的值分配到分区中。重复的行将在名称和电子邮件列中具有重复值,但行号不同
外部查询删除每个组中的第一行。
好吧,现在我相信,您可以对如何查找重复项并应用逻辑在所有可能的情况下查找重复项有很好的想法。谢谢。
这也应该有效,也许试一试。
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
如果您搜索具有某种前缀或一般更改(例如邮件中的新域)的重复项,则在您的情况下特别好。那么您可以在这些列中使用 replace()
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
GROUP BY
的情况下使用 COUNT
,除非它引用整个表。
这里最重要的是要有最快的功能。还应确定重复索引。自联接是一个不错的选择,但要获得更快的功能,最好先找到具有重复项的行,然后与原始表联接以查找重复行的 id。最后按除 id 之外的任何列排序,以使重复的行彼此靠近。
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
如果要查找重复数据(按一个或多个条件)并选择实际行。
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
删除名称重复的记录
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable
)
DELETE FROM CTE WHERE T > 1
检查表中的重复记录。
select * from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
或者
select * from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
删除表中的重复记录。
delete from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
或者
delete from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
您也可以使用分析函数尝试此操作的另一种简单方法:
SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt >1;
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
你可能想试试这个
SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1
不定期副业成功案例分享
>1
=1