If you want to remove duplicates from a result set, which command should you use?

Question

Introduction

Nội dung chính Show

Setting Up Test Database
Create Test Database
Add Table and Data
Display the Contents of the Dates Table
Display Duplicate Rows
Removing Duplicate Rows
Option 1: Remove Duplicate Rows Using INNER JOIN
Option 2: Remove Duplicate Rows Using an Intermediate Table
Option 3: Remove Duplicate Rows Using ROW_NUMBER()
Which command is used to delete the duplicates?
Which keyword is used to remove duplicate rows in the result set?
How remove duplicates excel?

There are several instances in which you may encounter duplicate rows in your MySQL database. This guide will walk you through the process of how to remove duplicate row values in MySQL.

Prerequisites

A system with MySQL installed
A MySQL root user account
Access to a terminal window / command line (Ctrl-Alt-T, Search > Terminal)

Setting Up Test Database

If you already have a MySQL database to work on, skip ahead to the next section.

Otherwise, open a terminal window and type in the following:

mysql –u root –p

When prompted, enter the root password for your MySQL installation. If you have a specific user account, use those credentials instead of root.

The system prompt should change to:

mysql>

Note: If you aren’t able to connect to the MySQL server, you may get the message that access has been denied. Refer to our article on how to solve this MySQL error if you need assistance.

Create Test Database

You can create a new table in an existing database. To do so, find the appropriate database by listing all existing instances with:

SHOW DATABASES;

Alternatively, you can create a new database by entering the following command:

CREATE DATABASE IF NOT EXISTS testdata;

To start working in your new testdata database use:

USE testdata;

Add Table and Data

Once in the database, add a table with the data below using the following command:

CREATE TABLE dates (
id INT PRIMARY KEY AUTO_INCREMENT,
day VARCHAR(2) NOT NULL,
month VARCHAR(10) NOT NULL,
year VARCHAR(4) NOT NULL

);

INSERT INTO dates (day,month,year)
VALUES (’29’,’January’,’2011’),
(’30’,’January’,’2011’),
(’30’,’January’,’2011’),
(’14’,’February,’2017’),
(’14’,’February,’2018’),
(‘23’,’March’,’2018’),
(‘23’,’March’,’2018’),
(‘23’,’March’,’2019’),
(‘29’,’October’,’2019’),
(‘29’,’November’,’2019’),
(‘12’,’November’,’2017’),
(‘17’,’August’,’2018’),
(‘05’,’June’,’2016’);

Display the Contents of the Dates Table

To see a display of all the dates you entered, ordered by year, type:

SELECT * FROM dates ORDER BY year;

The output should show a list of dates in the appropriate order.

Display Duplicate Rows

To find out whether there are duplicate rows in the test database, use the command:

SELECT
     day, COUNT(day),
     month, COUNT(month),
     year, COUNT(year)
FROM 
     dates
GROUP BY
     day,
     month,
     year
HAVING 
     COUNT(day) > 1
     AND COUNT(month) > 1
     AND COUNT(year) > 1;

The system will display any values that are duplicates. In this case, you should see:

This format works to select multiple columns. If you have a column with a unique identifier, such as an email address on a contact list or a single date column, you can simply select from that one column.

Removing Duplicate Rows

Prior to using any of the below-mentioned methods, remember you need to be working in an existing database. We will be using our sample database:

USE testdata;

Option 1: Remove Duplicate Rows Using INNER JOIN

To delete duplicate rows in our test MySQL table, use MySQL JOINS and enter the following:

delete t1 FROM dates t1
INNER  JOIN dates t2
WHERE
    t1.id < t2.id AND
    t1.day = t2.day AND
    t1.month = t2.month AND
    t1.year = t2.year;

You may also use the command from Display Duplicate Rows to verify the deletion.

Note: If you have a unique column identifier, you can substitute it for the month, day, and year column identifiers, omitting the AND operators. This is designed to help you delete rows with multiple identical columns.

Option 2: Remove Duplicate Rows Using an Intermediate Table

You can create an intermediate table and use it to remove duplicate rows. This is done by transferring only the unique rows to the newly created table and deleting the original one (with the remaining duplicate rows).

To do so follow the instructions below.

1. Create an intermediate table that has the same structure as the source table and transfer the unique rows found in the source:

CREATE TABLE [copy_of_source] SELECT DISTINCT [columns] FROM [source_table];

For instance, to create a copy of the structure of the sample table dates the command is:

CREATE TABLE copy_of_dates SELECT DISTINCT id, day, month, year FROM dates;

2. With that done, you can delete the source table with the drop command and rename the new one:

DROP TABLE [source_table];

ALTER TABLE [copy_of_source] RENAME TO [source_table];

For example:

DROP TABLE dates;

ALTER TABLE copy_of_dates RENAME TO dates;

Option 3: Remove Duplicate Rows Using ROW_NUMBER()

Important: This method is only available for MySQL version 8.02 and later. Check MySQL version before attempting this method.

Another way to delete duplicate rows is with the ROW_NUMBER() function.

SELECT *. ROW_NUMBER () Over (PARTITION BY [column] ORDER BY [column]) as [row_number_name];

Therefore, the command for our sample table would be:

SELECT *. ROW_NUMBER () Over (PARTITION BY id ORDER BY id) as row_number;

The results include a row_number column. The data is partitioned by id and within each partition there are unique row numbers. Unique values are labeled with row number 1, while duplicates are 2, 3, and so on.

Therefore, to remove duplicate rows, you need to delete everything except the ones marked with 1. This is done by running a DELETE query with the row_number as the filter.

To delete duplicate rows run:

DELETE FROM [table_name] WHERE row_number > 1;

In our example dates table, the command would be:

DELETE FROM dates WHERE row_number > 1;

The output will tell you how many rows have been affected, that is, how many duplicate rows have been deleted.

You can verify there are no duplicate rows by running:

SELECT * FROM [table_name];

For instance:

SELECT * FROM dates;

Conclusion

You should now be able to remove duplicate rows in MySQL and improve your database performance. Remember, new implementations of MySQL strict mode have limited the functionality of the required functions.

Which command is used to delete the duplicates?

The ROW NUMBER() function can be used in connection with a common table expression (CTE) to sort the data and then remove the duplicate records.

Which keyword is used to remove duplicate rows in the result set?

The DISTINCT keyword eliminates duplicate rows from a result.

How remove duplicates excel?

In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.

SQL duplicate row