Macro Tutorial: Find Duplicates in CSV File

Published: Mar 1 2019 — Updated: Jun 12 2019

A typical task when working with CSV files is to find and process duplicates, very often to delete them. Since version 0.9.7 Tablecruncher provides Javascript as a builtin macro language. This tutorial shows how to find and delete entries with a duplicate column. The process for finding duplicates has been extremly simplified with the introduction of flagged rows in version 0.9.8.

Please be aware, that Tablecrucher is still in beta. Always make backups of your files before manipulating them with Tablecruncher. The macro API is subject to change and more powerful features will be added in the future, making tasks like this one easier.

Step 1: Our initial file

This is our initial file that serves as an example for this tutorial.

Step 2: Sort the column with the values to check for duplicates

Now we’re going to sort the column which possibly contains duplicate entries. This step ensures all rows with duplicates are grouped together. To achieve this, just right-click the header of our column “E-Mail” or use Data > Sort … and choose the correct column from the dropdown.

After clicking Sort the table is sorted by the value of the chosen column.

Step 4: Select column

We select our sorted column by clicking on its column header.

Step 5: Flag lines with duplicates

With our sorted column selected, we choose Macro > Execute Macro …, click the little “+” sign and add a new macro with the name “Find Duplicates”. Insert the following code into the source area of the macro editor.

// Flag duplicate rows
for( r = ROWMIN+1; r<=ROWMAX; ++r) {
  var cell = getString(r,COLMIN);
  var prevCell = getString(r-1,COLMIN);
  if( cell == prevCell ) {
    flagRow(r);
  }
}

This code loops over the second selected row (ROWMIN+1) to the last row (ROWMAX). In cell we store the content of the recent cell, while in prevCell there’s the content of the cell above it. If cell equals prevCell we’ve found a duplicate, so we flag that row using flagRow(r).

Just click execute and the macros flags all rows containing duplicates:

Step 6: Delete all flagged rows

To finally delete the duplicates, click Data > Delete Flagged Row(s) ….

Final thoughts

This tutorial should give you some ideas on how to implement your data cleansing tasks utilizing the power of the builtin Javascript language. You can learn more about Tablecruncher’s Javascript macros in our documentation.

If you’ve questions regarding this tutorial or if you have ideas for further tutorials, don’t hesitate to contact me by e-mail: info@tablecruncher.com

Download (Apple Silicon) Download (Intel)
Version 1.7.0
requires macOS Sequoia, Sonoma, Ventura (Apple Silicon) or macOS Sequoia, Sonoma, Ventura, Monterey, Big Sur, Catalina (Intel)