Published: Mar 1 2019 — Updated: Jun 12 2019
A typical task when working with CSV files is to find and process duplicates, very often to delete them. Since version 0.9.7 Tablecruncher provides Javascript as a builtin macro language. This tutorial shows how to find and delete entries with a duplicate column. The process for finding duplicates has been extremly simplified with the introduction of flagged rows in version 0.9.8.
Please be aware, that Tablecrucher is still in beta. Always make backups of your files before manipulating them with Tablecruncher. The macro API is subject to change and more powerful features will be added in the future, making tasks like this one easier.
This is our initial file that serves as an example for this tutorial.
Now we’re going to sort the column which possibly contains duplicate entries. This step ensures all rows with duplicates are grouped together. To achieve this, just right-click the header of our column “E-Mail” or use Data > Sort … and choose the correct column from the dropdown.
After clicking Sort the table is sorted by the value of the chosen column.
We select our sorted column by clicking on its column header.
With our sorted column selected, we choose Macro > Execute Macro …, click the little “+” sign and add a new macro with the name “Find Duplicates”. Insert the following code into the source area of the macro editor.
// Flag duplicate rows
for( r = ROWMIN+1; r<=ROWMAX; ++r) {
var cell = getString(r,COLMIN);
var prevCell = getString(r-1,COLMIN);
if( cell == prevCell ) {
flagRow(r);
}
}
This code loops over the second selected row (ROWMIN+1
) to the last row (ROWMAX
). In cell
we store the content of the recent cell, while in prevCell
there’s the content of the cell above it. If cell
equals prevCell
we’ve found a duplicate, so we flag that row using flagRow(r)
.
Just click execute and the macros flags all rows containing duplicates:
To finally delete the duplicates, click Data > Delete Flagged Row(s) ….
This tutorial should give you some ideas on how to implement your data cleansing tasks utilizing the power of the builtin Javascript language. You can learn more about Tablecruncher’s Javascript macros in our documentation.
If you’ve questions regarding this tutorial or if you have ideas for further tutorials, don’t hesitate to contact me by e-mail: info@tablecruncher.com