Topic Options
#57778 - 12/01/20 10:26 AM remove duplicate record in CSV file
dstauder Offline
OL User

Registered: 06/28/02
Posts: 78
Loc: St. Louis, Missouri USA

I am trying to remove a duplicate record in a CSV file before I process the rest of the file. Is there a way to accomplish this within a workflow?



#57785 - 12/03/20 02:30 PM Re: remove duplicate record in CSV file [Re: dstauder]
gleesons Offline
OL Newbie

Registered: 06/14/16
Posts: 16
Hi dstrauder,

There are no native plugins that can do this, but it is definitely possible with creative use of the plugins available in Workflow. There is quiet a large range in the complexity of the possible solutions depending on what information you have to work with.

It's important to note that CSV files can be manipulated with all of the plugins that are meant for text/line-printer data, as CSV is basically just text data with an enforced structure. When it comes to a CSV "record", it's also just a line in a text file.

At a high level, what you're looking to do is:
a) Identify the duplicate line in your file (which may be fixed and known in advance, or may require a more sophisticated logic to find dynamically)
b) Recreate the original file without that line, or if the duplicate line happens to be at the start or end of the file you can actually just remove it quite simply with the "add/remove text" plugin.

There are many approaches to "recreating the file". One would be to loop through your data line-by-line with the Emulated Data Splitter set to split on every page(which means on every line with CSV emulation) - recreating the file as you go with a Send to Folder task concatenating to a fixed file name. When you get to the duplicate line in that looping you will skip the step to concatenate that piece to the file being built. This decision to skip or append is controlled with a standard text-condition: one path leading to a Send to Folder as mentioned, the other leading to most naturally a "Delete" task.

I hope this helps!

Edited by gleesons (12/03/20 02:32 PM)