Removing specific records from files when duplicate key

05-21-2014

Registered User

2, 0

Join Date: May 2014

Last Activity: 21 May 2014, 5:41 PM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

Removing specific records from files when duplicate key

Hello

I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right.

what I have is

Code:

900 - 1000 = 0
900 - 1000 =  2562
1000 - 1100 = 0
1000 - 1100 =  931
1100 - 1200 = 0
1100 - 1200 =  469
1200 - 1300 = 0
1300 - 1400 = 0
1300 - 1400 =  175
1400 - 1500 = 0
1400 - 1500 =  112

what I want is

Code:

900 - 1000 =  2562
1000 - 1100 =  931
1100 - 1200 =  469
1200 - 1300 = 0
1300 - 1400 =  175
1400 - 1500 =  112

Any help would be greatly appreciated

Last edited by Franklin52; 05-21-2014 at 10:42 AM.. Reason: Please use code tags

tinytimmay

View Public Profile for tinytimmay

Find all posts by tinytimmay

05-21-2014

Registered User

1,781, 705

Join Date: May 2008

Last Activity: 10 November 2021, 5:38 PM EST

Posts: 1,781

Thanks Given: 62

Thanked 705 Times in 653 Posts

Let's give it a try.

Code:

awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

These 2 Users Gave Thanks to Aia For This Post:

Aia

View Public Profile for Aia

Find all posts by Aia

05-21-2014

Registered User

2, 0

Join Date: May 2014

Last Activity: 21 May 2014, 5:41 PM EDT

Posts: 2

Thanks Given: 1

Thanked 0 Times in 0 Posts

1. thanks for the quick reply
2. your recommendation works with a larger amount of data, and now I have a large bunch of data that i need to parse - but I can handle that
3. I owe you a tasty beverage if you are in my neck of the woods

tinytimmay

View Public Profile for tinytimmay

Find all posts by tinytimmay

05-21-2014

Registered User

72, 12

Join Date: Oct 2009

Last Activity: 3 January 2019, 2:49 PM EST

Location: The matrix

Posts: 72

Thanks Given: 127

Thanked 12 Times in 10 Posts

Quote:

Originally Posted by Aia

Let's give it a try.

Code:

awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

It works great but I don't get it.

To me it looks like you save each record in an array, using $1 as index. That's ok, I understand. Then you decide to jump to the next record.. why?

And then it magically worked and it's all saved in the array and print it at the end.

I can't see the light in this one. Could you explain it a little bit?

Kibou

View Public Profile for Kibou

Find all posts by Kibou

05-21-2014

Moderator

12,296, 3,792

Join Date: Nov 2008

Last Activity: 1 January 2021, 1:47 AM EST

Location: Amsterdam

Posts: 12,296

Thanks Given: 679

Thanked 3,792 Times in 3,282 Posts

Quote:

Originally Posted by Aia

Let's give it a try.

Code:

awk '{a[$1]=$0; next}END{for (i in a) {print a[i]}}' filename | sort -n

Since it is the first three columns, technically that would need to be:

Code:

awk '{a[$1,$2,$3]=$0; next} .....

Last edited by Scrutinizer; 05-21-2014 at 10:17 PM..

Scrutinizer

View Public Profile for Scrutinizer

Find all posts by Scrutinizer

05-21-2014

Registered User

72, 12

Join Date: Oct 2009

Last Activity: 3 January 2019, 2:49 PM EST

Location: The matrix

Posts: 72

Thanks Given: 127

Thanked 12 Times in 10 Posts

Quote:

Originally Posted by Scrutinizer

Since it is the first three columns, technically that would need to be:

Code:

awk 'a[$1,$2,$3]=$0 .....

Now I understand.

Because there's always the first ocurrence that equals 0 which does not count, and it has the same index for the array, the second value overlaps the first, so it's always saved the second value of the same pattern, in case there's a second value with the same pattern.

This time the key was paying attention to the index and how awk saves in the array.

Thanks.

Kibou

View Public Profile for Kibou

Find all posts by Kibou

05-22-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

Quote:

Originally Posted by Kibou

There is no test for 0. It is not necessarily the second line with a given value for the 1st three fields that is saved in the array; it is the last line with a given value for the 1st three fields that is saved. If there is one line with 900, -, and 1000 as the 1st three fields on the line, respectively, a[$1, $2, $3]'s value (or in this case a["900", "-", "1000"]'s value) will be that entire line. If there is more one line with 900, -, and 1000 as the 1st three fields on the line, respectively, a[$1, $2, $3]'s value will be the last line starting with those three values.

When processing an array with:

Code:

for(i in a)

the elements are processed in a random order (not necessarily the order in which they were found in the input file). This is why aia used sort -n to print the output in the same order as the (sorted) input file.

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

Shell Programming and Scripting

Removing specific records from files when duplicate key

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Join and merge multiple files with duplicate key and fill void columns

Discussion started by: yjacknewton

2. Shell Programming and Scripting

Listing the file name and no of records in each files for the files created on a specific day

Discussion started by: Showdown

3. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

Discussion started by: cokedude

4. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Discussion started by: rajak.net

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Discussion started by: G.K.K

6. Linux

Need awk script for removing duplicate records

Discussion started by: Rastamed

7. Shell Programming and Scripting

Removing duplicate records from 2 files

Discussion started by: zooby

8. Linux

Need awk script for removing duplicate records

Discussion started by: nmumbarkar

9. Shell Programming and Scripting

How to delete duplicate records based on key

Discussion started by: sumitc

10. Shell Programming and Scripting

Removing duplicate files from list with different path

Discussion started by: vino