awk function to remove lines that contain contents of another file
Hi,
I'd be grateful for your help with the following. I have a file (file.txt) with 10 columns and about half a million lines, which in simplified form looks like this:
I'd like to remove all the lines where, say, "b" and "d" appear in the first (ID) column. The output that I want is:
In reality, there are about 100,000 lines that I want to remove.
I therefore have a reference file (referencefile.txt) that lists all the IDs that I want removed from file.txt. In this example, the reference file would simply contain "b" and "d" on successive lines.
I am using grep at the moment, and while it works, it is proving painfully slow.
Is there a way of using awk (or anything else for that matter) to speed up the process?
This requires a lot of memory depending on what you have in reference.txt
Simple awk which can be rewritten as something difficult to read for non-awkers.
We have posters who do that, which is okay as long as you can get what they show you.
This User Gave Thanks to jim mcnamara For This Post:
I do not understand the ! and ++ in {! arr[$0]++; next}
Replace by {arr[$1]; next}. Not storing a value in the array saves sone memory! $1 strips spaces, can make sense if there is invisible trailing space (and embedded spaces wouldn't work anyway when later comparing with $1). The next jumps to the next cycle, no need for checking the FILENAME again. {print $0} is a default action if there is just a condition.
hello Friend,
In hostgroup file, i have define lots of hostgroups. I need to remove few of them without manually editing file. Need script or syntax.
I want to search particular on hostgroup_members and delete hostgoup defination of it.
for example.
define hostgroup{
hostgroup_name... (8 Replies)
I have been searching and trying to come up with an awk that will perform the following on a
converted text file (original is a pdf).
1. Since the first two lines are (begin with) text they are removed
2. if $1 is a number then all text is merged (combined) into one line until the next... (3 Replies)
I am trying to remove each line in which $2 is FP or RFP. I believe the below will remove one instance but not both. Thank you :).
file
12
123 FP
11
10 RFP
awk
awk -F'\t' '
$2 != "FP"' file
desired output
12
11 (6 Replies)
I am trying to remove lines in the target.txt file if $5 before the - in that file matches sorted_list. I have tried grep and awk. Thank you :).
grep
grep -v -F -f targets.bed sort_list
grep -vFf sort_list targets
awk
awk -F, '
> FILENAME == ARGV {to_remove=1; next}
> ! ($5 in... (2 Replies)
Sorry for the weird title but i have the following problem.
We have several files which have between 10000 and about 500000 lines in them. From these files we want to remove lines which contain a pattern which is located in another file (around 20000 lines, all EAN codes). We also want to get... (28 Replies)
I have a function which does awk proceessing
sub mergeDescription {
system (q@awk -F'~' '
NR == FNR {
A = $1
B = $2
C = $0
next
}
{
n = split ( C, V, "~" )
if... (3 Replies)
So, this issue is driving me nuts! I was hoping to get a lending hand here...
I have 2 files:
file1.txt contains:
this is example1
this is example2
this is example3
this is example4
this is example5
file2.txt contains:
example3
example5
Basically, I need a script or command to... (4 Replies)
Hi,
I have two files, in which the second file has exactly the same contents of the first file with some additional records. Now, if I want to remove those matching lines from file2 and print only the extra contents which the first file does not have, I could use the below unsophisticated... (3 Replies)
Hi
Let say a flat file contains 1000 lines. The cursor is at the 530 line number.
Now I like to delete all the line at one ahot. how it can be done? (2 Replies)