Replacing lines matching a multi-line pattern (sed/perl/awk)

02-24-2014

Registered User

4, 0

Join Date: Feb 2014

Last Activity: 12 November 2015, 9:17 AM EST

Posts: 4

Thanks Given: 7

Thanked 0 Times in 0 Posts

Replacing lines matching a multi-line pattern (sed/perl/awk)

Dear Unix Forums,

I am hoping you can help me with a pattern matching problem.

What am I trying to do?
I want to replace multiple lines of a text file (that match a multi-line pattern) with a single line of text. These patterns can span several lines and do not always have the same number of line breaks in between.

Example input file

Code:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

One of the simpler patterns may look like this (pseudocode; meaning that only parts of the line are relevant and that the number of line breaks can vary):

Code:

^ * @CAL RtlInitAnsiString @PA1 0x0012f740 * ((1 to 3 line breaks)) * @CAL memmove @PA1 0x0012f740 * $

The matching text should be replaced with a string (e.g. "@MATCH").
In the end, the file should look like this:

Desired output

Code:

@MATCH //replaced lines 1 and 2
@MATCH //replaced lines 3 to 5, including the irrelevant BlaCall
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

Current solution for adjacent lines only:

Code:

sed 'N;s/\@LIB.*\@CAL RtlInitAnsiString .*\@PA1 0x0012f741.*\n.*\@CAL memmove \@PA1 0x0012f740.*/\@MATCH/' inputfile

Unfortunately, this does not seem to work for several line breaks (ie. when there is "gap" between the lines containing RtlInitAnsiString and memmove).

Stuff I tried that didn't match anything:

Code:

perl -pne 'BEGIN {undef $/} s/\@LIB.*\@CAL RtlInitAnsiString \@PA1 0x0012f740.*\@CAL memmove \@PA1 0x0012f740.*/\@MATCH/' inputfile
perl -0pe 's/^\@LIB.*\@CAL RtlInitAnsiString .*\@PA1 0x0012f740.*\@CAL memmove \@PA1 0x0012f740.*$/\@MATCH/gm' inputfile
perl -0pe 's/^\@LIB*\@CAL RtlInitAnsiString *\@PA1 0x0012f740*.*\@CAL memmove \@PA1 0x0012f740*$/\@MATCH/s' inputfile

Any ideas how to get this kind of multi-line pattern matching to work? I'd prefer sed or perl, but awk is fine too

Thanks in advance!

thefang

View Public Profile for thefang

Find all posts by thefang

02-24-2014

Registered User

779, 112

Join Date: Feb 2006

Last Activity: 18 May 2018, 1:51 PM EDT

Location: Almer�a, Spain

Posts: 779

Thanks Given: 24

Thanked 112 Times in 106 Posts

From python shell:

Code:

>>> import re
>>> text = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... '''
>>> 
>>> pattern = re.compile('^.*?@CAL RtlInitAnsiString @PA1 0x0012f740.*?@CAL memmove @PA1 0x0012f740.*?$',re.MULTILINE|re.DOTALL)
>>> 
>>> print re.sub(pattern,'@MATCH',text)                                                                                                                         
@MATCH
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

I am not working in perl stuff rigth now but i'm sure that the translation should be pretty straightforward

Last edited by Klashxx; 02-24-2014 at 01:10 PM..

These 2 Users Gave Thanks to Klashxx For This Post:

Klashxx

View Public Profile for Klashxx

Find all posts by Klashxx

02-24-2014

Registered User

12,315, 4,560

Join Date: Jul 2012

Last Activity: 22 November 2019, 4:29 PM EST

Location: San Jose, CA, USA

Posts: 12,315

Thanks Given: 952

Thanked 4,560 Times in 3,818 Posts

What should the output be for the input:

Code:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

(which is your sample input file with line 2 removed)? Is line 1 kept in the output as is, or should lines 1 through 4 be changed to a single:

Code:

@MATCH

output line?

What should happen if there are more than 3 newlines between @CAL RtlAnsiStringToUnicodeString and @CAL memmove if there are no other occurrences of @CAL RtlAnsiStringToUnicodeString between them?

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

02-24-2014

Registered User

1,910, 488

Join Date: Sep 2008

Last Activity: 22 December 2019, 2:31 AM EST

Location: San Jose, CA

Posts: 1,910

Thanks Given: 54

Thanked 488 Times in 481 Posts

Need response to Don's questions as it looks like there can be a lot a varying cases.

Requirement Analysis not complete

With the current data you have provided, try this

Code:

awk '
  /@CAL RtlInitAnsiString @PA1 0x0012f740/{s=1}
  s && /@CAL memmove @PA1 0x0012f740/{ print "@MATCH"; s=0; next }
  !s' infile

--ahamed

This User Gave Thanks to ahamed101 For This Post:

ahamed101

View Public Profile for ahamed101

Find all posts by ahamed101

02-25-2014

Registered User

4, 0

Join Date: Feb 2014

Last Activity: 12 November 2015, 9:17 AM EST

Posts: 4

Thanks Given: 7

Thanked 0 Times in 0 Posts

Thank you all for your replies!

In response to Don's questions: This input file...

Code:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

...should turn into this (minimal "destruction"):

Code:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS

If there are more than 3 newlines in between the first and second part of the pattern, nothing should happen. The replacement should only be executed as long as the "maximum gap" is not exceeded (in this case 3). So if the input file would look like this:

Code:

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc

...the script should NOT replace the large block of "RtlAnsiStringToUnicodeString".

Thanks again!

thefang

View Public Profile for thefang

Find all posts by thefang

02-25-2014

Registered User

779, 112

Join Date: Feb 2006

Last Activity: 18 May 2018, 1:51 PM EDT

Location: Almer�a, Spain

Posts: 779

Thanks Given: 24

Thanked 112 Times in 106 Posts

Interesting regex (python shell again):

Code:

>>> text = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... '''
>>>
>>> pattern = re.compile(r'''
... ^[^\n]+@CAL\sRtlInitAnsiString\s@PA1\s0x0012f740[^\n]+\n
... (?:(?!^[^\n]+RtlInitAnsiString)[^\n]+\n){1,3}
... ^[^\n]+@CAL\smemmove\s@PA1\s0x0012f740[^\n]+
... ''', re.X|re.M|re.S)
>>> 
>>> print re.sub(pattern, '@MATCH', text)
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@MATCH
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
>>> 
>>> text2 = '''
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL BlaCall @PA1 0x0012f741 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
... @LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
... @LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc
... '''
>>> print re.sub(pattern, '@MATCH', text2)

@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc 
@MATCH
@LIB ADVAPI32.dll @CAL RtlInitAnsiString @PA1 0x0012f740 @PA2 "CriticalSectionTimeout" @RET0 
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL RtlAnsiStringToUnicodeString @PA1 0x7ffdfbf8 @PA2 0x0012f740 @PA3 FALSE @RET STATUS_SUCCESS
@LIB ADVAPI32.dll @CAL memmove @PA1 0x0012f740 @PA2 0x0012f68c @PA3 4 @RET 0x0012f8bc

Last edited by Klashxx; 02-25-2014 at 09:20 AM.. Reason: Avoid capturing last \n

This User Gave Thanks to Klashxx For This Post:

Klashxx

View Public Profile for Klashxx

Find all posts by Klashxx

02-25-2014

Registered User

4, 0

Join Date: Feb 2014

Last Activity: 12 November 2015, 9:17 AM EST

Posts: 4

Thanks Given: 7

Thanked 0 Times in 0 Posts

Thanks Klashxx,

your python script works!
I changed the pattern to...

Code:

>>> pattern = re.compile(r'''
... ^[^\n]+@CAL\sRtlInitAnsiString\s@PA1\s0x0012f740[^\n]+\n
... (?:(?!^[^\n]+RtlInitAnsiString)[^\n]+\n){0,3}
... ^[^\n]+@CAL\smemmove\s@PA1\s0x0012f740[^\n]+
... ''', re.X|re.M|re.S)

...so it would also match adjacent lines. Now I have to figure out how to turn this into a "one-liner" (I currently use "eval" to loop through a file containing pattern matching commands (mostly "sed")) and what each part of the expression does (up until now, my scripting endeavors were limited to rather basic stuff

).

Does anyone know how python compares to other approaches (awk, etc.) in terms of performance? The files I plan to analyze have upwards of 50,000 lines each and are matched against hundreds of single-line and multi-line patterns.

Cheers

Last edited by thefang; 02-25-2014 at 10:37 AM.. Reason: python<>perl mixup

thefang

View Public Profile for thefang

Find all posts by thefang

Shell Programming and Scripting

Replacing lines matching a multi-line pattern (sed/perl/awk)

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk with sed to combine lines and remove specific odd # pattern from line

Discussion started by: cmccabe

2. Shell Programming and Scripting

Sed: printing lines AFTER pattern matching EXCLUDING the line containing the pattern

Discussion started by: essem

3. Shell Programming and Scripting

sed multiple multi line blocks of text containing pattern

Discussion started by: andyatit

4. Shell Programming and Scripting

Sed/awk/perl command to replace pattern in multiple lines

Discussion started by: dani777

5. Shell Programming and Scripting

sed to replace a line with multi lines from a var

Discussion started by: bblondin

6. Shell Programming and Scripting

Summing over specific lines and replacing the lines with the sum using sed, awk

Discussion started by: kaaliakahn

7. Shell Programming and Scripting

sed or awk delete character in the lines before and after the matching line

Discussion started by: KC_Rules

8. Shell Programming and Scripting

replacing multi lines with 1 line

Discussion started by: Griffs_Revenge

9. Shell Programming and Scripting

How to use sed to modify a line above or below matching pattern?

Discussion started by: sprinner

10. Shell Programming and Scripting

AWK - Pattern Matching & Replacing - Performance

Discussion started by: srivijay81