Remove html tags with bash

05-22-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

Remove html tags with bash

Hello,

is there a way to go through a file and remove certain html tags with bash? If it needs sed or awk, that'll do too.

The reason why I want this is, because I have a monitor script which generates a logfile in HTML and every time it generates a logfile, the tags are reproduced. The tags I want removed are </body> and </html> and are the last two lines in the HTML file.

I found similar topics, but none of them do what I need.

https://www.unix.com/shell-programmin...-line-sed.html

Thanks in advance for the help.

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

05-22-2008

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Try this:

Code:

awk '/<\/body>/ || /<\/html>/{next}1' file

Regards

This User Gave Thanks to Franklin52 For This Post:

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

05-22-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

It kinda works, but somehow I have to forward the output to a new file.

Code:

awk '/<\/body>/ || /<\/html>/{next}1' file.html > file2.html

is there a way to make it return the output to the original file? (file.html)

When I use:

Code:

awk '/<\/body>/ || /<\/html>/{next}1' file.html > file.html

I get a blank file.

All the code before the </body> and </html> tags should remain in the file.

Thanks

This User Gave Thanks to dejavu88 For This Post:

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

05-22-2008

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

You can't redirect the output to the inputfile. Redirect the output to a temporary file and move it to the original file, something like this:

Code:

awk '/<\/body>/ || /<\/html>/{next}1' file.html > file1.html

mv file1.html file.html

Regards

This User Gave Thanks to Franklin52 For This Post:

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

05-22-2008

Registered User

18, 1

Join Date: May 2008

Last Activity: 25 June 2008, 5:36 PM EDT

Posts: 18

Thanks Given: 0

Thanked 1 Time in 1 Post

Just figured it out some minutes ago, same way like you wrote the code, before you replied. Thanks for all the help

dejavu88

View Public Profile for dejavu88

Find all posts by dejavu88

Shell Programming and Scripting

Remove html tags with bash

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to remove multiline HTML tags from a file?

Discussion started by: threesixtyfive

2. Shell Programming and Scripting

How to remove the values inside the html tags?

Discussion started by: KCApple

3. Shell Programming and Scripting

Removing all except couple of html tags from html file

Discussion started by: juubuntu

4. Shell Programming and Scripting

Remove html tags with particular string inside the tags

Discussion started by: georgi58

5. Shell Programming and Scripting

Parsing HTML, get text between 2 HTML tags

Discussion started by: Mysthik

6. Shell Programming and Scripting

BASH parsing for html tags

Discussion started by: doomsayer16

7. Shell Programming and Scripting

remove html tags,consecutive duplicate lines

Discussion started by: clicstic

8. Shell Programming and Scripting

HTML code remove

Discussion started by: nrbhole

9. Shell Programming and Scripting

How to use sed to remove html tags including text between them

Discussion started by: alphagon

10. Linux

How to remove only html tags inside a file?

Discussion started by: btech_raju