Hi Guys,
I have a very large XML feed (2.7 MB) which crashes the server at the time of parsing. Now to reduce the load on the server I have a cron job running every 5 min.'s. This job will get the file from the feed host and keep it in the local machine.
This does not solve the problem as the file still gets loaded in the server. The file looks something like this:
<?xml version="1.0" standalone="no"?>
<IRXML CorpMasterID="">
<NewsReleases PubDate="20081104" PubTime="16:48:03">
<NewsCategory Category="">
<NewsRelease ReleaseID="" DLU="20081104 16:47:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="16:33:00">11/4/2008 4:33:00 PM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>
<NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>
I want to write a shell script which will extract only the part starting from
<NewsRelease> till </NewsRelease>
Something like:
<NewsRelease ReleaseID="" DLU="20081104 09:19:00" ArchiveStatus="Current"
RNSSource="">
<Title></Title>
<ExternalURL/>
<Date Date="20081104" Time="09:01:00">11/4/2008 9:01:00 AM</Date>
<ContentNetworkingLinks/>
<Categories>
<Category></Category>
</Categories>
</NewsRelease>
Also there is one more problem, in unix when the file is downloaded there are no return carriage, so the complete file appears to be in one line
.
Any help would be appreciated. Thanks,
Shridhar