The size of one line i.e. 40556271 1319211119.897235 0.0047939 is roughly 44 bytes. With that assumption the following command will split the file into smaller chunks of 1000 lines each
Code:
split -b 44000 infile
I have created a file with similar entries. The total no of lines in the file is 5531904. Total size around 233MB
NR % 1000 would split every smaller file to 1000 lines, if I am not mistaken.
In my case I am interested in column 1 values of range 1000 (say 1319211119 to 1319212119), and this does not translate into a 1000 lines every time.
a="1319130869"
count="1"
for i in $(seq 1 as_many_times_as_i_need)
do
awk -vmyvarA=$a -vmyvarB=$b '($1 >= myvarA && $1 <= myvarB) {print $0 }'in.txt > out$count.txt
a=$b
b=$(($a+1000))
count=`expr $count + 1`
done
As I was saying, when I run this for a file as huge as what I have, to search from the start every time would be a waste. and since my 'a' 'b' and column 1 values increase linearly, I thought it would be quicker if I saved the last matching NR (for previous 'a' and 'b' range), and after I increment 'a' and 'b' values, it would be faster if i pick up from where I left off.
---------- Post updated at 08:12 AM ---------- Previous update was at 08:01 AM ----------
Quote:
Originally Posted by ahamed101
Let us start once more.
How do you want to split the files? Yes, we know there is a number in the first column. What is next?
--ahamed
I know what the first value is, so from the first value('a') to a first value+1000('b') ( not in number of lines, but in actual value) i write it to a new file.
I want to do this repetitively (in some cases upto 10000 times) with different a and b values. At all times after an iteration 'a' becomes 'b' and 'b' is previous value+1000.
This is the main logic, and it works (previous code).
Now for the successive iterations I want to do something like this.
take a variable c. c will take NR values. Once I reach the last row with value 'b' I will store that line number to 'c', increment 'a' and 'b' values and then resume the search from line number 'c', so that I dont have to begin from the start again.
Please bear with me, i hope atleast now the working logic is clear to understand..sorry for the trouble guys
Thanks a million ahamed!
cant thank you enough. I can see it works as i want from the output..now am trying to understand it, a bit too advanced for me. Can you tell me what p=1 in the else routine and p=0 in the first loop does.
The following is a multi-line shell command example:
$cargo build
Compiling prawn v0.1.0 (/Users/ag/rust/prawn)
error: failed to resolve: could not find `setup_panix` in `human_panic`
--> src/main.rs:14:22
|
14 | human_panic::setup_panix!();
| ... (2 Replies)
All, I appreciate any help you can offer here as this is well beyond my grasp of awk/sed...
I have an input file similar to:
&LOG
&LOG Part: "@DB/TC10000021855/--F"
&LOG
&LOG
&LOG Part: "@DB/TC10000021852/--F"
&LOG Cloning_Action: RETAIN
&LOG Part: "@DB/TCCP000010713/--A"
&LOG
&LOG... (5 Replies)
I need to search the file using strings "Request Type" , " Request Method" , "Response Type" and by using result set find the xml tags and convert into a single line?. below are the scenarios.
Cat test
Nov 10, 2012 5:17:53 AM
INFO: Request Type
Line 1.... (5 Replies)
Hi,
i have a file say file1 having following data
/abc/def:ghi/jkl/ some other text
Now i want to extract only ghi/jkl/using sed, can some one please help me.
Thanks
Sarbjit (2 Replies)
Hi,
I have a written a shell script to get the previous line based on the pattern.
For example if a file has below lines:
----------------------------------------------
#UNBLOCK_As _per
#As per
205.162.42.92
#BLOCK_As_per
#-----------------------
#input checks
abc.com... (5 Replies)
I'll try explain this as best I can. Let me know if it is not clear.
I have large text files that contain data as such:
143593502 09-08-20 09:02:13 xxxxxxxxxxx xxxxxxxxxxx 09-08-20 09:02:11 N line 1 test
line 2 test
line 3 test
143593503 09-08-20 09:02:13... (3 Replies)
Dear All,
I have a file with the syntax below (composed of several <log ..... </log> stanzas)
I need to search this file for a number e.g. 2348022225919, and if it is found in a stanza, copy the whole stanza/section (<log .... </log>) to another output file.
The numbers to search for are... (0 Replies)
Hello,
I am hoping someone can provide some guidance on using context based search and replace to search for a pattern and then do a search and replace in the line that follows it. For example, I have a file that looks like this:
<bold>bold text
</italic>
somecontent
morecontent... (3 Replies)
I have a file (status.file) of the form:
valueA 3450
valueB -20
valueC -340
valueD 48
I am tailing a data.file, and need to search and modify a value
in status.file...the tail is:
tail -f data.file | awk '{ print $3, ($NF - $(NF-1)) }'
which will produce lines that look like this:
... (3 Replies)