Sponsored Content
Full Discussion: extract data from 2 files
Top Forums Shell Programming and Scripting extract data from 2 files Post 302598950 by chihung on Wednesday 15th of February 2012 08:12:12 PM
Old 02-15-2012
Try this:
Please let us know the performance like
Code:
#! /bin/sh

awk '
# only process file1 and store all the mappings
FNR==NR {
        # store mapping of column 2 in file1
        n=split($2,a,",")
        for(i=1;i<=n;++i) {
                map2[$1,a[i]]=1
        }

        # store mapping of column 3 in file1
        n=split($3,a,",")
        for(i=1;i<=n;++i) {
                map3[$1,a[i]]=1
        }

        # store original file
        file1[$1]=$0

        next
}

# start to process file2
# if column2 in file2 exists in file1, concatenate the index to string str2
# if column3 in file2 exists in file1, concatenate the index to string str3
{
        if ( map2[$5,$2]==1 && match2[$5,$2]!=1 ) {
                str2[$5]=sprintf("%s,%s",str2[$5],$2)
                match2[$5,$2]=1
        }
        if ( map3[$5,$3]==1 && match3[$5,$3]!=1 ) {
                str3[$5]=sprintf("%s,%s",str3[$5],$3)
                match3[$5,$3]=1
        }
}
END {
        for ( i in file1 ) {
                print file1[i], substr(str2[i],2), substr(str3[i],2)
        }
}' file1 file2


Last edited by chihung; 02-15-2012 at 10:25 PM..
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Perl script for extract data from xml files

Hi All, Prepare a perl script for extracting data from xml file. The xml data look like as AC StartTime="1227858839" ID="88" ETime="1227858837" DSTFlag="false" Type="2" Duration="303" /> <AS StartTime="1227858849" SigPairs="119 40 98 15 100 32 128 18 131 23 70 39 123 20 120 27 100 17 136 12... (3 Replies)
Discussion started by: allways4u21
3 Replies

2. Shell Programming and Scripting

extract the relevant data files for a quarter

CTB_KT_OllyotLvos_20081204_164352_200811.txt CTB_KT_LN_utahfwd_20081204_164352_200811.txt CTB_KT_LN_utahfwd_Summ_20081204_164352_200811.txt CTB_KT_PML_astdt_prFr_20081204_210153_200811.txt CTB_KT_PML_astdt_prOt_20081204_210153_200811.txt CTB_KT_PML_astdt_Nopr_20081204_210153_200811.txt... (7 Replies)
Discussion started by: w020637
7 Replies

3. UNIX for Dummies Questions & Answers

AWK, extract data from multiple files

Hi, I'm using AWK to try to extract data from multiple files (*.txt). The script should look for a flag that occurs at a specific position in each file and it should return the data to the right of that flag. I should end up with one line for each file, each containing 3 columns:... (8 Replies)
Discussion started by: Liverpaul09
8 Replies

4. Shell Programming and Scripting

How to extract data from indexed files (ISAM files) maintained in an unix server.

Hi, Could someone please assist on a quick way of How to extract data from indexed files (ISAM files) maintained in an UNIX(AIX) server.The file data needs to be extracted in flat text file or CSV or excel format . Usually we have programs in microfocus COBOL to extract data, but would like... (2 Replies)
Discussion started by: devina
2 Replies

5. Shell Programming and Scripting

extract data with awk from html files

Hello everyone, I'm new to this forum and i am new as a shell scripter. my problem is to have html files in a directory and I would like to extract from these some data that lies between two different lines Here's my situation <td align="default"> oxidizability (mg / l): data_to_extract... (6 Replies)
Discussion started by: sbobotex
6 Replies

6. Shell Programming and Scripting

Extract data with awk and write to several files

Hi! I have one file with data that looks like this: 1 data data data data 2 data data data data 3 data data data data . . . 1 data data data data 2 data data data data 3 data data data data . . . I would like to have awk to write each block to a separate file, like this: 1... (3 Replies)
Discussion started by: LinWin
3 Replies

7. Shell Programming and Scripting

How to extract information from two files with data range

Hi, I want to make a query about extracting data from two files that both have data ranges. the data that i want to extract; when there is matching between file1 column 2 is equal to file2 column2 , and file1 column 3 and column 4 is within the range of file2 columns 3 and 4. I would like rows... (1 Reply)
Discussion started by: houkto
1 Replies

8. UNIX for Dummies Questions & Answers

Extract common data out of multiple files

I am trying to extract common list of Organisms from different files For example I took 3 files and showed expected result. In real I have more than 1000 files. I am aware about the useful use of awk and grep but unaware in depth so need guidance regarding it. I want to use awk/ grep/ cut/... (7 Replies)
Discussion started by: macmath
7 Replies

9. Shell Programming and Scripting

Extract data in tabular format from multiple files

Hi, I have directory with multiple files from which i need to extract portion of specif lines and insert it in a new file, the new file will contain a separate columns for each file data. Example: I need to extract Value_1 & Value_3 from all files and insert in output file as below: ... (2 Replies)
Discussion started by: belalr
2 Replies

10. Shell Programming and Scripting

Match and extract data using two files

Hello, Using the information in file 1, I would like to extract from file2 all rows which matchs in column 3. file 1 1233 1230 1231 1232 file2 65733.00 19775.00 1220 65733.00 19793.00 1220 65733.00 19801.00 1220 65733.00 19809.00 1231 65733.00 19817.00 ... (2 Replies)
Discussion started by: jiam912
2 Replies
JOIN(1) 						      General Commands Manual							   JOIN(1)

NAME
join - relational database operator SYNOPSIS
join [ options ] file1 file2 DESCRIPTION
Join forms, on the standard output, a join of the two relations specified by the lines of file1 and file2. If file1 is `-', the standard input is used. File1 and file2 must be sorted in increasing ASCII collating sequence on the fields on which they are to be joined, normally the first in each line. There is one line in the output for each pair of lines in file1 and file2 that have identical join fields. The output line normally con- sists of the common field, then the rest of the line from file1, then the rest of the line from file2. Fields are normally separated by blank, tab or newline. In this case, multiple separators count as one, and leading separators are dis- carded. These options are recognized: -an In addition to the normal output, produce a line for each unpairable line in file n, where n is 1 or 2. -e s Replace empty output fields by string s. -jn m Join on the mth field of file n. If n is missing, use the mth field in each file. -o list Each output line comprises the fields specifed in list, each element of which has the form n.m, where n is a file number and m is a field number. -tc Use character c as a separator (tab character). Every appearance of c in a line is significant. SEE ALSO
sort(1), comm(1), awk(1) BUGS
With default field separation, the collating sequence is that of sort -b; with -t, the sequence is that of a plain sort. The conventions of join, sort, comm, uniq, look and awk(1) are wildly incongruous. JOIN(1)
All times are GMT -4. The time now is 04:09 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy