Interpolation if there is no exact match for value
Dear all, could you help me with following question. There are two datasets (below). I need to find match between BP values from data1 and data2, and add corresponding CM value from data2 into data1. if there is not exact match, the corresponding CM value should be calculated using interpolation. More detailed i put steps below (just if i were a code writer i would take those steps, unfortunately i am not).
(data1)
(data2)
1) take BP value from data1, find exact match BP in data2.
if exact match is found, then embed corresponding CM value from data2 into data1 (as a third column).
2) if there is no exact match for BP value from data1 in data2, then:
a) find two nearest BP values from data2 (i.e.: BP value data1>BP value data2 and BP value data1< BP value data2),
b) then go to CM values that correspond to those two nearest BP values (data2) and using linear interpolation calculate CM value that will correspond to the BP number from data1
and
c) embed this interpolated/calculated CM value into data1 (as a third column).
I will appreciate ur suggestions!
Thank u a lot in advance!
Would this help?:-
As to the interpolation, I'm a bit stuck. Are both files sorted numerically according to the first column before we start? We might be able to work on that.
Will there be a BP in data2 for every BP in data1?
hi, Robin,
1) ''Are both files sorted numerically according to the first column before we start?''
BP in both datasets are in an ascending order (from smaller to higher numbers).
2) ''Will there be a BP in data2 for every BP in data1?'':
No. Some BP from data1 have exact match in data2. So in this case it is easier (I just copy corresponding CM into data1).
But there are BP values in data1 that do not have exact matches in data2. In this case, the common practice (as I was told) is to use interpolation to calculate CM value (using two nearest CM values that correspond to two BP values from data2).
e.g.: we have BP value from data1 752721 , and it has no exact match in data2.
But in data2 there are two nearest BP values (and corresponding them CM values):
So, I can use those two nearest BP values (and CM) and linear interpolation formula to calculate CM value that will correspond to BP (752721) from data1. And i should get 2,013145. And paste this value into data1 in same row as 752721.
So, as I understand, this calculation should be done for all BP values from data1 that do not have exact match in data2.
Hope, my details do not increase mess. Thank you a lot for ur help!
Let us know expected output for given input, show mathematically, for given sample I think you can't generate even a single BP value using linear interpolation , we can't see even a single upper bp value in data2 for any one of bp values in data1
yes, sorry, i gave only 'heads' of real input as an example, because both datasets are very long. E.g.
data1 has 61951 lines
data2 has 256895 lines.
If to extract some arbitrary lines, then it may look like:
data1
data2 ---------- Post updated at 04:52 PM ---------- Previous update was at 04:48 PM ----------
to Scrutinizer:
no, this is my job, i m biologist, and have to create a reference genetic map to run set of analyses with my snp data.
my previous posts are also related to my work. main problem, that i am biologist (genetics), and I'm only learning to understand codes (can't yet even say - to write).
I appreciate your help, if I have done sth wrong, please, let me know. it is not intentionally.
Okay, let's try this:-
How does that do? It doesn't work very well on the sample input (as already mentioned by others. How does it work with the real thing? It may well be a bit slow, but it's explicit to follow the logic. An awk may be able to do things faster if you can code one.
I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output.
$1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Hi All,
My Input file contains a 1000’s of lines in which I have to replace a
a string to the other. Here the problem is, I have the lines in my Input as below.
Cable Yes && !Pay TV && !ADS
\noUE \Label="Cable Yes && !Pay TV && !ADS"
I want to replace exactly the string Cable Yes &&... (37 Replies)
Hi guys, I am using Centos 6.3. Actually I posted similar question but I still have some minor problem need be fixed. I have two files,
file1:target: gi|57529786|ref|NM_001006513.1| mfe: -31.4 kcal/mol p-value: 0.006985
target: gi|403048743|ref|NM_001271159.1| mfe: -29.6 kcal/mol p-value:... (11 Replies)
I would like replace all the rows in a file if a row has an exact match to number say 21 in a tab delimited file. I want to delete the row only if it has 21 any of the rows but it should not delecte the row that has 542178 or 563421.
I tried this
sed '/\<21\>/d' ./inputfile > output.txt
... (7 Replies)
file
11 2
12 6
13 7
114 6
011 7
if I'm searching for 11, output needed is
output:
11 2
011 7
Code: awk '$1 ~ /^11$/' file
I used the above to match exact, but it avoiding "011 7" line too, how to resolve this? (6 Replies)
Hi friends,
i am using the following grep command for exact word match:
>echo "sachin#tendulkar" | grep -iw "sachin"
output: sachin#tendulkar
as we can see in the above example that its throwinng the exact match(which is not the case as the keyword is sachin and string is... (6 Replies)
Hi
This time I'm trying to grep for an exact match
e.g
cat.dog.horse.cow.bird.pig
horse.dog.pig
pig.cat.horse.dog
horse
dog
dog
pig.dog
pig.dog.bird
how do I grep for dog only so that a wc -l would result 2 in above case.
Thanks in advance
---------- Post updated at 06:33 AM... (4 Replies)
Hi,
I have a file like follows
.
.
.
White.Jack.is.going.home
Black.Jack.is.going.home
Red.Jack.is.going.home
Jack.is.going.home
.
.
.
when I make:
cat <file> | grep -w "Jack.is.going.home"
it gives:
White.Jack.is.going.home
Black.Jack.is.going.home
Red.Jack.is.going.home... (4 Replies)
I am trying to match a pattern exactly in a shell script. I have tried two methods
awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath
perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath
Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
How to emulate grep -o option in perl.
I mean to print not all line, only the exact match.
echo "2A2 BB" | perl -ne 'print if /2A2/'
2A2 BB
I want to print only 2A2. (2 Replies)