[R] Help with isolating and comparing data from two files.

From: ajn21 <ajn21_at_case.edu>
Date: Sun, 22 May 2011 21:00:10 -0700 (PDT)


I was hoping that someone would be able to help me or at least point me in the right direction regarding a problem I am having. I am a new R user, and I've been trying to read tutorials but they haven't been much help to me so far.

The problem is relatively simple as I've already created working solutions in Java and Perl, but I need a solution in R as well.

I have two text files, say pos.txt and reg.txt. In pos.txt, the data is listed for example:

c22 1445 - CG 1 4
c22 1542 + CG 2 3
c22 1678 + CG 13 15

etc. for thousands of lines. The most important column is column 2, which lists "position" (e.g. 1445, 1542, 1678). In reg.txt, data is listed as:

c22 1440 1500 cpg: 44 56 ......
c22 1520 1700 cpg: 56 87 ......
c22 1800 1900 cpg: 58 90 ......


where the values in column 2 is the "start" position and values in column 3 are the "end" position. There are 10 columns total but I just listed the first few. Also, the text files are different lengths.

Essentially, my problem is trying to take the position listed in column 2 of pos.txt and try to find the region (based on start and end positions) listed in reg.txt. Then I need to print:

c22 "start" "end" "position" + 1 5

where the last 3 columns are from pos.txt as well (i.e. all of the lines don't end in + 1 5, but rather the values for the columns in pos.txt). Also, the position needs to be within the start and end position.

So far I've been able to use read.table to create a data frame for each text file, and I've also named each column (e.g. reg.data$end) and I can output each column individually. However, the problem I keep facing is how to compare the numbers for "position" in pos.txt to the numbers for "start" and "end" in reg.txt. I tried to use:

if ((pos >= start) | (pos <= end))..

but an error comes up that says the files aren't the same length.

In Java and Perl I used nested loops to cycle through each element in one file, and compare it to every element in the other file, and then printed to a new text file. As such, I was trying to learn a bit more about arrays in R, but if you know of a better way in R to do this then please let me know.

Any help is greatly appreciated.

Thank you,

View this message in context: http://r.789695.n4.nabble.com/Help-with-isolating-and-comparing-data-from-two-files-tp3543170p3543170.html
Sent from the R help mailing list archive at Nabble.com.

R-help_at_r-project.org mailing list
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 23 May 2011 - 04:37:57 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 23 May 2011 - 12:30:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive