Re: [R] reading in a subset of a large data set

From: jim holtman <jholtman_at_gmail.com>
Date: Fri, 11 Jul 2008 12:58:06 -0400

If the data you want is contiguous, then just 'skip' the number of records and then read the number you want.

If you want to select a random sample, then checkout http://article.gmane.org/gmane.comp.lang.r.general/78318/match=random+read

In your case where you want to conditionally read based on values, then you may have to read in a subset, select the records you want and then continue reading the file. At then end, you can reconstruct the data into a single dataframe.`

On Fri, Jul 11, 2008 at 12:25 PM, Stacey Burrows <stacey.burrows_at_yahoo.ca> wrote:
> I have a huge dataset for which I only want to read in a subset of it. Is it possible to use read.table to read in only a subset of the data? For example, something like read.table('~/data.txt', subset = chromosome=='1' )
>
> If not, then why not? This seems to be a feature available in all other statistical software.
>
> Thanks,
> Stacey
>
>
>
> __________________________________________________________________
> [[elided Yahoo spam]]
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 11 Jul 2008 - 17:43:09 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 11 Jul 2008 - 18:31:16 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive