Re: [R] how to skip last lines while reading the data in R

From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
Date: Mon, 28 Jan 2008 15:22:13 +0000 (GMT)

On Mon, 28 Jan 2008, mrafi wrote:

> but then the number of levels would reamain the same...!!

Please explain: the levels of factors are taken from the data which is actually read.

>
>
> Prof Brian Ripley wrote:
>>
>> On Mon, 28 Jan 2008, Barry Rowlingson wrote:
>>
>>> Henrique Dallazuanna wrote:
>>>> Perhaps:
>>>>
>>>> data <-
>>>> read.table(textConnection(rev(rev(readLines('data.txt'))[-(1:2)])))
>>>>
>>>
>>> Euurgh! Am I the only one whose sense of aesthetics is enraged by
>>> this? To get rid of the last two items you reverse the vector, remove
>>> the first two items, then reverse the vector again?
>>>
>>> One liners are fine for R Golf games, but in the real world, I'd get
>>> the length of the vector and cut directly off the end. Consider these:
>>>
>>> # reverse/trim/reverse:
>>> rev1 <- function(x,n=100,m=5){
>>> for(i in 1:n){
>>> y=rev(rev(x)[-(1:m)])
>>> }
>>> return(y)
>>> }
>>>
>>> # get length, trim
>>> rev2 <- function(x,n=100,m=5){
>>> for(i in 1:n){
>>> y=x[1:(length(x)-m)]
>>> }
>>> return(y)
>>> }
>>>
>>> > system.time(rev1(1:1000,10000,5))
>>> [1] 1.864 0.008 2.044 0.000 0.000
>>> > system.time(rev2(1:1000,10000,5))
>>> [1] 0.384 0.008 0.421 0.000 0.000
>>>
>>>
>>> Result: faster, more directly readable code.
>>
>> And if you know the file size, just use
>>
>> read.table('data.txt', nrows=<#file_rows>-2)
>>
>> (and wc -l will tell you the number of rows more efficiently that using a
>> text connection: if you must use a temporary home use file(), no
>> arguments, as that is much more efficient).
>>
>> --
>> Brian D. Ripley, ripley_at_stats.ox.ac.uk
>> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
>> University of Oxford, Tel: +44 1865 272861 (self)
>> 1 South Parks Road, +44 1865 272866 (PA)
>> Oxford OX1 3TG, UK Fax: +44 1865 272595
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/how-to-skip-last-lines-while-reading-the-data-in-R-tp15132030p15136013.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 28 Jan 2008 - 15:42:39 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 28 Jan 2008 - 16:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive