Re: [R] how to complete this task on data management

From: Petr Pikal <petr.pikal_at_precheza.cz>
Date: Wed 23 Aug 2006 - 23:03:57 EST

Hi

This is a little bit more precise. My sugeestion works with unordered data and finds row index for second item lower then a threshold.

which(diff(cumsum(diff(data<3.5)==1)<2)!=0)+2

However with ordered data you need to slightly modify it

which(diff(cumsum(diff(data<3.5)!=0)<2)!=0)+2

I bet there is some other solution

HTH
Petr

On 23 Aug 2006 at 19:23, zhijie zhang wrote:

Date sent:      	Wed, 23 Aug 2006 19:23:49 +0800
From:           	"zhijie zhang" <epistat@gmail.com>
To:             	"Petr Pikal" <petr.pikal@precheza.cz>
Subject:        	Re: [R] how to complete this task on data management


> *Dear friends,*
> * I'd like to explain it clearly*
> * x
> **1 1
> 2 2
> 3 3
> 4 4
> 5 5
> *6 1
> 7 2
> 8 3
> I want to retain the first part of the dataset(1,2,3,4,5) if the
> continuous data(1,2,3) in the latter part of dataset is less than 3.5,
> in fact ,i want to know the row index (it's 6 in this dataset)that is
> less than 3.5. In fact, my dataset is very large, so i should find the
> index automatically. My idea is: First:Find the continous data in the
> latter dataset,which is less than a certain value,here it's 3.5.
> X
> 6 1
> 7 2
> 8 3
>
> Second:Identify the index (here,it's 6), which corresponds to the
> first data in the latter dataset
> X
> *6* 1
> Finally,select the the first (index-1) number.(6-1=5)
> * x
> **1 1
> 2 2
> 3 3
> 4 4
> 5 5
> *
> Thanks very much.
>
>
> On 8/23/06, Petr Pikal <petr.pikal@precheza.cz> wrote:
> >
> > Hi
> >
> > I am not sure what you really want. If you try to preserve first
> > part of your objects just exclude them from operation e.g.
> >
> > data[-(1:5),] will exclude first five rows from your dataframe.
> >
> > However it is unclear what you want to do next. Instead of three
> > items you want only add one different?
> >
> > data.frame(x=c(data[(1:5),],6))
> >
> > or another vector
> >
> > data.frame(x=c(data[(1:5),],some.other.data))
> >
> > Following probably too complicated construction tells you which is
> > the position of the second value lower then some threshold (in this
> > case 3.5) in a vector.
> >
> > which(diff(cumsum(diff(data<3.5)==1)<2)!=0)+2
> >
> > HTH
> > Petr
> >
> >
> >
> > On 23 Aug 2006 at 11:23, zhijie zhang wrote:
> >
> > Date sent: Wed, 23 Aug 2006 11:23:03 +0800
> > From: "zhijie zhang" <epistat@gmail.com>
> > To: R-help@stat.math.ethz.ch
> > Subject: [R] how to complete this task on data
> > management
> >
> > > Dear friends,
> > > When i clean my dataset , i met a difficulty
> > > suppose my data set is :
> > > *> data<-data.frame(x=c(1:5,1,2,3))
> > > > data
> > > x
> > > 1 1
> > > 2 2
> > > 3 3
> > > 4 4
> > > 5 5*
> > > 6 1
> > > 7 2
> > > 8 3
> > > Now i need to add the data which are less than 3.5 at the bottom,
> > > not including the top data, so the results should be :
> > > x
> > > 1 1
> > > 2 2
> > > 3 3
> > > 4 4
> > > 5 5
> > > *6 6*
> > > I tried to use " data[data$x>3.5,]" to do it , but it also delete
> > > the first several numbers,* How to finish it ?* Thanks very much.
> > > -- Kind Regards, Zhi Jie,Zhang
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html and provide commented,
> > > minimal, self-contained, reproducible code.
> >
> > Petr Pikal
> > petr.pikal@precheza.cz
> >
> >
>
>
> --
> Kind Regards,
> Zhi Jie,Zhang ,PHD
> Department of Epidemiology
> School of Public Health
> Fudan University
> Tel:86-21-54237149
>

Petr Pikal
petr.pikal@precheza.cz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed Aug 23 23:08:31 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Thu 24 Aug 2006 - 00:21:47 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.