Re: [R] Use of Factors

From: jim holtman <jholtman_at_gmail.com>
Date: Thu, 20 Mar 2008 21:05:03 -0500

Do 'str' on your object and you will see that they are 'factors'. May have gotten that way when you read them in and there was character data in the column. To convert it back to numeric, do:

cpx_interp$HR <- as.numeric(as.character(cpx_interp$HR))

On Thu, Mar 20, 2008 at 9:26 AM, Beck, Kenneth (STP) <Kenneth.Beck_at_bsci.com> wrote:
> Relatively new to R, I'm trying to do a relatively simple task. I have
> data set that has several variables arranged by SubjID and visit, with
> multiple observations for that combination. I do linear regression on
> those multiple observations, then generated a set of interpolated values
> from the regression at fixed intervals along "x". I now want to average
> each of those across all the SubjID's. When I use either by() or
> tapply(), I get an error indicating the interpolated values are factors,
> even though they display looking like floating point numbers. The mean
> function returns a value that is obviously wrong, though the count of
> observations in the subsets is correct. I am including code snippets to
> try to demostrate how this is all created:, sorry for the length of this
>
> Here is output when I try to use the mean function,
> mean_interp_HR=tapply(cpx_interp$HR[cpx_interp$visit==1 &
> cpx_interp$xl==0],cpx_interp$SubjId[cpx_interp$visit==1 &
> cpx_interp$xl==0],mean)
> Warning in mean.default(X[[1L]], ...) :
> argument is not numeric or logical: returning NA
> Warning in mean.default(X[[2L]], ...) :
> argument is not numeric or logical: returning NA
> Warning in mean.default(X[[3L]], ...) :
> argument is not numeric or logical: returning NA
> Warning in mean.default(X[[4L]], ...) :
> argument is not numeric or logical: returning NA
> Warning in mean.default(X[[5L]], ...) :
> argument is not numeric or logical: returning NA
>
> Look at the data I am submitting to tapply and mean:
> > cpx_interp$HR[cpx_interp$visit==1 & cpx_interp$xl==0]
> [1] 62.5252140470478 67.6151493460742 68.3931063786315 78.6591518601803
> 59.7674671000443
> 90 Levels: 62.5252140470478 66.046907240618 69.5686004341883
> 69.8766646005142 71.9631282463843 ... 85.4270562298357
> > cpx_interp$SubjId[cpx_interp$visit==1 & cpx_interp$xl==0]
> [1] ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB
> Levels: ADENPV07 ADENPVJN ADENPV0Z ADENPVM9 ADENPVMB
>
> Why is the $HR variable listed as "90 levels" as if it is a factor? Why
> is it not treated as floating point to get simple mean?
>
> Here is how the HR values are generated:
>
> # create the array
> interp_out=array(,c(18,length(cols2)))
> # create the values to interpolate to
> interp_out[,3]=c(0,25,50,75,100,125,150,175,200,0,25,50,75,100,125,150,1
> 75,200);
> # fill the visits
> interp_out[,2]=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2)
> # fill the SubjID
> interp_out[,1]=SubjID;
> Now fill in interplated values for each visit.
> interp_out[1:9,4]=hrv1;interp_out[10:18,4]=hrv2;
>
> # hrv1 & hrv2 come from the following function, the "lm" parameter is
> output from the standard lm() function:
> interpolateToXL = function(lm,maxxl){
> int_values=matrix(nrow=9,ncol=1)
> int_values[1,]=coef(lm)[1];
> if (maxxl>25)
> int_values[2,]=coef(lm)[1]+coef(lm)[2] * 25
> if (maxxl>50)
> int_values[3,]=coef(lm)[1]+coef(lm)[2] * 50
> if (maxxl>75)
> int_values[4,]=coef(lm)[1]+coef(lm)[2] * 75
> if (maxxl>100)
> int_values[5,]=coef(lm)[1]+coef(lm)[2] * 100
> if (maxxl>125)
> int_values[6,]=coef(lm)[1]+coef(lm)[2] * 125
> if (maxxl>150)
> int_values[7,]=coef(lm)[1]+coef(lm)[2] * 150
> if (maxxl>175)
> int_values[8,]=coef(lm)[1]+coef(lm)[2] * 175
> if (maxxl>200)
> int_values[9,]=coef(lm)[1]+coef(lm)[2] * 200
> return (int_values)
> }
>
>
> Ken Beck PhD
> Research Scientist
> Boston Scientific CRM (Guidant)
> 10-212
> kenneth.beck_at_bsci.com
>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 21 Mar 2008 - 02:11:06 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 21 Mar 2008 - 04:30:24 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive