Re: [R] data.frame transformation

From: andrija djurovic <djandrija_at_gmail.com>
Date: Tue, 15 Mar 2011 07:48:29 +0100

Thank you Bill for this additional solution.

Andrija

On Tue, Mar 15, 2011 at 12:16 AM, <Bill.Venables_at_csiro.au> wrote:

> It is possible to do it with numeric comparisons, as well, but to make life
> comfortable you need to turn off the warning system temporarily.
>
> df <- data.frame(q1 = c(0,0,33.33,"check"),
> q2 = c(0,33.33,"check",9.156),
> q3 = c("check","check",25,100),
> q4 = c(7.123,35,100,"check"))
>
> conv <- function(x, cutoff) {
> oldOpt <- options(warn = -1)
> on.exit(options(oldOpt))
> x <- as.factor(x)
> lev <- as.numeric(levels(x))
> levels(x)[!is.na(lev) & lev < cutoff] <- "."
> x
> }
>
> Check:
> > (df1 <- data.frame(lapply(df, conv, cutoff = 10)))
> q1 q2 q3 q4
> 1 . . check .
> 2 . 33.33 check 35
> 3 33.33 check 25 100
> 4 check . 100 check
> >
>
> Bill Venables.
>
> -----Original Message-----
> From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org]
> On Behalf Of David Winsemius
> Sent: Tuesday, 15 March 2011 6:29 AM
> To: andrija djurovic
> Cc: r-help_at_r-project.org
> Subject: Re: [R] data.frame transformation
>
>
> On Mar 14, 2011, at 3:51 PM, andrija djurovic wrote:
>
> > I would like to hide cells with values less the 10%, so "." or just
> > "" doesn't make me any difference. Also I used apply combined with
> > as.character:
> >
> > apply(df, 2, function(x) ifelse(as.character(x) < 10,".",x))
> >
> > This is, probably not a good solution, but it works except that I
> > lose row names and because of that I was wondering if there is some
> > other way to do this.
> >
> > Anyway thank you both i will try to do this before combining numbers
> > and strings.
>
> I saw your later assertion that it didn't work which surprised me. My
> version of your data followed my advice not to use factors and your
> effort did succeed when the columns were character rather than factor.
> I put back the row numbers by coercing back to a data.frame. `apply`
> returns a matrix.
>
> > df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33," check",9.156),
> + q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),
> stringsAsFactors=FALSE)
> > as.data.frame(apply(df, 2, function(x) ifelse(as.character(x) <
> 10,".",x)))
> q1 q2 q3 q4
> 1 . . check 7.123
> 2 . 33.33 check 35
> 3 33.33 . 25 100
> 4 check 9.156 100 check
>
> There is a danger of using character collation in that if there are
> any leading characters in those strings that are below "1" such as a
> <blank> or any other punctuation, they will get "dotted".
>
> > "," < "1"
> [1] TRUE
> > "." < "1"
> [1] TRUE
> > "-" < "1"
> [1] TRUE
>
> And "1.check" would also get "dotted"
>
> > "1.check" < 10
> [1] TRUE
>
> >
> > Andrija
> >
> > On Mon, Mar 14, 2011 at 8:11 PM, David Winsemius <dwinsemius_at_comcast.net
> > > wrote:
> >
> > On Mar 14, 2011, at 2:52 PM, andrija djurovic wrote:
> >
> > Hi R users,
> >
> > I have following data frame
> >
> > df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> > q3=c("check","check",25,100),q4=c(7.123,35,100,"check"))
> >
> > and i would like to replace every element that is less then 10
> > with . (dot)
> > in order to obtain this:
> >
> > q1 q2 q3 q4
> > 1 . . check .
> > 2 . 33.33 check 35
> > 3 33.33 check 25 100
> > 4 check . 100 check
> >
> > I had a lot of difficulties because each variable is factor.
> >
> > Right, so comparisons with "<" will throw an error. I would
> > sidestep the factor problem with stringsAsFactors=FALSE in the
> > data.frame call. You might want to reconsider the "." as a missing
> > value. If you are coming from a SAS background, you should try to
> > get comfortable with NA or NA_character as a value.
> >
> >
> > df<-data.frame(q1=c(0,0,33.33,"check"),q2=c(0,33.33,"check",9.156),
> > q3=c("check","check",25,100),q4=c(7.123,35,100,"check"),
> > stringsAsFactors=FALSE)
> >
> > is.na(df) <- t(apply(df, 1, function(x) as.numeric(x) < 10))
> >
> > Warning messages:
> > 1: In FUN(newX[, i], ...) : NAs introduced by coercion
> > 2: In FUN(newX[, i], ...) : NAs introduced by coercion
> > 3: In FUN(newX[, i], ...) : NAs introduced by coercion
> > 4: In FUN(newX[, i], ...) : NAs introduced by coercion
> > > df
> > q1 q2 q3 q4
> > 1 <NA> <NA> check <NA>
> > 2 <NA> 33.33 check 35
> >
> > 3 33.33 check 25 100
> > 4 check <NA> 100 check
> >
> >
> > Could someone help me with this?
> >
> > Thanks in advance for any help.
> >
> > Andrija
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius, MD
> > West Hartford, CT
> >
> >
>
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue 15 Mar 2011 - 06:59:24 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 15 Mar 2011 - 07:40:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive