From: Ram H. Sharma <sharma.ram.h_at_gmail.com>

Date: Fri, 18 Mar 2011 13:01:53 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 Mar 2011 - 21:05:15 GMT

Date: Fri, 18 Mar 2011 13:01:53 -0400

I wish has simpler solution, apprently simple problem ! thanks for help.

On Fri, Mar 18, 2011 at 10:04 AM, jim holtman <jholtman_at_gmail.com> wrote:

> I think it was suggested that you save your output to a 'list' and

*> then you will have it in a format that can accept variable numbers of
**> items in each element and it is also in a form that you can easily
**> process it to create whatever other output you might need.
**>
**> On Fri, Mar 18, 2011 at 7:24 AM, Ram H. Sharma <sharma.ram.h_at_gmail.com>
**> wrote:
**> > Hi Dennis and R-users
**> >
**> > Thank you for more help. I am pretty close, but challenge still remain is
**> > forcing the output with different length to output dataframe.
**> >
**> >> x <- data.frame(apply(datafr1, 2, fout))
**> > Error in data.frame(var1 = c(-0.70777998321315, 0.418602152926712,
**> > 2.08356737154810, :
**> > arguments imply differing number of rows: 28, 12, 20, 19
**> >
**> > As I need to work with >2000 variables, my intension here is to save this
**> > output to such way that it would be further manipulated. Topline is to
**> save
**> > in dataframe that have extreme values for the variable concerned and
**> > bottomline is automate to save the output printed in the screen to a
**> > textfile.
**> >
**> > Thank you for help once again.
**> >
**> > Ram
**> >
**> >
**> > On Fri, Mar 18, 2011 at 3:16 AM, Dennis Murphy <djmuser_at_gmail.com>
**> wrote:
**> >
**> >> Hi:
**> >>
**> >> Is this what you're after?
**> >>
**> >> fout <- function(x) {
**> >> lim <- median(x) + c(-2, 2) * mad(x)
**> >> x[x < lim[1] | x > lim[2]]
**> >> }
**> >> > apply(datafr1, 2, fout)
**> >> $var1
**> >> [1] 17.5462078 18.4548214 0.7083442 1.9207578 -1.2296787 17.4948240
**> >> [7] 19.5702558 1.6181150 20.9791652 -1.3542099 1.8215087 -1.0296303
**> >> [13] 20.5237930 17.5366497 18.5657566 0.9335419 19.7519983 17.8607968
**> >> [19] 19.1307524 19.6145711 21.8037136 19.1532175 -2.6688409 19.6949309
**> >> [25] 1.9712347
**> >>
**> >> $var2
**> >> [1] 37.3822087 35.6490641 35.6000785 38.5981086 -1.6504275
**> >> 37.1419290
**> >> [7] 37.7605230 40.3508689 0.6639900 2.4695841 38.8209491
**> >> 39.9087921
**> >> [13] 38.9907585 35.8279437 2.7870799 37.0941113 0.6308583
**> >> 36.4556638
**> >> [19] -10.2384849 2.8480199 -7.7680457 35.7076539 -0.5467739
**> >> 3.4702765
**> >> [25] 40.4818580 3.2864273 1.4917174
**> >>
**> >> $var3
**> >> [1] 74.252563 68.396391 68.845461 -5.006545 66.083402 76.036577
**> >> [7] 75.112586 -6.374241 63.883549 64.041216 -19.764360 -15.051017
**> >> [13] -9.782767 64.696013 70.970648 -4.562031 -22.135003 70.549310
**> >> [19] 69.495915 -4.095587 86.612375 87.029526 70.072126 -6.421695
**> >> [25] 65.737536
**> >>
**> >> $var4
**> >> [1] 81.476483 87.098767 -10.451616 91.927329 86.588952 85.080950
**> >> [7] 84.958645 -9.456368 86.270876 -22.936779 83.314032
**> >>
**> >> Double checks:
**> >> > apply(datafr1, 2, function(x) median(x) + c(-2, 2) * mad(x))
**> >> var1 var2 var3 var4
**> >> [1,] 2.12167 3.779415 -3.736066 -3.471752
**> >> [2,] 17.37176 34.929800 62.969733 80.224799
**> >> > apply(datafr1, 2, range)
**> >> var1 var2 var3 var4
**> >> [1,] -2.668841 -10.23848 -22.13500 -22.93678
**> >> [2,] 21.803714 40.48186 87.02953 91.92733
**> >>
**> >> Assuming you wanted to do this columnwise (by variable), it appears to
**> be
**> >> doing the right thing.
**> >>
**> >> HTH,
**> >> Dennis
**> >>
**> >>
**> >> On Thu, Mar 17, 2011 at 7:04 PM, Ram H. Sharma <sharma.ram.h_at_gmail.com
**> >wrote:
**> >>
**> >>> Dear R community members
**> >>>
**> >>> I have been struggling on this simple question, but never get
**> appropriate
**> >>> solution. So please help.
**> >>>
**> >>> # my data, though I have a large number of variables
**> >>> var1 <- rnorm(500, 10,4)
**> >>> var2 <- rnorm(500, 20, 8)
**> >>> var3 <- rnorm(500, 30, 18)
**> >>> var4 <- rnorm(500, 40, 20)
**> >>> datafr1 <- data.frame(var1, var2, var3, var4)
**> >>>
**> >>> # my unsuccessful codes
**> >>> nvar <- ncol(datafr1)
**> >>> for (i in 1:nvar) {
**> >>> out1 <- NULL
**> >>> out2 <- NULL
**> >>> medianx <- median(getdata[,i], na.rm = TRUE)
**> >>> show(madx <- mad(getdata[,i], na.rm = TRUE))
**> >>> MD1 <- c(medianx + 2*madx)
**> >>> MD2 <- c(medianx - 2*madx)
**> >>> out1[i] <- which(getdata[,i] > MD1) # store data that are
**> >>> greater than median + 2 mad
**> >>> out2[i] <- which (getdata[,1] < MD2) # store data that are
**> >>> greater than median - 2 mad
**> >>> resultdf <- data.frame(out1, out2)
**> >>> write.table (resultdf, "out.csv", sep=",")
**> >>> }
**> >>>
**> >>>
**> >>> My idea here is to store those value which are either greater than
**> median
**> >>> +
**> >>> 2 *MAD or less than median - 2*MAD. Each variable have different length
**> of
**> >>> output.
**> >>>
**> >>> The following last error message:
**> >>> Error in data.frame(out1, out2) :
**> >>> arguments imply differing number of rows: 2, 0
**> >>> In addition: Warning messages:
**> >>> 1: In out1[i] <- which(getdata[, i] > MD1) :
**> >>> number of items to replace is not a multiple of replacement length
**> >>> 2: In out2[i] <- which(getdata[, 1] < MD2) :
**> >>> number of items to replace is not a multiple of replacement length
**> >>> 3: In out1[i] <- which(getdata[, i] > MD1) :
**> >>> number of items to replace is not a multiple of replacement length
**> >>>
**> >>> Thank you in advance for helping me.
**> >>>
**> >>> Best regards;
**> >>> RHS
**> >>>
**> >>> [[alternative HTML version deleted]]
**> >>>
**> >>> ______________________________________________
**> >>> R-help_at_r-project.org mailing list
**> >>> https://stat.ethz.ch/mailman/listinfo/r-help
**> >>> PLEASE do read the posting guide
**> >>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
**> >>> and provide commented, minimal, self-contained, reproducible code.
**> >>>
**> >>
**> >>
**> >
**> > [[alternative HTML version deleted]]
**> >
**> > ______________________________________________
**> > R-help_at_r-project.org mailing list
**> > https://stat.ethz.ch/mailman/listinfo/r-help
**> > PLEASE do read the posting guide
**> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
**> > and provide commented, minimal, self-contained, reproducible code.
**> >
**>
**>
**>
**> --
**> Jim Holtman
**> Data Munger Guru
**>
**> What is the problem that you are trying to solve?
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 Mar 2011 - 21:05:15 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Fri 18 Mar 2011 - 21:10:23 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*