[R] matlab/gauss code in R

From: Sebastian Kruk <residuo.solow_at_gmail.com>
Date: Sun, 24 Jun 2007 20:27:20 -0300


Hi all!

I would like to import a matlab or gauss code to R.

Could you help me?

Bye,

Sebastián.

2007/6/23, r-help-request_at_stat.math.ethz.ch <r-help-request_at_stat.math.ethz.ch>:
> Send R-help mailing list submissions to
> r-help_at_stat.math.ethz.ch
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://stat.ethz.ch/mailman/listinfo/r-help
> or, via email, send a message with subject or body 'help' to
> r-help-request_at_stat.math.ethz.ch
>
> You can reach the person managing the list at
> r-help-owner_at_stat.math.ethz.ch
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of R-help digest..."
>
>
> Today's Topics:
>
> 1. what is "better" when combining data frames? merge vs. rbind
> & cbind (Thomas Pujol)
> 2. extract index during execution of sapply (Christian Bieli)
> 3. multiple return (Manuele Pesenti)
> 4. Re: what is "better" when combining data frames? merge vs.
> rbind & cbind (Duncan Murdoch)
> 5. Re: multiple return (Mahbub Latif)
> 6. Re: how to create cumulative histogram from two independent
> variables? (Jim Lemon)
> 7. Re: Adding exponents (superscript format) to a plot (John Kane)
> 8. Re: using lme on multiple datasets in one shot (Douglas Bates)
> 9. Re: abline plots at wrong abscissae after boxplot (S Ellison)
> 10. Boxplot issues (S Ellison)
> 11. Re: help on the use of ldBand (Frank E Harrell Jr)
> 12. Re: multiple return (ONKELINX, Thierry)
> 13. Re: extract index during execution of sapply (Martin Morgan)
> 14. vectorize a function (Robin Hankin)
> 15. Re: Tools For Preparing Data For Analysis (Kevin E. Thorpe)
> 16. Re: Data consistency checks in functions (Kuhn, Max)
> 17. Re: vectorize a function (Christos Hatzis)
> 18. Re: extract index during execution of sapply (Ben Bolker)
> 19. Re: extract index during execution of sapply (Thomas Lumley)
> 20. fitCopula (Oden, Kevin)
> 21. how to ave this? (Weiwei Shi)
> 22. fitCopula (Oden, Kevin)
> 23. Re: how to ave this? (Weiwei Shi)
> 24. Re: Result depends on order of factors in unbalanced designs
> (lme, anova)? (Prof Brian Ripley)
> 25. Re: Tools For Preparing Data For Analysis (Christophe Pallier)
> 26. Re: interpretation of F-statistics in GAMs (Simon Wood)
> 27. Re: Boxplot issues (Martin Maechler)
> 28. Re: Overlaying lattice graphs (continued) (S?bastien)
> 29. Matrix library, CHOLMOD error: problem too large (Jose Quesada )
> 30. Re: Matrix library, CHOLMOD error: problem too large
> (Duncan Murdoch)
> 31. Imputing missing values in time series (Horace Tso)
> 32. Re: Imputing missing values in time series (Erik Iverson)
> 33. Re: Imputing missing values in time series (Horace Tso)
> 34. Re: Switching X-axis and Y-axis for histogram (hadley wickham)
> 35. Re: Imputing missing values in time series (Leeds, Mark (IED))
> 36. Re: Imputing missing values in time series (Horace Tso)
> 37. Re: Overlaying lattice graphs (continued) (hadley wickham)
> 38. Re: Overlaying lattice graphs (continued) (Deepayan Sarkar)
> 39. Re: Stacked barchart color (hadley wickham)
> 40. Re: Visualize quartiles of plot line (hadley wickham)
> 41. "heatmap" color still a spectrum for binary outcomes?
> (Patrick Ayscue)
> 42. Barchart legend position (Spilak,Jacqueline [Edm])
> 43. Lattice: hiding only some strips (Michael Hoffman)
> 44. Re: Overlaying lattice graphs (continued) (S?bastien)
> 45. Bayesian Networks (Mario.Carvalho.Fernandes_at_bpi.pt)
> 46. connecting to running process possible? (Charles Cosse)
> 47. Re: Overlaying lattice graphs (continued) (hadley wickham)
> 48. RServe (java2R) question (Guanrao Chen)
> 49. Re: Lattice: hiding only some strips (Deepayan Sarkar)
> 50. interaction contrast (szhan_at_uoguelph.ca)
> 51. Re: Barchart legend position (Deepayan Sarkar)
> 52. How to run "mathematica" or "c" programs in R? (Zhang Jian)
> 53. Re: Imputing missing values in time series (Horace Tso)
> 54. Asteriscs in a plot to represent approximate size of p-values
> (Judith Flores)
> 55. merging more than two data frames (Andrew Yee)
> 56. speed issues / pros & cons: dataframe vs. matrix (Thomas Pujol)
> 57. Re: Matrix *package*, CHOLMOD error: problem too large
> (Martin Maechler)
> 58. Re: speed issues / pros & cons: dataframe vs. matrix
> (Duncan Murdoch)
> 59. Re: Lattice: hiding only some strips (Michael Hoffman)
> 60. Re: Lattice: hiding only some strips (deepayan.sarkar_at_gmail.com)
> 61. Re: how to ave this? (Mike Meredith)
> 62. Re: how to ave this? (Dimitris Rizopoulos)
> 63. Re: merging more than two data frames (Mark Wardle)
> 64. latex of ftable (Hmisc?) (Dieter Menne)
> 65. Re: latex of ftable (Hmisc?) (Chuck Cleland)
> 66. Re: Data consistency checks in functions (Mike Meredith)
> 67. Re: latex of ftable (Hmisc?) (Dieter Menne)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 21 Jun 2007 12:36:16 -0700 (PDT)
> From: Thomas Pujol <thomas.pujol_at_yahoo.com>
> Subject: [R] what is "better" when combining data frames? merge vs.
> rbind & cbind
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <510958.56904.qm@web59308.mail.re1.yahoo.com>
> Content-Type: text/plain
>
> I often need to "combine" data frames, sometimes "vertically" and other times "horizontally".
>
> When it "better" to use merge? When is it better to use rbind or cbind?
>
> Are there clear pros and cons of each approach?
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 22 Jun 2007 12:31:39 +0200
> From: Christian Bieli <christian.bieli_at_unibas.ch>
> Subject: [R] extract index during execution of sapply
> To: R help list <r-help_at_stat.math.ethz.ch>
> Message-ID: <467BA50B.60408@unibas.ch>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> Hi there
> During execution of sapply I want to extract the number of times the
> function given to supply has been executed. I came up with:
>
> mylist <- list(a=3,b=6,c=9)
> sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
>
> This works fine, but looks quite ugly. I'm sure that there's a more
> elegant way to do this.
>
> Any suggestion?
>
> Christian
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 22 Jun 2007 12:37:37 +0200
> From: Manuele Pesenti <amicogodzilla_at_bruttocarattere.org>
> Subject: [R] multiple return
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <200706221237.37479.amicogodzilla@bruttocarattere.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Dear User,
> what's the correct way to obtain a multiple return from a function?
>
> for example creating the simple function:
>
> somma <- function (a, b) {
> c <- a+b
> return (a, b, c)
> }
>
> when I call it, it runs but returns the following output:
>
> > somma(5, 7)
> $a
> [1] 5
>
> $b
> [1] 7
>
> $c
> [1] 12
>
> Warning message:
> return multi-argomento sono deprecati in: return(a, b, c)
>
> i.e. multi-return is deprecated...
>
> thanks a lot
> best regards
> Manuele
>
> --
> Manuele Pesenti
> manuele_at_inventati.org
> amicogodzilla_at_jabber.linux.it
> http://mpesenti.polito.it
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 22 Jun 2007 06:40:23 -0400
> From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
> Subject: Re: [R] what is "better" when combining data frames? merge
> vs. rbind & cbind
> To: Thomas Pujol <thomas.pujol_at_yahoo.com>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <467BA717.7080202@stats.uwo.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 21/06/2007 3:36 PM, Thomas Pujol wrote:
> > I often need to "combine" data frames, sometimes "vertically" and other times "horizontally".
> >
> > When it "better" to use merge? When is it better to use rbind or cbind?
> >
> > Are there clear pros and cons of each approach?
>
> If rbind or cbind work, use them. They are much simpler, but much less
> flexible.
>
> Duncan Murdoch
>
>
>
> ------------------------------
>
> Message: 5
> Date: Fri, 22 Jun 2007 17:20:01 +0600
> From: "Mahbub Latif" <mlatif_at_isrt.ac.bd>
> Subject: Re: [R] multiple return
> To: amicogodzilla_at_bruttocarattere.org
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <5faba43d0706220420m6f34f831l7353b680925dae34@mail.gmail.com>
> Content-Type: text/plain
>
> one way --
>
>
> somma <- function (a, b) {
> c <- a+b
> return (list(a=a, b=a, c=c))
> }
>
> Mahbub.
>
> On 6/22/07, Manuele Pesenti <amicogodzilla_at_bruttocarattere.org> wrote:
> >
> > Dear User,
> > what's the correct way to obtain a multiple return from a function?
> >
> > for example creating the simple function:
> >
> > somma <- function (a, b) {
> > c <- a+b
> > return (a, b, c)
> > }
> >
> > when I call it, it runs but returns the following output:
> >
> > > somma(5, 7)
> > $a
> > [1] 5
> >
> > $b
> > [1] 7
> >
> > $c
> > [1] 12
> >
> > Warning message:
> > return multi-argomento sono deprecati in: return(a, b, c)
> >
> > i.e. multi-return is deprecated...
> >
> > thanks a lot
> > best regards
> > Manuele
> >
> > --
> > Manuele Pesenti
> > manuele_at_inventati.org
> > amicogodzilla_at_jabber.linux.it
> > http://mpesenti.polito.it
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> A H M Mahbub Latif, PhD
> Assistant Professor
> Applied Statistics
> Institute of Statistical Research and Training
> University of Dhaka, Dhaka 1000, Bangladesh
> web : http://www.isrt.ac.bd/mlatif
> ----
> Computers are like airconditioners: They stop working properly if you open
> windows.
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 6
> Date: Fri, 22 Jun 2007 21:36:29 +1000
> From: Jim Lemon <jim_at_bitwrit.com.au>
> Subject: Re: [R] how to create cumulative histogram from two
> independent variables?
> To: Jose Borreguero <borreguero_at_gmail.com>, R-help_at_stat.math.ethz.ch
> Message-ID: <467BB43D.3060004@bitwrit.com.au>
> Content-Type: text/plain; charset=us-ascii; format=flowed
>
> Jose Borreguero wrote:
> > Hi all,
> > I am extremely newbie to R. Can anybody jump-start me with any clues as to
> > how do I get a cumulative histogram from two independent variables,
> > cumhist(X,Y) ?
> > -jose
> >
> Hi Jose,
>
> Is this something like you want?
>
> var1<-sample(1:10,100,TRUE)
> var2<-sample(1:10,100,TRUE)
> barplot(rbind(hist(var1,plot=FALSE)$counts,hist(var2,plot=FALSE)$counts))
>
> Jim
>
>
>
> ------------------------------
>
> Message: 7
> Date: Fri, 22 Jun 2007 07:44:23 -0400 (EDT)
> From: John Kane <jrkrideau_at_yahoo.ca>
> Subject: Re: [R] Adding exponents (superscript format) to a plot
> To: Judith Flores <juryef_at_yahoo.com>, RHelp <r-help_at_stat.math.ethz.ch>
> Message-ID: <466583.57100.qm@web32802.mail.mud.yahoo.com>
> Content-Type: text/plain; charset=iso-8859-1
>
> # Using expression to add superscipts to the labels
>
> vec=c(1,10,100,1000,10000,100000,1000000,10000000)
> plot(vec,vec,log="xy", axes=F)
> axis(1, at=10^c(0,2,4,6), labels=expression(1, 10^2,
> 10^4, 10^6))
> axis(2, at=10^c(0,2,4,6), labels=expression(1, 10^2,
> 10^4, 10^6), las=1)
> box()
>
> --- Judith Flores <juryef_at_yahoo.com> wrote:
>
> > Hi,
> >
> > I need to add exponents to a label in one of the
> > axes of a plot, how can I do this?
> >
> > Thank you,
> >
> > Judith
> >
> >
> >
> >
> ____________________________________________________________________________________
> > Food fight? Enjoy some healthy debate
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> > reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 8
> Date: Fri, 22 Jun 2007 06:45:26 -0500
> From: "Douglas Bates" <bates_at_stat.wisc.edu>
> Subject: Re: [R] using lme on multiple datasets in one shot
> To: "maitra_at_iastate.edu" <maitra_at_iastate.edu>
> Cc: R-help <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <40e66e0b0706220445j5b5d74d7nb46237f83f0b8432@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/21/07, maitra_at_iastate.edu <maitra_at_iastate.edu> wrote:
> > Dear list,
>
> > I would like to do a huge number of lme's using the same design matrix
> > (and fixed and random effects). Is it possible to do this efficiently?
> > Doing otherwise is not an option for my example.
>
> > Basically, I am wanting to do the following which is possible using lm:
>
> > X <- matrix(rnorm(50),10,5)
> > Y <- matrix(rnorm(50),10,5)
> > lm(Y~X)
>
> > with lme. Any suggestions?
>
> It would not be easy to do this. Neither lme nor lmer were designed
> to make this easy to do. There is a better chance of accomplishing
> this by creating a custom function based on the current lmer but the
> modifications required are non-trivial.
>
> This is a reasonable thing to want to accomplish and I will add it to
> the "To Do" list for lmer. However it is not something I will be able
> to get to soon.
>
>
>
> ------------------------------
>
> Message: 9
> Date: Fri, 22 Jun 2007 12:49:38 +0100
> From: "S Ellison" <S.Ellison_at_lgc.co.uk>
> Subject: Re: [R] abline plots at wrong abscissae after boxplot
> To: <r-help_at_stat.math.ethz.ch>, <bwilfley_at_tripleringtech.com>
> Message-ID: <s67bc579.034@tedmail2.lgc.co.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> Boxplot positions and labels are not the same thing.
> You have groups 'called' "2", "3", "4". As factors - which is what bocplot will turn them into - they will be treated as arbitrary labels and _numbered_ 1:3 (try as.numeric(factor(x)).
>
> So your lm() used 2:4, but your plot (and abline) uses 1:3 for positions and "2" - "4" as labels.
>
> The best option used to be to plot the boxes at positions 2:4. Look at the at= parameter in boxplot.
> But that is now of little help because there is no way of overriding xlim, leaving you no alternative but to reformulate your model with an offset or something.
>
> I will take up the boxplot xlim issue separately on R-dev; it's not the only such.
>
> Steve Ellison.
>
> >>> "Brian Wilfley" <bwilfley_at_tripleringtech.com> 21/06/2007 22:44:17 >>>
> I'm trying to add lines to a plot originally made with "boxplot", but
> the lines appear in the wrong place.
>
> *******************************************************************
> This email and any attachments are confidential. Any use, co...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 10
> Date: Fri, 22 Jun 2007 13:02:20 +0100
> From: "S Ellison" <S.Ellison_at_lgc.co.uk>
> Subject: [R] Boxplot issues
> To: <r-help_at_stat.math.ethz.ch>
> Message-ID: <s67bc871.078@tedmail2.lgc.co.uk>
> Content-Type: text/plain; charset=US-ASCII
>
> Boxplot and bxp seem to have changed behaviour a bit of late (R 2.4.1). Or maybe I am mis-remembering.
>
> An annoying feature is that while at=3:6 will work, there is no way of overriding the default xlim of 0.5 to n+0.5. That prevents plotting boxes on, for example, interval scales - a useful thing to do at times. I really can see no good reason for bxp to hard-core the xlim=c(0.5, n+0.5) in the function body; it should be a parameter default conditional on horizontal=, not hard coded.
>
> Also, boxplot does not drop empty groups. I'm sure it used to. I know it is good to be able to see where a factor level is unpopulated, but its a nuisance with fractional factorials and some nested or survey problems when many are known to be missing and are of no interest. Irrespective of whether my memory is correct, the option would be useful. How hard can it be to add a 'drop.empty=F' default to boxplot to allow it to switch?
>
> Obviously, these are things I can fix locally. But who 'owns' boxplot so I can provide suggested code to them for later releases?
>
> Steve Ellison
>
>
>
> *******************************************************************
> This email and any attachments are confidential. Any use, co...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 11
> Date: Fri, 22 Jun 2007 07:35:51 -0500
> From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
> Subject: Re: [R] help on the use of ldBand
> To: Tomas Goicoa <tomas.goicoa_at_unavarra.es>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <467BC227.8000701@vanderbilt.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Tomas Goicoa wrote:
> > Hi R Users,
> >
> > I am trying to use the ldBand package. Together
> > with the package, I have downloaded the ld98
> > program (version for windows) as indicated in the
> > help page on ldBand. I did it, but obtained an
> > error message "Error in (head + 1):length(w) :
> > Argument NA/NaN" when I copied the help examples,
> > so it seems that a conection between R and ld98
> > is not well performed in my computer. Did I put
> > ld98.exe in the wrong place? If so, where should
> > I put it? Thanks a lot in advance,
> >
> >
> > Berta Iba?ez Beroiz
>
> Do you mean the Hmisc package? Do you mean the ldBands function? Did
> you put ld98.exe in a place that is in your system path as specified in
> the ldBands help file? And please read the posting guide referenced below.
>
> Frank
>
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Frank E Harrell Jr Professor and Chair School of Medicine
> Department of Biostatistics Vanderbilt University
>
>
>
> ------------------------------
>
> Message: 12
> Date: Fri, 22 Jun 2007 13:27:35 +0200
> From: "ONKELINX, Thierry" <Thierry.ONKELINX_at_inbo.be>
> Subject: Re: [R] multiple return
> To: <amicogodzilla_at_bruttocarattere.org>, <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <2E9C414912813E4EB981326983E0A104031A9629@inboexch.inbo.be>
> Content-Type: text/plain; charset="us-ascii"
>
> Put the return values in a vector or list
>
> somma <- function (a, b) {
> c <- a+b
> return (c(a = a, b = b, c = c))
> }
>
> somma(5,7)
> a b c
> 5 7 12
>
>
> somma <- function (a, b) {
> c <- a+b
> return (list(a = a, b = b, c = c))
> }
>
> somma(5,7)
> $a
> [1] 5
>
> $b
> [1] 7
>
> $c
> [1] 12
>
> Cheers,
>
> Thierry
> ------------------------------------------------------------------------
> ----
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and Forest
> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
> methodology and quality assurance
> Gaverstraat 4
> 9500 Geraardsbergen
> Belgium
> tel. + 32 54/436 185
> Thierry.Onkelinx_at_inbo.be
> www.inbo.be
>
> Do not put your faith in what statistics say until you have carefully
> considered what they do not say. ~William W. Watt
> A statistical analysis, properly conducted, is a delicate dissection of
> uncertainties, a surgery of suppositions. ~M.J.Moroney
>
>
>
> > -----Oorspronkelijk bericht-----
> > Van: r-help-bounces_at_stat.math.ethz.ch
> > [mailto:r-help-bounces_at_stat.math.ethz.ch] Namens Manuele Pesenti
> > Verzonden: vrijdag 22 juni 2007 12:38
> > Aan: r-help_at_stat.math.ethz.ch
> > Onderwerp: [R] multiple return
> >
> > Dear User,
> > what's the correct way to obtain a multiple return from a function?
> >
> > for example creating the simple function:
> >
> > somma <- function (a, b) {
> > c <- a+b
> > return (a, b, c)
> > }
> >
> > when I call it, it runs but returns the following output:
> >
> > > somma(5, 7)
> > $a
> > [1] 5
> >
> > $b
> > [1] 7
> >
> > $c
> > [1] 12
> >
> > Warning message:
> > return multi-argomento sono deprecati in: return(a, b, c)
> >
> > i.e. multi-return is deprecated...
> >
> > thanks a lot
> > best regards
> > Manuele
> >
> > --
> > Manuele Pesenti
> > manuele_at_inventati.org
> > amicogodzilla_at_jabber.linux.it
> > http://mpesenti.polito.it
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 13
> Date: Fri, 22 Jun 2007 06:20:27 -0700
> From: Martin Morgan <mtmorgan_at_fhcrc.org>
> Subject: Re: [R] extract index during execution of sapply
> To: Christian Bieli <christian.bieli_at_unibas.ch>
> Cc: R help list <r-help_at_stat.math.ethz.ch>
> Message-ID: <6phodj85dno.fsf@gopher4.fhcrc.org>
> Content-Type: text/plain; charset=us-ascii
>
> Christian,
>
> A favorite of mine is to use lexical scope and a 'factory' model:
>
> > fun_factory <- function() {
> + i <- 0 # 'state' variable(s), unique to each fun_factory
> + function(x) { # fun_factory return value; used as sapply FUN
> + i <<- i + 1 # <<- assignment finds i
> + x^i # return value of sapply FUN
> + }
> + }
> >
> > sapply(1:10, fun_factory())
> [1] 1 4 27 256 3125 46656
> [7] 823543 16777216 387420489 10000000000
>
>
> Christian Bieli <christian.bieli_at_unibas.ch> writes:
>
> > Hi there
> > During execution of sapply I want to extract the number of times the
> > function given to supply has been executed. I came up with:
> >
> > mylist <- list(a=3,b=6,c=9)
> > sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >
> > This works fine, but looks quite ugly. I'm sure that there's a more
> > elegant way to do this.
> >
> > Any suggestion?
> >
> > Christian
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Martin Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>
>
>
> ------------------------------
>
> Message: 14
> Date: Fri, 22 Jun 2007 14:28:07 +0100
> From: Robin Hankin <r.hankin_at_noc.soton.ac.uk>
> Subject: [R] vectorize a function
> To: RHelp help <r-help_at_stat.math.ethz.ch>
> Message-ID: <F77DEE3F-E5AA-4A9B-A722-18F7DA006F46@noc.soton.ac.uk>
> Content-Type: text/plain; charset=US-ASCII; format=flowed
>
> Hello everyone
>
> suppose I have an integer vector "a" of length "n" and
> a symmetric matrix "M" of size n-by-n.
>
> Vector "a" describes a partition of a set of "n" elements
> and matrix M describes a penalty function: row i column
> j represents the penalty if element i and element j
> are in the same partition.
>
> Toy example follows; the real case is much larger
> and I need to evaluate my penalty function many times.
>
> If a <- c(1,1,2,1,3) then elements 1,2,4 are in the
> same partition; element 3 is in a partition on its own
> and element 5 is in a partition on its own.
>
> The total penalty can be described by the following (ugly)
> function:
>
> f <- function(a,M){
> out <- 0
> for(i in unique(a)){
> out <- out + sum(M[which(a==i),which(a==i)])
> }
> return(out)
> }
>
>
> so with
>
> M <- matrix(rpois(25,3),5,5)
> M <- M+t(M)
> diag(M) <- 0
> a <- c(1,2,1,1,3)
>
> f(a,M) gives the total penalty.
>
>
> QUESTION: how to rewrite f() so that it has no loop?
>
>
>
>
>
>
> --
> Robin Hankin
> Uncertainty Analyst
> National Oceanography Centre, Southampton
> European Way, Southampton SO14 3ZH, UK
> tel 023-8059-7743
>
>
>
> ------------------------------
>
> Message: 15
> Date: Fri, 22 Jun 2007 09:56:20 -0400
> From: "Kevin E. Thorpe" <kevin.thorpe_at_utoronto.ca>
> Subject: Re: [R] Tools For Preparing Data For Analysis
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <467BD504.4040705@utoronto.ca>
> Content-Type: text/plain; charset=ISO-8859-1
>
> I am posting to this thread that has been quiet for some time because I
> remembered the following question.
>
> Christophe Pallier wrote:
> > Hi,
> >
> > Can you provide examples of data formats that are problematic to read and
> > clean with R ?
>
> Today I had a data manipulation problem that I don't know how to do in R
> so I solved it with perl. Since I'm always interested in learning more
> about complex data manipulation in R I am posting my problem in the
> hopes of receiving some hints for doing this in R.
>
> If anyone has nothing better to do than play with other people's data,
> I would be happy to send the row files off-list.
>
> Background:
>
> I have been given data that contains two measurements of left
> ventricular ejection fraction. One of the methods is echocardiogram
> which sometimes gives a true quantitative value and other times a
> semi-quantitative value. The desire is to compare echo with the
> other method (MUGA). In most cases, patients had either quantitative
> or semi-quantitative. Same patients had both. The data came
> to me in excel files with, basically, no patient identifiers to link
> the "both" with the semi-quantitative patients (the "both" patients
> were in multiple data sets).
>
> What I wanted to do was extract from the semi-quantitative data file
> those patients with only semi-quantitative. All I have to link with
> are the semi-quantitative echo and the MUGA and these pairs of values
> are not unique.
>
> To make this more concrete, here are some portions of the raw data.
>
> "Both"
>
> "ID NUM","ECHO","MUGA","Semiquant","Quant"
> "B",12,37,10,12
> "D",13,13,10,13
> "E",13,26,10,15
> "F",13,31,10,13
> "H",15,15,10,15
> "I",15,21,10,15
> "J",15,22,10,15
> "K",17,22,10,17
> "N",17.5,4,10,17.5
> "P",18,25,10,18
> "R",19,25,10,19
>
> Seimi-quantitative
>
> "echo","muga","quant"
> 10,20,0 <-- keep
> 10,20,0 <-- keep
> 10,21,0 <-- remove
> 10,21,0 <-- keep
> 10,24,0 <-- keep
> 10,25,0 <-- remove
> 10,25,0 <-- remove
> 10,25,0 <-- keep
>
> Here is the perl program I wrote for this.
>
> #!/usr/bin/perl
>
> open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> # Discard first row;
> $_ = <BOTH>;
> while(<BOTH>) {
> chomp;
> ($id, $e, $m, $sq, $qu) = split(/,/);
> $both{$sq,$m}++;
> }
> close(BOTH);
>
> open(OUT, "> qual_echo_only.csv") || die "Can't open qual_echo_only.csv";
> print OUT "pid,echo,muga,quant\n";
> $pid = 2001;
>
> open(QUAL, "qual_echo.csv") || die "Can't open qual_echo.csv";
> # Discard first row
> $_ = <QUAL>;
> while(<QUAL>) {
> chomp;
> ($echo, $muga, $quant) = split(/,/);
> if ($both{$echo,$muga} > 0) {
> $both{$echo,$muga}--;
> }
> else {
> print OUT "$pid,$echo,$muga,$quant\n";
> $pid++;
> }
> }
> close(QUAL);
> close(OUT);
>
> open(OUT, "> both_echo.csv") || die "Can't open both_echo.csv";
> print OUT "pid,echo,muga,quant\n";
> $pid = 3001;
>
> open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> # Discard first row;
> $_ = <BOTH>;
> while(<BOTH>) {
> chomp;
> ($id, $e, $m, $sq, $qu) = split(/,/);
> print OUT "$pid,$sq,$m,0\n";
> print OUT "$pid,$qu,$m,1\n";
> $pid++;
> }
> close(BOTH);
> close(OUT);
>
>
> --
> Kevin E. Thorpe
> Biostatistician/Trialist, Knowledge Translation Program
> Assistant Professor, Department of Public Health Sciences
> Faculty of Medicine, University of Toronto
> email: kevin.thorpe_at_utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
>
>
>
> ------------------------------
>
> Message: 16
> Date: Fri, 22 Jun 2007 09:58:39 -0400
> From: "Kuhn, Max" <Max.Kuhn_at_pfizer.com>
> Subject: Re: [R] Data consistency checks in functions
> To: "Anup Nandialath" <anup_nandialath_at_yahoo.com>,
> <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <71257D09F114DA4A8E134DEAC70F25D308B9745C@groamrexm03.amer.pfizer.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Anup,
>
> There are two ways to pass arguments to functions in R: as named
> arguments or by position*.
>
> Users *can* supply arguments that are inconsistent with the order that
> you specify in the function definition, but only if they are used as
> named arguments:
>
> myfun(X = someMatrix, values = aVector, theta = whatever)
>
> If the arguments are passed by position (as in myfun(beta, val1)), R
> will assume that the first argument is theta, the second is X, etc since
> that is how the function is defined.
>
> My suggestion would be to leave these arguments without defaults and put
> a lot of checks in the function (using is.matrix, is.vector and a few
> that check the content of the data I those objects).
>
>
> * You can also mix the two:
>
> foo(data, outcome, start = rep(0, 3))
>
>
>
> Max
>
> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Anup Nandialath
> Sent: Friday, June 22, 2007 12:19 AM
> To: r-help_at_stat.math.ethz.ch
> Subject: [R] Data consistency checks in functions
>
> Dear friends,
>
> I'm writing a function with three arguments
>
> myfun <- function(theta, X, values)
>
> {
> ....
> ....
> }
>
> in this function, I'm trying to write consistency checks. In order to
> compute the statistic of interest I only need theta and values. The idea
> of having X in there is that, if values is not provided by the user,
> then values is computed from X.
>
> my problem is I'm trying to write consistency checks. For instance if i
> say
>
> output <- myfun(beta, val1), how do I ensure that R reads this as
> passing arguments to "theta" and "values". In other words is it possible
> to bypass X completely if values is provided. Also how is it possible
> for R to recognize the second argument as being values and not X. This
> is important because X is a matrix and values is a vector. Therefore any
> checks using the dimensions of either one will land in trouble if it
> does not correctly capture that.
>
> Thanks in advance
> Sincerely
>
> Anup
>
>
> ---------------------------------
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ----------------------------------------------------------------------
> LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 17
> Date: Fri, 22 Jun 2007 10:17:24 -0400
> From: "Christos Hatzis" <christos_at_nuverabio.com>
> Subject: Re: [R] vectorize a function
> To: "'Robin Hankin'" <r.hankin_at_noc.soton.ac.uk>, "'RHelp help'"
> <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <001b01c7b4d8$0e8129f0$0e010a0a@headquarters.silicoinsights>
> Content-Type: text/plain; charset="us-ascii"
>
> How about:
>
> sum(sapply(unique(a), function(x) {b <- which(a==x); sum(M[b, b])}))
>
> HTH
> -Christos
>
> Christos Hatzis, Ph.D.
> Nuvera Biosciences, Inc.
> 400 West Cummings Park
> Suite 5350
> Woburn, MA 01801
> Tel: 781-938-3830
> www.nuverabio.com
>
>
> > -----Original Message-----
> > From: r-help-bounces_at_stat.math.ethz.ch
> > [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Robin Hankin
> > Sent: Friday, June 22, 2007 9:28 AM
> > To: RHelp help
> > Subject: [R] vectorize a function
> >
> > Hello everyone
> >
> > suppose I have an integer vector "a" of length "n" and a
> > symmetric matrix "M" of size n-by-n.
> >
> > Vector "a" describes a partition of a set of "n" elements and
> > matrix M describes a penalty function: row i column j
> > represents the penalty if element i and element j are in the
> > same partition.
> >
> > Toy example follows; the real case is much larger and I need
> > to evaluate my penalty function many times.
> >
> > If a <- c(1,1,2,1,3) then elements 1,2,4 are in the same
> > partition; element 3 is in a partition on its own and element
> > 5 is in a partition on its own.
> >
> > The total penalty can be described by the following (ugly)
> > function:
> >
> > f <- function(a,M){
> > out <- 0
> > for(i in unique(a)){
> > out <- out + sum(M[which(a==i),which(a==i)])
> > }
> > return(out)
> > }
> >
> >
> > so with
> >
> > M <- matrix(rpois(25,3),5,5)
> > M <- M+t(M)
> > diag(M) <- 0
> > a <- c(1,2,1,1,3)
> >
> > f(a,M) gives the total penalty.
> >
> >
> > QUESTION: how to rewrite f() so that it has no loop?
> >
> >
> >
> >
> >
> >
> > --
> > Robin Hankin
> > Uncertainty Analyst
> > National Oceanography Centre, Southampton European Way,
> > Southampton SO14 3ZH, UK
> > tel 023-8059-7743
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
>
>
>
> ------------------------------
>
> Message: 18
> Date: Fri, 22 Jun 2007 13:23:08 +0000 (UTC)
> From: Ben Bolker <bolker_at_ufl.edu>
> Subject: Re: [R] extract index during execution of sapply
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <loom.20070622T151700-639@post.gmane.org>
> Content-Type: text/plain; charset=us-ascii
>
> Christian Bieli <christian.bieli <at> unibas.ch> writes:
>
> >
> > Hi there
> > During execution of sapply I want to extract the number of times the
> > function given to supply has been executed. I came up with:
> >
> > mylist <- list(a=3,b=6,c=9)
> > sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >
> > This works fine, but looks quite ugly. I'm sure that there's a more
> > elegant way to do this.
> >
> > Any suggestion?
> >
> > Christian
> >
>
> I would love to have an answer to this -- when I run
> into this kind of problem I usually end up using mapply:
> e.g., suppose I have
>
> mylist <- replicate(5,list(x=runif(10),y=runif(10)),simplify=FALSE)
>
> and I want to plot each element in a different color. I'd like
> to be able to do
>
> plot(0:1,0:1,type="n")
> lapply(mylist,plot,col=i)
>
> but instead I do
>
> mapply(function(x,i) points(x,col=i),mylist,1:5)
>
> would it be too ugly to have a special variable called INDEX
> that could be used within an sapply/lapply statement?
>
>
>
> ------------------------------
>
> Message: 19
> Date: Fri, 22 Jun 2007 07:45:14 -0700 (PDT)
> From: Thomas Lumley <tlumley_at_u.washington.edu>
> Subject: Re: [R] extract index during execution of sapply
> To: Ben Bolker <bolker_at_ufl.edu>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <Pine.LNX.4.64.0706220732470.20743@homer22.u.washington.edu>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> On Fri, 22 Jun 2007, Ben Bolker wrote:
>
> > Christian Bieli <christian.bieli <at> unibas.ch> writes:
> >
> >>
> >> Hi there
> >> During execution of sapply I want to extract the number of times the
> >> function given to supply has been executed. I came up with:
> >>
> >> mylist <- list(a=3,b=6,c=9)
> >> sapply(mylist,function(x)as.numeric(gsub("[^0-9]","",deparse(substitute(x)))))
> >>
> >> This works fine, but looks quite ugly. I'm sure that there's a more
> >> elegant way to do this.
> >>
> >> Any suggestion?
> >>
> >> Christian
> >>
> >
> > I would love to have an answer to this -- when I run
> > into this kind of problem I usually end up using mapply:
> > e.g., suppose I have
> >
> > mylist <- replicate(5,list(x=runif(10),y=runif(10)),simplify=FALSE)
> >
> > and I want to plot each element in a different color. I'd like
> > to be able to do
> >
> > plot(0:1,0:1,type="n")
> > lapply(mylist,plot,col=i)
> >
> > but instead I do
> >
> > mapply(function(x,i) points(x,col=i),mylist,1:5)
> >
> > would it be too ugly to have a special variable called INDEX
> > that could be used within an sapply/lapply statement?
> >
>
> There are two distinct suggestions here: a variable that says *how many*
> times the function has been called, and a variable that say *which
> element* is currently being operated on. The first seems undesirable as
> order of evaluation really should not matter in the apply functions.
>
> The second makes more sense but is still a little tricky. AFAICS there is
> no way for lapply() to find out whether FUN will accept an argument INDEX
> without an "unused argument(s)" error, so it can't just be passed as an
> argument. This suggests having yet another apply function, that would
> assume an INDEX argument and might be written
> yapply<-function(X,FUN, ...) {
> index<-seq(length.out=length(X))
> mapply(FUN,X,INDEX=index,MoreArgs=list(...))
> }
>
> However, I think it would be preferable in many cases for INDEX to be
> names(X) if it exists, rather than 1:n. In any case, it is easy to write
> the function.
>
> -thomas
>
>
>
> ------------------------------
>
> Message: 20
> Date: Fri, 22 Jun 2007 10:49:08 -0400
> From: "Oden, Kevin" <kevin.oden_at_wachovia.com>
> Subject: [R] fitCopula
> To: <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <E3A68C90920A014CBB128279519B1B35042FEB87@M1WACA0030I001.cibna.msds.wachovia.net>
>
> Content-Type: text/plain
>
> I am using R 2.5.0 on windows XP and trying to fit copula. I see the
> following code works for some users, however my code crashes on the
> chol. Any suggestions?
>
>
>
> > mycop <- tCopula(param=0.5, dim=8, dispstr="ex", df=5)
>
> > x <- rcopula(mycop, 1000)
>
> > myfit <- fitCopula(x, mycop, c(0.6, 10), optim.control=list(trace=1),
> method="Nelder-Mead")
>
> Nelder-Mead direct search function minimizer
>
> function value for initial parameters = -1747.582044
>
> Scaled convergence tolerance is 2.6041e-05
>
> Stepsize computed as 1.000000
>
> Error in chol(x, pivot = FALSE) : the leading minor of order 2 is not
> positive definite
>
>
>
> Kevin D. Oden
>
> e: kevin.oden_at_wachovia.com <mailto:kevin.oden_at_wachovia.com>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 21
> Date: Fri, 22 Jun 2007 10:58:35 -0400
> From: "Weiwei Shi" <helprhelp_at_gmail.com>
> Subject: [R] how to ave this?
> To: "R Help" <R-help_at_stat.math.ethz.ch>
> Message-ID:
> <cdf817830706220758r10e93178x971a53e574e9488d@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi,
>
> I have a list that looks like this:
> [[1]]
> fc tt
> 50 0.07526882 0.000000000
> 100 0.09289617 0.000000000
> 150 0.12359551 0.000000000
>
> [[2]]
> fc tt
> 50 0.02040816 0.000000000
> 100 0.03626943 0.005025126
> 150 0.05263158 0.010101010
>
> and I am wondering how to "average" it so that I have one matrix t0 at
> the end, and t0[1,1] = (0.075..+0.0204..)/2
>
> Thanks,
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
>
>
> ------------------------------
>
> Message: 22
> Date: Fri, 22 Jun 2007 11:04:15 -0400
> From: "Oden, Kevin" <kevin.oden_at_wachovia.com>
> Subject: [R] fitCopula
> To: <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <E3A68C90920A014CBB128279519B1B35042FEB88@M1WACA0030I001.cibna.msds.wachovia.net>
>
> Content-Type: text/plain
>
> I am using R 2.5.0 on windows XP and trying to fit copula. I see the
> following code works for some users, however my code crashes on the
> chol. Any suggestions?
>
>
>
> > mycop <- tCopula(param=0.5, dim=8, dispstr="ex", df=5)
>
> > x <- rcopula(mycop, 1000)
>
> > myfit <- fitCopula(x, mycop, c(0.6, 10), optim.control=list(trace=1),
> method="Nelder-Mead")
>
> Nelder-Mead direct search function minimizer
>
> function value for initial parameters = -1747.582044
>
> Scaled convergence tolerance is 2.6041e-05
>
> Stepsize computed as 1.000000
>
> Error in chol(x, pivot = FALSE) : the leading minor of order 2 is not
> positive definite
>
>
>
> Kevin D. Oden
>
> e: kevin.oden_at_wachovia.com <mailto:kevin.oden_at_wachovia.com>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 23
> Date: Fri, 22 Jun 2007 11:20:19 -0400
> From: "Weiwei Shi" <helprhelp_at_gmail.com>
> Subject: Re: [R] how to ave this?
> To: "R Help" <R-help_at_stat.math.ethz.ch>
> Message-ID:
> <cdf817830706220820k7db2f82dv3e2a2e7d7a39ff69@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> one of my approaches is:
>
> x0 = sapply(mylist, cbind)
>
> and manipulate from x0 (x0[1:nrow(x0)/2, ] correponds to fc and the
> lower part is tt.
>
> but it is not neat way.
>
>
> On 6/22/07, Weiwei Shi <helprhelp_at_gmail.com> wrote:
> > Hi,
> >
> > I have a list that looks like this:
> > [[1]]
> > fc tt
> > 50 0.07526882 0.000000000
> > 100 0.09289617 0.000000000
> > 150 0.12359551 0.000000000
> >
> > [[2]]
> > fc tt
> > 50 0.02040816 0.000000000
> > 100 0.03626943 0.005025126
> > 150 0.05263158 0.010101010
> >
> > and I am wondering how to "average" it so that I have one matrix t0 at
> > the end, and t0[1,1] = (0.075..+0.0204..)/2
> >
> > Thanks,
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
>
>
> ------------------------------
>
> Message: 24
> Date: Fri, 22 Jun 2007 16:26:57 +0100 (BST)
> From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>
> Subject: Re: [R] Result depends on order of factors in unbalanced
> designs (lme, anova)?
> To: Karl Knoblick <karlknoblich_at_yahoo.de>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <Pine.LNX.4.64.0706221617190.11817@gannet.stats.ox.ac.uk>
> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>
> 'anova' is rather a misnomer here. In terms of the description in
> ?anova.lme, you have
>
> When only one fitted model object is present, a data frame with
> the sums of squares, numerator degrees of freedom, denominator
> degrees of freedom, F-values, and P-values for Wald tests for the
> terms in the model ...
>
> but there are no 'sums of squares' shown. However, the crucial part of
> that help page is
>
> type: an optional character string specifying the type of sum of
> squares to be used in F-tests for the terms in the model. If
> '"sequential"', the sequential sum of squares obtained by
> including the terms in the order they appear in the model is
> used; else, if '"marginal"', the marginal sum of squares
> obtained by deleting a term from the model at a time is used.
> This argument is only used when a single fitted object is
> passed to the function. Partial matching of arguments is
> used, so only the first character needs to be provided.
> Defaults to '"sequential"'.
>
> so these are sequential fits (just like anova for an lm fit), and yes,
> sequential fits do in general depend on the sequence of terms.
>
> The issues of interpretation are exactly those of unbalanced linear
> models, and you will find advice on that in many places, e.g. in MASS.
>
>
> On Thu, 21 Jun 2007, Karl Knoblick wrote:
>
> > Dear R-Community!
> >
> > For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below.
> >
> > In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one?
> >
> > Here the example script:
> > library(nlme)
> > set.seed(123)
> > # Random generation of data:
> > NSubj<-40 # No. of subjects
> > set.seed(1234)
> > id<-factor(rep(c(1:NSubj),4)) # ID of subjects
> > treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels
> > gender<-factor(rep(rep(1:2, each=20),4))
> > visit<-factor(rep(1:4, each=NSubj))
> > y<-runif(4*NSubj) # Results
> > # Add effects
> > y<-y+0.01*as.integer(visit)
> > y<-y+0.02*as.integer(gender)
> > y<-y+0.024*as.integer(treat)
> > df<-data.frame(id, treat, gender, visit, y)
> > # groupedData object for lme
> > gdat<-groupedData(y ~ visit|id, data=df)
> > # fits - different ordering of factors
> > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> > anova(fit1)
> > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> > anova(fit2)
> > # Result: identical (balanced design so far), ok
> > # Now change gender of subject 1
> > gdat$gender[c(1,41,81,121)]<-2
> > # onece more fits with different ordering of factors
> > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id)
> > anova(fit1)
> > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id)
> > anova(fit2)
> > # Result: There are differences!!
> >
> > Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme?
> >
> > Thanks!
> > Karl
>
> --
> Brian D. Ripley, ripley_at_stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
>
>
> ------------------------------
>
> Message: 25
> Date: Fri, 22 Jun 2007 17:27:42 +0200
> From: "Christophe Pallier" <christophe_at_pallier.org>
> Subject: Re: [R] Tools For Preparing Data For Analysis
> To: "Kevin E. Thorpe" <kevin.thorpe_at_utoronto.ca>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <dea6cb960706220827y4c281c31t493106eeaffedd4b@mail.gmail.com>
> Content-Type: text/plain
>
> If I understand correctly (from your Perl script)
>
> 1. you count the number of occurences of each "(echo, muga)" pairs in the
> first file.
>
> 2. you remove from the second file the lines that correspond to these
> occurences.
>
> If this is indeed your aim, here's a solution in R:
>
> cumcount <- function(x) {
> y <- numeric(length(x))
> for (i in 1:length(y)) {
> y[i] = sum(x[1:i] == x[i])
> }
> y
> }
>
> both <- read.csv('both_echo.csv')
> v <- table(paste(both$echo, "_", both$muga, sep=""))
>
> semi <- read.csv('qual_echo.csv')
> s <- paste(semi$echo, "_", semi$muga, sep="")
> cs = cumcount(s)
> count = v[s]
> count[is.na(count)]=0
>
> semi2 <- data.frame(semi, s, cs, count, keep = cs > count)
>
> > semi2
> echo muga quant s cs count keep
> 1 10 20 0 10_20 1 0 TRUE
> 2 10 20 0 10_20 2 0 TRUE
> 3 10 21 0 10_21 1 1 FALSE
> 4 10 21 0 10_21 2 1 TRUE
> 5 10 24 0 10_24 1 0 TRUE
> 6 10 25 0 10_25 1 2 FALSE
> 7 10 25 0 10_25 2 2 FALSE
> 8 10 25 0 10_25 3 2 TRUE
>
>
> My code is not very readable...
> Yet, the 'trick' of using an helper function like 'cumcount' might be
> instructive.
>
> Christophe Pallier
>
>
> On 6/22/07, Kevin E. Thorpe <kevin.thorpe_at_utoronto.ca> wrote:
> >
> > I am posting to this thread that has been quiet for some time because I
> > remembered the following question.
> >
> > Christophe Pallier wrote:
> > > Hi,
> > >
> > > Can you provide examples of data formats that are problematic to read
> > and
> > > clean with R ?
> >
> > Today I had a data manipulation problem that I don't know how to do in R
> > so I solved it with perl. Since I'm always interested in learning more
> > about complex data manipulation in R I am posting my problem in the
> > hopes of receiving some hints for doing this in R.
> >
> > If anyone has nothing better to do than play with other people's data,
> > I would be happy to send the row files off-list.
> >
> > Background:
> >
> > I have been given data that contains two measurements of left
> > ventricular ejection fraction. One of the methods is echocardiogram
> > which sometimes gives a true quantitative value and other times a
> > semi-quantitative value. The desire is to compare echo with the
> > other method (MUGA). In most cases, patients had either quantitative
> > or semi-quantitative. Same patients had both. The data came
> > to me in excel files with, basically, no patient identifiers to link
> > the "both" with the semi-quantitative patients (the "both" patients
> > were in multiple data sets).
> >
> > What I wanted to do was extract from the semi-quantitative data file
> > those patients with only semi-quantitative. All I have to link with
> > are the semi-quantitative echo and the MUGA and these pairs of values
> > are not unique.
> >
> > To make this more concrete, here are some portions of the raw data.
> >
> > "Both"
> >
> > "ID NUM","ECHO","MUGA","Semiquant","Quant"
> > "B",12,37,10,12
> > "D",13,13,10,13
> > "E",13,26,10,15
> > "F",13,31,10,13
> > "H",15,15,10,15
> > "I",15,21,10,15
> > "J",15,22,10,15
> > "K",17,22,10,17
> > "N",17.5,4,10,17.5
> > "P",18,25,10,18
> > "R",19,25,10,19
> >
> > Seimi-quantitative
> >
> > "echo","muga","quant"
> > 10,20,0 <-- keep
> > 10,20,0 <-- keep
> > 10,21,0 <-- remove
> > 10,21,0 <-- keep
> > 10,24,0 <-- keep
> > 10,25,0 <-- remove
> > 10,25,0 <-- remove
> > 10,25,0 <-- keep
> >
> > Here is the perl program I wrote for this.
> >
> > #!/usr/bin/perl
> >
> > open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> > # Discard first row;
> > $_ = <BOTH>;
> > while(<BOTH>) {
> > chomp;
> > ($id, $e, $m, $sq, $qu) = split(/,/);
> > $both{$sq,$m}++;
> > }
> > close(BOTH);
> >
> > open(OUT, "> qual_echo_only.csv") || die "Can't open qual_echo_only.csv";
> > print OUT "pid,echo,muga,quant\n";
> > $pid = 2001;
> >
> > open(QUAL, "qual_echo.csv") || die "Can't open qual_echo.csv";
> > # Discard first row
> > $_ = <QUAL>;
> > while(<QUAL>) {
> > chomp;
> > ($echo, $muga, $quant) = split(/,/);
> > if ($both{$echo,$muga} > 0) {
> > $both{$echo,$muga}--;
> > }
> > else {
> > print OUT "$pid,$echo,$muga,$quant\n";
> > $pid++;
> > }
> > }
> > close(QUAL);
> > close(OUT);
> >
> > open(OUT, "> both_echo.csv") || die "Can't open both_echo.csv";
> > print OUT "pid,echo,muga,quant\n";
> > $pid = 3001;
> >
> > open(BOTH, "quant_qual_echo.csv") || die "Can't open quant_qual_echo.csv";
> > # Discard first row;
> > $_ = <BOTH>;
> > while(<BOTH>) {
> > chomp;
> > ($id, $e, $m, $sq, $qu) = split(/,/);
> > print OUT "$pid,$sq,$m,0\n";
> > print OUT "$pid,$qu,$m,1\n";
> > $pid++;
> > }
> > close(BOTH);
> > close(OUT);
> >
> >
> > --
> > Kevin E. Thorpe
> > Biostatistician/Trialist, Knowledge Translation Program
> > Assistant Professor, Department of Public Health Sciences
> > Faculty of Medicine, University of Toronto
> > email: kevin.thorpe_at_utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Christophe Pallier (http://www.pallier.org)
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 26
> Date: Fri, 22 Jun 2007 16:30:44 +0100
> From: Simon Wood <s.wood_at_bath.ac.uk>
> Subject: Re: [R] interpretation of F-statistics in GAMs
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <200706221630.44317.s.wood@bath.ac.uk>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On Friday 15 June 2007 08:06, robert.ptacnik_at_niva.no wrote:
> > dear listers,
> > I use gam (from mgcv) for evaluation of shape and strength of relationships
> > between a response variable and several predictors.
> > How can I interpret the 'F' values viven in the GAM summary? Is it
> > appropriate to treat them in a similar manner as the T-statistics in a
> > linear model, i.e. larger values mean that this variable has a stronger
> > impact than a variable with smaller F?
> - I'd be a bit cautious about this (even for T-statistics and linear models
> it's not quite clear to me what `impact' means if judged this way). These gam
> F statistics are only meant to provide a rough and ready means of judging
> approximate significance of terms, and I'm unsure about interpreting a
> comparison of such F ratios: for example the F statistics can be based on
> differerent numbers of degrees of freedom, depending on the term concerned...
>
> > When I run my analysis for two different response varables (but identical
> > predictors), is there a way to compare the F values among tests (like to
> > standardize them by teh sum of F within each test?) I append two summaries
> > below.
> - Again, I don't really known how this would work. I'd be more inclined to
> compare the plotted terms and associated CIs (and maybe the p-values),
> especially if you are using GAMs in a quite exploratory way (e.g. if the
> assumption of an additive structure is really a convenience, rather than
> being something that is suggested by the underlying science).
>
> best,
> Simon
>
> >
> >
> > ### example 1 ###
> >
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dep[sel, i] ~ s(date, k = 3) + s(depth, k = kn) + s(temp, k = kn) +
> > s(light, k = kn) + s(PO4, k = kn) + s(DIN, k = kn) + s(prop.agpla,
> > k = kn)
> >
> > Parametric coefficients:
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) 5.1048 0.0384 132.9 <2e-16 ***
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> > edf Est.rank F p-value
> > s(date) 1.669 2 12.161 1.07e-05 ***
> > s(depth) 1.671 2 36.125 4.85e-14 ***
> > s(temp) 1.927 2 6.686 0.00156 **
> > s(light) 1.886 2 12.604 7.20e-06 ***
> > s(PO4) 1.676 2 3.237 0.04143 *
> > s(DIN) 1.000 1 38.428 3.41e-09 ***
> > s(prop.agpla) 1.405 2 15.987 3.79e-07 ***
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > R-sq.(adj) = 0.687 Deviance explained = 70.5%
> > GCV score = 0.31995 Scale est. = 0.30076 n = 204
> >
> > ### example 2 ###
> > Family: gaussian
> > Link function: identity
> >
> > Formula:
> > dep[sel, i] ~ s(date, k = 3) + s(depth, k = kn) + s(temp, k = kn) +
> > s(light, k = kn) + s(PO4, k = kn) + s(DIN, k = kn) + s(prop.agpla,
> > k = kn)
> >
> > Parametric coefficients:
> > Estimate Std. Error t value Pr(>|t|)
> > (Intercept) 7.13588 0.05549 128.6 <2e-16 ***
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > Approximate significance of smooth terms:
> > edf Est.rank F p-value
> > s(date) 1.944 2 15.997 3.67e-07 ***
> > s(depth) 1.876 2 25.427 1.52e-10 ***
> > s(temp) 1.000 1 2.866 0.0921 .
> > s(light) 1.751 2 4.212 0.0162 *
> > s(PO4) 1.950 2 10.632 4.14e-05 ***
> > s(DIN) 1.805 2 10.745 3.73e-05 ***
> > s(prop.agpla) 1.715 2 2.674 0.0715 .
> > ---
> > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >
> > R-sq.(adj) = 0.479 Deviance explained = 50.9%
> > GCV score = 0.6863 Scale est. = 0.64348 n = 209
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html and provide commented, minimal,
> > self-contained, reproducible code.
>
> --
> > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > +44 1225 386603 www.maths.bath.ac.uk/~sw283
>
>
>
> ------------------------------
>
> Message: 27
> Date: Fri, 22 Jun 2007 17:53:01 +0200
> From: Martin Maechler <maechler_at_stat.math.ethz.ch>
> Subject: Re: [R] Boxplot issues
> To: "S Ellison" <S.Ellison_at_lgc.co.uk>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <18043.61533.242252.779598@stat.math.ethz.ch>
> Content-Type: text/plain; charset=us-ascii
>
> >>>>> "SE" == S Ellison <S.Ellison_at_lgc.co.uk>
> >>>>> on Fri, 22 Jun 2007 13:02:20 +0100 writes:
>
> SE> Boxplot and bxp seem to have changed behaviour a bit of late (R 2.4.1). Or maybe I am mis-remembering.
> SE> An annoying feature is that while at=3:6 will work, there is no way of overriding the default xlim of 0.5 to n+0.5. That prevents plotting boxes on, for example, interval scales - a useful thing to do at times. I really can see no good reason for bxp to hard-core the xlim=c(0.5, n+0.5) in the function body; it should be a parameter default conditional on horizontal=, not hard coded.
>
> SE> Also, boxplot does not drop empty groups. I'm sure it used to. I know it is good to be able to see where a factor level is unpopulated, but its a nuisance with fractional factorials and some nested or survey problems when many are known to be missing and are of no interest. Irrespective of whether my memory is correct, the option would be useful. How hard can it be to add a 'drop.empty=F' default to boxplot to allow it to switch?
>
> SE> Obviously, these are things I can fix locally. But who 'owns' boxplot so I can provide suggested code to them for later releases?
>
>
> Legally speaking, I think that's a hard question the answer of
> which may even depend on the country where it is answered.
> I would like to say it is owned by the R Foundation.
>
> Suggested improvements of the R "base code" should be made and
> discussed on the R-devel mailing list. That's exactly the main
> purpose of that list.
> Such propositions typically make it into the code base
> if you are convincing and you provide code improvements that
> convince at least one member of R core that it's worth his time
> to implement, document, *and* test the changes.
>
> Also, as on R-help, it helps to work with small reproducible
> (ideally "cut-n-pastable") R code examples.
>
> Regards,
> Martin Maechler
>
> SE> Steve Ellison
>
>
>
> ------------------------------
>
> Message: 28
> Date: Fri, 22 Jun 2007 12:19:35 -0400
> From: S?bastien <pomchip_at_free.fr>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: Deepayan Sarkar <deepayan.sarkar_at_gmail.com>
> Cc: R-help <r-help_at_stat.math.ethz.ch>
> Message-ID: <467BF697.8050203@free.fr>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Hi Deepayan,
>
> The following code creates a dummy dataset which has the same similar as
> my usual datasets. I did not try to implement the changes proposed by
> Hadley, hoping that a solution can be found using the original dataset.
>
> ######### My code
>
> # Creating dataset
>
> nPts<-10 # number of time points
> nInd<-6 # number of individuals
> nModel<-3 # number of models
>
> TimePts<-rep(1:nPts,nInd*nModel) #
> creates the "Time" column
> Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel) # Creates a
> vector of coefficients for generating the observations
> Obs<-10*exp(-Coef*TimePts) #
> creates the observations
>
> for (i in 1:60){
> Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> Pred[i+60]<-jitter(5)
> Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> }
> # creates the predicted values
>
> colPlot<-rep(1,nPts*nInd*nModel)
> # creates the "Plot" column
> colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C")) #
> creates the "Model" column
> colID<-gl(nInd,nPts,nPts*nInd*nModel)
> # creates the "ID" column
>
> mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
> # creates the dataset
> names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
>
> # Plotting as indicated by Deepayan
>
> xyplot(Observed + Predicted ~ Time | Individuals + Model,
> data = mydata,
> panel = panel.superpose.2, type = c("p", "l"),
> layout = c(0, nlevels(mydata$Individuals))) #,
> #<...>)
>
> ####### End of code
>
> This codes is not exactly what I am looking for, although it is pretty
> close. In the present case, I would like to have a Trellis plot with 6
> panels (one for each individual), where the Observations and the
> Predicted are plotted as symbols and lines, respectively. All three
> models should be plotted on the same panel. Unfortunately, it looks to
> me as 3 successives xyplots are created by the code above but only the
> last one remains displayed. I tried to play with
> panel.superpose,panel.superpose.2 and type, without much success.
>
> I also tried the following code that creates 18 panels and distinguish
> all (Individuals,Model) couples... so, not what I want.
>
> xyplot(Observed + Predicted ~ Time | Individuals+Model, data = mydata,
> type = c("p", "l"), distribute.type = TRUE)
>
> Sebastien
>
>
> Deepayan Sarkar a ?crit :
> > On 6/21/07, S?bastien <pomchip_at_free.fr> wrote:
> >> Hi Hadley,
> >>
> >> Hopefully, my dataset won't be too hard to changed. Can I modify the
> >> aspect of each group using your code (symbols for observed and lines for
> >> predicted)?
> >>
> >> Sebastien
> >>
> >> hadley wickham a ?crit :
> >> > Hi Sebastian,
> >> >
> >> > I think you need to rearrange your data a bit. Firstly, you need to
> >> > put observed on the same footing as the different models, so you would
> >> > have a new column in your data called value (previously observed and
> >> > predicted) and a new model type ("observed"). Then you could do:
> >
> > Yes, and ?make.groups (and reshape of course) could help with that.
> > This might not be strictly necessary though.
> >
> > However, I'm finding your pseudo-code confusing. Could you create a
> > small example data set that can be used to try out some real code?
> > Just from your description, I would have suggested something like
> >
> > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> > data = mydata,
> > panel = panel.superpose.2, type = c("p", "l"),
> > layout = c(0, nlevels(mydata$Individuals)),
> > <...>)
> >
> > If all you want is to plot one page at a time, there are easier ways
> > to do that.
> >
> > -Deepayan
> >
> >> >
> >> > xyplot(value ~ time | individauls, data=mydata, group=model)
> >> >
> >> > Hadley
> >> >
> >> >
> >> > On 6/21/07, S?bastien <pomchip_at_free.fr> wrote:
> >> >> Dear R Users,
> >> >>
> >> >> I recently posted an email on this list about the use of
> >> data.frame and
> >> >> overlaying multiple plots. Deepayan kindly indicated to me the
> >> >> panel.superposition command which worked perfectly in the context
> >> of the
> >> >> example I gave.
> >> >> I'd like to go a little bit further on this topic using a more
> >> complex
> >> >> dataset structure (actually the one I want to work on).
> >> >>
> >> >> >mydata
> >> >> Plot Model Individuals Time Observed
> >> >> Predicted
> >> >> 1 1 A 1 0.05
> >> >> 10 10.2
> >> >> 2 1 A 1 0.10
> >> >> 20 19.5
> >> >> etc...
> >> >> 10 1 B 1 0.05 10
> >> >> 9.8
> >> >> 11 1 B 1 0.10 20
> >> >> 20.2
> >> >> etc...
> >> >>
> >> >> There are p "levels" in mydata$Plot, m in mydata$Model, n in
> >> >> mydata$Individuals and t in mydata$Time (Note that I probably use the
> >> >> word levels improperly as all columns are not factors). Basically,
> >> this
> >> >> dataset summarizes the t measurements obtained in n individuals as
> >> well
> >> >> as the predicted values from m different modeling approaches
> >> (applied to
> >> >> all individuals). Therefore, the observations are repeated m times in
> >> >> the Observed columns, while the predictions appears only once for a
> >> >> given model an a given individual.
> >> >>
> >> >> What I want to write is a R batch file creating a Trellis graph,
> >> where
> >> >> each panel corresponds to one individual and contains the
> >> observations
> >> >> (as scatterplot) plus the predicted values for all models (as
> >> lines of
> >> >> different colors)... $Plot is just a token: it might be used to not
> >> >> overload graphs in case there are too many tested models. The fun
> >> part
> >> >> is that the values of p, m, n and t might vary from one dataset to
> >> the
> >> >> other, so everything has to be coded dynamically.
> >> >>
> >> >> For the plotting part I was thinking about having a loop in my code
> >> >> containing something like that:
> >> >>
> >> >> for (i in 1:nlevels(mydata$Model)) {
> >> >>
> >> >> subdata<-subset(mydata,mydata$Model=level(mydata$Model)[i])
> >> >> xyplot(subset(Observed + Predicted ~ Time | Individuals, data =
> >> >> subdata) #plus additionnal formatting code
> >> >>
> >> >> }
> >> >>
> >> >> Unfortunately, this code simply creates a new Trellis plot instead of
> >> >> adding the model one by one on the panels. Any idea or link to a
> >> useful
> >> >> command will wellcome.
> >> >>
> >> >> Sebastien
> >> >>
> >> >> ______________________________________________
> >> >> R-help_at_stat.math.ethz.ch mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >> >
> >> >
> >>
> >> ______________________________________________
> >> R-help_at_stat.math.ethz.ch mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
> ------------------------------
>
> Message: 29
> Date: Fri, 22 Jun 2007 19:26:56 +0200
> From: "Jose Quesada " <quesada_at_gmail.com>
> Subject: [R] Matrix library, CHOLMOD error: problem too large
> To: "r-help_at_lists.r-project.org" <r-help_at_stat.math.ethz.ch>
> Message-ID: <op.tub2q6d64hcap5@delllap.ugr.es>
> Content-Type: text/plain; format=flowed; delsp=yes;
> charset=iso-8859-15
>
>
> I have a pretty large sparse matrix of integers:
> > dim(tasa)
> [1] 91650 37651
>
> I need to add one to it in order to take logs, but I'm getting the
> following error:
>
> > tasa = log(tasa + 1)
> CHOLMOD error: problem too large
> Error in asMethod(object) : Cholmod error `problem too large'
>
> I have 2 Gb of RAM, and the current workspace is barely 300mb.
> Is there any workaround to this? Anyone has any experience with this error?
>
> Thanks,
> -Jose
>
> --
> Jose Quesada, PhD.
> http://www.andrew.cmu.edu/~jquesada
>
>
>
> ------------------------------
>
> Message: 30
> Date: Fri, 22 Jun 2007 14:04:03 -0400
> From: Duncan Murdoch <murdoch_at_stats.uwo.ca>
> Subject: Re: [R] Matrix library, CHOLMOD error: problem too large
> To: Jose Quesada <quesada_at_gmail.com>
> Cc: "r-help_at_lists.r-project.org" <r-help_at_stat.math.ethz.ch>
> Message-ID: <467C0F13.4060004@stats.uwo.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/22/2007 1:26 PM, Jose Quesada wrote:
> > I have a pretty large sparse matrix of integers:
> >> dim(tasa)
> > [1] 91650 37651
> >
> > I need to add one to it in order to take logs, but I'm getting the
> > following error:
> >
> >> tasa = log(tasa + 1)
> > CHOLMOD error: problem too large
> > Error in asMethod(object) : Cholmod error `problem too large'
> >
> > I have 2 Gb of RAM, and the current workspace is barely 300mb.
> > Is there any workaround to this? Anyone has any experience with this error?
> >
>
> If tasa is sparse, then tasa+1 will not be sparse, so that's likely your
> problem. You might have better luck with
>
> log1p(tasa)
>
> if the authors of the Matrix package have written a method for log1p();
> if not, you'll probably have to do it yourself.
>
> Duncan Murdoch
>
>
>
> ------------------------------
>
> Message: 31
> Date: Fri, 22 Jun 2007 11:44:52 -0700
> From: "Horace Tso" <Horace.Tso_at_pgn.com>
> Subject: [R] Imputing missing values in time series
> To: <r-help_at_stat.math.ethz.ch>
> Message-ID: <467BB6340200006500006924@pgn.com>
> Content-Type: text/plain; charset=ISO-8859-15
>
> Folks,
>
> This must be a rather common problem with real life time series data
> but I don't see anything in the archive about how to deal with it. I
> have a time series of natural gas prices by flow date. Since gas is not
> traded on weekends and holidays, I have a lot of missing values,
>
> FDate Price
> 11/1/2006 6.28
> 11/2/2006 6.58
> 11/3/2006 6.586
> 11/4/2006 6.716
> 11/5/2006 NA
> 11/6/2006 NA
> 11/7/2006 6.262
> 11/8/2006 6.27
> 11/9/2006 6.696
> 11/10/2006 6.729
> 11/11/2006 6.487
> 11/12/2006 NA
> 11/13/2006 NA
> 11/14/2006 6.725
> 11/15/2006 6.844
> 11/16/2006 6.907
>
> What I would like to do is to fill the NAs with the price from the
> previous date * gas used during holidays is purchased from the week
> before. Though real simple, I wonder if there is a function to perform
> this task. Some of the imputation functions I'm aware of (eg. impute,
> transcan in Hmisc) seem to deal with completely different problems.
>
> 2.5.0/Windows XP
>
> Thanks in advance.
>
> HT
>
>
>
> ------------------------------
>
> Message: 32
> Date: Fri, 22 Jun 2007 14:01:52 -0500
> From: Erik Iverson <iverson_at_biostat.wisc.edu>
> Subject: Re: [R] Imputing missing values in time series
> To: Horace Tso <Horace.Tso_at_pgn.com>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <467C1CA0.1080606@biostat.wisc.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
> test
> [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is not
> > traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006 6.28
> > 11/2/2006 6.58
> > 11/3/2006 6.586
> > 11/4/2006 6.716
> > 11/5/2006 NA
> > 11/6/2006 NA
> > 11/7/2006 6.262
> > 11/8/2006 6.27
> > 11/9/2006 6.696
> > 11/10/2006 6.729
> > 11/11/2006 6.487
> > 11/12/2006 NA
> > 11/13/2006 NA
> > 11/14/2006 6.725
> > 11/15/2006 6.844
> > 11/16/2006 6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------
>
> Message: 33
> Date: Fri, 22 Jun 2007 12:08:26 -0700
> From: "Horace Tso" <Horace.Tso_at_pgn.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson_at_biostat.wisc.edu>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <467BBBBA0200006500006928@pgn.com>
> Content-Type: text/plain; charset=US-ASCII
>
> Erik, indeed it gets the work done. I was hoping to avoid the dreaded looping, though.....
>
> Thanks.
>
> Horace
>
> >>> Erik Iverson <iverson_at_biostat.wisc.edu> 6/22/2007 12:01 PM >>>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
> test
> [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is not
> > traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006 6.28
> > 11/2/2006 6.58
> > 11/3/2006 6.586
> > 11/4/2006 6.716
> > 11/5/2006 NA
> > 11/6/2006 NA
> > 11/7/2006 6.262
> > 11/8/2006 6.27
> > 11/9/2006 6.696
> > 11/10/2006 6.729
> > 11/11/2006 6.487
> > 11/12/2006 NA
> > 11/13/2006 NA
> > 11/14/2006 6.725
> > 11/15/2006 6.844
> > 11/16/2006 6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ------------------------------
>
> Message: 34
> Date: Fri, 22 Jun 2007 21:13:09 +0200
> From: "hadley wickham" <h.wickham_at_gmail.com>
> Subject: Re: [R] Switching X-axis and Y-axis for histogram
> To: "Donghui Feng" <donghui.feng_at_gmail.com>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <f8e6ff050706221213j3dc25aeaw6c6d4df7a4511e06@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> It's trivial to do this with ggplot2 (http://had.co.nz):
>
> qplot(rating, data=movies, geom="histogram") + coord_flip()
> qplot(rating, data=movies, geom="histogram", binwidth=0.1) + coord_flip()
>
> Hadley
>
> On 6/22/07, Donghui Feng <donghui.feng_at_gmail.com> wrote:
> > Dear all,
> >
> > I'm creating a histogram with the function hist(). But
> > right now what I get is column representation (as normal).
> > I'm wondering if I could switch X-axis and Y-axis and
> > get row-representation of frequencies?
> >
> > One more question, can I define the step of each axises
> > for the histogram?
> >
> > Thanks so much!
> >
> > Donghui
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 35
> Date: Fri, 22 Jun 2007 15:16:09 -0400
> From: "Leeds, Mark \(IED\)" <Mark.Leeds_at_morganstanley.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson_at_biostat.wisc.edu>, "Horace Tso"
> <Horace.Tso_at_pgn.com>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <D3AEEDA31E57474B840BEBC25A8A834401957418@NYWEXMB23.msad.ms.com>
> Content-Type: text/plain; charset="us-ascii"
>
> I have a function that does this type of thing but it works off a pure
> vector so it wouldn have to be modified.
> If you make your object a zoo object, the that object has many functions
> associated with it and na.locf would
> Do what you need, I think.
>
>
> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Erik Iverson
> Sent: Friday, June 22, 2007 3:02 PM
> To: Horace Tso
> Cc: r-help_at_stat.math.ethz.ch
> Subject: Re: [R] Imputing missing values in time series
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
> test
> [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is
> > not traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006 6.28
> > 11/2/2006 6.58
> > 11/3/2006 6.586
> > 11/4/2006 6.716
> > 11/5/2006 NA
> > 11/6/2006 NA
> > 11/7/2006 6.262
> > 11/8/2006 6.27
> > 11/9/2006 6.696
> > 11/10/2006 6.729
> > 11/11/2006 6.487
> > 11/12/2006 NA
> > 11/13/2006 NA
> > 11/14/2006 6.725
> > 11/15/2006 6.844
> > 11/16/2006 6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
>
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 36
> Date: Fri, 22 Jun 2007 12:21:40 -0700
> From: "Horace Tso" <Horace.Tso_at_pgn.com>
> Subject: Re: [R] Imputing missing values in time series
> To: "Erik Iverson" <iverson_at_biostat.wisc.edu>, "Mark (IED) Leeds"
> <Mark.Leeds_at_morganstanley.com>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID: <467BBED40200006500006931@pgn.com>
> Content-Type: text/plain; charset=US-ASCII
>
> Mark, thanks for the tips. I thought you financial folks must have run into things like these before. Just wonder why this problem wasn't asked more often on this list.
>
> H.
>
>
> >>> "Leeds, Mark (IED)" <Mark.Leeds_at_morganstanley.com> 6/22/2007 12:16 PM >>>
> I have a function that does this type of thing but it works off a pure
> vector so it wouldn have to be modified.
> If you make your object a zoo object, the that object has many functions
> associated with it and na.locf would
> Do what you need, I think.
>
>
> -----Original Message-----
> From: r-help-bounces_at_stat.math.ethz.ch
> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of Erik Iverson
> Sent: Friday, June 22, 2007 3:02 PM
> To: Horace Tso
> Cc: r-help_at_stat.math.ethz.ch
> Subject: Re: [R] Imputing missing values in time series
>
> I think my example should work for you, but I couldn't think of a way to
> do this without an interative while loop.
>
> test <- c(1,2,3,NA,4,NA,NA,5,NA,6,7,NA)
>
> while(any(is.na(test)))
> test[is.na(test)] <- test[which(is.na(test))-1]
>
> test
> [1] 1 2 3 3 4 4 4 5 5 6 7 7
>
> Horace Tso wrote:
> > Folks,
> >
> > This must be a rather common problem with real life time series data
> > but I don't see anything in the archive about how to deal with it. I
> > have a time series of natural gas prices by flow date. Since gas is
> > not traded on weekends and holidays, I have a lot of missing values,
> >
> > FDate Price
> > 11/1/2006 6.28
> > 11/2/2006 6.58
> > 11/3/2006 6.586
> > 11/4/2006 6.716
> > 11/5/2006 NA
> > 11/6/2006 NA
> > 11/7/2006 6.262
> > 11/8/2006 6.27
> > 11/9/2006 6.696
> > 11/10/2006 6.729
> > 11/11/2006 6.487
> > 11/12/2006 NA
> > 11/13/2006 NA
> > 11/14/2006 6.725
> > 11/15/2006 6.844
> > 11/16/2006 6.907
> >
> > What I would like to do is to fill the NAs with the price from the
> > previous date * gas used during holidays is purchased from the week
> > before. Though real simple, I wonder if there is a function to perform
>
> > this task. Some of the imputation functions I'm aware of (eg. impute,
> > transcan in Hmisc) seem to deal with completely different problems.
> >
> > 2.5.0/Windows XP
> >
> > Thanks in advance.
> >
> > HT
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help_at_stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
>
>
> ------------------------------
>
> Message: 37
> Date: Fri, 22 Jun 2007 21:40:26 +0200
> From: "hadley wickham" <h.wickham_at_gmail.com>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: " S?bastien " <pomchip_at_free.fr>
> Cc: R-help <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <f8e6ff050706221240m240391dfrf83f5eb6d47e1766@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Sebastian,
>
> I think the following does what you want:
>
> library(ggplot2)
> names(mydata) <- tolower(names(mydata))
>
> obs <- rename(subset(mydata, model=="A", -predicted), c("observed" = "value"))
> obs$model <- factor("observed")
> pred <- rename(mydata[, -5], c("predicted" = "value"))
> all <- rbind(obs, pred)
>
> ggplot(all, aes(x = time, y = value, colour=model)) +
> geom_point(data = subset(all, model != "Observed")) +
> geom_line(data= subset(all, model == "Observed")) +
> facet_grid(. ~ individuals)
>
> Hadley
>
> On 6/22/07, S?bastien <pomchip_at_free.fr> wrote:
> > Hi Deepayan,
> >
> > The following code creates a dummy dataset which has the same similar as
> > my usual datasets. I did not try to implement the changes proposed by
> > Hadley, hoping that a solution can be found using the original dataset.
> >
> > ######### My code
> >
> > # Creating dataset
> >
> > nPts<-10 # number of time points
> > nInd<-6 # number of individuals
> > nModel<-3 # number of models
> >
> > TimePts<-rep(1:nPts,nInd*nModel) #
> > creates the "Time" column
> > Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel) # Creates a
> > vector of coefficients for generating the observations
> > Obs<-10*exp(-Coef*TimePts) #
> > creates the observations
> >
> > for (i in 1:60){
> > Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> > Pred[i+60]<-jitter(5)
> > Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> > }
> > # creates the predicted values
> >
> > colPlot<-rep(1,nPts*nInd*nModel)
> > # creates the "Plot" column
> > colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C")) #
> > creates the "Model" column
> > colID<-gl(nInd,nPts,nPts*nInd*nModel)
> > # creates the "ID" column
> >
> > mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
> > # creates the dataset
> > names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
> >
> > # Plotting as indicated by Deepayan
> >
> >
> > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> > data = mydata,
> > panel = panel.superpose.2, type = c("p", "l"),
> > layout = c(0, nlevels(mydata$Individuals))) #,
> > #<...>)
> >
> > ####### End of code
> >
> > This codes is not exactly what I am looking for, although it is pretty
> > close. In the present case, I would like to have a Trellis plot with 6
> > panels (one for each individual), where the Observations and the
> > Predicted are plotted as symbols and lines, respectively. All three
> > models should be plotted on the same panel. Unfortunately, it looks to
> > me as 3 successives xyplots are created by the code above but only the
> > last one remains displayed. I tried to play with
> > panel.superpose,panel.superpose.2 and type, without much success.
> >
> > I also tried the following code that creates 18 panels and distinguish
> > all (Individuals,Model) couples... so, not what I want.
> >
> > xyplot(Observed + Predicted ~ Time | Individuals+Model, data = mydata,
> > type = c("p", "l"), distribute.type = TRUE)
> >
> > Sebastien
> >
> >
> > Deepayan Sarkar a ?crit :
> > > On 6/21/07, S?bastien <pomchip_at_free.fr> wrote:
> > >> Hi Hadley,
> > >>
> > >> Hopefully, my dataset won't be too hard to changed. Can I modify the
> > >> aspect of each group using your code (symbols for observed and lines for
> > >> predicted)?
> > >>
> > >> Sebastien
> > >>
> > >> hadley wickham a ?crit :
> > >> > Hi Sebastian,
> > >> >
> > >> > I think you need to rearrange your data a bit. Firstly, you need to
> > >> > put observed on the same footing as the different models, so you would
> > >> > have a new column in your data called value (previously observed and
> > >> > predicted) and a new model type ("observed"). Then you could do:
> > >
> > > Yes, and ?make.groups (and reshape of course) could help with that.
> > > This might not be strictly necessary though.
> > >
> > > However, I'm finding your pseudo-code confusing. Could you create a
> > > small example data set that can be used to try out some real code?
> > > Just from your description, I would have suggested something like
> > >
> > > xyplot(Observed + Predicted ~ Time | Individuals + Model,
> > > data = mydata,
> > > panel = panel.superpose.2, type = c("p", "l"),
> > > layout = c(0, nlevels(mydata$Individuals)),
> > > <...>)
> > >
> > > If all you want is to plot one page at a time, there are easier ways
> > > to do that.
> > >
> > > -Deepayan
> > >
> > >> >
> > >> > xyplot(value ~ time | individauls, data=mydata, group=model)
> > >> >
> > >> > Hadley
> > >> >
> > >> >
> > >> > On 6/21/07, S?bastien <pomchip_at_free.fr> wrote:
> > >> >> Dear R Users,
> > >> >>
> > >> >> I recently posted an email on this list about the use of
> > >> data.frame and
> > >> >> overlaying multiple plots. Deepayan kindly indicated to me the
> > >> >> panel.superposition command which worked perfectly in the context
> > >> of the
> > >> >> example I gave.
> > >> >> I'd like to go a little bit further on this topic using a more
> > >> complex
> > >> >> dataset structure (actually the one I want to work on).
> > >> >>
> > >> >> >mydata
> > >> >> Plot Model Individuals Time Observed
> > >> >> Predicted
> > >> >> 1 1 A 1 0.05
> > >> >> 10 10.2
> > >> >> 2 1 A 1 0.10
> > >> >> 20 19.5
> > >> >> etc...
> > >> >> 10 1 B 1 0.05 10
> > >> >> 9.8
> > >> >> 11 1 B 1 0.10 20
> > >> >> 20.2
> > >> >> etc...
> > >> >>
> > >> >> There are p "levels" in mydata$Plot, m in mydata$Model, n in
> > >> >> mydata$Individuals and t in mydata$Time (Note that I probably use the
> > >> >> word levels improperly as all columns are not factors). Basically,
> > >> this
> > >> >> dataset summarizes the t measurements obtained in n individuals as
> > >> well
> > >> >> as the predicted values from m different modeling approaches
> > >> (applied to
> > >> >> all individuals). Therefore, the observations are repeated m times in
> > >> >> the Observed columns, while the predictions appears only once for a
> > >> >> given model an a given individual.
> > >> >>
> > >> >> What I want to write is a R batch file creating a Trellis graph,
> > >> where
> > >> >> each panel corresponds to one individual and contains the
> > >> observations
> > >> >> (as scatterplot) plus the predicted values for all models (as
> > >> lines of
> > >> >> different colors)... $Plot is just a token: it might be used to not
> > >> >> overload graphs in case there are too many tested models. The fun
> > >> part
> > >> >> is that the values of p, m, n and t might vary from one dataset to
> > >> the
> > >> >> other, so everything has to be coded dynamically.
> > >> >>
> > >> >> For the plotting part I was thinking about having a loop in my code
> > >> >> containing something like that:
> > >> >>
> > >> >> for (i in 1:nlevels(mydata$Model)) {
> > >> >>
> > >> >> subdata<-subset(mydata,mydata$Model=level(mydata$Model)[i])
> > >> >> xyplot(subset(Observed + Predicted ~ Time | Individuals, data =
> > >> >> subdata) #plus additionnal formatting code
> > >> >>
> > >> >> }
> > >> >>
> > >> >> Unfortunately, this code simply creates a new Trellis plot instead of
> > >> >> adding the model one by one on the panels. Any idea or link to a
> > >> useful
> > >> >> command will wellcome.
> > >> >>
> > >> >> Sebastien
> > >> >>
> > >> >> ______________________________________________
> > >> >> R-help_at_stat.math.ethz.ch mailing list
> > >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> >> PLEASE do read the posting guide
> > >> >> http://www.R-project.org/posting-guide.html
> > >> >> and provide commented, minimal, self-contained, reproducible code.
> > >> >>
> > >> >
> > >> >
> > >>
> > >> ______________________________________________
> > >> R-help_at_stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> >
>
>
>
> ------------------------------
>
> Message: 38
> Date: Fri, 22 Jun 2007 12:47:51 -0700
> From: "Deepayan Sarkar" <deepayan.sarkar_at_gmail.com>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: " S?bastien " <pomchip_at_free.fr>
> Cc: R-help <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <eb555e660706221247v59cf4f4cpdaee99e0a1372c84@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> On 6/22/07, S?bastien <pomchip_at_free.fr> wrote:
> > Hi Deepayan,
> >
> > The following code creates a dummy dataset which has the same similar as
> > my usual datasets. I did not try to implement the changes proposed by
> > Hadley, hoping that a solution can be found using the original dataset.
> >
> > ######### My code
> >
> > # Creating dataset
> >
> > nPts<-10 # number of time points
> > nInd<-6 # number of individuals
> > nModel<-3 # number of models
> >
> > TimePts<-rep(1:nPts,nInd*nModel) #
> > creates the "Time" column
> > Coef<-rep(rnorm(6,0.1,0.01),each=nPts,nModel) # Creates a
> > vector of coefficients for generating the observations
> > Obs<-10*exp(-Coef*TimePts) #
> > creates the observations
> >
> > for (i in 1:60){
> > Pred[i]<-jitter(10*exp(-Coef[i]*TimePts[i]))
> > Pred[i+60]<-jitter(5)
> > Pred[i+120]<-jitter(10-Coef[i+120]*TimePts[i])
> > }
> > # creates the predicted values
> >
> > colPlot<-rep(1,nPts*nInd*nModel)
> > # creates the "Plot" column
> > colModel<-gl(nModel,nPts*nInd,labels=c("A","B","C")) #
> > creates the "Model" column
> > colID<-gl(nInd,nPts,nPts*nInd*nModel)
> > # creates the "ID" column
> >
> > mydata<-data.frame(colPlot,colModel,colID,TimePts,Obs,Pred)
> > # creates the dataset
> > names(mydata)<-c("Plot","Model","Individuals","Time","Observed","Predicted")
>
> The way you have structured your data makes no sense to me. In
> particular, your 'Observed' data is the same set of 60 numbers
> repeated 3 times, and this is not reflected in the data structure at
> all. What would you want to happen if the numbers were not repeated?
> Would you always plot the first 60, or would plot all of them?
>
> If I understand what you are trying to do, this might be a more
> transparent approach:
>
>
> nPts<-10 # number of time points
> nInd<-6 # number of individuals
>
> TimePts <- rep(1:nPts, nInd)
> Coef <- rep(rnorm(6,0.1,0.01), each = nPts)
> Obs <- 10 * exp(-Coef * TimePts)
> colID <- gl(nInd, nPts)
>
> mydata <- data.frame(Time = TimePts, Observed = Obs, Individuals = colID)
>
> fmA <- lm(Observed ~ Time, mydata)
> fmB <- lm(Observed ~ poly(Time, 2), mydata)
> fmC <- lm(Observed ~ poly(Time, 2) * Individuals, mydata)
>
> mydata$PredA <- predict(fmA)
> mydata$PredB <- predict(fmB)
> mydata$PredC <- predict(fmC)
>
> xyplot(Observed + PredA + PredB + PredC ~ Time | Individuals,
> data = mydata,
> type = c("p", "l", "l", "l"),
> distribute.type = TRUE)
>
> -Deepayan
>
>
>
> ------------------------------
>
> Message: 39
> Date: Fri, 22 Jun 2007 21:55:52 +0200
> From: "hadley wickham" <h.wickham_at_gmail.com>
> Subject: Re: [R] Stacked barchart color
> To: owenman <solberg_at_speakeasy.net>
> Cc: r-help_at_stat.math.ethz.ch
> Message-ID:
> <f8e6ff050706221255j37e15382n4179d5b276486c0b@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi Owen,
>
> The bars should be stacked in the order specified by the factor. Try
> using factor(..., levels=...) to explicitly order them the way you
> want. If that doesn't work, please provide a small replicable example
> and I'll look into it.
>
> Hadley
>
> On 6/18/07, owenman <solberg_at_speakeasy.net> wrote:
> >
> > Hi Hadley,
> > Great, I am starting to get it. It's working for me, but there is one more
> > thing I am having trouble with. The ordering of the stacked bars seems to
> > be dictated by the name of the color, I guess because of the fill=color
> > argument in aes(). In other words, if I set up my colors like this:
> > y$color = c("gray1","gray35","gray45","gray65") the bars get stacked in the
> > opposite order than if I set up the colors like this: y$color =
> > c("gray65","gray45","gray35","gray1"). How can I control the order of the
> > bars independent of the name of the colors? Thanks so much in advance!
> > Really neat package you've made.
> >
> > FYI, my plot command now looks like this:
> >
> > p = ggplot(y, aes(x=locus, y=Freq, fill=color))
> > p = p + geom_bar(position="fill")
> > p = p + scale_fill_identity(labels=levels(y$Fnd), grob="tile", name="Fnd
> > Results")
> > p = p + coord_flip()
> >
> > And the data table is similar as before:
> >
> > > y
> > Fnd locus Freq color
> > 1 signeg DPB1 0.013071895 gray1
> > 2 neg DPB1 0.581699346 gray35
> > 3 pos DPB1 0.379084967 gray45
> > 4 sigpos DPB1 0.026143791 gray65
> > 5 signeg DPA1 0.068181818 gray1
> > 6 neg DPA1 0.659090909 gray35
> > 7 pos DPA1 0.250000000 gray45
> > 8 sigpos DPA1 0.022727273 gray65
> >
> >
> >
> > hadley wrote:
> > >
> > > Hi Owen,
> > >
> > > The identity scale won't create a legend, unless you tell it what
> > > labels it should use - there's an example at
> > > http://had.co.nz/ggplot2/scale_identity.html. Otherwise, if you have
> > > a continuous scale and you want something that works in black and
> > > white, p + scale_fill_gradient(low="white", high="black") might be
> > > easier.
> > >
> > > Hadley
> > >
> > >
> > >>
> > >> > y$color = factor(y$Fnd)
> > >> > y$color = c("black","darkgray","lightgray","white")
> > >> > y
> > >> Fnd locus Freq color
> > >> 1 signeg A 0.087248322 black
> > >> 2 neg A 0.711409396 darkgray
> > >> 3 pos A 0.201342282 lightgray
> > >> 4 sigpos A 0.000000000 white
> > >> 5 signeg C 0.320754717 black
> > >> 6 neg C 0.603773585 darkgray
> > >> 7 pos C 0.075471698 lightgray
> > >> 8 sigpos C 0.000000000 white
> > >> 9 signeg B 0.157534247 black
> > >> 10 neg B 0.732876712 darkgray
> > >> 11 pos B 0.109589041 lightgray
> > >> 12 sigpos B 0.000000000 white
> > >>
> > >> > p = ggplot(y, aes(x=locus, y=Freq, fill=color)) +
> > >> > geom_bar(position="fill") + scale_fill_identity()
> > >> > p
> > >>
> > >>
> > >>
> > >>
> > >> hadley wrote:
> > >> >
> > >> >
> > >> > Hi Dieter,
> > >> >
> > >> > You can do this with ggplot2 (http://had.co.nz/ggplot2) as follows:
> > >> >
> > >> > library(ggplot2)
> > >> >
> > >> > barley1 <- subset(barley, site=="Grand Rapids" & variety %in%
> > >> > c("Velvet","Peatland"))
> > >> > barley1[] <- lapply(barley1, "[", drop=TRUE)
> > >> >
> > >> > qplot(variety, yield, data=barley1, geom="bar", stat="identity",
> > >> > fill=factor(year))
> > >> >
> > >> > barley1$fill <- c("red","green","blue","gray")
> > >> > qplot(variety, yield, data=barley1, geom="bar", stat="identity",
> > >> > fill=fill) + scale_fill_identity()
> > >> >
> > >> > See http://had.co.nz/ggplot2/scale_identity.html and
> > >> > http://had.co.nz/ggplot2/position_stack.html for more details.
> > >> >
> > >> > Hadley
> > >> >
> > >> >
> > >>
> > >>
> > >> --
> > >> View this message in context:
> > >> http://www.nabble.com/Stacked-barchart-color-tf3909162.html#a11149419
> > >> Sent from the R help mailing list archive at Nabble.com.
> > >>
> > >>
> > >> ______________________________________________
> > >> R-help_at_stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > >> http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > > ______________________________________________
> > > R-help_at_stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> >
> >
> > --
> > View this message in context: http://www.nabble.com/Stacked-barchart-color-tf3909162.html#a11182581
> >
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help_at_stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> ------------------------------
>
> Message: 40
> Date: Fri, 22 Jun 2007 22:07:37 +0200
> From: "hadley wickham" <h.wickham_at_gmail.com>
> Subject: Re: [R] Visualize quartiles of plot line
> To: "Arne Brutschy" <abr-r-project_at_xylon.de>
> Cc: R-help_at_stat.math.ethz.ch
> Message-ID:
> <f8e6ff050706221307s72ebc36v31537365bc7ff667@mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 6/17/07, Arne Brutschy <abr-r-project_at_xylon.de> wrote:
> > Hi,
> >
> > thanks for your tips - all of them worked. After a bit of fiddling, I
> > managed to get what I wanted.
>
> Glad to hear it.
>
> > hadley wickham wrote:
> > h> You might want to read the introductory chapters in the ggplot book,
> > h> available from http://had.co.nz/ggplot2, which will give you more of a
> > h> background. Please let me know places where you think the
> > h> documentation is inconsistent so I can try and make them better.
> > I already did. :) A general problem: the examples are nice and easy to
> > get, but it's often hard to apply them to my own specific problem.
> > It's more a problem of the general structure: what has to go where.
> > Most of the methods are using qplot, but what do I have to do if I'm
> > trying create a more complex plot. Hmm, it's hard to describe.
> >
> > Example: I know how I set the title when using qplot (qplot(....
> > main="asdf"). Where do I have to put it when I'm using gplot? Stuff
> > like this is unclear...
>
> p <- ggplot(...) + ...
> p$title <- "Title goes here"
>
> It is currently hard to figure this out in the current documentation though.
>
> > A more general problem is, that the manual pages are very, eh,
> > minimalistic documented. The overall reference page is good and nicely
> > structured. But the big idea is sort of missing. All components are
> > linked, but the basics like layout, ggplot, aes etc are harder to find
> > - and their help pages are the shortest. Especially the small details
> > are hard to figure out. Lists of attributes etc..
>
> Yes, that's definitely something I'm working on for the book.
> Unfortunately, I don't have
> that much time and it is a lot of work. Every comment helps though.
>
> > Hmm, I know this is not really helpful. I can't describe my problems
> > properly, I guess. Perhaps the documentation simply has to improve
> > based on users questions. :\
> >
> > How old is this package? I think it's really, really great, but are
> > there many users? Is there an additional mailinglist or forum where I
> > can get more information?
>
> It's pretty young still, although the precursor ggplot package has
> been around for about a year. I really have no idea how many users
> there are. For questions, either email me or R-help.
>
> > Some more questions:
> >
> > Why doesn't ggplot2 work with layout()? I'm using viewport now, which
> > works fine for me, but there should be a note in the docs perhaps.
>
> Because it works with the grid drawing package - see the last chapter
> in the ggplot book for some details on how to use the grid
> equivalents.
>
> > How do I change the legend. The auto-creation of it might be nice,
> > but I want a) to add a title b) change the order to ascending and c)
> > add a short description like:
> >
> > DeltaConfig
> > [ ] 0 best
> > [ ]
> > [ ] 5
> > [ ]
> > [ ]10 worst
> >
> > I don't know if this is possible, but it would be nice to explain what
> > the colors/values might mean if it's not clear from the beginning
> > (ligke diamonds.size). The only thing I found was the attribute
> > legend.justifcation in ggopt, which isn't fully documented.
>
> The legends aren't very customisable at the moment - look at the
> examples for the scale functions to see what you can do. You can see
> the name of the title easily, and you can change the labels by
> changing the level of the factors, or setting the breaks argument. I
> agree there could be more options. If you could provide me with a
> picture of what you want, I'll add it to my to do list to think about.
>
> > Additionally, how can I change the order of the facets? I currently
> > have a plot with a smoother for each model (all in the same plot),
> > which sorts the models like this: dyn,dl4,dl3 Below that, I have a
> > facet with point-plots for each model which sorts them the other way
> > round, which is a bit confusing.
>
> Again, change the order of the underlying factor.
>
> > BTW, what's the "strip" and the associated attributes?
>
> The strip is the labelled associated with the facet.
>
> > Again, I think this package is great - nice work! All the above isn't
> > meant as general critisism, but is being said in order to improve the
> > documentation..
>
> I do appreciate your comments and they definitely help me to make a
> better product.
>
> Thanks,
>
> Hadley
>
>
>
> ------------------------------
>
> Message: 41
> Date: Fri, 22 Jun 2007 16:14:52 -0400
> From: "Patrick Ayscue" <payscue_at_gmail.com>
> Subject: [R] "heatmap" color still a spectrum for binary outcomes?
> To: r-help_at_stat.math.ethz.ch
> Message-ID:
> <ea3a49790706221314n270f8036i72fb4356c579303@mail.gmail.com>
> Content-Type: text/plain
>
> I have a matrix of a time series binary response variable for around 200
> individuals I would like to display. I am approaching success using the
> "heatmap" function in the "stats" package without dendorgrams, however, am
> running into trouble in that the colors get lighter with more positive
> outcomes in a column (time point). Is there a way to make the colors
> uniform irrespective of the number of similar values in the column? or this
> part of the heatmap function?
>
> Other suggestions for representing the data graphically are certainly
> welcome as well.
>
> Thanks,
> Patrick
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 42
> Date: Fri, 22 Jun 2007 14:19:43 -0600
> From: "Spilak,Jacqueline [Edm]" <Jacqueline.Spilak_at_EC.gc.ca>
> Subject: [R] Barchart legend position
> To: <r-help_at_stat.math.ethz.ch>
> Message-ID:
> <4A6AB38B55B49C44A22E021A83CBEDDB015EB9A1@sr-pnr-exch3.prairie.int.ec.gc.ca>
>
> Content-Type: text/plain
>
> I am using barchart to make charts for some data with a lot more
> functions and labels and such in the command.
>
> barchart(Freq ~ factor(HH), data = dataset1, group= year)
>
> So I have my data grouped by year and I get a legend at the top of
> graph, which is great cause I need the legend for the different years
> but it is a weird spot. So how can I manipulate the legend, ie. Move
> it, shrink it, do anything with it. I have searched the help archives
> and found nothing, and I have looked up the legend section in ?barchart
> but that has not helped or I am doing something wrong. Any help is
> greatly appreciated.
> Jacquie
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------
>
> Message: 43
> Date: Fri, 22 Jun 2007 17:25:14 +0100
> From: Michael Hoffman <b3i4old02_at_sneakemail.com>
> Subject: [R] Lattice: hiding only some strips
> To: r-help_at_stat.math.ethz.ch
> Message-ID: <f5gt5i$f6k$1@sea.gmane.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> I am using R 2.4.0 and lattice to produce some xyplots conditioned on a
> factor and a shingle. The shingle merely chops up the data along the
> x-axis, so it is easy to identify which part of the shingle a panel is
> in by looking at the x-axis markings. I only want to have a strip at the
> top for the factor.
>
> Is this possible? I looked into calculateGridLayout() and it seems to me
> that there isn't an easy way to do it without rewriting that function
> (and others).
>
> Many thanks
> --
> Michael Hoffman
>
>
>
> ------------------------------
>
> Message: 44
> Date: Fri, 22 Jun 2007 16:42:13 -0400
> From: S?bastien <pomchip_at_free.fr>
> Subject: Re: [R] Overlaying lattice graphs (continued)
> To: hadley wickham <h.wickham_at_gmail.com>
> Cc: R-help <r-help_at_stat.math.ethz.ch>
> Message-ID: <467C3425.1030000@free.fr>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hadley,
>
> I have some troubles to run your code with ggplot version 0.4.1. Is the
> package ggplot2 mandatory ?
>
> Sebastien
>
> hadley wickham a ?crit :
> > Hi Sebastian,
> >
> > I think the following does what you want:
> >
> > library(ggplot2)
> > names(mydata) <- tolower(names(mydata))
> >
> > obs <- rename(subset(mydata, model=="A", -predicted), c("observed" =
> > "value"))
> > obs$model <- factor("observed")
> > pred <- rename(mydata[, -5], c("predicted" = "value")...
>
> [Mensaje recortado]



R-help_at_stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Sun 24 Jun 2007 - 23:38:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 26 Jun 2007 - 17:32:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.