Re: [R] Subsetting to unique values

From: Beat Bapst <beat.bapst_at_braunvieh.ch>
Date: Mon, 09 Jun 2008 07:57:16 +0200


Dear Paul,

Try :

ddTable[ match(unique(ddTable$ID), ddTable$ID), ]

Regards,

Beat Bapst

-----Ursprungliche Nachricht-----
Von: r-help-bounces_at_r-project.org
[mailto:r-help-bounces_at_r-project.org]Im Auftrag von r-help-request_at_r-project.org
Gesendet: Samstag, 7. Juni 2008 12:00
An: r-help_at_r-project.org
Betreff: R-help Digest, Vol 64, Issue 7

Send R-help mailing list submissions to

        r-help_at_r-project.org

To subscribe or unsubscribe via the World Wide Web, visit

        https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to

        r-help-request_at_r-project.org

You can reach the person managing the list at

        r-help-owner_at_r-project.org

When replying, please edit your Subject line so it is more specific than "Re: Contents of R-help digest..."

Today's Topics:

  1. Agreggating data using external aggregation rules (ANGELO.LINARDI_at_bancaditalia.it)
  2. Re: Lattice: key does not accept German umlaute (Dieter Menne)
  3. Re: which question (Dieter Menne)
  4. Re: Y values below the X plot (Jim Lemon)
  5. Re: Lattice: key does not accept German umlaute (Prof Brian Ripley)
  6. Merging two dataframes (Michael Pearmain)
  7. Re: Lattice: key does not accept German umlaute (Bernd Weiss)
  8. boxplot changes fontsize of labels (Sebastian Merz)
  9. simple data question (stephen sefick)
  10. Re: Multiple comment.char under read.table (Daniel Folkinshteyn)
  11. Re: simple data question (Daniel Folkinshteyn)
  12. Re: R: Securities earning covariance (Gabor Grothendieck)
  13. Re: Merging two dataframes (Daniel Folkinshteyn)
  14. Re: Agreggating data using external aggregation rules (Gabor Grothendieck)
  15. request: a class having max frequency (Muhammad Azam)
  16. Re: request: a class having max frequency (Chuck Cleland)
  17. Re: request: a class having max frequency (Michael Conklin)
  18. Re: request: a class having max frequency (Daniel Folkinshteyn)
  19. Re: Problem in executing R on server (Erik Iverson)
  20. Re: request: a class having max frequency (Chuck Cleland)
  21. Manipulating DataSets (Neil Gupta)
  22. Subsetting to unique values (Emslie, Paul [Ctr])
  23. Re: How can I display a characters table ? (Katharine Mullen)
  24. Giovanna Jonalasinio ? fuori ufficio, I'm away (Giovanna.Jonalasinio_at_uniroma1.it)
  25. Re: Subsetting to unique values (Chuck Cleland)
  26. Re: simple data question (stephen sefick)
  27. Re: Subsetting to unique values (John Kane)
  28. Re: which question (Eleni Christodoulou)
  29. Re: Subsetting to unique values (Adrian Dusa)
  30. Startup speed for a lengthy script (Dennis Fisher)
  31. Re: Java to R interface (Dumblauskas, Jerry)
  32. Re: which question (Richard Pearson)
  33. Re: Merging two dataframes (Daniel Folkinshteyn)
  34. fit.variogram sgeostat error (Alexys Herleym Rodriguez Avellaneda)
  35. lsmeans (Dani Valverde)
  36. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  37. store filename (DAVID ARTETA GARCIA)
  38. Re: label outliers in geom_boxplot (ggplot2) (hadley wickham)
  39. Re: Improving data processing efficiency (Patrick Burns)
  40. Re: Improving data processing efficiency (Gabor Grothendieck)
  41. Store filename (DAVID ARTETA GARCIA)
  42. where to download BRugs? (Nanye Long)
  43. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  44. Re: Improving data processing efficiency (Gabor Grothendieck)
  45. Re: Improving data processing efficiency (Gabor Grothendieck)
  46. How to force two regression coefficients to be equal but opposite in sign? (Woolner, Keith)
  47. Re: Store filename (Daniel Folkinshteyn)
  48. Re: Store filename (Henrique Dallazuanna)
  49. fit.contrast error (Dani Valverde)
  50. Re: where to download BRugs? (Uwe Ligges)
  51. Re: choosing an appropriate linear model (Levi Waldron)
  52. reorder breaking by half (avilella)
  53. Re: rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model (Thomas Lumley)
  54. Problem with subset (Luca Mortarini)
  55. Re: Manipulating DataSets (Charles C. Berry)
  56. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  57. Re: lsmeans (John Fox)
  58. Re: reorder breaking by half (Daniel Folkinshteyn)
  59. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  60. Re: Improving data processing efficiency (Gabor Grothendieck)
  61. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  62. Re: How to force two regression coefficients to be equal but opposite in sign? (Greg Snow)
  63. Re: Subsetting to unique values (jim holtman)
  64. Re: where to download BRugs? (Prof Brian Ripley)
  65. Re: Problem with subset (Charles C. Berry)
  66. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  67. Re: ggplot questions (Thompson, David (MNR))
  68. Re: Improving data processing efficiency (Patrick Burns)
  69. Re: ggplot questions (hadley wickham)
  70. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  71. Re: boxplot changes fontsize of labels (Prof Brian Ripley)
  72. Re: Improving data processing efficiency (Greg Snow)
  73. Re: Improving data processing efficiency (Gabor Grothendieck)
  74. Random Forest (Bertrand Pub Michel)
  75. mean (Marco Chiapello)
  76. Re: Java to R interface (madhura)
  77. R (D)COM Server not working on windows domain account (Evans_CSHL)
  78. Random Forest and for multivariate response data (Bertrand Pub Michel)
  79. Random Forest (Bertrand Pub Michel)
  80. R + Linux (steven wilson)
  81. Re: Improving data processing efficiency (Greg Snow)
  82. Re: mean (ctu_at_bigred.unl.edu)
  83. Re: Improving data processing efficiency (Patrick Burns)
  84. Plot matrix as many lines (Alberto Monteiro)
  85. Re: mean (Chuck Cleland)
  86. col.names ? (tolga.i.uzuner_at_jpmorgan.com)
  87. Re: ggplot questions (Thompson, David (MNR))
  88. Re: mean (Douglas Bates)
  89. New vocabulary on a Friday afternoon. Was: Improving data processing efficiency (Greg Snow)
  90. Re: R + Linux (Douglas Bates)
  91. editing a data.frame (john.polo)
  92. Re: Plot matrix as many lines (Henrique Dallazuanna)
  93. calling a C function with a struct (John Nolan)
  94. Re: col.names ? (Henrique Dallazuanna)
  95. Re: Plot matrix as many lines (Chuck Cleland)
  96. Re: col.names ? (Chuck Cleland)
  97. Re: col.names ? (William Pepe)
  98. Re: col.names ? (tolga.i.uzuner_at_jpmorgan.com)
  99. Re: Subsetting to unique values (Jorge Ivan Velez)
  100. Re: calling a C function with a struct (Duncan Murdoch)
  101. Re: R + Linux (Kevin E. Thorpe)
  102. Re: R + Linux (Markus J?ntti)
  103. color scale mapped to B/W (Michael Friendly)
  104. Re: Random Forest (Yasir Kaheil)
  105. Re: R + Linux (Roland Rau)
  106. Re: R + Linux (Dirk Eddelbuettel)
  107. Re: R + Linux (Prof Brian Ripley)
  108. Re: R + Linux (Esmail Bonakdarian)
  109. Re: R + Linux (Abhijit Dasgupta)
  110. Re: editing a data.frame (Daniel Folkinshteyn)
  111. Re: R + Linux (Daniel Folkinshteyn)
  112. Re: R + Linux (Jonathan Baron)
  113. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  114. Re: R + Linux (Esmail Bonakdarian)
  115. FW: R + Linux (Horace Tso)
  116. Re: Improving data processing efficiency (Don MacQueen)
  117. Re: Improving data processing efficiency (hadley wickham)
  118. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  119. Re: color scale mapped to B/W (hadley wickham)
  120. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  121. Re: color scale mapped to B/W (Achim Zeileis)
  122. Re: color scale mapped to B/W (Greg Snow)
  123. Re: Improving data processing efficiency (Daniel Folkinshteyn)
  124. Re: color scale mapped to B/W (Greg Snow)
  125. Re: Improving data processing efficiency (Esmail Bonakdarian)
  126. Re: Improving data processing efficiency (Horace Tso)
  127. Re: Improving data processing efficiency (Esmail Bonakdarian)
  128. Problem of installing Matrix (ronggui)
  129. Re: Improving data processing efficiency (Charles C. Berry)
  130. Re: Improving data processing efficiency (hadley wickham)
  131. Re: color scale mapped to B/W (hadley wickham)
  132. Re: editing a data.frame (john.polo)
  133. error message with dat (Paul Adams)
  134. Re: Problem of installing Matrix (Prof Brian Ripley)
  135. Predicting a single observatio using LME (Rebecca Sela)
  136. expected risk from coxph (survival) (Reid Tingley)
  137. txt file, 14000+ rows, only last 8000 appear (RobertsLRRI)
  138. functions for high dimensional integral (ZT2008)
  139. compilation failed on MacOSX.5 / icc 10.1 / ifort 10.1 / R 2.7.0 (Mathieu Prevot)
  140. Re: expected risk from coxph (survival) (Dieter Menne)
  141. Re: txt file, 14000+ rows, only last 8000 appear (Paul Smith)
  142. Re: color scale mapped to B/W (Achim Zeileis)
  143. Re: Predicting a single observatio using LME (Dieter Menne)
  144. Re: lsmeans (Dieter Menne)
  145. Re: functions for high dimensional integral (Prof Brian Ripley)

Message: 1
Date: Fri, 6 Jun 2008 12:12:36 +0200
From: <ANGELO.LINARDI_at_bancaditalia.it>
Subject: [R] Agreggating data using external aggregation rules To: <r-help_at_R-project.org>
Message-ID:

	<C844A6B20A3322429988FDA0E042FFDB01501AEA@SERVPE2.ac.bankit.it>
Content-Type: text/plain;	charset="us-ascii"

Dear R experts,

I am currently facing a tricky problem which I have read a lot about in the various R mailing lists without finding exactly what I need. I have a big data frame DF (about 2,000,000 rows) with 7 columns being variables and 1 being a measure (using reshape package nomeclature). There are no "duplicates" in it.
Fot each of the variables I have some "rules" to apply, being COD_IN the value of the variable in the DF, COD_OUT the one to be transformed to; once obtained the "new codes" in the DF I have to aggregate the "new DF" (for example summing the measure).
Usually the total transformation (merge+aggregate) really decreases the number of lines in the data frame, but sometimes it can grows depending on the rule. Just to give an idea, the first "rule" in v1 maps 820 different values into 7 ones.
Using SQL and a database this can be done in a very straightforward way (for example on the variable v1):

Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
>From DF, RULE_v1
Where v1=COD_IN
Group by v2, v3,v4, v5, v6, v7

So the first choice would be using a database; the second one would be splitting the data frame and then joining the results. Is there any other possibility to merge+aggregate caused by the merge ?

Thank you in advance

Angelo Linardi

Message: 2
Date: Fri, 6 Jun 2008 10:18:34 +0000 (UTC) From: Dieter Menne <dieter.menne_at_menne-biomed.de> Subject: Re: [R] Lattice: key does not accept German umlaute To: r-help_at_stat.math.ethz.ch
Message-ID: <loom.20080606T101648-707@post.gmane.org> Content-Type: text/plain; charset=us-ascii

Bernd Weiss <bernd.weiss <at> uni-koeln.de> writes:

> library(lattice)
> ## gives an error
> xyplot(1~1, key = list(text = list(c("M\344nner"))))
>
> Is this a bug?

You forgot to mention your version, assuming 2.7.0 unpatched.

Corrected by Brian Ripley in developer version (and probably also in patched)

http://finzi.psych.upenn.edu/R/Rhelp02a/archive/129251.html

Dieter


Message: 3
Date: Fri, 6 Jun 2008 10:23:32 +0000 (UTC) From: Dieter Menne <dieter.menne_at_menne-biomed.de> Subject: Re: [R] which question
To: r-help_at_stat.math.ethz.ch
Message-ID: <loom.20080606T102224-478@post.gmane.org> Content-Type: text/plain; charset=us-ascii

Eleni Christodoulou <elenichri <at> gmail.com> writes:

> I was trying to select a column of a data frame using the *which* command. I
> was actually selecting the rows of the data frame using *which, *and then
> displayed a certain column of it. The command that I was using is:
> sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
....
Please provide a running example. The *mydata* are difficult to read.

Dieter


Message: 4
Date: Fri, 06 Jun 2008 20:44:31 +1000
From: Jim Lemon <jim_at_bitwrit.com.au>
Subject: Re: [R] Y values below the X plot To: jpardila <bluejp_at_gmail.com>
Cc: r-help_at_r-project.org
Message-ID: <4849150F.6000501@bitwrit.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed

jpardila wrote:
> Dear List,
> I am creating a plot and I want to insert the tabular data below the X axis.
> I mean for every value of X I want to show the value in Y as a table below
> the plot. I think the attached image gives an idea of what I mean by this.
>
> Below is the code i am using now... but as you see the Y values don't have
> the right location. Maybe I should insert them as a table? Any ideas on
> that. This should be easy to do but I don't have much experience in R.
> Many thanks in advanced,
> JP
>
> http://www.nabble.com/file/p17670311/legend.jpg legend.jpg
> -------------------------
> img1<-c(-5.28191709,-5.364480081,-4.829456677,-5.325101503,-5.212952356,-5.181171896,-5.211122693,-5.153677663,-5.292961077,-5.151612394,-5.056544559,-5.151457115,-5.332984571,-5.325259917,-5.523870109,-5.429800485,-5.436455325)
> img2<-c(-5.55,-5.56,-5.72,-5.57,-5.34,-5.18,-5.18,-5.36,-5.46,-5.32,-5.29,-5.37,-5.42,-5.45,-5.75,-5.75,-5.77)
> angle<-26:42
> plot(img1~angle, type="o", xlab="Incident angle", ylab="sigma",
> ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes="false")
> lines(img2~angle,lwd=2,type="o", col=1, pch=19,cex=1)
> legend(38,-2,format(img1,digits=2), cex=0.8)
> legend(40,-2,format(img2,digits=2),cex=0.8)
> legend(26, -2, c("Image 1","Image 2"), cex=0.8,lwd=2,col=c("8","1"), pch=19,
> lty=1:2,bty="n")
> abline(h = -1:-8, v = 25:45, col = "lightgray", lty=3)
>
> axis(1, at=2*0:22)
> axis(2, at=-8:-2)
> -----------------------------------
Hi JP,
I thought I could do this with addtable2plot, but I hadn't coded a column spacing into it (maybe next version). However, this is almost what you want, and I'm sure you can work out how to add the lines.

plot(img1~angle, type="o", xlab="Incident angle", ylab="sigma",   ylim=c(-8,-2),lwd=2,col=8, pch=19,cex=1,axes="false") box()
lines(img2~angle,lwd=2,type="o", col=1, pch=19,cex=1) tablerownames<-"Angle\nImage1\nImage2"
mtext(c(tablerownames,
  paste(angle,round(img1,2),round(img2,2),sep="\n")),   1,line=1,at=c(24.7,angle),cex=0.5)

Jim


Message: 5
Date: Fri, 6 Jun 2008 11:48:56 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] Lattice: key does not accept German umlaute To: Bernd Weiss <bernd.weiss_at_uni-koeln.de> Cc: r-help_at_stat.math.ethz.ch
Message-ID:

        <alpine.LFD.1.10.0806061139280.14980@gannet.stats.ox.ac.uk> Content-Type: text/plain; charset="iso-8859-15"; Format="flowed"

Well, you failed to give the 'at a minimum information' asked for in the posting guide, and \344 is locale-specific. I see 'MingW32' below, so will guess this is German-language Windows. We don't know what the error was, either.

It works correctly for me in CP1252 with R-patched, and gives an error in 2.7.0 (and works in 2.6.2). I think it was fixed as side effect of

     o	Rare string width calculations in package grid were not
 	interpreting the string encoding correctly.

although it is not the same problem that NEWS item refers to.

My error message in 2.7.0 was

Error in grid.Call.graphics("L_setviewport", pvp, TRUE) :

   invalid input 'M?nner' in 'utf8towcs'

which is what makes me think this was to do with sizing the viewport.

So please update to R-patched and try again.

On Fri, 6 Jun 2008, Bernd Weiss wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> library(lattice)
>
> ## works as expected
> xyplot(1~1, key = list(text = list(c("Maenner"))))
>
> ## works as expected
> xyplot(1~1, key = list(text = list(c("Maenner"))), xlab = "M\344nner")
>
> ## gives an error
> xyplot(1~1, key = list(text = list(c("M\344nner"))))
>
> Is this a bug?
>
> TIA,
>
> Bernd
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.7 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFISPo6Usbvfbd00+ERArJFAJsEvWq2Cai7chuOADadZHT2pnRJOgCfWLdx
> 3Hs3PnCzd6nuTqt6JwCl+VM=
> =RVUk
> -----END PGP SIGNATURE-----
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

--
Brian D. Ripley,                  ripley_at_stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

------------------------------

Message: 6
Date: Fri, 6 Jun 2008 12:30:25 +0100
From: "Michael Pearmain" <mpearmain_at_google.com>
Subject: [R] Merging two dataframes
To: r-help_at_r-project.org
Message-ID:
	<2763e000806060430k1b16328fw9d5e73e4683a6f13@mail.gmail.com>
Content-Type: text/plain

Hi All,

Newbie question for you all but i have been looking at the archieves and the
help dtuff to get a rough idea of what i want to do

I would like to merge two dataframes together based on a keyed variable in
one dataframe linking to the other dataframe.  Only some of the cases will
match but i would like to keep the others as well.

My dataframes have 67 and 28 cases respectively and i would like ot end uip
with one file 67 cases long (all 28 are matched cases).


I can use the merge command to merge two datasets together this but i still
get some
odd results, i'm using the code below;

ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
'SURVEY <- read.csv(file="survey.csv",head=TRUE,sep=",")
'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC = "ord")

The merged file seems to have 1800 cases while the ETC data file only
has 67 and the SURVEY file only has 28.  (Reading the help it looks as if it
merges 1 case with all cases in the other file, which is not what i want)

The matching variables fields are the 'ord' field and the 'uid' field
Can anyone advise please?

--
Michael Pearmain

	[[alternative HTML version deleted]]



------------------------------

Message: 7
Date: Fri, 06 Jun 2008 14:22:58 +0200
From: Bernd Weiss <bernd.weiss_at_uni-koeln.de>
Subject: Re: [R] Lattice: key does not accept German umlaute
To: r-help_at_stat.math.ethz.ch
Message-ID: <48492C22.9060102@uni-koeln.de>
Content-Type: text/plain; charset=ISO-8859-15; format=flowed

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Prof Brian Ripley schrieb:

[...]

| It works correctly for me in CP1252 with R-patched, and gives an error
| in 2.7.0 (and works in 2.6.2).  I think it was fixed as side effect of
|
|     o    Rare string width calculations in package grid were not
|     interpreting the string encoding correctly.
|
| although it is not the same problem that NEWS item refers to.
|
| My error message in 2.7.0 was
|
| Error in grid.Call.graphics("L_setviewport", pvp, TRUE) :
|   invalid input 'M?nner' in 'utf8towcs'
|
| which is what makes me think this was to do with sizing the viewport.
|
|
| So please update to R-patched and try again.


That's it! Thanks for your help.

Bernd
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISSwiUsbvfbd00+ERAphpAJ9I5vxmzCYIkl52potRXsMG322J1gCgxe4S
BgPTcyWju9A74csTgVPQSi4=
=urOX
-----END PGP SIGNATURE-----



------------------------------

Message: 8
Date: Fri, 6 Jun 2008 14:37:47 +0200
From: Sebastian Merz <sebastian.merz_at_web.de>
Subject: [R] boxplot changes fontsize of labels
To: r-help_at_r-project.org
Message-ID: <20080606143747.5f91ef4d@fred>
Content-Type: text/plain; charset=US-ASCII

Hi all!

So far I learned some R but finilizing my plots so they look
publishable seems not to be possible.

I set up some boxplots. Everything works well but when I put more then
two of them in one plot the labels of the axes appear smaller than the
normal font size.


> x <- rnorm(30)
> y <- rnorm(30)
> par(mfrow=c(1,4))
> boxplot(x,y, names=c("horray", "hurra"))
> mtext("Jubel", side=1, line=2)
In case I take one or two boxplots this does not happen:
> par(mfrow=c(1,2))
> boxplot(x,y, names=c("horray", "hurra"))
> mtext("Jubel", side=1, line=2)
The cex.axis seems not to be changed, as setting it to 1.0 doesn't change the behaviour. If cex.axis=1.3 in the first example the font size used by boxplot and by mtext is about the same. But as I use a function to draw quite some of these plots this "hack" is not a proper solution. I couldn't find anything about this behaviour in the documention or the inet. Can anybody explain? All hints are appriciated. Thanks, S. Merz ------------------------------ Message: 9 Date: Fri, 6 Jun 2008 08:43:01 -0400 From: "stephen sefick" <ssefick_at_gmail.com> Subject: [R] simple data question To: r-help_at_r-project.org Message-ID: <c502a9e10806060543s3203756cj7efbe23f6a517bf6@mail.gmail.com> Content-Type: text/plain if I wanted to use a name for a column with two words say Dick Cheney and George Bush can I put these in quotes "Dick Cheney" and "George Bush" to get them to read into R using both read.table and read.zoo to recognize this. thanks Stephen -- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] ------------------------------ Message: 10 Date: Fri, 06 Jun 2008 08:51:54 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Multiple comment.char under read.table To: Gundala Viswanath <gundalav_at_gmail.com> Cc: r-help_at_stat.math.ethz.ch Message-ID: <484932EA.9040305@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed according to the helpfile, comment only takes one character, so you'll have to do some 'magic' :) i'd suggest to first run mydata through sed, and replace one of the comment chars with another, then run read.table with the one comment char that remains. sed -e 's/^\^/!/' mydata.txt > mydata2.txt alternatively, you could do read.table twice, once with ! and once with ^, and then pull out all the common rows from the two results. on 06/06/2008 03:47 AM Gundala Viswanath said the following:
> Hi all,
>
> Suppose I want to read a text file with read.table.
> It containt lines to be skipped that begins with "!" and "^".
>
> Is there a way to include this two values in the read.table function?
> I tried this but doesn't seem to work.
>
> dat <- read.table("mydata.txt", comment.char = c("!","^") , na.strings
> = "null", sep = "\t");
>
> Please advice.
>
------------------------------ Message: 11 Date: Fri, 06 Jun 2008 09:05:02 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] simple data question To: stephen sefick <ssefick_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <484935FE.7020706@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed should work - don't even have to put them in quotes, if your field separator is not space. why don't you just try it and see what comes out? :) on 06/06/2008 08:43 AM stephen sefick said the following:
> if I wanted to use a name for a column with two words say Dick Cheney and
> George Bush
> can I put these in quotes "Dick Cheney" and "George Bush" to get them to
> read into R using both read.table and read.zoo to recognize this.
> thanks
>
> Stephen
>
------------------------------ Message: 12 Date: Fri, 6 Jun 2008 09:05:56 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] R: Securities earning covariance To: ANGELO.LINARDI_at_bancaditalia.it Cc: r-help_at_r-project.org Message-ID: <971536df0806060605m43e55dffh712255d835c58e63@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Update your version of zoo to the latest one. On Fri, Jun 6, 2008 at 3:18 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
> Thank you for your very fast response.
> I just tried to use the zoo package, after having read the vignettes, but I get this error message:
>
> Warning messages:
> 1: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 2: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
> 3: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 4: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
> 5: In x$DAY : $ operator is invalid for atomic vectors, returning NULL
> 6: In x$EARNINGS :
> $ operator is invalid for atomic vectors, returning NULL
>
> Am I missing something ?
>
> Thank you again
>
> Angelo Linardi
>
>
> -----Messaggio originale-----
> Da: Gabor Grothendieck [mailto:ggrothendieck_at_gmail.com]
> Inviato: gioved? 5 giugno 2008 17.55
> A: LINARDI ANGELO
> Cc: r-help_at_r-project.org
> Oggetto: Re: [R] Securities earning covariance
>
> Check out the three vignettes (i.e. pdf documents in the zoo package). e.g.
>
>
> Lines <- "SEC_ID DAY EARNING
> IT0000001 20070101 5.467
> IT0000001 20070102 5.456
> IT0000001 20070103 4.954
> IT0000001 20070104 3.456
> IT0000002 20070101 1.456
> IT0000002 20070102 1.345
> IT0000002 20070103 1.233
> IT0000003 20070101 0.345
> IT0000003 20070102 0.367
> IT0000003 20070103 0.319
> "
> DF <- read.table(textConnection(Lines), header = TRUE) DFs <- split(DF, DF$SEC_ID)
>
> library(zoo)
> f <- function(DF.) zoo(DF.$EARNING, as.Date(format(DF.$DAY), "%Y%m%d")) z <- do.call(merge, lapply(DFs, f))
> cov(z) # uses n-1
>
>
> On Thu, Jun 5, 2008 at 11:41 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
>> Good morning,
>>
>> I am a new R user and I am trying to learn how to use it.
>> I am trying to solve this problem.
>> I have a dataframe df of daily securities (for a year) earnings as
>> follows:
>>
>> SEC_ID DAY EARNING
>> IT0000001 20070101 5.467
>> IT0000001 20070102 5.456
>> IT0000001 20070103 4.954
>> IT0000001 20070104 3.456
>> ..........................
>> IT0000002 20070101 1.456
>> IT0000002 20070102 1.345
>> IT0000002 20070103 1.233
>> ..........................
>> IT0000003 20070101 0.345
>> IT0000003 20070102 0.367
>> IT0000003 20070103 0.319
>> ..........................
>>
>> And so on: about 800 different SEC_ID and about 180000 rows.
>> I have to calculate the "covariance" for each couple of securities x
>> and y according to the formula:
>>
>> Cov(x,y) = (sum[(x-x')*(y-y')]/N)/(sx*sy)
>>
>> being x' and y' the mean of securities earning in the year, N the
>> number of observations, sx and sy the standard deviation of x and y.
>> To do this I could build a df2 data frame like this:
>>
>> DAY SEC_ID.x SEC_ID.y EARNING.x
>> EARNING.y x' y' sx sy
>> 20070101 IT0000001 IT0000002 5.467 1.456
>> a b aa bb
>> 20070101 IT0000001 IT0000003 5.467 0.345
>> a c aa cc
>> 20070101 IT0000002 IT0000003 1.456 0.345
>> b c bb cc
>> 20070102 IT0000001 IT0000002 5.456 1.345
>> a b aa bb
>> 20070102 IT0000001 IT0000003 5.456 0.367
>> a c aa cc
>> 20070102 IT0000002 IT0000003 1.345 0.367
>> b c bb cc
>> ........................................................................
>> .......................................................
>>
>> (merging df with itself with a condition SEC_ID.x < SEC_ID.y) and then
>> easily calculate the formula; but the dimensions are too big (the
>> process stops whit an out-of-memory message).
>> Besides partitioning the input and using a loop, are there any smarter
>> solutions (eventually using split and other ways of "subgroup merging"
>> to solve the problem ?
>> Are there any "shortcuts" using statistical built-in functions (e.g.
>> cov, vcov) ?
>> Thank you in advance
>>
>> Angelo Linardi
>>
>>
>>
>> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona
>> fede e non comportano alcun vincolo ne' creano obblighi per la Banca
>> stessa, salvo che cio' non sia espressamente previsto da un accordo scritto.
>> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore,
>> La preghiamo di comunicarne via e-mail la ricezione al mittente e di
>> distruggerne il contenuto. La informiamo inoltre che l'utilizzo non
>> autorizzato del messaggio o dei suoi allegati potrebbe costituire reato. Grazie per la collaborazione.
>> -- E-mails from the Bank of Italy are sent in good faith but they are
>> neither binding on the Bank nor to be understood as creating any
>> obligation on its part except where provided for in a written
>> agreement. This e-mail is confidential. If you have received it by mistake, please inform the sender by reply e-mail and delete it from your system.
>> Please also note that the unauthorized disclosure or use of the
>> message or any attachments could be an offence. Thank you for your
>> cooperation. **
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
> the Bank nor to be understood as creating any obligation on its part except where
> provided for in a written agreement. This e-mail is confidential. If you have received it
> by mistake, please inform the sender by reply e-mail and delete it from your system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>
------------------------------ Message: 13 Date: Fri, 06 Jun 2008 09:07:22 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Merging two dataframes To: Michael Pearmain <mpearmain_at_google.com> Cc: r-help_at_r-project.org Message-ID: <4849368A.6080903@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed try this: FullData <- merge(ETC, SURVEY, by.x = "ord", by.y = "uid", all.x = T, all.y = F) on 06/06/2008 07:30 AM Michael Pearmain said the following:
> Hi All,
>
> Newbie question for you all but i have been looking at the archieves and the
> help dtuff to get a rough idea of what i want to do
>
> I would like to merge two dataframes together based on a keyed variable in
> one dataframe linking to the other dataframe. Only some of the cases will
> match but i would like to keep the others as well.
>
> My dataframes have 67 and 28 cases respectively and i would like ot end uip
> with one file 67 cases long (all 28 are matched cases).
>
>
> I can use the merge command to merge two datasets together this but i still
> get some
> odd results, i'm using the code below;
>
> ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
> 'SURVEY <- read.csv(file="survey.csv",head=TRUE,sep=",")
> 'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC = "ord")
>
> The merged file seems to have 1800 cases while the ETC data file only
> has 67 and the SURVEY file only has 28. (Reading the help it looks as if it
> merges 1 case with all cases in the other file, which is not what i want)
>
> The matching variables fields are the 'ord' field and the 'uid' field
> Can anyone advise please?
>
------------------------------ Message: 14 Date: Fri, 6 Jun 2008 09:10:22 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Agreggating data using external aggregation rules To: ANGELO.LINARDI_at_bancaditalia.it Cc: r-help_at_r-project.org Message-ID: <971536df0806060610m4d80d0fbh2db428a7389a6ef@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Use aggregate() for aggregation and use indexing or subset() for selection. Alternately try the sqldf package: http://sqldf.googlecode.com which allows one to perform SQL operations on data frames. On Fri, Jun 6, 2008 at 6:12 AM, <ANGELO.LINARDI_at_bancaditalia.it> wrote:
> Dear R experts,
>
> I am currently facing a tricky problem which I have read a lot about in
> the various R mailing lists without finding exactly what I need.
> I have a big data frame DF (about 2,000,000 rows) with 7 columns being
> variables and 1 being a measure (using reshape package nomeclature).
> There are no "duplicates" in it.
> Fot each of the variables I have some "rules" to apply, being COD_IN the
> value of the variable in the DF, COD_OUT the one to be transformed to;
> once obtained the "new codes" in the DF I have to aggregate the "new DF"
> (for example summing the measure).
> Usually the total transformation (merge+aggregate) really decreases the
> number of lines in the data frame, but sometimes it can grows depending
> on the rule. Just to give an idea, the first "rule" in v1 maps 820
> different values into 7 ones.
> Using SQL and a database this can be done in a very straightforward way
> (for example on the variable v1):
>
> Select COD_OUT, v2, v3, v4, v5, v6, v7, sum(measure)
> >From DF, RULE_v1
> Where v1=COD_IN
> Group by v2, v3,v4, v5, v6, v7
>
> So the first choice would be using a database; the second one would be
> splitting the data frame and then joining the results.
> Is there any other possibility to merge+aggregate caused by the merge ?
>
> Thank you in advance
>
> Angelo Linardi
>
>
>
> ** Le e-mail provenienti dalla Banca d'Italia sono trasmesse in buona fede e non
> comportano alcun vincolo ne' creano obblighi per la Banca stessa, salvo che cio' non
> sia espressamente previsto da un accordo scritto.
> Questa e-mail e' confidenziale. Qualora l'avesse ricevuta per errore, La preghiamo di
> comunicarne via e-mail la ricezione al mittente e di distruggerne il contenuto. La
> informiamo inoltre che l'utilizzo non autorizzato del messaggio o dei suoi allegati
> potrebbe costituire reato. Grazie per la collaborazione.
> -- E-mails from the Bank of Italy are sent in good faith but they are neither binding on
> the Bank nor to be understood as creating any obligation on its part except where
> provided for in a written agreement. This e-mail is confidential. If you have received it
> by mistake, please inform the sender by reply e-mail and delete it from your system.
> Please also note that the unauthorized disclosure or use of the message or any
> attachments could be an offence. Thank you for your cooperation. **
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 15 Date: Fri, 6 Jun 2008 06:14:54 -0700 (PDT) From: Muhammad Azam <mazam72_at_yahoo.com> Subject: [R] request: a class having max frequency To: R Help <r-help_at_r-project.org>, R-help request <r-help-request_at_r-project.org> Message-ID: <302639.59596.qm@web32203.mail.mud.yahoo.com> Content-Type: text/plain Dear R users I have a very basic question. I tried but could not find the required result. using dat <- pima f <- table(dat[,9])
> f
0 1 500 268 i want to find that class say "0" having maximum frequency i.e 500. I used
>which.max(f)
which provide 0 1 How can i get only the "0". Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] ------------------------------ Message: 16 Date: Fri, 06 Jun 2008 09:18:55 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] request: a class having max frequency Cc: R Help <r-help_at_r-project.org> Message-ID: <4849393F.8030900@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 9:14 AM, Muhammad Azam wrote:
> Dear R users
> I have a very basic question. I tried but could not find the required result. using
> dat <- pima
> f <- table(dat[,9])
>
>> f
> 0 1
> 500 268
> i want to find that class say "0" having maximum frequency i.e 500. I used
>> which.max(f)
> which provide
> 0
> 1
> How can i get only the "0". Thanks and
table(iris$Species) setosa versicolor virginica 50 50 50 which.max(table(iris$Species)) setosa 1 names(which.max(table(iris$Species))) [1] "setosa"
> best regards
>
> Muhammad Azam
> Ph.D. Student
> Department of Medical Statistics,
> Informatics and Health Economics
> University of Innsbruck, Austria
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 17 Date: Fri, 6 Jun 2008 08:21:41 -0500 From: "Michael Conklin" <michael.conklin_at_markettools.com> Subject: Re: [R] request: a class having max frequency <r-help_at_r-project.org>, "R-help request" <r-help-request_at_r-project.org> Message-ID: <8EA061E48306894180DB020B0C6907A1015F37AF@MNMAIL02.markettools.com> Content-Type: text/plain; charset="US-ASCII" The 0 is the name of the item and the 1 is the index in f of the maximum class. (since f is a table, and the first element of the table is the maximum, which.max returns a 1) So, if you just want to know which class is maximum you can say names(which.max(f)) Michael Conklin Chief Methodologist - Advanced Analytics MarketTools, Inc. 6465 Wayzata Blvd. Suite 170 Minneapolis, MN 55426 Tel: 952.417.4719 | Mobile:612.201.8978 Michael.Conklin_at_markettools.com MarketTools(r) http://www.markettools.com This e-mail and any attachments may contain privileged, confidential or proprietary information. If you are not the intended recipient, be aware that any review, copying, or distribution of this e-mail or any attachment is strictly prohibited. If you have received this e-mail in error, please return it to the sender immediately, and permanently delete the original and any copies from your system. Thank you for your cooperation. -----Original Message----- From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Muhammad Azam Sent: Friday, June 06, 2008 8:15 AM To: R Help; R-help request Subject: [R] request: a class having max frequency Dear R users I have a very basic question. I tried but could not find the required result. using dat <- pima f <- table(dat[,9])
> f
0 1 500 268 i want to find that class say "0" having maximum frequency i.e 500. I used
>which.max(f)
which provide 0 1 How can i get only the "0". Thanks and best regards Muhammad Azam Ph.D. Student Department of Medical Statistics, Informatics and Health Economics University of Innsbruck, Austria [[alternative HTML version deleted]] ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 18 Date: Fri, 06 Jun 2008 09:25:44 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] request: a class having max frequency Cc: R Help <r-help_at_r-project.org>, R-help request <r-help-request_at_r-project.org> Message-ID: <48493AD8.6050204@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed names(f)[which.max(f)] on 06/06/2008 09:14 AM Muhammad Azam said the following:
> Dear R users
> I have a very basic question. I tried but could not find the required result. using
> dat <- pima
> f <- table(dat[,9])
>
>> f
> 0 1
> 500 268
> i want to find that class say "0" having maximum frequency i.e 500. I used
>> which.max(f)
> which provide
> 0
> 1
> How can i get only the "0". Thanks and
>
>
> best regards
>
> Muhammad Azam
> Ph.D. Student
> Department of Medical Statistics,
> Informatics and Health Economics
> University of Innsbruck, Austria
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 19 Date: Fri, 06 Jun 2008 08:27:18 -0500 From: Erik Iverson <iverson_at_biostat.wisc.edu> Subject: Re: [R] Problem in executing R on server To: Jason Lee <huajie.lee_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48493B36.1060605@biostat.wisc.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed [[elided Yahoo spam]] Jason Lee wrote:
> Hi,
>
> I am not too sure its what you meant :-
> Below is the closest data for each session from "top"
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 26792 jason 25 0 283m 199m 2620 R 100 0.6 0:00.38 R
>
> The numbers changed as the processes are running. I am actually sharing
> the server with other few people. I dont think this is a problem.
>
> And, for my own pc,
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 6192 jason 25 0 157m 148m 2888 R 100 14.8 1081:21 R
>
> On Fri, Jun 6, 2008 at 12:46 PM, Erik Iverson <iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>> wrote:
>
[[elided Yahoo spam]]
>
> Jason Lee wrote:
>
> Hi,
>
> I query free -m,
>
> On my server it is,
>
> total used free shared buffers
> cached
> Mem: 32190 8758 23431 0 742
> 2156
>
> And on my pc,
>
> total used free shared buffers
> cached
> Mem: 1002 986 16 0 132
> 255
>
>
> On the server, the above figure is after I exited the R.
> It seems that there are still alot free MB available if I am not
> wrong.
>
> On Fri, Jun 6, 2008 at 12:29 PM, Erik Iverson
> <iverson_at_biostat.wisc.edu <mailto:iverson_at_biostat.wisc.edu>
> <mailto:iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>>> wrote:
>
> How much RAM is installed in your Sun Solaris server? How
> much RAM
> is installed on your PC?
>
> Jason Lee wrote:
>
> Hi,
>
> I am actually trying to do some matrix multiplications of
> large
> datasets of 3000 columns and 150 rows.
>
> And I am running R version 2.7.0. <http://2.7.0.>
> <http://2.7.0.> <http://2.7.0.>
>
>
>
> I tried setting R --min-vsize=10M --max-vsize=100M
> --min-nsize=500k --max-nsize=1000M
>
> Yet I still get:-
>
> Error: cannot allocate vector of size 17.7 Mb
>
> I am running on Sun Solaris server.
>
> Please advise.
> Thanks.
> On Fri, Jun 6, 2008 at 11:50 AM, Erik Iverson
> <iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>
> <mailto:iverson_at_biostat.wisc.edu <mailto:iverson_at_biostat.wisc.edu>>
> <mailto:iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>
> <mailto:iverson_at_biostat.wisc.edu
> <mailto:iverson_at_biostat.wisc.edu>>>> wrote:
>
>
>
> Jason Lee wrote:
>
> Hi R-listers,
>
> I have problem in executing my R on server. It
> returns me
>
> Error: cannot allocate vector of size 15.8 Mb
>
> each time when i execute R on the server. But it
> doesnt
> give me
> any problem
> when i try executing on my own Pc (except it runs
> extremely slow).
>
> Any pointers to this? I tried to read the FAQ on
> this issue
> before in the
> archive but it seems there is no one solution to this.
>
>
> And that is because there is no one cause to this
> issue. I might
> guess your 'server' has less memory than your 'PC',
> but you
> didn't
> say anything your respective setups, or what you are even
> trying to
> do with R.
>
>
> I tried to
>
> simplified my code but it seems the problem is
> still the
> same.
>
>
>
> Please advise. Thanks.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org <mailto:R-help_at_r-project.org>
> <mailto:R-help_at_r-project.org <mailto:R-help_at_r-project.org>>
> <mailto:R-help_at_r-project.org
> <mailto:R-help_at_r-project.org> <mailto:R-help_at_r-project.org
> <mailto:R-help_at_r-project.org>>>
>
> mailing list
>
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
>
>
>
>
------------------------------ Message: 20 Date: Fri, 06 Jun 2008 09:28:38 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] request: a class having max frequency Cc: R Help <r-help_at_r-project.org> Message-ID: <48493B86.3060409@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 9:18 AM, Chuck Cleland wrote:
> On 6/6/2008 9:14 AM, Muhammad Azam wrote:
>> Dear R users
>> I have a very basic question. I tried but could not find the
>> required result. using
>> dat <- pima
>> f <- table(dat[,9])
>>
>>> f
>> 0 1 500 268
>> i want to find that class say "0" having maximum frequency i.e 500. I
>> used
>>> which.max(f)
>> which provide 0 1 How can i get only the "0". Thanks and
>
> table(iris$Species)
>
> setosa versicolor virginica
> 50 50 50
>
> which.max(table(iris$Species))
> setosa
> 1
>
> names(which.max(table(iris$Species)))
> [1] "setosa"
If, as above, more than one category frequency is at the maximum, you might want something like this: x <- table(iris$Species) which(x == max(x)) setosa versicolor virginica 1 2 3 names(which(x == max(x))) [1] "setosa" "versicolor" "virginica"
>> best regards
>>
>> Muhammad Azam Ph.D. Student Department of Medical Statistics,
>> Informatics and Health Economics University of Innsbruck, Austria
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 21 Date: Fri, 6 Jun 2008 08:40:18 -0500 From: "Neil Gupta" <neil.gup_at_gmail.com> Subject: [R] Manipulating DataSets To: R-help_at_r-project.org Message-ID: <a51fe2df0806060640j4f1677b4h2e3b332ec0c2fd@mail.gmail.com> Content-Type: text/plain Hello R-users, I have a very simple problem I wanted to solve. I have a large dataset as such: Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1 Time.1 TickType.1 ReferenceNumber.1 1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44 3:ESZ7.GB08:30:00 A 74390988 2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17 3:YMZ7.EC08:30:00 A 74390994 3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17 3:YMZ7.EC08:30:00 A 74390999 5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49 3:ESZ7.GB08:30:00 A 74391136 6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14 3:YMZ7.EC08:30:00 A 74391001 Price.1 Size.1 LeadTime MidPoint Spread 1 151100 22 08:30:00 *151087.5* 25 2 13688 27 08:30:00 13687.0 2 3 151100 22 08:30:00 *151087.5* 25 4 13688 27 08:30:00 13687.0 2 5 151100 22 08:30:00 151087.5 25 6 13688 27 08:30:00 13687.0 2 All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a symbol "3:ESZ7.GB" So the first one would be log(151087.5) - log(151087.5). I wanted to do this throughout the data set and add that in another column. I would appreciate any help. Regards, Neil Gupta [[alternative HTML version deleted]] ------------------------------ Message: 22 Date: Fri, 6 Jun 2008 09:35:42 -0400 From: "Emslie, Paul [Ctr]" <emsliep_at_atac.mil> Subject: [R] Subsetting to unique values To: <r-help_at_r-project.org> Message-ID: <9E17510E40D158498789549DD2D5DE7101F3C30D@nex01.atac.mil> Content-Type: text/plain; charset="us-ascii" I want to take the first row of each unique ID value from a data frame. For instance
> ddTable <-
data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry")) I want a dataset that is Id Name 1 Paul 2 Bob
> unique(ddTable)
Will give me all 4 rows, and
> unique(ddTable$Id)
Will give me c(1,2), but not accompanied by the name column. ------------------------------ Message: 23 Date: Fri, 6 Jun 2008 15:58:10 +0200 (CEST) From: Katharine Mullen <kate_at_few.vu.nl> Subject: Re: [R] How can I display a characters table ? To: Maura E Monville <maura.monville_at_gmail.com> Cc: r-help <r-help_at_stat.math.ethz.ch> Message-ID: <Pine.GSO.4.56.0806061555001.14989@laurel.few.vu.nl> Content-Type: TEXT/PLAIN; charset=US-ASCII Dear Maura, try the function textplot from the package gplots. you can say textplot(yourmatrix) and get a plot of a character matrix. On Fri, 6 Jun 2008, Maura E Monville wrote:
> I would like to generate a graphics text. I have a 67x2 table with
> 5-character string in col 1 and 2-character string in col 2.
> Is it possible to make such a table appear on a graphics or a
> message-box pop-up window ?
>
> Thank you so much.
> --
> Maura E.M
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 24 Date: Fri, 6 Jun 2008 16:01:23 +0200 From: Giovanna.Jonalasinio_at_uniroma1.it Subject: [R] Giovanna Jonalasinio ? fuori ufficio, I'm away To: r-help_at_r-project.org Message-ID: <OF43D77A5D.F29EAA64-ONC1257460.004D0831-C1257460.004D0831@Uniroma1.it> Content-Type: text/plain Risposta automatica dal 06/06/08 fino al 14/06/08 I'm going to have limited access to my email untill the 14th of june 2008 Avrr accesso limitato all'email fino al 14 giugno 2008 [[alternative HTML version deleted]] ------------------------------ Message: 25 Date: Fri, 06 Jun 2008 10:08:34 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] Subsetting to unique values To: "Emslie, Paul [Ctr]" <emsliep_at_atac.mil> Cc: r-help_at_r-project.org Message-ID: <484944E2.1090600@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 9:35 AM, Emslie, Paul [Ctr] wrote:
> I want to take the first row of each unique ID value from a data frame.
> For instance
>> ddTable <-
> data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
>> unique(ddTable)
> Will give me all 4 rows, and
>> unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
ddTable <- data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry")) !duplicated(ddTable$Id) [1] TRUE FALSE TRUE FALSE ddTable[!duplicated(ddTable$Id),] Id name 1 1 Paul 3 2 Bob ?duplicated
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 26 Date: Fri, 6 Jun 2008 10:13:55 -0400 From: "stephen sefick" <ssefick_at_gmail.com> Subject: Re: [R] simple data question To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <c502a9e10806060713w199e5511xedebb5d3b1cad7bc@mail.gmail.com> Content-Type: text/plain Good point. Thanks On Fri, Jun 6, 2008 at 9:05 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
> should work - don't even have to put them in quotes, if your field
> separator is not space. why don't you just try it and see what comes out? :)
>
> on 06/06/2008 08:43 AM stephen sefick said the following:
>
> if I wanted to use a name for a column with two words say Dick Cheney and
>> George Bush
>> can I put these in quotes "Dick Cheney" and "George Bush" to get them to
>> read into R using both read.table and read.zoo to recognize this.
>> thanks
>>
>> Stephen
>>
>>
-- Let's not spend our time and resources thinking about things that are so little or so large that all they really do for us is puff us up and make us feel like gods. We are mammals, and have not exhausted the annoying little problems of being mammals. -K. Mullis [[alternative HTML version deleted]] ------------------------------ Message: 27 Date: Fri, 6 Jun 2008 07:22:23 -0700 (PDT) From: John Kane <jrkrideau_at_yahoo.ca> Subject: Re: [R] Subsetting to unique values To: r-help_at_r-project.org, "Emslie, Paul \[Ctr\]" <emsliep_at_atac.mil> Message-ID: <217081.40430.qm@web32807.mail.mud.yahoo.com> Content-Type: text/plain; charset=us-ascii I don't have R on this machine but will this work. myrows <- unique(ddTable[,1]) unis <- ddTable(myrows, ] --- On Fri, 6/6/08, Emslie, Paul [Ctr] <emsliep_at_atac.mil> wrote:
> From: Emslie, Paul [Ctr] <emsliep@atac.mil>
> Subject: [R] Subsetting to unique values
> To: r-help_at_r-project.org
> Received: Friday, June 6, 2008, 9:35 AM
> I want to take the first row of each unique ID value from a
> data frame.
> For instance
> > ddTable <-
> data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name
> column.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.
------------------------------ Message: 28 Date: Fri, 6 Jun 2008 17:29:26 +0300 From: "Eleni Christodoulou" <elenichri_at_gmail.com> Subject: Re: [R] which question To: "Dieter Menne" <dieter.menne_at_menne-biomed.de> Cc: r-help_at_stat.math.ethz.ch Message-ID: <2293b7660806060729t2d0cf78mefae840b427807f5@mail.gmail.com> Content-Type: text/plain An example is: symbol=human[which(human[,3] %in% genes.sam.names),8] The data* human* and *genes.sam.names* are attached. The result of the above command is:
> symbol
[1] CCL18 MARCO SYT13 [4] FOXC1 CDH3 [7] CA12 CELSR1 NM_018440 [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1 [13] PHGDH GABRP LGMN [16] MMP9 BMP7 KLF5 [19] RIPK2 GATA3 NM_032023 [22] TRIM2 CCND1 MMP12 [25] LDHB AF493978 SOD2 [28] SOD2 SOD2 NME5 [31] STC2 RBP1 ROPN1 [34] RDH10 KRTHB1 SLPI [37] BBOX1 FOXA1 NM_005669 [40] MCCC2 CHI3L1 GSTM3 [43] LPIN1 DSC2 FADS2 [46] ELF5 CYP1B1 LMO4 [49] AL035297 NM_152398 AB018342 [52] PIK3R1 NFKBIE MLZE [55] NFIB NM_052997 NM_006023 [58] CPB1 CXCL13 CBR3 [61] NM_017527 FABP7 DACH [64] IFI27 ACOX2 CXCL11 [67] UGP2 CLDN4 M12740 [70] IGKC IGKC CLECSF12 [73] AY069977 HOXB2 SOX11 [76] NM_017422 TLR2 [79] CKS1B BC017946 APOBEC3B [82] HLA-DRB1 HLA-DQB1 [85] CCL13 C4orf7 [88] NM_173552 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX As you can see, apart from gene symbols, which is the required thing, RefSeq ID sare also retrieved... Thanks a lot, Eleni On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne <dieter.menne_at_menne-biomed.de> wrote:
> Eleni Christodoulou <elenichri <at> gmail.com> writes:
>
> > I was trying to select a column of a data frame using the *which*
> command. I
> > was actually selecting the rows of the data frame using *which, *and then
> > displayed a certain column of it. The command that I was using is:
> > sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
> ....
> Please provide a running example. The *mydata* are difficult to read.
>
>
> Dieter
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]] ------------------------------ Message: 29 Date: Fri, 6 Jun 2008 14:38:50 +0000 (UTC) From: Adrian Dusa <dusa.adrian_at_gmail.com> Subject: Re: [R] Subsetting to unique values To: r-help_at_stat.math.ethz.ch Message-ID: <loom.20080606T143745-451@post.gmane.org> Content-Type: text/plain; charset=us-ascii Emslie, Paul [Ctr] <emsliep <at> atac.mil> writes:
>
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
> data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
ddTable[-which(duplicated(ddTable$Id)), ] HTH, Adrian ------------------------------ Message: 30 Date: Fri, 6 Jun 2008 07:39:32 -0700 From: Dennis Fisher <fisher_at_plessthan.com> Subject: [R] Startup speed for a lengthy script To: r-help_at_stat.math.ethz.ch Message-ID: <808B5F10-1D7A-4A51-9BC2-548FA9391DEC@plessthan.com> Content-Type: text/plain Colleagues, Several days ago, I wrote to the list about a lengthy delay in startup of a a script. I will start with a brief summary of that email. I have a 10,000 line script of which the final 3000 lines constitute a function. The script contains time-markers (cat(date()) to that I can determine how fast it was read. When I invoke the script from the OS ("R --slave < Script.R"; similar performance with R 2.6.1 or 2.7.0 on a Mac / Linux / Windows), the first 7000 lines were read in 5 seconds, then it took 2 minutes to read the remaining 3000 lines. I inquired as to the cause for the lengthy reading of the final 3000 lines. Subsequently, I whittled the 3000 lines to ~ 1000 (moving 2000 lines to smaller functions). Now the first 9000 lines still reads in ~ 6 seconds and the final 1000 lines in ~ 15 seconds. Better but not ideal. However, I just encountered a new situation that I don't understand. The R code is now embedded in a graphical interface built with Real Basic. When I invoke the script in that environment, the first 9000 lines takes the usual 6 seconds. But, to my surprise, the final 1000 [[elided Yahoo spam]] There is one major difference in the implementation. With the GUI, the commands are "pushed", i.e., the GUI opens R, then sends a continuous stream of code. Does anyone have any idea as to why the delay should be so different in the two settings? Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-415-564-2220 www.PLessThan.com [[alternative HTML version deleted]] ------------------------------ Message: 31 Date: Fri, 6 Jun 2008 09:37:59 -0500 From: "Dumblauskas, Jerry" <jerry.dumblauskas_at_credit-suisse.com> Subject: Re: [R] Java to R interface To: r-help_at_r-project.org Message-ID: <6BEE6042FD73A54BB6E88969828C6B06014622CA@ECHI17P30001A.csfb.cs-group.com> Content-Type: text/plain Try and make sure that R is in your windows Path variable I got your message when I first did this, but when I did the about it then worked... ============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ============================================================================== [[alternative HTML version deleted]] ------------------------------ Message: 32 Date: Fri, 06 Jun 2008 15:41:49 +0100 From: Richard Pearson <richard.pearson_at_postgrad.manchester.ac.uk> Subject: Re: [R] which question To: Eleni Christodoulou <elenichri_at_gmail.com> Cc: Dieter Menne <dieter.menne_at_menne-biomed.de>, r-help_at_stat.math.ethz.ch Message-ID: <48494CAD.9010204@postgrad.manchester.ac.uk> Content-Type: text/plain; charset=ISO-8859-1; format=flowed I didn't get any attached data, but my suspicion here is that you have somehow got RefSeq IDs in column 8 of human, as well as the gene symbols. Did you read this data in from a text file? Eleni Christodoulou wrote:
> An example is:
>
> symbol=human[which(human[,3] %in% genes.sam.names),8]
>
> The data* human* and *genes.sam.names* are attached. The result of the above
> command is:
>> symbol
> [1] CCL18 MARCO SYT13
> [4] FOXC1 CDH3
> [7] CA12 CELSR1 NM_018440
> [10] MICROTUBULE-ASSOCIATED NM_015529 ESR1
> [13] PHGDH GABRP LGMN
> [16] MMP9 BMP7 KLF5
> [19] RIPK2 GATA3 NM_032023
> [22] TRIM2 CCND1 MMP12
> [25] LDHB AF493978 SOD2
> [28] SOD2 SOD2 NME5
> [31] STC2 RBP1 ROPN1
> [34] RDH10 KRTHB1 SLPI
> [37] BBOX1 FOXA1 NM_005669
> [40] MCCC2 CHI3L1 GSTM3
> [43] LPIN1 DSC2 FADS2
> [46] ELF5 CYP1B1 LMO4
> [49] AL035297 NM_152398 AB018342
> [52] PIK3R1 NFKBIE MLZE
> [55] NFIB NM_052997 NM_006023
> [58] CPB1 CXCL13 CBR3
> [61] NM_017527 FABP7 DACH
> [64] IFI27 ACOX2 CXCL11
> [67] UGP2 CLDN4 M12740
> [70] IGKC IGKC CLECSF12
> [73] AY069977 HOXB2 SOX11
> [76] NM_017422 TLR2
> [79] CKS1B BC017946 APOBEC3B
> [82] HLA-DRB1 HLA-DQB1
> [85] CCL13 C4orf7
> [88] NM_173552
> 21345 Levels: (2 (32 (55.11 (AIB-1) (ALU (CAK1) (CAP4) (CASPASE ... ZYX
>
> As you can see, apart from gene symbols, which is the required thing, RefSeq
> ID sare also retrieved...
>
> Thanks a lot,
> Eleni
>
>
>
>
>
>
> On Fri, Jun 6, 2008 at 1:23 PM, Dieter Menne <dieter.menne_at_menne-biomed.de>
> wrote:
>
>> Eleni Christodoulou <elenichri <at> gmail.com> writes:
>>
>>> I was trying to select a column of a data frame using the *which*
>> command. I
>>> was actually selecting the rows of the data frame using *which, *and then
>>> displayed a certain column of it. The command that I was using is:
>>> sequence=*mydata*[*which*(human[,3] %in% genes.sam.names),*9*]
>> ....
>> Please provide a running example. The *mydata* are difficult to read.
>>
>>
>> Dieter
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Richard D. Pearson richard.pearson_at_postgrad.manchester.ac.uk School of Computer Science, http://www.cs.man.ac.uk/~pearsonr University of Manchester, Tel: +44 161 275 6178 Oxford Road, Mob: +44 7971 221181 Manchester M13 9PL, UK. Fax: +44 161 275 6204 ------------------------------ Message: 33 Date: Fri, 06 Jun 2008 10:55:53 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Merging two dataframes To: Michael Pearmain <mpearmain_at_google.com>, R Help <r-help_at_r-project.org> Message-ID: <48494FF9.6010900@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed cool. :) yea, the argument names are by.x and by.y, so your by.etc were ignored in the black hole of "arguments passed to other methods" on 06/06/2008 09:11 AM Michael Pearmain said the following:
> Thanks
> Works perfectly.
> Was the problem due to me putting by.survey and by.etc rather than by.y
> and by.x?
>
> I think when i was playing around i tried the all. command in that setup
> as well
>
> Mike
>
>
>
> On Fri, Jun 6, 2008 at 2:07 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com
> <mailto:dfolkins_at_gmail.com>> wrote:
>
> try this:
> FullData <- merge(ETC, SURVEY, by.x = "ord", by.y = "uid", all.x =
> T, all.y = F)
>
> on 06/06/2008 07:30 AM Michael Pearmain said the following:
>
> Hi All,
>
> Newbie question for you all but i have been looking at the
> archieves and the
> help dtuff to get a rough idea of what i want to do
>
> I would like to merge two dataframes together based on a keyed
> variable in
> one dataframe linking to the other dataframe. Only some of the
> cases will
> match but i would like to keep the others as well.
>
> My dataframes have 67 and 28 cases respectively and i would like
> ot end uip
> with one file 67 cases long (all 28 are matched cases).
>
>
> I can use the merge command to merge two datasets together this
> but i still
> get some
> odd results, i'm using the code below;
>
> ETC <- read.csv(file="CSV_Data2.csv",head=TRUE,sep=",")
> 'SURVEY <- read.csv(file="survey.csv",head=TRUE,sep=",")
> 'FullData <- merge(ETC, SURVEY, by.SURVEY = "uid", by.ETC = "ord")
>
> The merged file seems to have 1800 cases while the ETC data file
> only
> has 67 and the SURVEY file only has 28. (Reading the help it
> looks as if it
> merges 1 case with all cases in the other file, which is not
> what i want)
>
> The matching variables fields are the 'ord' field and the 'uid'
> field
> Can anyone advise please?
>
>
>
>
> --
> Michael Pearmain
> Senior Statistical Analyst
>
>
> 1st Floor, 180 Great Portland St. London W1W 5QZ
> t +44 (0) 2032191684
> mpearmain_at_google.com <mailto:mpearmain_at_google.com>
> mpearmain_at_doubleclick.com <mailto:mpearmain_at_doubleclick.com>
>
>
> Doubleclick is a part of the Google group of companies
------------------------------ Message: 34 Date: Fri, 6 Jun 2008 10:04:33 -0500 From: "Alexys Herleym Rodriguez Avellaneda" <alexyshr_at_gmail.com> Subject: [R] fit.variogram sgeostat error To: r-help_at_r-project.org Message-ID: <ae405c330806060804t787392bewcd1b7e71da69ea0e@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi, When i do the next line it work fine: fit.spherical(var, 0, 2.6, 250, type='c', iterations=10, tolerance=1e-06, echo=FALSE, plot.it=T, weighted=TRUE, delta=0.1, verbose=TRUE) But, i use the next and send one error: fit.variogram("spherical", var, nugget=0, sill=2.6, range=250, plot.it=TRUE, iterations=0) This is the error: Error in fit.variogram("spherical", var, nugget = 0, sill = 2.6, range = 250, : unused argument(s) (nugget = 0, sill = 2.6, range = 250, plot.it = TRUE, iterations = 0) any suggest? Alexys H ------------------------------ Message: 35 Date: Fri, 06 Jun 2008 17:05:58 +0200 From: Dani Valverde <daniel.valverde_at_uab.cat> Subject: [R] lsmeans To: R Help <r-help_at_r-project.org> Message-ID: <48495256.20804@uab.cat> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello, I have the next function call: lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID, data=error_DB) which returns an lme object. I am interested on carrying out some kind of lsmeans on the data returned, but I cannot find any function to do this in R. I'have seen the effect() function, but it does not work with lme objects. Any idea? Best, Dani -- Daniel Valverde Saub? Grup de Biologia Molecular de Llevats Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN Centro de Investigaci?n Biom?dica en Red en Bioingenier?a, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biom?diques de la RMN Facultat de Bioci?ncies Universitat Aut?noma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN +34 93 5814126 ------------------------------ Message: 36 Date: Fri, 06 Jun 2008 11:12:32 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: r-help_at_r-project.org Message-ID: <484953E0.70508@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Anybody have any thoughts on this? Please? :) on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
> Hi everyone!
>
> I have a question about data processing efficiency.
>
> My data are as follows: I have a data set on quarterly institutional
> ownership of equities; some of them have had recent IPOs, some have not
> (I have a binary flag set). The total dataset size is 700k+ rows.
>
> My goal is this: For every quarter since issue for each IPO, I need to
> find a "matched" firm in the same industry, and close in market cap. So,
> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
> firm in quarter 1 since IPO, then a (possibly different) non-issuing
> firm in quarter 2 since IPO, etc. Repeat for each issuing firm (there
> are about 8300 of these).
>
> Thus it seems to me that I need to be doing a lot of data selection and
> subsetting, and looping (yikes!), but the result appears to be highly
> inefficient and takes ages (well, many hours). What I am doing, in
> pseudocode, is this:
>
> 1. for each quarter of data, getting out all the IPOs and all the
> eligible non-issuing firms.
> 2. for each IPO in a quarter, grab all the non-issuers in the same
> industry, sort them by size, and finally grab a matching firm closest in
> size (the exact procedure is to grab the closest bigger firm if one
> exists, and just the biggest available if all are smaller)
> 3. assign the matched firm-observation the same "quarters since issue"
> as the IPO being matched
> 4. rbind them all into the "matching" dataset.
>
> The function I currently have is pasted below, for your reference. Is
> there any way to make it produce the same result but much faster?
> Specifically, I am guessing eliminating some loops would be very good,
> but I don't see how, since I need to do some fancy footwork for each IPO
> in each quarter to find the matching firm. I'll be doing a few things
> similar to this, so it's somewhat important to up the efficiency of
> this. Maybe some of you R-fu masters can clue me in? :)
>
> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>
> ========== my function below ===========
>
> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
> quarters_since_issue=40) {
>
> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
> cheaper, so typecast the result to matrix
>
> colnames = names(tfdata)
>
> quarterends = sort(unique(tfdata$DATE))
>
> for (aquarter in quarterends) {
> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>
> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
> (tfdata_quarter$IPO.Flag == 0), ]
> tfdata_quarter_ipoissuers = tfdata_quarter[
> tfdata_quarter$IPO.Flag == 1, ]
>
> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
> arow = tfdata_quarter_ipoissuers[i,]
> industrypeers = tfdata_quarter_fitting_nonissuers[
> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
> industrypeers = industrypeers[
> order(industrypeers$Market.Cap.13f), ]
> if ( nrow(industrypeers) > 0 ) {
> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
> arow$Market.Cap.13f, ]) > 0 ) {
> bestpeer =
> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
> }
> else {
> bestpeer = industrypeers[nrow(industrypeers),]
> }
> bestpeer$Quarters.Since.IPO.Issue =
> arow$Quarters.Since.IPO.Issue
>
> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
> bestpeer$PERMNO] = 1
> result = rbind(result, as.matrix(bestpeer))
> }
> }
> #result = rbind(result, tfdata_quarter)
> print (aquarter)
> }
>
> result = as.data.frame(result)
> names(result) = colnames
> return(result)
>
> }
>
> ========= end of my function =============
>
------------------------------ Message: 37 Date: Fri, 6 Jun 2008 17:36:18 +0200 From: DAVID ARTETA GARCIA <darteta001_at_ikasle.ehu.es> Subject: [R] store filename To: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <20080606173618.dcn20wksgwsosoo8@www.ehu.es> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" ------------------------------ Message: 38 Date: Fri, 6 Jun 2008 10:36:30 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] label outliers in geom_boxplot (ggplot2) To: " Mihalicza P?ter " <mihalicza.peter_at_eski.hu> Cc: r-help_at_r-project.org Message-ID: <f8e6ff050806060836i1de4ecfk4aab39e5bed596af@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1
> It's too obvious, so I am positive that there is a good reason for not doing
> this, but still:
> why is it not possible, to have an "outlier" output in stat_boxplot that can
> be used at geom_text()?
>
> Something like this, with "upper":
> > dat=data.frame(num=rep(1,20), val=c(runif(18),3,3.5),
> name=letters[1:20])
> > ggplot(dat, aes(y=val, x=num))+stat_boxplot(outlier.size=4,
> + outlier.colour="green")+geom_text(aes(y=..upper..), label="This is upper
> hinge")
>
> Unfortunately, this does not work and gives the error message:
> Error in eval(expr, envir, enclos) : object "upper" not found
>
> Is it because you can only use stat outputs within the stat statements?
> Could it be possible to make them available outside the statements too?
You can generally, but it won't work here. The problem is that you want a different y aesthetic for the statistic (val) than you do for the geom (upper) and there's no way to get around that with the current design of ggplot2. Hadley -- http://had.co.nz/ ------------------------------ Message: 39 Date: Fri, 06 Jun 2008 16:44:26 +0100 From: Patrick Burns <pburns_at_pburns.seanet.com> Subject: Re: [R] Improving data processing efficiency To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48495B5A.3040005@pburns.seanet.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed One thing that is likely to speed the code significantly is if you create 'result' to be its final size and then subscript into it. Something like: result[i, ] <- bestpeer (though I'm not sure if 'i' is the proper index). Patrick Burns patrick_at_burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Daniel Folkinshteyn wrote:
> Anybody have any thoughts on this? Please? :)
>
> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have
>> not (I have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need
>> to find a "matched" firm in the same industry, and close in market
>> cap. So, e.g., for firm X, which had an IPO, i need to find a matched
>> non-issuing firm in quarter 1 since IPO, then a (possibly different)
>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing
>> firm (there are about 8300 of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection
>> and subsetting, and looping (yikes!), but the result appears to be
>> highly inefficient and takes ages (well, many hours). What I am
>> doing, in pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the
>> eligible non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest
>> in size (the exact procedure is to grab the closest bigger firm if
>> one exists, and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since
>> issue" as the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very
>> good, but I don't see how, since I need to do some fancy footwork for
>> each IPO in each quarter to find the matching firm. I'll be doing a
>> few things similar to this, so it's somewhat important to up the
>> efficiency of this. Maybe some of you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ===========
>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[
>> tfdata_quarter$IPO.Flag == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer =
>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue =
>> arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>> bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function =============
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------ Message: 40 Date: Fri, 6 Jun 2008 11:45:44 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <971536df0806060845v66784addo64cf0d3a1bfee377@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Try reading the posting guide before posting. On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
> Anybody have any thoughts on this? Please? :)
>
> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have not (I
>> have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need to
>> find a "matched" firm in the same industry, and close in market cap. So,
>> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
>> firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in
>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300
>> of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection and
>> subsetting, and looping (yikes!), but the result appears to be highly
>> inefficient and takes ages (well, many hours). What I am doing, in
>> pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the eligible
>> non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest in
>> size (the exact procedure is to grab the closest bigger firm if one exists,
>> and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since issue" as
>> the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very good, but I
>> don't see how, since I need to do some fancy footwork for each IPO in each
>> quarter to find the matching firm. I'll be doing a few things similar to
>> this, so it's somewhat important to up the efficiency of this. Maybe some of
>> you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ===========
>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag
>> == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>> arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue =
>> arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>> bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function =============
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 41 Date: Fri, 6 Jun 2008 17:51:34 +0200 From: DAVID ARTETA GARCIA <darteta001_at_ikasle.ehu.es> Subject: [R] Store filename To: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <20080606175134.qh2v6povk0o4co4s@www.ehu.es> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed" Hi list, Is it possible to save the name of a filename automatically when reading it using read.table() or some other function? My aim is to create then an output table with the name of the original table with a suffix like _out example: mydata = read.table("Run224_v2_060308.txt", sep = "\t", header = TRUE) ## store name? myfile = the_name_of_the_file ## do analysis of data and store in a data.frame "myoutput" ## write output in tab format write.table(myoutput, c(myfile,"_out.txt"),sep="\t") the name of the new file will be "Run224_v2_060308_out.txt" Thanks in advanve, David ------------------------------ Message: 42 Date: Fri, 6 Jun 2008 09:56:27 -0600 From: "Nanye Long" <nanye.long_at_gmail.com> Subject: [R] where to download BRugs? To: r-help_at_r-project.org Message-ID: <d3fc68d40806060856h1ccb5475u40c2ffa08d75ef32@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi all, Does anyone know where to download the "BRugs" package? I did not find it on r-project website. Thanks. NL ------------------------------ Message: 43 Date: Fri, 06 Jun 2008 12:03:13 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Gabor Grothendieck <ggrothendieck_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48495FC1.5060900@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed i did! what did i miss? on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
> Try reading the posting guide before posting.
>
> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
>> Anybody have any thoughts on this? Please? :)
>>
>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>> Hi everyone!
>>>
>>> I have a question about data processing efficiency.
>>>
>>> My data are as follows: I have a data set on quarterly institutional
>>> ownership of equities; some of them have had recent IPOs, some have not (I
>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>
>>> My goal is this: For every quarter since issue for each IPO, I need to
>>> find a "matched" firm in the same industry, and close in market cap. So,
>>> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing firm in
>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about 8300
>>> of these).
>>>
>>> Thus it seems to me that I need to be doing a lot of data selection and
>>> subsetting, and looping (yikes!), but the result appears to be highly
>>> inefficient and takes ages (well, many hours). What I am doing, in
>>> pseudocode, is this:
>>>
>>> 1. for each quarter of data, getting out all the IPOs and all the eligible
>>> non-issuing firms.
>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>> industry, sort them by size, and finally grab a matching firm closest in
>>> size (the exact procedure is to grab the closest bigger firm if one exists,
>>> and just the biggest available if all are smaller)
>>> 3. assign the matched firm-observation the same "quarters since issue" as
>>> the IPO being matched
>>> 4. rbind them all into the "matching" dataset.
>>>
>>> The function I currently have is pasted below, for your reference. Is
>>> there any way to make it produce the same result but much faster?
>>> Specifically, I am guessing eliminating some loops would be very good, but I
>>> don't see how, since I need to do some fancy footwork for each IPO in each
>>> quarter to find the matching firm. I'll be doing a few things similar to
>>> this, so it's somewhat important to up the efficiency of this. Maybe some of
>>> you R-fu masters can clue me in? :)
>>>
>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>
>>> ========== my function below ===========
>>>
>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>> quarters_since_issue=40) {
>>>
>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>> cheaper, so typecast the result to matrix
>>>
>>> colnames = names(tfdata)
>>>
>>> quarterends = sort(unique(tfdata$DATE))
>>>
>>> for (aquarter in quarterends) {
>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>
>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>> (tfdata_quarter$IPO.Flag == 0), ]
>>> tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag
>>> == 1, ]
>>>
>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>> arow = tfdata_quarter_ipoissuers[i,]
>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>> industrypeers = industrypeers[
>>> order(industrypeers$Market.Cap.13f), ]
>>> if ( nrow(industrypeers) > 0 ) {
>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>> arow$Market.Cap.13f, ]) > 0 ) {
>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>> = arow$Market.Cap.13f, ][1,]
>>> }
>>> else {
>>> bestpeer = industrypeers[nrow(industrypeers),]
>>> }
>>> bestpeer$Quarters.Since.IPO.Issue =
>>> arow$Quarters.Since.IPO.Issue
>>>
>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>> bestpeer$PERMNO] = 1
>>> result = rbind(result, as.matrix(bestpeer))
>>> }
>>> }
>>> #result = rbind(result, tfdata_quarter)
>>> print (aquarter)
>>> }
>>>
>>> result = as.data.frame(result)
>>> names(result) = colnames
>>> return(result)
>>>
>>> }
>>>
>>> ========= end of my function =============
>>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
------------------------------ Message: 44 Date: Fri, 6 Jun 2008 12:05:21 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <971536df0806060905t4ed24ec3nf353155b6e129a9f@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Its summarized in the last line to r-help. Note reproducible and minimal. On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
> i did! what did i miss?
>
> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>
>> Try reading the posting guide before posting.
>>
>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com>
>> wrote:
>>>
>>> Anybody have any thoughts on this? Please? :)
>>>
>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>
>>>> Hi everyone!
>>>>
>>>> I have a question about data processing efficiency.
>>>>
>>>> My data are as follows: I have a data set on quarterly institutional
>>>> ownership of equities; some of them have had recent IPOs, some have not
>>>> (I
>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>
>>>> My goal is this: For every quarter since issue for each IPO, I need to
>>>> find a "matched" firm in the same industry, and close in market cap. So,
>>>> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>> firm in
>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
>>>> 8300
>>>> of these).
>>>>
>>>> Thus it seems to me that I need to be doing a lot of data selection and
>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>> pseudocode, is this:
>>>>
>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>> eligible
>>>> non-issuing firms.
>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>> industry, sort them by size, and finally grab a matching firm closest in
>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>> exists,
>>>> and just the biggest available if all are smaller)
>>>> 3. assign the matched firm-observation the same "quarters since issue"
>>>> as
>>>> the IPO being matched
>>>> 4. rbind them all into the "matching" dataset.
>>>>
>>>> The function I currently have is pasted below, for your reference. Is
>>>> there any way to make it produce the same result but much faster?
>>>> Specifically, I am guessing eliminating some loops would be very good,
>>>> but I
>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>> each
>>>> quarter to find the matching firm. I'll be doing a few things similar to
>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>> some of
>>>> you R-fu masters can clue me in? :)
>>>>
>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>
>>>> ========== my function below ===========
>>>>
>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>> quarters_since_issue=40) {
>>>>
>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>> cheaper, so typecast the result to matrix
>>>>
>>>> colnames = names(tfdata)
>>>>
>>>> quarterends = sort(unique(tfdata$DATE))
>>>>
>>>> for (aquarter in quarterends) {
>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>
>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>> tfdata_quarter$IPO.Flag
>>>> == 1, ]
>>>>
>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>> industrypeers = industrypeers[
>>>> order(industrypeers$Market.Cap.13f), ]
>>>> if ( nrow(industrypeers) > 0 ) {
>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>>>
>>>>> = arow$Market.Cap.13f, ][1,]
>>>>
>>>> }
>>>> else {
>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>> }
>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>> arow$Quarters.Since.IPO.Issue
>>>>
>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>> bestpeer$PERMNO] = 1
>>>> result = rbind(result, as.matrix(bestpeer))
>>>> }
>>>> }
>>>> #result = rbind(result, tfdata_quarter)
>>>> print (aquarter)
>>>> }
>>>>
>>>> result = as.data.frame(result)
>>>> names(result) = colnames
>>>> return(result)
>>>>
>>>> }
>>>>
>>>> ========= end of my function =============
>>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
------------------------------ Message: 45 Date: Fri, 6 Jun 2008 12:05:38 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <971536df0806060905u7198d0f6u19979a2e3b5dedc8@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 That is the last line of every message to r-help. On Fri, Jun 6, 2008 at 12:05 PM, Gabor Grothendieck <ggrothendieck_at_gmail.com> wrote:
> Its summarized in the last line to r-help. Note reproducible and
> minimal.
>
> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
>> i did! what did i miss?
>>
>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>
>>> Try reading the posting guide before posting.
>>>
>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com>
>>> wrote:
>>>>
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>>
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly institutional
>>>>> ownership of equities; some of them have had recent IPOs, some have not
>>>>> (I
>>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each IPO, I need to
>>>>> find a "matched" firm in the same industry, and close in market cap. So,
>>>>> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
>>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>>> firm in
>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
>>>>> 8300
>>>>> of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data selection and
>>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>>> pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>> eligible
>>>>> non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>> industry, sort them by size, and finally grab a matching firm closest in
>>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>>> exists,
>>>>> and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same "quarters since issue"
>>>>> as
>>>>> the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your reference. Is
>>>>> there any way to make it produce the same result but much faster?
>>>>> Specifically, I am guessing eliminating some loops would be very good,
>>>>> but I
>>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>>> each
>>>>> quarter to find the matching firm. I'll be doing a few things similar to
>>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>>> some of
>>>>> you R-fu masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>
>>>>> ========== my function below ===========
>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>>> quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>>> cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag
>>>>> == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>>>>
>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>
>>>>> }
>>>>> else {
>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function =============
>>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>
------------------------------ Message: 46 Date: Fri, 6 Jun 2008 12:06:37 -0400 From: "Woolner, Keith" <kwoolner_at_indians.com> Subject: [R] How to force two regression coefficients to be equal but opposite in sign? To: <r-help_at_r-project.org> Message-ID: <F8CE5B1510266D46A27FE586C8D78EEC02F5CEAF@WAHOO.indians.com> Content-Type: text/plain Is there a way to set up a regression in R that forces two coefficients to be equal but opposite in sign? I'm trying to setup a model where a subject appears in a pair of environments where a measurement X is made. There are a total of 5 environments, one of which is a baseline. But each observation is for a subject in only two of them, and not all subjects will appear in each environment. Each of the environments has an effect on the variable X. I want to measure the relative effects of each environment E on X with a model. Xj = Xi * Ei / Ej Ei of the baseline model is set equal to 1. With a log transform, a linear-looking regression can be written as: log(Xj) = log(Xi) + log(Ei) - log(Ej) My data looks like: # E1 X1 E2 X2 1 A .20 B .25 What I've tried in R: env <- c("A","B","C","D","E") # Note: data is made up just for this example df <- data.frame( X1 = c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92), X2 = c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70), E1 = c("A","A","A","B","B","B","C","C","C","D","D","D","E","E","E"), E2 = c("B","C","D","A","D","E","A","B","E","B","C","E","A","B","C") ) model <- lm(log(X2) ~ log(X1) + E1 + E2, data = df) summary(model) Call: lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df) Residuals: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831 -0.2608 -0.1222 0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.54563 1.71558 0.318 0.763 log(X1) 1.29745 0.57295 2.265 0.073 . E1B -0.23571 0.95738 -0.246 0.815 E1C -0.57057 1.20490 -0.474 0.656 E1D -0.22988 0.98274 -0.234 0.824 E1E -1.17181 1.02918 -1.139 0.306 E2B -0.16775 0.87803 -0.191 0.856 E2C 0.05952 1.12779 0.053 0.960 E2D 0.43077 1.19485 0.361 0.733 E2E 0.40633 0.98289 0.413 0.696 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.004 on 5 degrees of freedom Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343 F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721 ---- What I need to do is force the corresponding environment coefficients to be equal in absolute value, but opposite in sign. That is: E1B = -E2B E1C = -E3C E1D = -E3D E1E = -E1E In essence, E1 and E2 are the "same" variable, but can play two different roles in the model depending on whether it's the first part of the observation or the second part. I searched the archive, and the closest thing I found to my situation was: http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html But the response to that thread didn't seem to be applicable to my situation. Any pointers would be appreciated. Thanks, Keith [[alternative HTML version deleted]] ------------------------------ Message: 47 Date: Fri, 06 Jun 2008 12:20:52 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Store filename To: DAVID ARTETA GARCIA <darteta001_at_ikasle.ehu.es> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <484963E4.20709@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed well, where are you getting the filename in the first place? are you looping over a list of filenames that comes from somewhere? generally, for concatenating strings, look at function 'paste': write.table(myoutput, paste(myfilename,"_out.txt", sep=''),sep="\t") on 06/06/2008 11:51 AM DAVID ARTETA GARCIA said the following:
> Hi list,
>
> Is it possible to save the name of a filename automatically when reading
> it using read.table() or some other function?
> My aim is to create then an output table with the name of the original
> table with a suffix like _out
>
> example:
>
> mydata = read.table("Run224_v2_060308.txt", sep = "\t", header = TRUE)
>
> ## store name?
>
> myfile = the_name_of_the_file
>
> ## do analysis of data and store in a data.frame "myoutput"
> ## write output in tab format
>
> write.table(myoutput, c(myfile,"_out.txt"),sep="\t")
>
> the name of the new file will be
>
> "Run224_v2_060308_out.txt"
>
> Thanks in advanve,
>
>
>
> David
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 48 Date: Fri, 6 Jun 2008 13:23:41 -0300 From: "Henrique Dallazuanna" <wwwhsd_at_gmail.com> Subject: Re: [R] Store filename To: "DAVID ARTETA GARCIA" <darteta001_at_ikasle.ehu.es> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <da79af330806060923r4f7058c1wa27e104b8cb4ee2e@mail.gmail.com> Content-Type: text/plain You can write your own function, something about like this: read.table2 <- function(file, ...) { x <- read.table(file, ...) attributes(x)[["file_name"]] <- file return(x) } mydata <- read.table2("Run224_v2_060308.txt", sep = "\t", header = TRUE) myfile <- attr(x, "file_name") On Fri, Jun 6, 2008 at 12:51 PM, DAVID ARTETA GARCIA < darteta001_at_ikasle.ehu.es> wrote:
> Hi list,
>
> Is it possible to save the name of a filename automatically when reading it
> using read.table() or some other function?
> My aim is to create then an output table with the name of the original
> table with a suffix like _out
>
> example:
>
> mydata = read.table("Run224_v2_060308.txt", sep = "\t", header = TRUE)
>
> ## store name?
>
> myfile = the_name_of_the_file
>
> ## do analysis of data and store in a data.frame "myoutput"
> ## write output in tab format
>
> write.table(myoutput, c(myfile,"_out.txt"),sep="\t")
>
> the name of the new file will be
>
> "Run224_v2_060308_out.txt"
>
> Thanks in advanve,
>
>
>
> David
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Henrique Dallazuanna Curitiba-Parana-Brasil 250 25' 40" S 490 16' 22" O [[alternative HTML version deleted]] ------------------------------ Message: 49 Date: Fri, 06 Jun 2008 18:32:14 +0200 From: Dani Valverde <daniel.valverde_at_uab.cat> Subject: [R] fit.contrast error To: R Help <r-help_at_r-project.org> Message-ID: <4849668E.7040009@uab.cat> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hello, I am trying to perform a fit.contrast() on a lme object with this code: attach(error_DB) model_temperature <- lme(Error ~ Temperature, data = error_DB,random=~1|ID) summary(model_temperature) fit.contrast(model_temperature, "Temperature", c(-1,1), conf.int=0.95 ) detach(error_DB) but I got this error Error in `contrasts<-`(`*tmp*`, value = c(-0.5, 0.5)) : contrasts apply only to factors My database is a dataframe, very similar to that of the Orthodont. Could anyone give me some advise on how to solve the problem? Best, Dani -- Daniel Valverde Saub? Grup de Biologia Molecular de Llevats Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona Edifici V, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN Centro de Investigaci?n Biom?dica en Red en Bioingenier?a, Biomateriales y Nanomedicina (CIBER-BBN) Grup d'Aplicacions Biom?diques de la RMN Facultat de Bioci?ncies Universitat Aut?noma de Barcelona Edifici Cs, Campus UAB 08193 Cerdanyola del Vall?s- SPAIN +34 93 5814126 ------------------------------ Message: 50 Date: Fri, 06 Jun 2008 18:46:53 +0200 From: Uwe Ligges <ligges_at_statistik.tu-dortmund.de> Subject: Re: [R] where to download BRugs? To: Nanye Long <nanye.long_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <484969FD.3060807@statistik.tu-dortmund.de> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear NL. BRugs is available from the CRAN extras repository hosted by Brian Ripley. install.packages("BRugs") should install it as before (for R-2.7.x), if you have not changed the list of default repositories. Best wishes, Uwe Ligges Nanye Long wrote:
> Hi all,
>
> Does anyone know where to download the "BRugs" package? I did not find
> it on r-project website. Thanks.
>
> NL
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------ Message: 51 Date: Fri, 6 Jun 2008 12:53:38 -0400 From: "Levi Waldron" <leviwaldron_at_gmail.com> Subject: Re: [R] choosing an appropriate linear model To: "R-help mailing list" <R-help_at_stat.math.ethz.ch> Message-ID: <7a09e3940806060953w5fa5bfb2i750a1e48b5561b5a@mail.gmail.com> Content-Type: text/plain Perhaps this was too big a question, so I'll ask something shorter: I have fit a linear model, and want to use its prediction intervals to calculate the sum of many individual predictions. 1) Some of the lower prediction intervals are negative, which is non-sensical. Should I just set all negative predictions to zero, or is there another way to require non-negative predictions only? 2) I am interested in the sum of many predictions based on the lm. How can I calculate the 95% prediction interval for the sum? Should I calculate a root mean square of the individual errors, or use a bootstrap method, or something else? ps. the data is attached to the end of this email. On Thu, Jun 5, 2008 at 6:25 PM, Levi Waldron <leviwaldron_at_gmail.com> wrote:
> I am trying to model the observed leaching of wood preservative chemicals
> from treated wood during an outdoor experiment where leaching is caused by
> rainfall events. For each rainfall event, the amount of rainfall was
> recorded as well as the amount of preservative chemical leached. A number
> of climatic variables were measured, but the most important is the amount of
> rainfall.
>
> I have tried a simple linear model, with zero intercept because zero
> rainfall cannot cause any leaching (leachdata dataframe is attached to this
> email). The diagnostics show clearly non-normally distributed residuals
> with a simple linear regression, and I am trying to figure out what to do
> about it (see attached diagnostics.png). This dataset contains measurements
> from 57 rainfall events on three replicate samples, for a total of 171
> measurements.
>
> Part of the problem is that physically, the leaching values can only be
> positive, so for the smaller rainfall amounts the residuals are all
> positive. If I allow an intercept then it is significantly positive,
> possibly since the researcher wouldn't have collected measurements for very
> small rain events, but in terms of the model it doesn't make sense
> physically to have a positive intercept, particularly since lab experiments
> have shown that a certain amount of rain exposure is required to wet the
> wood before leaching begins.
>
> I can get more normally distributed residuals by log-transforming the
> response, or using the optimal box-cox transformation of lambda = 0.21,
> which produces nicer-looking residuals but unsatisfactory prediction which
> is the main goal of the model (also attached).
>
> Any advice on how to create a better predictive model? I presume it has
> something to do with glm, especially since I have repeated rainfalls on
> replicate samples, but any advice on the approach to take would be much
> appreciated. The code I used to produce the attached plots is included
> below.
>
>
> leach.lm <- lm(leachate~rainmm-1,data=leachdata)
>
> png("dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> abline(leach.lm)
> plot(predict(leach.lm)~leachdata$leachate,main="predicted vs. observed
> leaching amount",xlim=c(0,12),ylim=c(0,12),xlab="observed
> leaching",ylab="predicted leaching")
> abline(a=0,b=1)
> plot(leach.lm)
> dev.off()
>
> library(MASS)
> boxcox(leach.lm,plotit=T,lambda=seq(0,0.4,by=0.01))
>
> boxtran <- function(y,lambda,inverse=F){
> if(inverse)
> return((lambda*y+1)^(1/lambda))
> else
> return((y^lambda-1)/lambda)
> }
>
> png("boxcox-dianostics.png",height=1200,width=700)
> par(mfrow=c(3,2))
> logleach.lm <- lm(boxtran(leachate,0.21)~rainmm-1,data=leachdata)
> plot(leachate~rainmm,data=leachdata,main="Data and fitted line")
> x <- leachdata$rainmm
> y <- boxtran(predict(logleach.lm),0.21,T)
> xy <- cbind(x,y)[order(x),]
> lines(xy)
> plot(y~leachdata$leachate,xlim=c(0,12),ylim=c(0,12),main="predicted vs.
> observed leaching amount",xlab="observed leaching",ylab="predicted
> leaching")
> abline(a=0,b=1)
> plot(logleach.lm)
> dev.off()
>
`leachdata` <- structure(list(rainmm = c(19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54, 19.68, 36.168, 18.632, 2.74, 0.822, 9.864, 7.124, 29.592, 4.384, 11.508, 1.37, 3.288, 9.042, 2.74, 18.906, 4.932, 0.274, 3.836, 1.918, 4.384, 16.714, 5.754, 12.604, 2.466, 13.014, 2.74, 14.796, 5.754, 4.93, 5.21, 0.548, 1.644, 3.014, 6.028, 18.358, 1.918, 3.014, 18.358, 0.274, 1.918, 54.2, 43.4, 11.2, 1.6, 3.8, 70.2, 0.2, 24.4, 25.8, 13, 7.124, 10.96, 7.672, 3.562, 3.288, 6.02, 17.54), leachate = c(0.94, 4.74, 2.84, 3.28, 0.07, 1.56, 0.48, 9.63, 1.2, 2.55, 0.15, 0.67, 0.57, 0.38, 1.81, 0.08, 0.94, 0.79, 0.16, 0.09, 1.2, 0.61, 0.77, 0.02, 1, 0.26, 1.34, 0.81, 0.18, 0.17, 0.005, 0.25, 0.42, 1.45, 0.54, 0.24, 0.41, 0.55, 1.59, 1.09, 3.84, 11.52, 6.21, 3.86, 2.34, 11.02, 2.33, 1.83, 2.4, 0.74, 0.71, 0.55, 0.31, 0.83, 0.29, 0.48, 0.92, 1.33, 4.8, 1.73, 1.87, 0.21, 1.04, 1.08, 6.74, 1.23, 2.5, 0.13, 1.29, 0.75, 0.66, 2.14, 0.17, 0.43, 0.69, 0.47, 0.14, 1.6, 0.56, 1.02, 0.04, 0.75, 0.32, 1.68, 0.58, 0.42, 0.18, 0.1, 0.34, 0.36, 1.54, 0.38, 0.18, 0.26, 0.005, 0.17, 0.18, 0.4, 2.13, 0.87, 0.75, 0.52, 3.21, 0.49, 0.85, 1.24, 0.32, 0.5, 0.37, 0.19, 0.53, 0.3, 0.51, 1.37, 1.25, 3.69, 2.76, 1.82, 0.005, 0.99, 0.87, 6.93, 1.04, 2.26, 0.14, 1.27, 0.62, 0.6, 2.91, 0.19, 0.41, 0.47, 0.38, 0.17, 1.56, 0.41, 0.92, 0.02, 0.51, 0.26, 0.86, 0.47, 0.39, 0.12, 0.08, 0.28, 0.3, 1.16, 0.27, 0.15, 0.22, 0.3, 0.18, 0.16, 0.47, 6, 1.47, 0.67, 0.35, 2.13, 0.51, 0.85, 1.37, 0.23, 0.45, 0.34, 0.17, 0.46, 0.23, 0.43, 1.17)), .Names = c("rainmm", "leachate" ), row.names = c(NA, -171L), class = "data.frame") [[alternative HTML version deleted]] ------------------------------ Message: 52 Date: Fri, 6 Jun 2008 10:02:40 -0700 (PDT) From: avilella <avilella_at_gmail.com> Subject: [R] reorder breaking by half To: r-help_at_r-project.org Message-ID: <89301b84-41f6-48fc-88e0-4087e0a21a4e@c65g2000hsa.googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1 Hi, I want to reorder the colors given by rainbow(7) so that the last half move to the first 4. For example:
> ci=rainbow(7)
> ci
[1] "#FF0000FF" "#FFDB00FF" "#49FF00FF" "#00FF92FF" "#0092FFFF" "#4900FFFF" [7] "#FF00DBFF" I would like "#FF0000FF" "#FFDB00FF" "#49FF00FF" to be at the end of ci, and the rest to be at the beginning. How can I do that? ------------------------------ Message: 53 Date: Fri, 6 Jun 2008 10:11:42 -0700 (PDT) From: Thomas Lumley <tlumley_at_u.washington.edu> Subject: Re: [R] rmeta package: metaplot or forestplot of meta-analysis under DSL (ramdon) model To: "Shi, Jiajun [BSD] - KNP" <jshi1_at_bsd.uchicago.edu> Cc: r-help_at_r-project.org Message-ID: <Pine.LNX.4.64.0806061009100.13806@homer21.u.washington.edu> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed The package has a plot() method for random-effects meta-analyses as well, either those produced by meta.DSL or meta.summaries. There are examples on the help page for meta.DSL. -thomas On Tue, 27 May 2008, Shi, Jiajun [BSD] - KNP wrote:
> Dear all,
>
> I could not draw a forest plot for meta-analysis under ramdon models
> using the rmeta package. The rmeta has a default function for MH
> (fixed-effect) model. Has the rmeta package been updated for such a
> function? Or someone revised it and kept a private code?
>
> I would appreciate it if you could provide some information on this
> question.
>
> Thanks,
>
> Andrew
>
>
> This email is intended only for the use of the individua...{{dropped:12}}
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics tlumley_at_u.washington.edu University of Washington, Seattle ------------------------------ Message: 54 Date: Fri, 6 Jun 2008 19:12:00 +0200 (CEST) From: "Luca Mortarini" <l.mortarini_at_isac.cnr.it> Subject: [R] Problem with subset To: r-help_at_r-project.org Message-ID: <56855.213.140.22.79.1212772320.squirrel@mail.isac.cnr.it> Content-Type: text/plain;charset=iso-8859-1 Hi, I am new to R and i am looking for a way to extract a subset from a vector. I have a vector of number oscillating around zero (a decreasing autocorrelation function) and i would like to extract only the first positive part of the function (from zero lag to the lag where the function inverts its sign for the first time). I have tried subset(myvector,myvector>0) but this obviously extract all the positive intervals not only the first one. Is there a logical statement i can use in subset? I prefer not to use an if statement that would probably slow down the code. Thanks a lot, Luca ********************************************************* dr. Luca Mortarini l.mortarini_at_isac.cnr.it Universit? del Piemonte Orientale Dipartimento di Scienze e Tecnologie Avanzate ------------------------------ Message: 55 Date: Fri, 6 Jun 2008 10:14:12 -0700 From: "Charles C. Berry" <cberry_at_tajo.ucsd.edu> Subject: Re: [R] Manipulating DataSets To: Neil Gupta <neil.gup_at_gmail.com> Cc: R-help_at_r-project.org Message-ID: <Pine.LNX.4.64.0806061008470.28293@tajo.ucsd.edu> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, Neil Gupta wrote:
> Hello R-users,
>
> I have a very simple problem I wanted to solve. I have a large dataset as
> such:
> Lag X.Symbol Time TickType ReferenceNumber Price Size X.Symbol.1
> Time.1 TickType.1 ReferenceNumber.1
> 1 ES 3:ESZ7.GB 08:30:00 B 74390987 151075 44
> 3:ESZ7.GB08:30:00 A 74390988
> 2 ES 3:YMZ7.EC 08:30:00 B 74390993 13686 17
> 3:YMZ7.EC08:30:00 A 74390994
> 3 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
> 3:ESZ7.GB08:30:00 A 74391136
> 4 YM 3:YMZ7.EC 08:30:00 B 74390998 13686 17
> 3:YMZ7.EC08:30:00 A 74390999
> 5 YM 3:ESZ7.GB 08:30:00 B 74391135 151075 49
> 3:ESZ7.GB08:30:00 A 74391136
> 6 YM 3:YMZ7.EC 08:30:00 B 74391000 13686 14
> 3:YMZ7.EC08:30:00 A 74391001
> Price.1 Size.1 LeadTime MidPoint Spread
> 1 151100 22 08:30:00 *151087.5* 25
> 2 13688 27 08:30:00 13687.0 2
> 3 151100 22 08:30:00 *151087.5* 25
> 4 13688 27 08:30:00 13687.0 2
> 5 151100 22 08:30:00 151087.5 25
> 6 13688 27 08:30:00 13687.0 2
>
>
> All I wanted to do was take the Log(MidPoint[2]) - Log(MidPoint[1]) for a
> symbol "3:ESZ7.GB"
> So the first one would be log(151087.5) - log(151087.5). I wanted to do this
> throughout the data set and add that in another column. I would appreciate
> any help.
See example( split ) Note the "### data frame variation", which should serve as a template for your problem. HTH, Chuck
>
> Regards,
>
> Neil Gupta
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry_at_tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ------------------------------ Message: 56 Date: Fri, 06 Jun 2008 13:25:05 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Gabor Grothendieck <ggrothendieck_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <484972F1.3070203@gmail.com> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" i thought since the function code (which i provided in full) was pretty short, it would be reasonably easy to just read the code and see what it's doing. but ok, so... i am attaching a zip file, with a small sample of the data set (tab delimited), and the function code, in a zip file (posting guidelines claim that "some archive formats" are allowed, i assume zip is one of them... would appreciate your comments! :) on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
> Its summarized in the last line to r-help. Note reproducible and
> minimal.
>
> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
>> i did! what did i miss?
>>
>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>> Try reading the posting guide before posting.
>>>
>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn <dfolkins_at_gmail.com>
>>> wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly institutional
>>>>> ownership of equities; some of them have had recent IPOs, some have not
>>>>> (I
>>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each IPO, I need to
>>>>> find a "matched" firm in the same industry, and close in market cap. So,
>>>>> e.g., for firm X, which had an IPO, i need to find a matched non-issuing
>>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>>> firm in
>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are about
>>>>> 8300
>>>>> of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data selection and
>>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>>> pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>> eligible
>>>>> non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>> industry, sort them by size, and finally grab a matching firm closest in
>>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>>> exists,
>>>>> and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same "quarters since issue"
>>>>> as
>>>>> the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your reference. Is
>>>>> there any way to make it produce the same result but much faster?
>>>>> Specifically, I am guessing eliminating some loops would be very good,
>>>>> but I
>>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>>> each
>>>>> quarter to find the matching firm. I'll be doing a few things similar to
>>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>>> some of
>>>>> you R-fu masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>
>>>>> ========== my function below ===========
>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>>> quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>>> cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag
>>>>> == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function =============
>>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>
------------------------------ Message: 57 Date: Fri, 06 Jun 2008 13:25:40 -0400 From: "John Fox" <jfox_at_mcmaster.ca> Subject: Re: [R] lsmeans To: Dani Valverde <daniel.valverde_at_uab.cat> Cc: R Help <r-help_at_r-project.org> Message-ID: <web-214018018@cgpsrv2.cis.mcmaster.ca> Content-Type: text/plain; charset="ISO-8859-1" Dear Dani, I intend at some point to extend the effects package to linear and generalized linear mixed-effects models, probably using lmer() rather than lme(), but as you discovered, it doesn't handle these models now. It wouldn't be hard, however, to do the computations yourself, using the coefficient vector for the fixed effects and a suitably constructed model-matrix to compute the effects; you could also get standard errors by using the covariance matrix for the fixed effects. I hope this helps, John On Fri, 06 Jun 2008 17:05:58 +0200 Dani Valverde <daniel.valverde_at_uab.cat> wrote:
> Hello,
> I have the next function call:
>
> lme(fixed=Error ~ Temperature * Tumour ,random = ~1|ID,
> data=error_DB)
>
> which returns an lme object. I am interested on carrying out some
> kind of lsmeans on the data returned, but I cannot find any function
> to do this in R. I'have seen the effect() function, but it does not
> work with lme objects. Any idea?
>
> Best,
>
> Dani
>
> --
> Daniel Valverde Saub?
>
> Grup de Biologia Molecular de Llevats
> Facultat de Veterin?ria de la Universitat Aut?noma de Barcelona
> Edifici V, Campus UAB
> 08193 Cerdanyola del Vall?s- SPAIN
>
> Centro de Investigaci?n Biom?dica en Red
> en Bioingenier?a, Biomateriales y
> Nanomedicina (CIBER-BBN)
>
> Grup d'Aplicacions Biom?diques de la RMN
> Facultat de Bioci?ncies
> Universitat Aut?noma de Barcelona
> Edifici Cs, Campus UAB
> 08193 Cerdanyola del Vall?s- SPAIN
> +34 93 5814126
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ ------------------------------ Message: 58 Date: Fri, 06 Jun 2008 13:27:32 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] reorder breaking by half To: avilella <avilella_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48497384.5000505@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed ci = rainbow(7)[c(4:7, 1:3)] on 06/06/2008 01:02 PM avilella said the following:
> Hi,
>
> I want to reorder the colors given by rainbow(7) so that the last half
> move to the first 4.
>
> For example:
>
>> ci=rainbow(7)
>> ci
> [1] "#FF0000FF" "#FFDB00FF" "#49FF00FF" "#00FF92FF" "#0092FFFF"
> "#4900FFFF"
> [7] "#FF00DBFF"
>
> I would like "#FF0000FF" "#FFDB00FF" "#49FF00FF" to be at the end of
> ci, and the rest to be at the beginning.
>
> How can I do that?
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 59 Date: Fri, 06 Jun 2008 13:29:56 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Patrick Burns <pburns_at_pburns.seanet.com> Cc: r-help_at_r-project.org Message-ID: <48497414.1060802@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed thanks for the tip! i'll try that and see how big of a difference that makes... if i am not sure what exactly the size will be, am i better off making it larger, and then later stripping off the blank rows, or making it smaller, and appending the missing rows? on 06/06/2008 11:44 AM Patrick Burns said the following:
> One thing that is likely to speed the code significantly
> is if you create 'result' to be its final size and then
> subscript into it. Something like:
>
> result[i, ] <- bestpeer
>
> (though I'm not sure if 'i' is the proper index).
>
> Patrick Burns
> patrick_at_burns-stat.com
> +44 (0)20 8525 0696
> http://www.burns-stat.com
> (home of S Poetry and "A Guide for the Unwilling S User")
>
> Daniel Folkinshteyn wrote:
>> Anybody have any thoughts on this? Please? :)
>>
>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>> Hi everyone!
>>>
>>> I have a question about data processing efficiency.
>>>
>>> My data are as follows: I have a data set on quarterly institutional
>>> ownership of equities; some of them have had recent IPOs, some have
>>> not (I have a binary flag set). The total dataset size is 700k+ rows.
>>>
>>> My goal is this: For every quarter since issue for each IPO, I need
>>> to find a "matched" firm in the same industry, and close in market
>>> cap. So, e.g., for firm X, which had an IPO, i need to find a matched
>>> non-issuing firm in quarter 1 since IPO, then a (possibly different)
>>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing
>>> firm (there are about 8300 of these).
>>>
>>> Thus it seems to me that I need to be doing a lot of data selection
>>> and subsetting, and looping (yikes!), but the result appears to be
>>> highly inefficient and takes ages (well, many hours). What I am
>>> doing, in pseudocode, is this:
>>>
>>> 1. for each quarter of data, getting out all the IPOs and all the
>>> eligible non-issuing firms.
>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>> industry, sort them by size, and finally grab a matching firm closest
>>> in size (the exact procedure is to grab the closest bigger firm if
>>> one exists, and just the biggest available if all are smaller)
>>> 3. assign the matched firm-observation the same "quarters since
>>> issue" as the IPO being matched
>>> 4. rbind them all into the "matching" dataset.
>>>
>>> The function I currently have is pasted below, for your reference. Is
>>> there any way to make it produce the same result but much faster?
>>> Specifically, I am guessing eliminating some loops would be very
>>> good, but I don't see how, since I need to do some fancy footwork for
>>> each IPO in each quarter to find the matching firm. I'll be doing a
>>> few things similar to this, so it's somewhat important to up the
>>> efficiency of this. Maybe some of you R-fu masters can clue me in? :)
>>>
>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>
>>> ========== my function below ===========
>>>
>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>> quarters_since_issue=40) {
>>>
>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>> cheaper, so typecast the result to matrix
>>>
>>> colnames = names(tfdata)
>>>
>>> quarterends = sort(unique(tfdata$DATE))
>>>
>>> for (aquarter in quarterends) {
>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>
>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>> (tfdata_quarter$IPO.Flag == 0), ]
>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>> tfdata_quarter$IPO.Flag == 1, ]
>>>
>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>> arow = tfdata_quarter_ipoissuers[i,]
>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>> industrypeers = industrypeers[
>>> order(industrypeers$Market.Cap.13f), ]
>>> if ( nrow(industrypeers) > 0 ) {
>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>>> >= arow$Market.Cap.13f, ]) > 0 ) {
>>> bestpeer =
>>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>>> }
>>> else {
>>> bestpeer = industrypeers[nrow(industrypeers),]
>>> }
>>> bestpeer$Quarters.Since.IPO.Issue =
>>> arow$Quarters.Since.IPO.Issue
>>>
>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>> bestpeer$PERMNO] = 1
>>> result = rbind(result, as.matrix(bestpeer))
>>> }
>>> }
>>> #result = rbind(result, tfdata_quarter)
>>> print (aquarter)
>>> }
>>>
>>> result = as.data.frame(result)
>>> names(result) = colnames
>>> return(result)
>>>
>>> }
>>>
>>> ========= end of my function =============
>>>
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------ Message: 60 Date: Fri, 6 Jun 2008 13:35:48 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <971536df0806061035vd6a2941v18303c1ce78bed2d@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 I think the posting guide may not be clear enough and have suggested that it be clarified. Hopefully this better communicates what is required and why in a shorter amount of space: https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
> i thought since the function code (which i provided in full) was pretty
> short, it would be reasonably easy to just read the code and see what it's
> doing.
>
> but ok, so... i am attaching a zip file, with a small sample of the data set
> (tab delimited), and the function code, in a zip file (posting guidelines
> claim that "some archive formats" are allowed, i assume zip is one of
> them...
>
> would appreciate your comments! :)
>
> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>>
>> Its summarized in the last line to r-help. Note reproducible and
>> minimal.
>>
>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com>
>> wrote:
>>>
>>> i did! what did i miss?
>>>
>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>>
>>>> Try reading the posting guide before posting.
>>>>
>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>> <dfolkins_at_gmail.com>
>>>> wrote:
>>>>>
>>>>> Anybody have any thoughts on this? Please? :)
>>>>>
>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>>>
>>>>>> Hi everyone!
>>>>>>
>>>>>> I have a question about data processing efficiency.
>>>>>>
>>>>>> My data are as follows: I have a data set on quarterly institutional
>>>>>> ownership of equities; some of them have had recent IPOs, some have
>>>>>> not
>>>>>> (I
>>>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>>>
>>>>>> My goal is this: For every quarter since issue for each IPO, I need to
>>>>>> find a "matched" firm in the same industry, and close in market cap.
>>>>>> So,
>>>>>> e.g., for firm X, which had an IPO, i need to find a matched
>>>>>> non-issuing
>>>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>>>> firm in
>>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are
>>>>>> about
>>>>>> 8300
>>>>>> of these).
>>>>>>
>>>>>> Thus it seems to me that I need to be doing a lot of data selection
>>>>>> and
>>>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>>>> pseudocode, is this:
>>>>>>
>>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>>> eligible
>>>>>> non-issuing firms.
>>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>>> industry, sort them by size, and finally grab a matching firm closest
>>>>>> in
>>>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>>>> exists,
>>>>>> and just the biggest available if all are smaller)
>>>>>> 3. assign the matched firm-observation the same "quarters since issue"
>>>>>> as
>>>>>> the IPO being matched
>>>>>> 4. rbind them all into the "matching" dataset.
>>>>>>
>>>>>> The function I currently have is pasted below, for your reference. Is
>>>>>> there any way to make it produce the same result but much faster?
>>>>>> Specifically, I am guessing eliminating some loops would be very good,
>>>>>> but I
>>>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>>>> each
>>>>>> quarter to find the matching firm. I'll be doing a few things similar
>>>>>> to
>>>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>>>> some of
>>>>>> you R-fu masters can clue me in? :)
>>>>>>
>>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>>
>>>>>> ========== my function below ===========
>>>>>>
>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>>>> quarters_since_issue=40) {
>>>>>>
>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>>>> cheaper, so typecast the result to matrix
>>>>>>
>>>>>> colnames = names(tfdata)
>>>>>>
>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>
>>>>>> for (aquarter in quarterends) {
>>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>>
>>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>> tfdata_quarter$IPO.Flag
>>>>>> == 1, ]
>>>>>>
>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>>> industrypeers = industrypeers[
>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>>>>>
>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>>
>>>>>> }
>>>>>> else {
>>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>>> }
>>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>
>>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>>> bestpeer$PERMNO] = 1
>>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>>> }
>>>>>> }
>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>> print (aquarter)
>>>>>> }
>>>>>>
>>>>>> result = as.data.frame(result)
>>>>>> names(result) = colnames
>>>>>> return(result)
>>>>>>
>>>>>> }
>>>>>>
>>>>>> ========= end of my function =============
>>>>>>
>>>>> ______________________________________________
>>>>> R-help_at_r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>
>
------------------------------ Message: 61 Date: Fri, 06 Jun 2008 13:36:42 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Gabor Grothendieck <ggrothendieck_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <484975AA.8000800@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed just in case, uploaded it to the server, you can get the zip file i mentioned here: http://astro.temple.edu/~dfolkins/helplistfiles.zip on 06/06/2008 01:25 PM Daniel Folkinshteyn said the following:
> i thought since the function code (which i provided in full) was pretty
> short, it would be reasonably easy to just read the code and see what
> it's doing.
>
> but ok, so... i am attaching a zip file, with a small sample of the data
> set (tab delimited), and the function code, in a zip file (posting
> guidelines claim that "some archive formats" are allowed, i assume zip
> is one of them...
>
> would appreciate your comments! :)
>
> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>> Its summarized in the last line to r-help. Note reproducible and
>> minimal.
>>
>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn
>> <dfolkins_at_gmail.com> wrote:
>>> i did! what did i miss?
>>>
>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>> Try reading the posting guide before posting.
>>>>
>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>> <dfolkins_at_gmail.com>
>>>> wrote:
>>>>> Anybody have any thoughts on this? Please? :)
>>>>>
>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>>> Hi everyone!
>>>>>>
>>>>>> I have a question about data processing efficiency.
>>>>>>
>>>>>> My data are as follows: I have a data set on quarterly institutional
>>>>>> ownership of equities; some of them have had recent IPOs, some
>>>>>> have not
>>>>>> (I
>>>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>>>
>>>>>> My goal is this: For every quarter since issue for each IPO, I
>>>>>> need to
>>>>>> find a "matched" firm in the same industry, and close in market
>>>>>> cap. So,
>>>>>> e.g., for firm X, which had an IPO, i need to find a matched
>>>>>> non-issuing
>>>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>>>> firm in
>>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are
>>>>>> about
>>>>>> 8300
>>>>>> of these).
>>>>>>
>>>>>> Thus it seems to me that I need to be doing a lot of data
>>>>>> selection and
>>>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>>>> pseudocode, is this:
>>>>>>
>>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>>> eligible
>>>>>> non-issuing firms.
>>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>>> industry, sort them by size, and finally grab a matching firm
>>>>>> closest in
>>>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>>>> exists,
>>>>>> and just the biggest available if all are smaller)
>>>>>> 3. assign the matched firm-observation the same "quarters since
>>>>>> issue"
>>>>>> as
>>>>>> the IPO being matched
>>>>>> 4. rbind them all into the "matching" dataset.
>>>>>>
>>>>>> The function I currently have is pasted below, for your reference. Is
>>>>>> there any way to make it produce the same result but much faster?
>>>>>> Specifically, I am guessing eliminating some loops would be very
>>>>>> good,
>>>>>> but I
>>>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>>>> each
>>>>>> quarter to find the matching firm. I'll be doing a few things
>>>>>> similar to
>>>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>>>> some of
>>>>>> you R-fu masters can clue me in? :)
>>>>>>
>>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>>
>>>>>> ========== my function below ===========
>>>>>>
>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>>>> quarters_since_issue=40) {
>>>>>>
>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>>>> cheaper, so typecast the result to matrix
>>>>>>
>>>>>> colnames = names(tfdata)
>>>>>>
>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>
>>>>>> for (aquarter in quarterends) {
>>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>>
>>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>> tfdata_quarter$IPO.Flag
>>>>>> == 1, ]
>>>>>>
>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>>> industrypeers = industrypeers[
>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>>> bestpeer =
>>>>>> industrypeers[industrypeers$Market.Cap.13f
>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>> }
>>>>>> else {
>>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>>> }
>>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>
>>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>>> bestpeer$PERMNO] = 1
>>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>>> }
>>>>>> }
>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>> print (aquarter)
>>>>>> }
>>>>>>
>>>>>> result = as.data.frame(result)
>>>>>> names(result) = colnames
>>>>>> return(result)
>>>>>>
>>>>>> }
>>>>>>
>>>>>> ========= end of my function =============
>>>>>>
>>>>> ______________________________________________
>>>>> R-help_at_r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>
------------------------------ Message: 62 Date: Fri, 6 Jun 2008 11:39:27 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: Re: [R] How to force two regression coefficients to be equal but opposite in sign? To: "Woolner, Keith" <kwoolner_at_indians.com>, "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F685895B@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii One simple way is to do something like:
> fit <- lm(y ~ I(x1-x2) + x3, data=mydata)
The first coeficient (after the intercept) will be the slope for x1, the slope for x2 will be the negative of that. This model is nested in the fuller model with x1 and x2 fit seperately and you can therefore test for differences. Hope this helps, -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111
> -----Original Message-----
> From: r-help-bounces_at_r-project.org
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Woolner, Keith
> Sent: Friday, June 06, 2008 10:07 AM
> To: r-help_at_r-project.org
> Subject: [R] How to force two regression coefficients to be
> equal but opposite in sign?
>
> Is there a way to set up a regression in R that forces two
> coefficients
>
> to be equal but opposite in sign?
>
>
>
> I'm trying to setup a model where a subject appears in a pair of
>
> environments where a measurement X is made. There are a total of 5
>
> environments, one of which is a baseline. But each observation is for
>
> a subject in only two of them, and not all subjects will appear in
>
> each environment.
>
>
>

> Each of the environments has an effect on the variable X. I want to
>
> measure the relative effects of each environment E on X with a model.
>
>
>
> Xj = Xi * Ei / Ej
>
>
>
> Ei of the baseline model is set equal to 1.
>
>
>
> With a log transform, a linear-looking regression can be written as:
>
>
>
> log(Xj) = log(Xi) + log(Ei) - log(Ej)
>
>
>
> My data looks like:
>
>
>
> # E1 X1 E2 X2
>
> 1 A .20 B .25
>
>
>
> What I've tried in R:
>
>
>
> env <- c("A","B","C","D","E")
>
>
>
> # Note: data is made up just for this example
>
>
>
> df <- data.frame(
>
> X1 =
> c(.20,.10,.40,.05,.10,.24,.30,.70,.48,.22,.87,.29,.24,.19,.92),
>
> X2 =
> c(.25,.12,.45,.01,.19,.50,.30,.40,.50,.40,.68,.30,.16,.02,.70),
>
> E1 =
> c("A","A","A","B","B","B","C","C","C","D","D","D","E","E","E"),
>
> E2 =
> c("B","C","D","A","D","E","A","B","E","B","C","E","A","B","C")
>
> )
>
>
>
> model <- lm(log(X2) ~ log(X1) + E1 + E2, data = df)
>
>
>
> summary(model)
>
>
>
> Call:
>
> lm(formula = log(X2) ~ log(X1) + E1 + E2, data = df)
>
>
>
> Residuals:
>

> 1 2 3 4 5 6 7
> 8 9
> 10 11 12 13 14 15
>
> 0.3240 0.2621 -0.5861 -1.0283 0.5861 0.4422 0.3831
> -0.2608 -0.1222
> 0.9002 -0.5802 -0.3200 0.6452 -0.9634 0.3182
>
>
>
> Coefficients:
>
> Estimate Std. Error t value Pr(>|t|)
>
> (Intercept) 0.54563 1.71558 0.318 0.763
>
> log(X1) 1.29745 0.57295 2.265 0.073 .
>
> E1B -0.23571 0.95738 -0.246 0.815
>
> E1C -0.57057 1.20490 -0.474 0.656
>
> E1D -0.22988 0.98274 -0.234 0.824
>
> E1E -1.17181 1.02918 -1.139 0.306
>
> E2B -0.16775 0.87803 -0.191 0.856
>
> E2C 0.05952 1.12779 0.053 0.960
>
> E2D 0.43077 1.19485 0.361 0.733
>
> E2E 0.40633 0.98289 0.413 0.696
>
> ---
>
> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
>
>
> Residual standard error: 1.004 on 5 degrees of freedom
>
> Multiple R-squared: 0.7622, Adjusted R-squared: 0.3343
>
> F-statistic: 1.781 on 9 and 5 DF, p-value: 0.2721
>
>
>
> ----
>
>
>
> What I need to do is force the corresponding environment coefficients
>
> to be equal in absolute value, but opposite in sign. That is:
>
>
>
> E1B = -E2B
>
> E1C = -E3C
>
> E1D = -E3D
>
> E1E = -E1E

>
>
>
> In essence, E1 and E2 are the "same" variable, but can play two
>
> different roles in the model depending on whether it's the first part
>
> of the observation or the second part.
>
>
>
> I searched the archive, and the closest thing I found to my situation
>
> was:
>
>
>
> http://tolstoy.newcastle.edu.au/R/e4/help/08/03/6773.html
>
>
>
> But the response to that thread didn't seem to be applicable to my
>
> situation.
>
>
>
> Any pointers would be appreciated.
>
>
>
> Thanks,
>
> Keith
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 63 Date: Fri, 6 Jun 2008 13:43:27 -0400 From: "jim holtman" <jholtman_at_gmail.com> Subject: Re: [R] Subsetting to unique values To: "Emslie, Paul [Ctr]" <emsliep_at_atac.mil> Cc: r-help_at_r-project.org Message-ID: <644e1f320806061043t5ab2fd01h586099c91e165ad4@mail.gmail.com> Content-Type: text/plain The interesting thing about R is that there are several ways to "skin the cat"; here is yet another solution:
> do.call(rbind, by(ddTable, ddTable$Id, function(z) z[1,,drop=FALSE]))
Id name 1 1 Paul 2 2 Bob
>
On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] <emsliep_at_atac.mil> wrote:
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
> data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] ------------------------------ Message: 64 Date: Fri, 6 Jun 2008 18:51:34 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] where to download BRugs? To: Nanye Long <nanye.long_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <alpine.LFD.1.10.0806061847500.10799@gannet.stats.ox.ac.uk> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, Nanye Long wrote:
> Hi all,
>
> Does anyone know where to download the "BRugs" package? I did not find
> it on r-project website. Thanks.
It is Windows-only, and you download it from 'CRAN (extras)' which is part of the default repository set on Windows versions of R. So install.packages("BRugs") is all that is needed unless you changed something to stop it working. (It is only available for R >= 2.6.0.) -- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ------------------------------ Message: 65 Date: Fri, 6 Jun 2008 10:54:06 -0700 From: "Charles C. Berry" <cberry_at_tajo.ucsd.edu> Subject: Re: [R] Problem with subset To: Luca Mortarini <l.mortarini_at_isac.cnr.it> Cc: r-help_at_r-project.org Message-ID: <Pine.LNX.4.64.0806061048150.28293@tajo.ucsd.edu> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" On Fri, 6 Jun 2008, Luca Mortarini wrote:
> Hi,
> I am new to R and i am looking for a way to extract a subset from a
> vector.
> I have a vector of number oscillating around zero (a decreasing
> autocorrelation function) and i would like to extract only the first
> positive part of the function (from zero lag to the lag where the function
> inverts its sign for the first time).
> I have tried
>
> subset(myvector,myvector>0)
>
> but this obviously extract all the positive intervals not only the first one.
> Is there a logical statement i can use in subset? I prefer not to use an
For vector subsets you probably want "[". Because from help("[") For ordinary vectors, the result is simply x[subset & !is.na(subset)]. But see ?rle Something like myvector[ 1 : rle( myvector >= 0 )$lengths[ 1 ] ] should work. HTH, Chuck
> if statement that would probably slow down the code.
> Thanks a lot,
> Luca
>
>
> *********************************************************
> dr. Luca Mortarini l.mortarini_at_isac.cnr.it
> Universit? del Piemonte Orientale
> Dipartimento di Scienze e Tecnologie Avanzate
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry_at_tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ------------------------------ Message: 66 Date: Fri, 06 Jun 2008 13:57:39 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Gabor Grothendieck <ggrothendieck_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48497A93.6050602@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Ok, sorry about the zip, then. :) Thanks for taking the trouble to clue [[elided Yahoo spam]] well, here's a dput-ed version of the small data subset you can use for testing. below that, an updated version of the function, with extra explanatory comments, and producing an extra column showing exactly what is matched to what. to test, just run the function, with the dataset as sole argument. Thanks again; i'd appreciate any input on this. =========== begin dataset dput representation ============ structure(list(PERMNO = c(10001L, 10001L, 10298L, 10298L, 10484L, 10484L, 10515L, 10515L, 10634L, 10634L, 11048L, 11048L, 11237L, 11294L, 11294L, 11338L, 11338L, 11404L, 11404L, 11587L, 11587L, 11591L, 11591L, 11737L, 11737L, 11791L, 11809L, 11809L, 11858L, 11858L, 11955L, 11955L, 12003L, 12003L, 12016L, 12016L, 12223L, 12223L, 12758L, 12758L, 13688L, 13688L, 16117L, 16117L, 17770L, 17770L, 21514L, 21514L, 21792L, 21792L, 21821L, 21821L, 22437L, 22437L, 22947L, 22947L, 23027L, 23027L, 23182L, 23182L, 23536L, 23536L, 23712L, 23712L, 24053L, 24053L, 24117L, 24117L, 24256L, 24256L, 24299L, 24299L, 24352L, 24352L, 24379L, 24379L, 24467L, 24467L, 24679L, 24679L, 24870L, 24870L, 25056L, 25056L, 25208L, 25208L, 25232L, 25232L, 25241L, 25590L, 25590L, 26463L, 26463L, 26470L, 26470L, 26614L, 26614L, 27385L, 27385L, 29196L, 29196L, 30411L, 30411L, 32943L, 32943L, 38893L, 38893L, 40708L, 40708L, 41005L, 41005L, 42817L, 42817L, 42833L, 42833L, 43668L, 43668L, 45947L, 45947L, 46017L, 46017L, 48274L, 48274L, 49971L, 49971L, 53786L, 53786L, 53859L, 53859L, 54199L, 54199L, 56371L, 56952L, 56952L, 57277L, 57277L, 57381L, 57381L, 58202L, 58202L, 59395L, 59395L, 59935L, 60169L, 60169L, 61188L, 61188L, 61444L, 61444L, 62690L, 62690L, 62842L, 62842L, 64290L, 64290L, 64418L, 64418L, 64450L, 64450L, 64477L, 64477L, 64557L, 64557L, 64646L, 64646L, 64902L, 64902L, 67774L, 67774L, 68910L, 68910L, 70471L, 70471L, 74406L, 74406L, 75091L, 75091L, 75304L, 75304L, 75743L, 75964L, 75964L, 76026L, 76026L, 76162L, 76170L, 76173L, 78530L, 78530L, 78682L, 78682L, 81569L, 81569L, 82502L, 82502L, 83337L, 83337L, 83919L, 83919L, 88242L, 88242L, 90852L, 90852L, 91353L, 91353L ), DATE = c(19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900630, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900630, 19900331, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630, 19900331, 19900630), Shares.Owned = c(50100, 50100, 250000, 293500, 3656629, 3827119, 4132439, 3566591, 2631193, 2500301, 775879, 816879, 38700, 1041600, 1070300, 533768, 558815, 61384492, 60466567, 194595, 196979, 359946, 314446, 106770, 107070, 20242, 1935098, 2099403, 1902125, 1766750, 41991, 41991, 34490, 36290, 589400, 596700, 1549395, 1759440, 854473, 762903, 156366785, 98780287, 2486389, 2635718, 122264, 122292, 25455916, 25458658, 71645490, 71855722, 30969596, 30409838, 2738576, 2814490, 20846605, 20930233, 1148299, 505415, 396388, 385714, 25239923, 24117950, 73465526, 73084616, 8096614, 7595742, 3937930, 3820215, 20884821, 19456342, 2127331, 2188276, 2334515, 2813347, 8267264, 8544084, 783277, 810742, 742048, 512956, 9659658, 9436873, 40107717, 41234384, 9111755, 9708782, 12815719, 13144148, 1146100, 8292392, 8271030, 282650, 281273, 4196126, 4273758, 2489363, 2734182, 1579681, 1369192, 51947585, 51941430, 54673, 52585, 317601, 314876, 62626258, 63341772, 8977553, 8940106, 4478872, 4315631, 1246339, 1227442, 68484747, 68041081, 22679902, 21775270, 927147, 936881, 2626449, 2245552, 14029366, 14304855, 2434123, 2184358, 77479654, 81754241, 333070, 282967, 241146, 256146, 11419, 819092, 798490, 1403179, 1326018, 238974451, 237684105, 1889699, 2317096, 4887641, 5972387, 3567239, 1024595, 993627, 5254732, 5459404, 413146, 432697, 5307595, 4813261, 7717872, 8689444, 2431341, 2372096, 909359, 868068, 2110670, 2055349, 23774859, 23573345, 4234466, 4143534, 1192314, 1255105, 3052000, 2605700, 5566270, 5972761, 1470173, 1448403, 28065345, 32961737, 1844441, 2247991, 651758, 655658, 65864806, 82392617, 1942906, 14800, 14657, 6600, 5534, 394064, 163000, 2499320, 1123624, 1227987, 198000, 241000, 3681688, 3409586, 2416988, 2407798, 55081, 48091, 480000, 785710, 1040147, 1171854, 1363994, 1555229, 199237, 192637), Shares.Outstanding.13f = c(1, 1, 7, 7, 8, 8, 8, 8, 6, 6, 8, 8, 4, 4, 4, 18, 19, 228, 228, 2, 2, 3, 3, 5, 5, 7, 9, 9, 6, 6, 2, 2, 3, 3, 7, 7, 14, 15, 3, 3, 429, 429, 17, 16, 2, 2, 43, 41, 127, 126, 86, 86, 15, 15, 51, 51, 7, 7, 3, 3, 67, 67, 211, 211, 35, 35, 14, 14, 49, 49, 12, 12, 22, 22, 31, 31, 4, 4, 4, 5, 34, 34, 64, 64, 56, 56, 27, 27, 47, 28, 28, 2, 2, 10, 10, 8, 8, 13, 13, 87, 87, 1, 1, 3, 3, 101, 101, 38, 36, 49, 56, 22, 22, 245, 247, 36, 35, 6, 6, 22, 22, 30, 30, 11, 11, 151, 151, 2, 2, 3, 3, 4, 4, 4, 10, 10, 468, 459, 10, 10, 16, 16, 27, 8, 8, 19, 19, 3, 3, 7, 7, 15, 15, 6, 6, 6, 6, 13, 13, 60, 60, 11, 11, 10, 10, 8, 8, 153, 152, 7, 7, 206, 206, 5, 5, 4, 4, 246, 299, 4, 0, 0, 13, 13, 7, 5, 10, 7, 7, 11, 11, 16, 16, 6, 6, 1, 1, 7, 7, 10, 10, 5, 5, 10, 10), Percent.Inst.Owned = c(0.0501, 0.0501, 0.0357142857142857, 0.0419285714285714, 0.457078625, 0.478389875, 0.516554875, 0.445823875, 0.438532166666667, 0.416716833333333, 0.096984875, 0.102109875, 0.009675, 0.2604, 0.267575, 0.0296537777777778, 0.0294113157894737, 0.269230228070175, 0.26520424122807, 0.0972975, 0.0984895, 0.119982, 0.104815333333333, 0.021354, 0.021414, 0.00289171428571429, 0.215010888888889, 0.233267, 0.317020833333333, 0.294458333333333, 0.0209955, 0.0209955, 0.0114966666666667, 0.0120966666666667, 0.0842, 0.0852428571428571, 0.110671071428571, 0.117296, 0.284824333333333, 0.254301, 0.36449134032634, 0.230257079254079, 0.146258176470588, 0.164732375, 0.061132, 0.061146, 0.591998046511628, 0.62094287804878, 0.564137716535433, 0.570283507936508, 0.360111581395349, 0.353602767441860, 0.182571733333333, 0.187632666666667, 0.408756960784314, 0.410396725490196, 0.164042714285714, 0.0722021428571429, 0.132129333333333, 0.128571333333333, 0.376715268656716, 0.359969402985075, 0.348177848341232, 0.346372587677725, 0.231331828571429, 0.2170212, 0.281280714285714, 0.2728725, 0.426220836734694, 0.397068204081633, 0.177277583333333, 0.182356333333333, 0.106114318181818, 0.127879409090909, 0.266685935483871, 0.275615612903226, 0.19581925, 0.2026855, 0.185512, 0.1025912, 0.284107588235294, 0.277555088235294, 0.626683078125, 0.64428725, 0.162709910714286, 0.173371107142857, 0.474656259259259, 0.486820296296296, 0.0243851063829787, 0.296156857142857, 0.295393928571429, 0.141325, 0.1406365, 0.4196126, 0.4273758, 0.311170375, 0.34177275, 0.121513923076923, 0.105322461538462, 0.59709867816092, 0.597027931034483, 0.054673, 0.052585, 0.105867, 0.104958666666667, 0.62006196039604, 0.627146257425743, 0.236251394736842, 0.248336277777778, 0.0914055510204082, 0.0770648392857143, 0.0566517727272727, 0.0557928181818182, 0.279529579591837, 0.275469963562753, 0.629997277777778, 0.622150571428571, 0.1545245, 0.156146833333333, 0.119384045454545, 0.102070545454545, 0.467645533333333, 0.4768285, 0.221283909090909, 0.198578, 0.513110291390729, 0.541418814569536, 0.166535, 0.1414835, 0.080382, 0.085382, 0.00285475, 0.204773, 0.1996225, 0.1403179, 0.1326018, 0.510629168803419, 0.517830294117647, 0.1889699, 0.2317096, 0.3054775625, 0.3732741875, 0.132119962962963, 0.128074375, 0.124203375, 0.276564842105263, 0.287337052631579, 0.137715333333333, 0.144232333333333, 0.758227857142857, 0.687608714285714, 0.5145248, 0.579296266666667, 0.4052235, 0.395349333333333, 0.151559833333333, 0.144678, 0.162359230769231, 0.158103769230769, 0.39624765, 0.392889083333333, 0.384951454545455, 0.376684909090909, 0.1192314, 0.1255105, 0.3815, 0.3257125, 0.0363808496732026, 0.0392944802631579, 0.210024714285714, 0.206914714285714, 0.136239538834951, 0.160008432038835, 0.3688882, 0.4495982, 0.1629395, 0.1639145, 0.267743113821138, 0.275560591973244, 0.4857265, Inf, Inf, 0.000507692307692308, 0.000425692307692308, 0.0562948571428571, 0.0326, 0.249932, 0.160517714285714, 0.175426714285714, 0.018, 0.0219090909090909, 0.2301055, 0.213099125, 0.402831333333333, 0.401299666666667, 0.055081, 0.048091, 0.0685714285714286, 0.112244285714286, 0.1040147, 0.1171854, 0.2727988, 0.3110458, 0.0199237, 0.0192637), Latest.Issue.Date.ByPERMNO = c(19860108, 19860108, 19600101, 19600101, 19600101, 19600101, 19870728, 19870728, 19870501, 19870501, 19870805, 19870805, 19600101, 19600101, 19600101, 19600101, 19600101, 19730523, 19730523, 19600101, 19600101, 19870811, 19870811, 19870930, 19870930, 19600101, 19880729, 19880729, 19880225, 19880225, 19880602, 19880602, 19860610, 19860610, 19880802, 19880802, 19890629, 19890629, 19600101, 19600101, 19821109, 19821109, 19860619, 19860619, 19871117, 19871117, 19600101, 19600101, 19890308, 19890308, 19900208, 19900208, 19861120, 19861120, 19880803, 19880803, 19600101, 19600101, 19890216, 19890216, 19761202, 19761202, 19890919, 19890919, 19810623, 19810623, 19770615, 19770615, 19831004, 19831004, 19830616, 19830616, 19810519, 19810519, 19850311, 19850311, 19781130, 19781130, 19841016, 19900515, 19800904, 19800904, 19830825, 19830825, 19830601, 19830601, 19811110, 19811110, 19600101, 19890309, 19890309, 19850529, 19850529, 19881122, 19881122, 19840620, 19840620, 19740305, 19740305, 19860718, 19860718, 19600101, 19600101, 19860207, 19860207, 19891003, 19891003, 19870403, 19870403, 19600101, 19600101, 19790403, 19790403, 19850528, 19850528, 19830322, 19830322, 19761202, 19761202, 19841114, 19841114, 19800826, 19800826, 19880517, 19880517, 19860516, 19860516, 19891122, 19891122, 19600101, 19600101, 19600101, 19871119, 19871119, 19760624, 19760624, 19851206, 19851206, 19890615, 19890615, 19860805, 19860805, 19600101, 19890919, 19890919, 19860501, 19860501, 19600101, 19600101, 19890308, 19890308, 19900125, 19900125, 19890714, 19890714, 19880412, 19880412, 19890809, 19890809, 19870306, 19870306, 19751112, 19751112, 19870604, 19870604, 19810625, 19810625, 19600101, 19600101, 19860416, 19860416, 19891027, 19891027, 19890125, 19890125, 19860502, 19860502, 19600101, 19600101, 19900405, 19600101, 19600101, 19600101, 19600101, 19900412, 19900514, 19900518, 19890518, 19890518, 19600101, 19600101, 19900117, 19900117, 19891214, 19891214, 19600101, 19600101, 19600101, 19600101, 19851206, 19851206, 19851211, 19851211, 19600101, 19600101), Quarters.Since.19800101 = c(41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L, 42L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 41L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 42L, 41L, 42L, 41L, 42L, 42L, 42L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 42L, 41L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L, 41L, 42L), Quarters.Since.Latest.Issue = c(17L, 18L, 122L, 121L, 121L, 122L, 11L, 12L, 12L, 13L, 11L, 12L, 122L, 122L, 121L, 121L, 122L, 68L, 69L, 121L, 122L, 12L, 11L, 11L, 10L, 122L, 8L, 7L, 10L, 9L, 8L, 9L, 17L, 16L, 8L, 7L, 4L, 5L, 121L, 122L, 30L, 31L, 16L, 17L, 10L, 11L, 122L, 121L, 6L, 5L, 2L, 1L, 15L, 14L, 7L, 8L, 122L, 121L, 5L, 6L, 55L, 54L, 3L, 4L, 36L, 37L, 53L, 52L, 26L, 27L, 28L, 29L, 37L, 36L, 21L, 22L, 46L, 47L, 22L, 1L, 39L, 40L, 27L, 28L, 28L, 29L, 35L, 34L, 121L, 5L, 6L, 21L, 20L, 6L, 7L, 24L, 25L, 66L, 65L, 15L, 16L, 121L, 122L, 18L, 17L, 3L, 2L, 13L, 12L, 121L, 122L, 44L, 45L, 20L, 21L, 29L, 30L, 54L, 55L, 22L, 23L, 40L, 39L, 8L, 9L, 16L, 17L, 3L, 2L, 121L, 122L, 122L, 10L, 11L, 57L, 56L, 19L, 18L, 5L, 4L, 15L, 16L, 121L, 3L, 4L, 16L, 17L, 121L, 122L, 6L, 5L, 1L, 2L, 3L, 4L, 8L, 9L, 3L, 4L, 13L, 14L, 59L, 58L, 12L, 13L, 36L, 37L, 122L, 121L, 17L, 16L, 2L, 3L, 6L, 5L, 16L, 17L, 121L, 122L, 1L, 121L, 122L, 121L, 122L, 1L, 1L, 1L, 4L, 5L, 121L, 122L, 1L, 2L, 3L, 2L, 121L, 122L, 121L, 122L, 18L, 19L, 18L, 19L, 121L, 122L), ALTPRC = c(9.9375, 9.875, 0.45313, 0.67188, 7.875, 10, 18, 22, 14.75, 9.75, 0.375, 0.15625, 3.9375, 16, 14.25, 7, 7.125, 27.25, 23.375, 10.75, 13, 3.125, 3.125, 2.6875, 3.4375, 0.5, 8.75, 7, 16.875, 12.375, 2.40625, 3.96875, 4, 4.625, 4.5, 5.125, 26.25, 28.75, 4.5, 5.5, 21.75, 23.25, 15, 14.375, 16.625, 14, 50.5, 48.75, 31.875, 33.125, 41.5, 46, 21, 22.125, 30.75, 30.125, 10.375, 5.5, 11.5, 11, 29, 28.875, 27.25, 26.75, 22.375, 22.25, 33.375, 35, 21, 19.75, 29.875, 28.875, 22.125, 20.125, 21, 18.875, 24.625, 26.75, 21.75, 22, 22.125, 21.125, 24.75, 26.75, 42.75, 43.5, 13.375, 29.625, 0.07813, 25.125, 23.75, 18, 20, 17.5, 18.125, 18.875, 19, 28.875, 30, 23.875, 23.625, 15.5, 15.625, 17.5, 19.5, 34.75, 30.75, 2, 2.25, 18.625, 17.5, 21.375, 19.875, 45.25, 20.125, 37.25, 41.75, 32.25, 32.5, 23.125, 21.875, 35.25, 38.75, 27.875, 27.375, 35.875, 42.125, 24.25, 24.5, 25.125, 23.875, 2.0625, 16.75, 16.25, 34.625, 37.75, 40, 31.625, 19.375, 20, 30.875, 29.375, 0.125, 17.625, 17, 16.625, 17.75, 12.625, 13.25, 26, 19.75, 15.25, 18.625, 18.125, 18, 16.375, 15.625, 18.5, 19, 12.875, 14.375, 32.375, 33.375, 16.375, 16.375, 1.625, 2.8125, 13.875, 14.625, 4.625, 4.5, 18.5, 24.125, 6.375, 5.875, 10.625, 11.625, 6.625, 7.375, 14.75, 0.8125, 0.6875, 2.125, 2.375, 20.25, 7.625, 34, 15.25, 15, 2.09375, 2.375, 19.5, 18.125, 38.5, 30.75, 36, 35.75, 9.375, 11.25, 21.25, 18.625, 6, 5.25, 1.15625, 1.25), HSICIG = c(492, 492, 494, 494, 495, 495, 495, 495, 495, 495, 493, 493, 495, 495, 495, 495, 495, 493, 493, 492, 492, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 495, 493, 493, 494, 494, 492, 492, 492, 492, 493, 493, 492, 492, 493, 493, 493, 493, 495, 495, 492, 492, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 493, 492, 492, 493, 493, 492, 492, 493, 493, 492, 492, 495, 492, 492, 494, 494, 492, 492, 492, 492, 493, 493, 492, 492, 494, 494, 492, 492, 492, 492, 492, 492, 492, 492, 493, 493, 493, 493, 492, 492, 493, 493, 493, 493, 492, 492, 492, 492, 495, 495, 494, 494, 494, 494, 495, 492, 492, 493, 493, 495, 495, 492, 492, 492, 492, 495, 492, 492, 492, 492, 492, 492, 492, 492, 494, 494, 492, 492, 492, 492, 492, 492, 495, 495, 493, 493, 492, 492, 495, 495, 492, 492, 492, 492, 495, 495, 495, 495, 495, 495, 492, 492, 495, 495, 495, 492, 492, 495, 495, 495, 492, 492, 494, 494, 492, 492, 495, 495, 492, 492, 493, 493, 494, 494, 495, 495, 495, 495), Market.Cap.13f = c(9937500, 9875000, 3171910, 4703160, 6.3e+07, 8e+07, 1.44e+08, 1.76e+08, 88500000, 58500000, 3e+06, 1250000, 15750000, 6.4e+07, 5.7e+07, 1.26e+08, 135375000, 6.213e+09, 5329500000, 21500000, 2.6e+07, 9375000, 9375000, 13437500, 17187500, 3500000, 78750000, 6.3e+07, 101250000, 74250000, 4812500, 7937500, 1.2e+07, 13875000, 31500000, 35875000, 367500000, 431250000, 13500000, 16500000, 9330750000, 9974250000, 2.55e+08, 2.3e+08, 33250000, 2.8e+07, 2171500000, 1998750000, 4048125000, 4173750000, 3.569e+09, 3.956e+09, 3.15e+08, 331875000, 1568250000, 1536375000, 72625000, 38500000, 34500000, 3.3e+07, 1.943e+09, 1934625000, 5749750000, 5644250000, 783125000, 778750000, 467250000, 4.9e+08, 1.029e+09, 967750000, 358500000, 346500000, 486750000, 442750000, 6.51e+08, 585125000, 98500000, 1.07e+08, 8.7e+07, 1.1e+08, 752250000, 718250000, 1.584e+09, 1.712e+09, 2.394e+09, 2.436e+09, 361125000, 799875000, 3672110, 703500000, 6.65e+08, 3.6e+07, 4e+07, 1.75e+08, 181250000, 1.51e+08, 1.52e+08, 375375000, 3.9e+08, 2077125000, 2055375000, 15500000, 15625000, 52500000, 58500000, 3509750000, 3105750000, 7.6e+07, 8.1e+07, 912625000, 9.8e+08, 470250000, 437250000, 11086250000, 4970875000, 1.341e+09, 1461250000, 193500000, 1.95e+08, 508750000, 481250000, 1057500000, 1162500000, 306625000, 301125000, 5417125000, 6360875000, 48500000, 4.9e+07, 75375000, 71625000, 8250000, 6.7e+07, 6.5e+07, 346250000, 377500000, 1.872e+10, 14515875000, 193750000, 2e+08, 4.94e+08, 4.7e+08, 3375000, 1.41e+08, 1.36e+08, 315875000, 337250000, 37875000, 39750000, 1.82e+08, 138250000, 228750000, 279375000, 108750000, 1.08e+08, 98250000, 93750000, 240500000, 2.47e+08, 772500000, 862500000, 356125000, 367125000, 163750000, 163750000, 1.3e+07, 22500000, 2122875000, 2.223e+09, 32375000, 31500000, 3.811e+09, 4969750000, 31875000, 29375000, 42500000, 46500000, 1629750000, 2205125000, 5.9e+07, 0, 0, 27625000, 30875000, 141750000, 38125000, 3.4e+08, 106750000, 1.05e+08, 23031250, 26125000, 3.12e+08, 2.9e+08, 2.31e+08, 184500000, 3.6e+07, 35750000, 65625000, 78750000, 212500000, 186250000, 3e+07, 26250000, 11562500, 12500000), IPO.Flag = c(0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0), IPO.Issue.Date = c(NA, NA, NA, NA, NA, NA, 19860724, 19860724, NA, NA, 19870127, 19870127, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19870811, 19870811, 19870930, 19870930, NA, 19871124, 19871124, 19880225, 19880225, 19880602, 19880602, NA, NA, 19880802, 19880802, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710324, 19710324, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19710617, 19710617, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 19831014, 19831014, 19861016, 19861016, NA, NA, 19860502, 19860502, NA, NA, 19890419, NA, NA, NA, NA, 19900412, 19900514, 19900518, NA, NA, NA, NA, NA, NA, 19830603, 19830603, NA, NA, NA, NA, 19851206, 19851206, 19851211, 19851211, NA, NA ), Quarters.Since.IPO.Issue = c(NA, NA, NA, NA, NA, NA, 15L, 16L, NA, NA, 13L, 14L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 12L, 11L, 11L, 10L, NA, 11L, 10L, 10L, 9L, 8L, 9L, NA, NA, 8L, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 77L, 78L, NA, NA, NA, NA, NA, NA, NA, NA, NA, 77L, 76L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 27L, 26L, 14L, 15L, NA, NA, 16L, 17L, NA, NA, 5L, NA, NA, NA, NA, 1L, 1L, 1L, NA, NA, NA, NA, NA, NA, 29L, 28L, NA, NA, NA, NA, 18L, 19L, 18L, 19L, NA, NA)), .Names = c("PERMNO", "DATE", "Shares.Owned", "Shares.Outstanding.13f", "Percent.Inst.Owned", "Latest.Issue.Date.ByPERMNO", "Quarters.Since.19800101", "Quarters.Since.Latest.Issue", "ALTPRC", "HSICIG", "Market.Cap.13f", "IPO.Flag", "IPO.Issue.Date", "Quarters.Since.IPO.Issue"), row.names = c(79L, 85L, 9902L, 9908L, 15739L, 15758L, 16673L, 16675L, 20159L, 20160L, 32879L, 32889L, 38023L, 39404L, 39409L, 40405L, 40420L, 43114L, 43116L, 47939L, 47953L, 48091L, 48120L, 52828L, 52837L, 54612L, 55002L, 55048L, 56506L, 56508L, 59230L, 59247L, 60454L, 60461L, 60845L, 60852L, 66143L, 66147L, 69439L, 69454L, 72218L, 72232L, 81826L, 81840L, 87882L, 87883L, 105814L, 105832L, 106687L, 106709L, 106867L, 106875L, 110008L, 110081L, 113124L, 113125L, 113448L, 113460L, 114419L, 114431L, 116222L, 116234L, 117215L, 117310L, 119463L, 119477L, 119913L, 119927L, 120787L, 120799L, 121214L, 121215L, 121541L, 121548L, 121670L, 121680L, 122420L, 122421L, 123629L, 123679L, 124479L, 124485L, 125607L, 125608L, 126683L, 126716L, 126911L, 126954L, 126986L, 128941L, 128979L, 132991L, 133048L, 133090L, 133091L, 134227L, 134228L, 137449L, 137465L, 146656L, 146710L, 151717L, 151728L, 162724L, 162738L, 186344L, 186346L, 194239L, 194251L, 195124L, 195125L, 203411L, 203426L, 203486L, 203487L, 206821L, 206863L, 218733L, 218734L, 219083L, 219084L, 232389L, 232401L, 241221L, 241222L, 262518L, 262556L, 263151L, 263154L, 264783L, 264811L, 275743L, 278957L, 278958L, 281230L, 281242L, 281957L, 281962L, 286492L, 286504L, 294444L, 294445L, 297641L, 298974L, 298988L, 304628L, 304669L, 306326L, 306339L, 315987L, 316013L, 316939L, 316940L, 327003L, 327032L, 327976L, 327977L, 328372L, 328386L, 328621L, 328622L, 329277L, 329289L, 329983L, 329984L, 331735L, 331746L, 350849L, 350887L, 357747L, 357750L, 366913L, 366917L, 380680L, 380749L, 385635L, 385642L, 394280L, 394281L, 410203L, 417419L, 417420L, 418842L, 418851L, 423401L, 423687L, 423795L, 494497L, 494498L, 496519L, 496520L, 576735L, 576737L, 590042L, 590057L, 606077L, 606087L, 620736L, 620737L, 704834L, 704837L, 749540L, 749573L, 754161L, 754162L ), class = "data.frame") ======== end dataset dput representation ========= =========== begin function code ========== fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata, quarters_since_issue=20) { # tfdata must not have NAs for market.cap result = matrix(nrow=0, ncol=(ncol(tfdata) + 1)) # rbind for matrix is cheaper, so typecast the result to matrix colnames = names(tfdata) #grab the colnames, which we will shove back to the result at the end when we reconvert to data.frame quarterends = sort(unique(tfdata$DATE)) # the data are quarterly, all dates are quarter ends # basic code logic: # grab each quarter's data, in each quarter get the ipo subset, and the eligible matching firm subset # for each ipo from the ipo subset, select a matching firm from the eligible matching firm subset # the said selection is done based on industry group (HSICIG), and market cap (Market.Cap.13f) # Industry group has to be the same, market cap has to be 'closest one from above', or if that is not available, then 'closest one from below'. for (aquarter in quarterends) { tfdata_quarter = tfdata[tfdata$DATE == aquarter, ] tfdata_quarter_fitting_nonissuers = tfdata_quarter[ (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) & (tfdata_quarter$IPO.Flag == 0), ] tfdata_quarter_ipoissuers = tfdata_quarter[ tfdata_quarter$IPO.Flag == 1, ] for (i in 1:nrow(tfdata_quarter_ipoissuers)) { arow = tfdata_quarter_ipoissuers[i,] industrypeers = tfdata_quarter_fitting_nonissuers[ tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ] industrypeers = industrypeers[ order(industrypeers$Market.Cap.13f), ] if ( nrow(industrypeers) > 0 ) { if ( nrow(industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ]) > 0 ) { bestpeer = industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,] } else { bestpeer = industrypeers[nrow(industrypeers),] } bestpeer$Quarters.Since.IPO.Issue = arow$Quarters.Since.IPO.Issue bestpeer$Peer.To.PERMNO = arow$PERMNO result = rbind(result, as.matrix(bestpeer)) } } print (aquarter) } result = as.data.frame(result) names(result) = c(colnames, 'Peer.To.PERMNO') return(result) } ============== end function code=========== on 06/06/2008 01:35 PM Gabor Grothendieck said the following:
> I think the posting guide may not be clear enough and have suggested that
> it be clarified. Hopefully this better communicates what is required and why
> in a shorter amount of space:
>
> https://stat.ethz.ch/pipermail/r-devel/2008-June/049891.html
>
>
> On Fri, Jun 6, 2008 at 1:25 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
>> i thought since the function code (which i provided in full) was pretty
>> short, it would be reasonably easy to just read the code and see what it's
>> doing.
>>
>> but ok, so... i am attaching a zip file, with a small sample of the data set
>> (tab delimited), and the function code, in a zip file (posting guidelines
>> claim that "some archive formats" are allowed, i assume zip is one of
>> them...
>>
>> would appreciate your comments! :)
>>
>> on 06/06/2008 12:05 PM Gabor Grothendieck said the following:
>>> Its summarized in the last line to r-help. Note reproducible and
>>> minimal.
>>>
>>> On Fri, Jun 6, 2008 at 12:03 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com>
>>> wrote:
>>>> i did! what did i miss?
>>>>
>>>> on 06/06/2008 11:45 AM Gabor Grothendieck said the following:
>>>>> Try reading the posting guide before posting.
>>>>>
>>>>> On Fri, Jun 6, 2008 at 11:12 AM, Daniel Folkinshteyn
>>>>> <dfolkins_at_gmail.com>
>>>>> wrote:
>>>>>> Anybody have any thoughts on this? Please? :)
>>>>>>
>>>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>>>> Hi everyone!
>>>>>>>
>>>>>>> I have a question about data processing efficiency.
>>>>>>>
>>>>>>> My data are as follows: I have a data set on quarterly institutional
>>>>>>> ownership of equities; some of them have had recent IPOs, some have
>>>>>>> not
>>>>>>> (I
>>>>>>> have a binary flag set). The total dataset size is 700k+ rows.
>>>>>>>
>>>>>>> My goal is this: For every quarter since issue for each IPO, I need to
>>>>>>> find a "matched" firm in the same industry, and close in market cap.
>>>>>>> So,
>>>>>>> e.g., for firm X, which had an IPO, i need to find a matched
>>>>>>> non-issuing
>>>>>>> firm in quarter 1 since IPO, then a (possibly different) non-issuing
>>>>>>> firm in
>>>>>>> quarter 2 since IPO, etc. Repeat for each issuing firm (there are
>>>>>>> about
>>>>>>> 8300
>>>>>>> of these).
>>>>>>>
>>>>>>> Thus it seems to me that I need to be doing a lot of data selection
>>>>>>> and
>>>>>>> subsetting, and looping (yikes!), but the result appears to be highly
>>>>>>> inefficient and takes ages (well, many hours). What I am doing, in
>>>>>>> pseudocode, is this:
>>>>>>>
>>>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>>>> eligible
>>>>>>> non-issuing firms.
>>>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>>>> industry, sort them by size, and finally grab a matching firm closest
>>>>>>> in
>>>>>>> size (the exact procedure is to grab the closest bigger firm if one
>>>>>>> exists,
>>>>>>> and just the biggest available if all are smaller)
>>>>>>> 3. assign the matched firm-observation the same "quarters since issue"
>>>>>>> as
>>>>>>> the IPO being matched
>>>>>>> 4. rbind them all into the "matching" dataset.
>>>>>>>
>>>>>>> The function I currently have is pasted below, for your reference. Is
>>>>>>> there any way to make it produce the same result but much faster?
>>>>>>> Specifically, I am guessing eliminating some loops would be very good,
>>>>>>> but I
>>>>>>> don't see how, since I need to do some fancy footwork for each IPO in
>>>>>>> each
>>>>>>> quarter to find the matching firm. I'll be doing a few things similar
>>>>>>> to
>>>>>>> this, so it's somewhat important to up the efficiency of this. Maybe
>>>>>>> some of
>>>>>>> you R-fu masters can clue me in? :)
>>>>>>>
>>>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>>>
>>>>>>> ========== my function below ===========
>>>>>>>
>>>>>>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>>>>>>> quarters_since_issue=40) {
>>>>>>>
>>>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>>>>>>> cheaper, so typecast the result to matrix
>>>>>>>
>>>>>>> colnames = names(tfdata)
>>>>>>>
>>>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>>>
>>>>>>> for (aquarter in quarterends) {
>>>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>>>
>>>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>>>>>>> (tfdata_quarter$IPO.Flag == 0), ]
>>>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>>>> tfdata_quarter$IPO.Flag
>>>>>>> == 1, ]
>>>>>>>
>>>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>>>> industrypeers = industrypeers[
>>>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>>>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>>>> bestpeer = industrypeers[industrypeers$Market.Cap.13f
>>>>>>>> = arow$Market.Cap.13f, ][1,]
>>>>>>> }
>>>>>>> else {
>>>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>>>> }
>>>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>>>> arow$Quarters.Since.IPO.Issue
>>>>>>>
>>>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>>>> bestpeer$PERMNO] = 1
>>>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>>>> }
>>>>>>> }
>>>>>>> #result = rbind(result, tfdata_quarter)
>>>>>>> print (aquarter)
>>>>>>> }
>>>>>>>
>>>>>>> result = as.data.frame(result)
>>>>>>> names(result) = colnames
>>>>>>> return(result)
>>>>>>>
>>>>>>> }
>>>>>>>
>>>>>>> ========= end of my function =============
>>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help_at_r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>
------------------------------ Message: 67 Date: Fri, 6 Jun 2008 14:01:01 -0400 From: "Thompson, David (MNR)" <David.John.Thompson_at_ontario.ca> Subject: Re: [R] ggplot questions To: "ONKELINX, Thierry" <Thierry.ONKELINX_at_inbo.be>, "hadley wickham" <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <ECF21B71808ECF4F8918C57EDBEE121D028C2532@CTSPITDCEMMVX11.cihs.ad.gov.on.ca> Content-Type: text/plain; charset="us-ascii" Thanx Thierry, Suggestion #1 had no effect. I have been playing with variants on #2 along the way. DaveT.
>-----Original Message-----
>From: ONKELINX, Thierry [mailto:Thierry.ONKELINX_at_inbo.be]
>Sent: June 6, 2008 04:02 AM
>To: Thompson, David (MNR); hadley wickham
>Cc: r-help_at_r-project.org
>Subject: RE: [R] ggplot questions
>
>David,
>
>1. Try scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim =
>c(0, 16))
>2. You could set the colour of the gridlines equal to the backgroup
>colour with ggopt
>
>HTH,
>
>Thierry
>
------------------------------ Message: 68 Date: Fri, 06 Jun 2008 19:03:44 +0100 From: Patrick Burns <pburns_at_pburns.seanet.com> Subject: Re: [R] Improving data processing efficiency To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <48497C00.7000206@pburns.seanet.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed That is going to be situation dependent, but if you have a reasonable upper bound, then that will be much easier and not far from optimal. If you pick the possibly too small route, then increasing the size in largish junks is much better than adding a row at a time. Pat Daniel Folkinshteyn wrote:
> thanks for the tip! i'll try that and see how big of a difference that
> makes... if i am not sure what exactly the size will be, am i better
> off making it larger, and then later stripping off the blank rows, or
> making it smaller, and appending the missing rows?
>
> on 06/06/2008 11:44 AM Patrick Burns said the following:
>> One thing that is likely to speed the code significantly
>> is if you create 'result' to be its final size and then
>> subscript into it. Something like:
>>
>> result[i, ] <- bestpeer
>>
>> (though I'm not sure if 'i' is the proper index).
>>
>> Patrick Burns
>> patrick_at_burns-stat.com
>> +44 (0)20 8525 0696
>> http://www.burns-stat.com
>> (home of S Poetry and "A Guide for the Unwilling S User")
>>
>> Daniel Folkinshteyn wrote:
>>> Anybody have any thoughts on this? Please? :)
>>>
>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>> Hi everyone!
>>>>
>>>> I have a question about data processing efficiency.
>>>>
>>>> My data are as follows: I have a data set on quarterly
>>>> institutional ownership of equities; some of them have had recent
>>>> IPOs, some have not (I have a binary flag set). The total dataset
>>>> size is 700k+ rows.
>>>>
>>>> My goal is this: For every quarter since issue for each IPO, I need
>>>> to find a "matched" firm in the same industry, and close in market
>>>> cap. So, e.g., for firm X, which had an IPO, i need to find a
>>>> matched non-issuing firm in quarter 1 since IPO, then a (possibly
>>>> different) non-issuing firm in quarter 2 since IPO, etc. Repeat for
>>>> each issuing firm (there are about 8300 of these).
>>>>
>>>> Thus it seems to me that I need to be doing a lot of data selection
>>>> and subsetting, and looping (yikes!), but the result appears to be
>>>> highly inefficient and takes ages (well, many hours). What I am
>>>> doing, in pseudocode, is this:
>>>>
>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>> eligible non-issuing firms.
>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>> industry, sort them by size, and finally grab a matching firm
>>>> closest in size (the exact procedure is to grab the closest bigger
>>>> firm if one exists, and just the biggest available if all are smaller)
>>>> 3. assign the matched firm-observation the same "quarters since
>>>> issue" as the IPO being matched
>>>> 4. rbind them all into the "matching" dataset.
>>>>
>>>> The function I currently have is pasted below, for your reference.
>>>> Is there any way to make it produce the same result but much
>>>> faster? Specifically, I am guessing eliminating some loops would be
>>>> very good, but I don't see how, since I need to do some fancy
>>>> footwork for each IPO in each quarter to find the matching firm.
>>>> I'll be doing a few things similar to this, so it's somewhat
>>>> important to up the efficiency of this. Maybe some of you R-fu
>>>> masters can clue me in? :)
>>>>
>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>
>>>> ========== my function below ===========
>>>>
>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
>>>> function(tfdata, quarters_since_issue=40) {
>>>>
>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix
>>>> is cheaper, so typecast the result to matrix
>>>>
>>>> colnames = names(tfdata)
>>>>
>>>> quarterends = sort(unique(tfdata$DATE))
>>>>
>>>> for (aquarter in quarterends) {
>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>
>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>
>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>> industrypeers = industrypeers[
>>>> order(industrypeers$Market.Cap.13f), ]
>>>> if ( nrow(industrypeers) > 0 ) {
>>>> if (
>>>> nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>> bestpeer =
>>>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>> ][1,]
>>>> }
>>>> else {
>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>> }
>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>> arow$Quarters.Since.IPO.Issue
>>>>
>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>> bestpeer$PERMNO] = 1
>>>> result = rbind(result, as.matrix(bestpeer))
>>>> }
>>>> }
>>>> #result = rbind(result, tfdata_quarter)
>>>> print (aquarter)
>>>> }
>>>>
>>>> result = as.data.frame(result)
>>>> names(result) = colnames
>>>> return(result)
>>>>
>>>> }
>>>>
>>>> ========= end of my function =============
>>>>
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------ Message: 69 Date: Fri, 6 Jun 2008 13:06:22 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] ggplot questions To: "Thompson, David (MNR)" <David.John.Thompson_at_ontario.ca> Cc: r-help_at_r-project.org Message-ID: <f8e6ff050806061106o1ab6583je2c195b42d704689@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1
> Does the difference have something to do with ggplot() using ranges
> derived from the data?
> When I modify my original 'test' dataframe with two extra rows as
> defined below, I get expected results in both versions.
Order shouldn't matter - and if it's making a difference, that's a bug. But I'm still not completely sure what you're expecting.
> This highlights my next question (warned you ;-) ), I have been
> unsuccessful in trying to define fixed plotting ranges to generate a
> 'template' graphic that I may reuse with successive 'overstory plot'
> data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but, this seems to
> not have any effect on the final plot layout.
Could you please produce a small reproducible example that demonstrates this? It may well be a bug. Hadley -- http://had.co.nz/ ------------------------------ Message: 70 Date: Fri, 06 Jun 2008 14:09:40 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Patrick Burns <pburns_at_pburns.seanet.com> Cc: r-help_at_r-project.org Message-ID: <48497D64.4090905@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Cool, I do have an upper bound, so I'll try it and how much of a [[elided Yahoo spam]] on 06/06/2008 02:03 PM Patrick Burns said the following:
> That is going to be situation dependent, but if you
> have a reasonable upper bound, then that will be
> much easier and not far from optimal.
>
> If you pick the possibly too small route, then increasing
> the size in largish junks is much better than adding
> a row at a time.
>
> Pat
>
> Daniel Folkinshteyn wrote:
>> thanks for the tip! i'll try that and see how big of a difference that
>> makes... if i am not sure what exactly the size will be, am i better
>> off making it larger, and then later stripping off the blank rows, or
>> making it smaller, and appending the missing rows?
>>
>> on 06/06/2008 11:44 AM Patrick Burns said the following:
>>> One thing that is likely to speed the code significantly
>>> is if you create 'result' to be its final size and then
>>> subscript into it. Something like:
>>>
>>> result[i, ] <- bestpeer
>>>
>>> (though I'm not sure if 'i' is the proper index).
>>>
>>> Patrick Burns
>>> patrick_at_burns-stat.com
>>> +44 (0)20 8525 0696
>>> http://www.burns-stat.com
>>> (home of S Poetry and "A Guide for the Unwilling S User")
>>>
>>> Daniel Folkinshteyn wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
>>>>> institutional ownership of equities; some of them have had recent
>>>>> IPOs, some have not (I have a binary flag set). The total dataset
>>>>> size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each IPO, I need
>>>>> to find a "matched" firm in the same industry, and close in market
>>>>> cap. So, e.g., for firm X, which had an IPO, i need to find a
>>>>> matched non-issuing firm in quarter 1 since IPO, then a (possibly
>>>>> different) non-issuing firm in quarter 2 since IPO, etc. Repeat for
>>>>> each issuing firm (there are about 8300 of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data selection
>>>>> and subsetting, and looping (yikes!), but the result appears to be
>>>>> highly inefficient and takes ages (well, many hours). What I am
>>>>> doing, in pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>> eligible non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>> industry, sort them by size, and finally grab a matching firm
>>>>> closest in size (the exact procedure is to grab the closest bigger
>>>>> firm if one exists, and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same "quarters since
>>>>> issue" as the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your reference.
>>>>> Is there any way to make it produce the same result but much
>>>>> faster? Specifically, I am guessing eliminating some loops would be
>>>>> very good, but I don't see how, since I need to do some fancy
>>>>> footwork for each IPO in each quarter to find the matching firm.
>>>>> I'll be doing a few things similar to this, so it's somewhat
>>>>> important to up the efficiency of this. Maybe some of you R-fu
>>>>> masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>
>>>>> ========== my function below ===========
>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
>>>>> function(tfdata, quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix
>>>>> is cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
>>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
>>>>> nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer =
>>>>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>>> ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function =============
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------ Message: 71 Date: Fri, 6 Jun 2008 19:14:59 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] boxplot changes fontsize of labels To: Sebastian Merz <sebastian.merz_at_web.de> Cc: r-help_at_r-project.org Message-ID: <alpine.LFD.1.10.0806061905080.10799@gannet.stats.ox.ac.uk> Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Please read the help for par(mfrow)! AFAICS this is nothing to do with boxplot(). In a layout with exactly two rows and columns the base value of '"cex"' is reduced by a factor of 0.83: if there are three or more of either rows or columns, the reduction factor is 0.66. See also the 'consider the alternatives' in that entry. On Fri, 6 Jun 2008, Sebastian Merz wrote:
> Hi all!
>
> So far I learned some R but finilizing my plots so they look
> publishable seems not to be possible.
>
> I set up some boxplots. Everything works well but when I put more then
> two of them in one plot the labels of the axes appear smaller than the
> normal font size.
>
>> x <- rnorm(30)
>> y <- rnorm(30)
>> par(mfrow=c(1,4))
>> boxplot(x,y, names=c("horray", "hurra"))
>> mtext("Jubel", side=1, line=2)
>
> In case I take one or two boxplots this does not happen:
>> par(mfrow=c(1,2))
>> boxplot(x,y, names=c("horray", "hurra"))
>> mtext("Jubel", side=1, line=2)
>
> The cex.axis seems not to be changed, as setting it to 1.0 doesn't
> change the behaviour. If cex.axis=1.3 in the first example the font
> size used by boxplot and by mtext is about the same. But as I use a
> function to draw quite some of these plots this "hack" is not a proper
> solution.
>
> I couldn't find anything about this behaviour in the documention or
> the inet.
>
> Can anybody explain? All hints are appriciated.
>
> Thanks,
> S. Merz
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ------------------------------ Message: 72 Date: Fri, 6 Jun 2008 12:28:15 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: Re: [R] Improving data processing efficiency To: "Patrick Burns" <pburns_at_pburns.seanet.com>, "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6858991@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii
> -----Original Message-----
> From: r-help-bounces_at_r-project.org
> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
> Sent: Friday, June 06, 2008 12:04 PM
> To: Daniel Folkinshteyn
> Cc: r-help_at_r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> That is going to be situation dependent, but if you have a
> reasonable upper bound, then that will be much easier and not
> far from optimal.
>
> If you pick the possibly too small route, then increasing the
> size in largish junks is much better than adding a row at a time.
Pat, I am unfamiliar with the use of the word "junk" as a unit of measure for data objects. I figure there are a few different possibilities: 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils. 2. This was a Freudian slip based on your opinion of some datasets you have seen. 3. Somewhere between your mind and the final product "jumps/chunks" became "junks" (possibly a microsoft "correction", or just typing too fast combined with number 2). 4. "junks" is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above). Please let it be #4, I would love to be able to tell some clients that I have received a junk of data from them. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111 ------------------------------ Message: 73 Date: Fri, 6 Jun 2008 14:32:47 -0400 From: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Greg Snow" <Greg.Snow_at_imail.org> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org>, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <971536df0806061132h1d5dfebeyfca3961152f76ba5@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow_at_imail.org> wrote:
>> -----Original Message-----
>> From: r-help-bounces_at_r-project.org
>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
>> Sent: Friday, June 06, 2008 12:04 PM
>> To: Daniel Folkinshteyn
>> Cc: r-help_at_r-project.org
>> Subject: Re: [R] Improving data processing efficiency
>>
>> That is going to be situation dependent, but if you have a
>> reasonable upper bound, then that will be much easier and not
>> far from optimal.
>>
>> If you pick the possibly too small route, then increasing the
>> size in largish junks is much better than adding a row at a time.
>
> Pat,
>
> I am unfamiliar with the use of the word "junk" as a unit of measure for data objects. I figure there are a few different possibilities:
>
> 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils.
>
> 2. This was a Freudian slip based on your opinion of some datasets you have seen.
>
> 3. Somewhere between your mind and the final product "jumps/chunks" became "junks" (possibly a microsoft "correction", or just typing too fast combined with number 2).
>
> 4. "junks" is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above).
>
5. Chinese sailing vessel. http://en.wikipedia.org/wiki/Junk_(ship) ------------------------------ Message: 74 Date: Fri, 6 Jun 2008 19:38:48 +0200 From: "Bertrand Pub Michel" <michel.bertrand.pub_at_gmail.com> Subject: [R] Random Forest To: r-help_at_r-project.org Message-ID: <2eab3a700806061038o187b384aj342e14547f20ebc7@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the "randomForest" function and I did not find any information about this in the help pages ... Thank you for your help Bertrand ------------------------------ Message: 75 Date: Fri, 06 Jun 2008 17:43:27 +0200 From: Marco Chiapello <marco.chiapello_at_unito.it> Subject: [R] mean To: r-help_at_r-project.org Message-ID: <1212767007.6257.6.camel@Biochimica2.bioveg.unito.it> Content-Type: text/plain Hi, I have a simple question. If I have a table and I want to have the mean [[elided Yahoo spam]] Es: c1 c2 c3 mean 1 12 13 14 ?? 2 15 24 10 ?? ... Thanks, Marco ------------------------------ Message: 76 Date: Fri, 6 Jun 2008 07:51:21 -0700 (PDT) From: madhura <madhura.girish_at_gmail.com> Subject: Re: [R] Java to R interface To: r-help_at_r-project.org Message-ID: <3f0c26b2-a978-484b-ae75-2df2476f2ada@m45g2000hsb.googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1 The path to R/bin is in the Windows PATH variable. Yet I get this error. On Jun 6, 10:37?am, "Dumblauskas, Jerry" <jerry.dumblaus..._at_credit- suisse.com> wrote:
> Try and make sure that R is in your windows Path variable
>
> I got your message when I first did this, but when I did the about it
> then worked...
>
> ===========================================================================?===
> Please access the attached hyperlink for an important electronic communications disclaimer:
>
> http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
> ===========================================================================?===
>
> ? ? ? ? [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------ Message: 77 Date: Fri, 6 Jun 2008 08:35:51 -0700 (PDT) From: Evans_CSHL <evans_at_cshl.edu> Subject: [R] R (D)COM Server not working on windows domain account To: r-help_at_r-project.org Message-ID: <17695171.post@talk.nabble.com> Content-Type: text/plain; charset=us-ascii I have installed R (D)COM on a (windows) machine that is part of a windows domain. if I run the test file in a local (log into this machine) administrative account it works fine. If I run the test file on a domain account with administrative rights it will not connect to the server, even is I change the account type from roaming to local. Anyone have any ideas? Thanks, Gregg -- View this message in context: http://www.nabble.com/R-%28D%29COM-Server-not-working-on-windows-domain-account-tp17695171p17695171.html Sent from the R help mailing list archive at Nabble.com. ------------------------------ Message: 78 Date: Fri, 6 Jun 2008 19:40:55 +0200 From: "Bertrand Pub Michel" <michel.bertrand.pub_at_gmail.com> Subject: [R] Random Forest and for multivariate response data To: r-help_at_r-project.org Message-ID: <2eab3a700806061040l7febaae5ub7af3be0d622bf82@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the "randomForest" function and I did not find any information about this in the help pages ... Thank you for your help Bertrand ------------------------------ Message: 79 Date: Fri, 6 Jun 2008 19:40:55 +0200 From: "Bertrand Pub Michel" <michel.bertrand.pub_at_gmail.com> Subject: [R] Random Forest To: r-help_at_r-project.org Message-ID: <2eab3a700806061040g51003677l9bfaa1b7a4dfa2b2@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hello Is there exists a package for multivariate random forest, namely for multivariate response data ? It seems to be impossible with the "randomForest" function and I did not find any information about this in the help pages ... Thank you for your help Bertrand ------------------------------ Message: 80 Date: Fri, 6 Jun 2008 14:13:40 -0400 From: "steven wilson" <swpt07_at_gmail.com> Subject: [R] R + Linux To: r-help_at_r-project.org Message-ID: <25944ea00806061113p30b8edcdo49978d401b05465f@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Dear all; I'm planning to install Linux on my computer to run R (I'm bored of W..XP). However, I haven't used Linux before and I would appreciate, if possible, suggestions/comments about what could be the best option install, say Fedora, Ubuntu or OpenSuse which to my impression are the most popular ones (at least on the R-help lists). The computer is a PC desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be used only to run R. Thanks Steven ------------------------------ Message: 81 Date: Fri, 6 Jun 2008 12:50:23 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: Re: [R] Improving data processing efficiency To: "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org>, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F68589AA@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck_at_gmail.com]
> Sent: Friday, June 06, 2008 12:33 PM
> To: Greg Snow
> Cc: Patrick Burns; Daniel Folkinshteyn; r-help_at_r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow_at_imail.org> wrote:
> >> -----Original Message-----
> >> From: r-help-bounces_at_r-project.org
> >> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
> >> Sent: Friday, June 06, 2008 12:04 PM
> >> To: Daniel Folkinshteyn
> >> Cc: r-help_at_r-project.org
> >> Subject: Re: [R] Improving data processing efficiency
> >>
> >> That is going to be situation dependent, but if you have a
> reasonable
> >> upper bound, then that will be much easier and not far
> from optimal.
> >>
> >> If you pick the possibly too small route, then increasing
> the size in
> >> largish junks is much better than adding a row at a time.
> >
> > Pat,
> >
> > I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >
> > 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >
> > 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >
> > 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >
> > 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >
>
> 5. Chinese sailing vessel.
> http://en.wikipedia.org/wiki/Junk_(ship)
>
Thanks for expanding my vocabulary (hmm, how am I going to use that word in context today?). So, if 5 is the case, then Pat's original statement can be reworded as: "If you pick the possibly too small route, then increasing the size in largish Chinese sailing vessels is much better than adding a row boat at a time." While that is probably true, I am not sure what that would mean in terms of the original data processing question. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111 ------------------------------ Message: 82 Date: Fri, 6 Jun 2008 13:56:39 -0500 From: ctu_at_bigred.unl.edu Subject: Re: [R] mean To: r-help_at_r-project.org Message-ID: <20080606135639.r6hbgm31a8w8s4ko@wm-imp-2.unl.edu> Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed"
> TABLE<-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3)
> TABLE
[,1] [,2] [,3] [1,] 12 13 14 [2,] 15 24 10
> apply(TABLE,1,mean)
[1] 13.00000 16.33333 Chunhao Quoting Marco Chiapello <marco.chiapello_at_unito.it>:
> Hi,
> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]
> Es:
> c1 c2 c3 mean
> 1 12 13 14 ??
> 2 15 24 10 ??
> ...
>
> Thanks,
> Marco
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 83 Date: Fri, 06 Jun 2008 19:58:05 +0100 From: Patrick Burns <pburns_at_pburns.seanet.com> Subject: Re: [R] Improving data processing efficiency To: Gabor Grothendieck <ggrothendieck_at_gmail.com> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org>, Greg Snow <Greg.Snow_at_imail.org> Message-ID: <484988BD.4090205@pburns.seanet.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed My guess is that number 2 is closest to the mark. Typing too fast is unfortunately not one of my habitual attributes. Gabor Grothendieck wrote:
> On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow <Greg.Snow@imail.org> wrote:
>
>>> -----Original Message-----
>>> From: r-help-bounces_at_r-project.org
>>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
>>> Sent: Friday, June 06, 2008 12:04 PM
>>> To: Daniel Folkinshteyn
>>> Cc: r-help_at_r-project.org
>>> Subject: Re: [R] Improving data processing efficiency
>>>
>>> That is going to be situation dependent, but if you have a
>>> reasonable upper bound, then that will be much easier and not
>>> far from optimal.
>>>
>>> If you pick the possibly too small route, then increasing the
>>> size in largish junks is much better than adding a row at a time.
>>>
>> Pat,
>>
>> I am unfamiliar with the use of the word "junk" as a unit of measure for data objects. I figure there are a few different possibilities:
>>
>> 1. You are using the term intentionally meaning that you suggest he increases the size in terms of old cars and broken pianos rather than used up pens and broken pencils.
>>
>> 2. This was a Freudian slip based on your opinion of some datasets you have seen.
>>
>> 3. Somewhere between your mind and the final product "jumps/chunks" became "junks" (possibly a microsoft "correction", or just typing too fast combined with number 2).
>>
>> 4. "junks" is an official measure of data/object size that I need to learn more about (the history of the term possibly being related to 2 and 3 above).
>>
>>
>
> 5. Chinese sailing vessel.
> http://en.wikipedia.org/wiki/Junk_(ship)
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
------------------------------ Message: 84 Date: Fri, 6 Jun 2008 17:01:01 -0200 From: "Alberto Monteiro" <albmont_at_centroin.com.br> Subject: [R] Plot matrix as many lines To: r-help_at_r-project.org Message-ID: <20080606185826.M61869@centroin.com.br> Content-Type: text/plain; charset=iso-8859-1 Suppose that I have a matrix like: m <- rbind(c(1,2,3,4), c(2,3,2,1)) Is there any way to efficiently plot the _lines_ as if I was doing: plot(m[1,], type="l") points(m[2,], type="l", col="red") (of course, in the "real world" there much more than just 2 lines and 4 columns...) Alberto Monteiro ------------------------------ Message: 85 Date: Fri, 06 Jun 2008 15:00:57 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] mean To: Marco Chiapello <marco.chiapello_at_unito.it> Cc: r-help_at_r-project.org Message-ID: <48498969.8090908@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 11:43 AM, Marco Chiapello wrote:
> Hi,
> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]
> Es:
> c1 c2 c3 mean
> 1 12 13 14 ??
> 2 15 24 10 ??
> ...
>
> Thanks,
> Marco
VADeaths Rural Male Rural Female Urban Male Urban Female 50-54 11.7 8.7 15.4 8.4 55-59 18.1 11.7 24.3 13.6 60-64 26.9 20.3 37.0 19.3 65-69 41.0 30.9 54.6 35.1 70-74 66.0 54.3 71.1 50.0 rowMeans(VADeaths) 50-54 55-59 60-64 65-69 70-74 11.050 16.925 25.875 40.400 60.350 You could have found rowMeans() with the following: RSiteSearch("row means", restrict="functions")
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 86 Date: Fri, 6 Jun 2008 20:06:41 +0100 From: tolga.i.uzuner_at_jpmorgan.com Subject: [R] col.names ? To: r-help_at_r-project.org Message-ID: <OFA4DE8424.7D3A4136-ON80257460.0068DA43-80257460.0068FBB0@jpmchase.com> Content-Type: text/plain Dear R Users, A bit of an elementary question, but somehow, I haven't been able to figure it out. I'd like to changes the column names of a data frame, so I am looking for something like col.names (as in row.names). Could someone please show me how to change the column names of a data frame ? Thanks, Tolga Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]] ------------------------------ Message: 87 Date: Fri, 6 Jun 2008 15:11:13 -0400 From: "Thompson, David (MNR)" <David.John.Thompson_at_ontario.ca> Subject: Re: [R] ggplot questions To: "hadley wickham" <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <ECF21B71808ECF4F8918C57EDBEE121D028C2556@CTSPITDCEMMVX11.cihs.ad.gov.on.ca> Content-Type: text/plain; charset="us-ascii" OK, The original ggplot() construct (below) on the following two dataframes (test1, test2) generate different outputs, which I have attached. The output that I expect is that shown in test2.png. My expectations are that I have set the plotting limits with 'scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim = c(0, 16))' so, both data sets should produce the same output except for the 'o' at plot center and the 'N' at the top. The only difference in the two dataframes are inclusion of first two rows in test2 with rplt column changed to character:
> test2[1:2,]
oplt rplt az dist 1 0 o 0 0 2 0 N 360 16 Ahhh, wait a second! In composing this message I may have found the problem. It appears that including the 'scale_x_continuous()' component twice in my original version was causing (?) the erratic behaviour. And I have confirmed that the ordering of the layer, scale* and coord* components does not affect the output. However, I'm still getting more x-breaks than requested with radial lines corresponding to 45, 135, 225, 315 degrees (NE, SE, SW, NW). Still open to suggestions on that. # new version working with both dataframes ggplot() + coord_polar() + layer( data = test1, mapping = aes(x = az, y = dist, label = rplt), geom = "text") + scale_x_continuous(lim = c(0, 360), breaks=c(90, 180, 270, 360), labels=c('E', 'S', 'W', 'N')) + scale_y_continuous(lim = c(0, 16), breaks=c(0, 4, 8, 12, 16), labels=c('centre', '4m', '8m', '12m', '16m')) ###### ###### ###### # original version NOT WORKING with test1 ggplot() + coord_polar() + scale_x_continuous(lim = c(0, 360)) + scale_y_continuous(lim = c(0, 16)) + layer( data = test, mapping = aes(x = az, y = dist, label = rplt), geom = "text") + scale_x_continuous(breaks=c(90, 180, 270, 360), labels=c('90', '180', '270', '360')) # data generating test1.png test1 <-structure(list(oplt = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L ), rplt = 1:10, az = c(57L, 94L, 96L, 152L, 182L, 185L, 227L, 264L, 332L, 354L), dist = c(4.09, 2.8, 7.08, 7.09, 3.28, 7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c("oplt", "rplt", "az", "dist" ), row.names = c(NA, 10L), class = "data.frame") # data generating test2.png test2 <- structure(list(oplt = c(0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), rplt = c("o", "N", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"), az = c(0, 360, 57, 94, 96, 152, 182, 185, 227, 264, 332, 354), dist = c(0, 16, 4.09, 2.8, 7.08, 7.09, 3.28, 7.85, 6.12, 1.97, 7.68, 7.9)), .Names = c("oplt", "rplt", "az", "dist"), row.names = c(NA, 12L), class = "data.frame") Many, many thanks for your patience and perseverance on this one Hadley, DaveT.
>-----Original Message-----
>From: hadley wickham [mailto:h.wickham_at_gmail.com]
>Sent: June 6, 2008 02:06 PM
>To: Thompson, David (MNR)
>Cc: r-help_at_r-project.org
>Subject: Re: [R] ggplot questions
>
>> Does the difference have something to do with ggplot() using ranges
>> derived from the data?
>> When I modify my original 'test' dataframe with two extra rows as
>> defined below, I get expected results in both versions.
>
>Order shouldn't matter - and if it's making a difference, that's a
>bug. But I'm still not completely sure what you're expecting.
>
>> This highlights my next question (warned you ;-) ), I have been
>> unsuccessful in trying to define fixed plotting ranges to generate a
>> 'template' graphic that I may reuse with successive 'overstory plot'
>> data sets. I have used '+ xlim(0, 360) + ylim(0, 16)' but,
>this seems to
>> not have any effect on the final plot layout.
>
>Could you please produce a small reproducible example that
>demonstrates this? It may well be a bug.
>
>Hadley
>
>--
>http://had.co.nz/
>
-------------- next part -------------- A non-text attachment was scrubbed... Name: test1.png Type: image/png Size: 9710 bytes Desc: test1.png URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080606/02dbba07/attachment-0002.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: test2.png Type: image/png Size: 9306 bytes Desc: test2.png URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20080606/02dbba07/attachment-0003.png> ------------------------------ Message: 88 Date: Fri, 6 Jun 2008 14:14:49 -0500 From: "Douglas Bates" <bates_at_stat.wisc.edu> Subject: Re: [R] mean To: ctu_at_bigred.unl.edu Cc: r-help_at_r-project.org Message-ID: <40e66e0b0806061214veef7b96t8b07bae9c9a4d044@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 See also ?rowMeans On Fri, Jun 6, 2008 at 1:56 PM, <ctu_at_bigred.unl.edu> wrote:
>> TABLE<-matrix(data=c(12,13,14,15,24,10),byrow=T,nrow=2,ncol=3)
>> TABLE
>
> [,1] [,2] [,3]
> [1,] 12 13 14
> [2,] 15 24 10
>>
>> apply(TABLE,1,mean)
>
> [1] 13.00000 16.33333
>
> Chunhao
>
>
> Quoting Marco Chiapello <marco.chiapello_at_unito.it>:
>
>> Hi,
>> I have a simple question. If I have a table and I want to have the mean
[[elided Yahoo spam]]
>> Es:
>> c1 c2 c3 mean
>> 1 12 13 14 ??
>> 2 15 24 10 ??
>> ...
>>
>> Thanks,
>> Marco
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 89 Date: Fri, 6 Jun 2008 13:14:20 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: [R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency To: "Patrick Burns" <pburns_at_pburns.seanet.com>, "Gabor Grothendieck" <ggrothendieck_at_gmail.com> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F68589C6@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii I still like the number 4 option, so I think we need to come up with a formal definition for a "junk" of data. I read somewhere that Tukey coined the word "bit" as it applies to computers, we can share the credit/blame for "junks" of data. My proposal for a statistical/data definition of the work junk: Junk (noun): A quantity of data just large enough to get the client excited about the "great" dataset they provided, but not large enough to make any useful conclusions. Example sentence: We just received another junk of data from the boss, who gets to give him the bad news that it still does not prove his pet theory? -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow_at_imail.org (801) 408-8111
> -----Original Message-----
> From: Patrick Burns [mailto:pburns_at_pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help_at_r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
>
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow_at_imail.org> wrote:
> >
> >>> -----Original Message-----
> >>> From: r-help-bounces_at_r-project.org
> >>> [mailto:r-help-bounces_at_r-project.org] On Behalf Of Patrick Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help_at_r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>>
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not far
> >>> from optimal.
> >>>
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >>>
> >> Pat,
> >>
> >> I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >>
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >>
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >>
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >>
> >> 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >>
> >>
> >
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> >
> > ______________________________________________
> > R-help_at_r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>
------------------------------ Message: 90 Date: Fri, 6 Jun 2008 14:18:52 -0500 From: "Douglas Bates" <bates_at_stat.wisc.edu> Subject: Re: [R] R + Linux To: "steven wilson" <swpt07_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <40e66e0b0806061218r71700a56nedb6cc610150fb49@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 1:13 PM, steven wilson <swpt07_at_gmail.com> wrote:
> Dear all;

> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
Ah yes, we haven't had a good flame war for a long time. Let's start discussing the relative merits of various Linux distributions. That should heat things up a bit. I can only speak about Ubuntu. I have used it exclusively for several years now and find it to be superb. In my opinion it is easy to install and maintain and has very good support for R (take a bow, Dirk). ------------------------------ Message: 91 Date: Fri, 06 Jun 2008 15:25:15 -0400 From: "john.polo" <jpolo_at_mail.usf.edu> Subject: [R] editing a data.frame To: r-help_at_r-project.org Message-ID: <48498F1B.5020506@mail.usf.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed dear R users, the data frame (read in from a csv) looks like this: TreeTag Census Stage DBH 1 CW-W740 2001 juvenile 5.8 2 CW-W739 2001 juvenile 4.3 3 CW-W738 2001 juvenile 4.7 4 CW-W737 2001 juvenile 5.4 5 CW-W736 2001 juvenile 7.4 6 CW-W735 2001 juvenile 5.4 ... 1501 1.00E-20 2001 adult 32.5 i would like to change values under the TreeTag column. as the last value shows, some of the tags have decimals followed by 2 decimal places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i have a rough understanding of regexp and grepped all the positions that have the inappropriate tags. i tried sub() a couple of different ways, like yr1bp$TreeTag[1501]<-sub("1.00", "1", yr1bp$TreeTag[1501]) and after turning yr1bp$TreeTag[1501] into <NA>, yr1bp$TreeTag[1501]<-sub("", "1E-20", yr1pb$TreeTag[1501]) and sub("", "1E-20", yr1bp$TreeTag[1501]) but it's not working. i guess it has something to do with the data.frame characteristics i'm not aware of or don't understand. would i somehow have to tear apart the columns, edit them, and then put it back together? not that i know how to do that, but i'm wondering out loud. john ------------------------------ Message: 92 Date: Fri, 6 Jun 2008 16:31:11 -0300 From: "Henrique Dallazuanna" <wwwhsd_at_gmail.com> Subject: Re: [R] Plot matrix as many lines To: "Alberto Monteiro" <albmont_at_centroin.com.br> Cc: r-help_at_r-project.org Message-ID: <da79af330806061231o5be8199aq63d1bd983b359da7@mail.gmail.com> Content-Type: text/plain Try this: matplot(t(m), type='l', lty = 'solid', col='black') On 6/6/08, Alberto Monteiro <albmont_at_centroin.com.br> wrote:
>
> Suppose that I have a matrix like:
>
> m <- rbind(c(1,2,3,4), c(2,3,2,1))
>
> Is there any way to efficiently plot the _lines_ as if
> I was doing:
>
> plot(m[1,], type="l")
> points(m[2,], type="l", col="red")
>
> (of course, in the "real world" there much more than
> just 2 lines and 4 columns...)
>
> Alberto Monteiro
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Henrique Dallazuanna Curitiba-Parana-Brasil 250 25' 40" S 490 16' 22" O [[alternative HTML version deleted]] ------------------------------ Message: 93 Date: Fri, 6 Jun 2008 15:32:29 -0400 From: John Nolan <jpnolan_at_american.edu> Subject: [R] calling a C function with a struct To: r-help_at_r-project.org Message-ID: <OFADEBCB20.A4FB3A39-ON85257460.006B3BBC-85257460.006B7096@american.edu> Content-Type: text/plain I am trying to call a precompiled C function that uses a struct as one of it's arguments. I could write a wrapper function in C, but I was hoping there is some way to pack fields into an array of type raw that could be passed directly to the function. Here is some more detail. The C struct is simple, but has mixed types: struct STRUCT1 { long type; long nx; double *x; double a; double b; }; typedef struct STRUCT1 STRUCT1_TYPE; The C function header is void func1( long method, STRUCT1 my_struct, double *output); I would like to have an R list mimicking the C struct, and then use .C to call func1 with this information, e.g. my.struct <- list(type=3,nx=5,x=1:5,a=2.5,b=8.3) my.func1( 3, convert2raw( my.struct ), ) where R function convert2raw would return a vector of type raw with the fields of my.struct packed into memory just like STRUCT1, and then I could call func1 with that vector of raws. Can I write a convert2raw( ) function and then use my.func1 <- function( method, buf ) { a <- .C("func1", as.integer(method), as.raw(buf) , output=double(1) ) return(a$output) } John ........................................................................... John P. Nolan Math/Stat Department 227 Gray Hall American University 4400 Massachusetts Avenue, NW Washington, DC 20016-8050 jpnolan_at_american.edu 202.885.3140 voice 202.885.3155 fax http://academic2.american.edu/~jpnolan ........................................................................... [[alternative HTML version deleted]] ------------------------------ Message: 94 Date: Fri, 6 Jun 2008 16:37:25 -0300 From: "Henrique Dallazuanna" <wwwhsd_at_gmail.com> Subject: Re: [R] col.names ? To: "tolga.i.uzuner_at_jpmorgan.com" <tolga.i.uzuner_at_jpmorgan.com> Cc: r-help_at_r-project.org Message-ID: <da79af330806061237j5578eabfq75da2146ef1b09ca@mail.gmail.com> Content-Type: text/plain See ?names On 6/6/08, tolga.i.uzuner_at_jpmorgan.com <tolga.i.uzuner_at_jpmorgan.com> wrote:
>
> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so I
> am looking for something like col.names (as in row.names). Could someone
> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
>
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Henrique Dallazuanna Curitiba-Parana-Brasil 250 25' 40" S 490 16' 22" O [[alternative HTML version deleted]] ------------------------------ Message: 95 Date: Fri, 06 Jun 2008 15:41:08 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] Plot matrix as many lines To: Alberto Monteiro <albmont_at_centroin.com.br> Cc: r-help_at_r-project.org Message-ID: <484992D4.3040903@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 3:01 PM, Alberto Monteiro wrote:
> Suppose that I have a matrix like:
>
> m <- rbind(c(1,2,3,4), c(2,3,2,1))
>
> Is there any way to efficiently plot the _lines_ as if
> I was doing:
>
> plot(m[1,], type="l")
> points(m[2,], type="l", col="red")
>
> (of course, in the "real world" there much more than
> just 2 lines and 4 columns...)
m <- rbind(c(1,2,3,4), c(2,3,2,1)) matplot(t(m), type="l") ?matplot
> Alberto Monteiro
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 96 Date: Fri, 06 Jun 2008 15:46:47 -0400 From: Chuck Cleland <ccleland_at_optonline.net> Subject: Re: [R] col.names ? To: tolga.i.uzuner_at_jpmorgan.com Cc: r-help_at_r-project.org Message-ID: <48499427.2060108@optonline.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 6/6/2008 3:06 PM, tolga.i.uzuner_at_jpmorgan.com wrote:
> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so I
> am looking for something like col.names (as in row.names). Could someone
> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
my.iris <- iris names(my.iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" names(my.iris) <- c("Length.Sepal", "Width.Sepal", "Length.Petal", "Width.Petal", "Species") names(my.iris) [1] "Length.Sepal" "Width.Sepal" "Length.Petal" "Width.Petal" "Species"
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894 ------------------------------ Message: 97 Date: Fri, 6 Jun 2008 15:52:19 -0400 From: William Pepe <williampepe_at_hotmail.com> Subject: Re: [R] col.names ? To: <tolga.i.uzuner_at_jpmorgan.com>, <r-help_at_r-project.org> Message-ID: <BAY101-W486E51E4FA67FD4847E4D3B4B70@phx.gbl> Content-Type: text/plain As a very simple example: TolgaData<-data.frame(A=c(1,2),B=c(3,4))names(TolgaData )<- c( "newA", "newB" )> TolgaData Column names should now be newA and newB Best, Bill> To: r-help_at_r-project.org> From: tolga.i.uzuner_at_jpmorgan.com> Date: Fri, 6 Jun 2008 20:06:41 +0100> Subject: [R] col.names ?> > Dear R Users,> > A bit of an elementary question, but somehow, I haven't been able to > figure it out. I'd like to changes the column names of a data frame, so I > am looking for something like col.names (as in row.names). Could someone > please show me how to change the column names of a data frame ?> > Thanks,> Tolga> > Generally, this communication is for informational purposes only> and it is not intended as an offer or solicitation for the purchase> or sale of any financial instrument or as an official confirmation> of any transaction. In the event you are receiving the offering> materials attached below related to your interest in hedge funds or> private equity, this communica! tion may be intended as an offer or> solicitation for the purchase or sale of such fund(s). All market> prices, data and other information are not warranted as to> completeness or accuracy and are subject to change without notice.> Any comments or statements made herein do not necessarily reflect> those of JPMorgan Chase & Co., its subsidiaries and affiliates.> > This transmission may contain information that is privileged,> confidential, legally privileged, and/or exempt from disclosure> under applicable law. If you are not the intended recipient, you> are hereby notified that any disclosure, copying, distribution, or> use of the information contained herein (including any reliance> thereon) is STRICTLY PROHIBITED. Although this transmission and any> attachments are believed to be free of any virus or other defect> that might affect any computer system into which it is received and> opened, it is the responsibility of the recipient to ensure that it> is virus free and no r! esponsibility is accepted by JPMorgan Chase &> Co., its subsidiaries a nd affiliates, as applicable, for any loss> or damage arising in any way from its use. If you received this> transmission in error, please immediately contact the sender and> destroy the material in its entirety, whether in electronic or hard> copy format. Thank you.> Please refer to http://www.jpmorgan.com/pages/disclosures for> disclosures relating to UK legal entities.> [[alternative HTML version deleted]]> > ______________________________________________> R-help@r-project.org mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. _________________________________________________________________ sh_skydrive_062008 [[alternative HTML version deleted]] ------------------------------ Message: 98 Date: Fri, 6 Jun 2008 20:54:05 +0100 From: tolga.i.uzuner_at_jpmorgan.com Subject: Re: [R] col.names ? To: William Pepe <williampepe_at_hotmail.com> Cc: r-help_at_r-project.org, tolga.i.uzuner_at_jpmorgan.com Message-ID: <OF1B3AFA12.6E9DEF33-ON80257460.006D4E42-80257460.006D52AB@jpmchase.com> Content-Type: text/plain Many thanks to everyone who replied, Tolga William Pepe <williampepe_at_hotmail.com> 06/06/2008 20:52 To <tolga.i.uzuner_at_jpmorgan.com>, <r-help_at_r-project.org> cc Subject RE: [R] col.names ? As a very simple example: TolgaData<-data.frame(A=c(1,2),B=c(3,4)) names(TolgaData )<- c( "newA", "newB" )
> TolgaData
Column names should now be newA and newB Best, Bill
> To: r-help@r-project.org
> From: tolga.i.uzuner_at_jpmorgan.com
> Date: Fri, 6 Jun 2008 20:06:41 +0100
> Subject: [R] col.names ?
>
> Dear R Users,
>
> A bit of an elementary question, but somehow, I haven't been able to
> figure it out. I'd like to changes the column names of a data frame, so
I
> am looking for something like col.names (as in row.names). Could someone

> please show me how to change the column names of a data frame ?
>
> Thanks,
> Tolga
>
> Generally, this communication is for informational purposes only
> and it is not intended as an offer or solicitation for the purchase
> or sale of any financial instrument or as an official confirmation
> of any transaction. In the event you are receiving the offering
> materials attached below related to your interest in hedge funds or
> private equity, this communication may be intended as an offer or
> solicitation for the purchase or sale of such fund(s). All market
> prices, data and other information are not warranted as to
> completeness or accuracy and are subject to change without notice.
> Any comments or statements made herein do not necessarily reflect
> those of JPMorgan Chase & Co., its subsidiaries and affiliates.
>
> This transmission may contain information that is privileged,
> confidential, legally privileged, and/or exempt from disclosure
> under applicable law. If you are not the intended recipient, you
> are hereby notified that any disclosure, copying, distribution, or
> use of the information contained herein (including any reliance
> thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect
> that might affect any computer system into which it is received and
> opened, it is the responsibility of the recipient to ensure that it
> is virus free and no responsibility is accepted by JPMorgan Chase &
> Co., its subsidiaries and affiliates, as applicable, for any loss
> or damage arising in any way from its use. If you received this
> transmission in error, please immediately contact the sender and
> destroy the material in its entirety, whether in electronic or hard
> copy format. Thank you.
> Please refer to http://www.jpmorgan.com/pages/disclosures for
> disclosures relating to UK legal entities.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
SkyDrive. Generally, this communication is for informational purposes only and it is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. In the event you are receiving the offering materials attached below related to your interest in hedge funds or private equity, this communication may be intended as an offer or solicitation for the purchase or sale of such fund(s). All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to UK legal entities. [[alternative HTML version deleted]] ------------------------------ Message: 99 Date: Fri, 6 Jun 2008 15:57:37 -0400 From: "Jorge Ivan Velez" <jorgeivanvelez_at_gmail.com> Subject: Re: [R] Subsetting to unique values To: "Emslie, Paul [Ctr]" <emsliep_at_atac.mil> Cc: r-help_at_r-project.org Message-ID: <317737de0806061257n197e2f1aq871cac79945737ca@mail.gmail.com> Content-Type: text/plain Dear Paul, Try also: ddTable <- data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry")) ddTable[unique(ddTable$Id),] HTH, Jorge On Fri, Jun 6, 2008 at 9:35 AM, Emslie, Paul [Ctr] <emsliep_at_atac.mil> wrote:
> I want to take the first row of each unique ID value from a data frame.
> For instance
> > ddTable <-
> data.frame(Id=c(1,1,2,2),name=c("Paul","Joe","Bob","Larry"))
>
> I want a dataset that is
> Id Name
> 1 Paul
> 2 Bob
>
> > unique(ddTable)
> Will give me all 4 rows, and
> > unique(ddTable$Id)
> Will give me c(1,2), but not accompanied by the name column.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]] ------------------------------ Message: 100 Date: Fri, 06 Jun 2008 16:04:24 -0400 From: Duncan Murdoch <murdoch_at_stats.uwo.ca> Subject: Re: [R] calling a C function with a struct To: John Nolan <jpnolan_at_american.edu> Cc: r-help_at_r-project.org Message-ID: <48499848.8010100@stats.uwo.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed John Nolan wrote:
> I am trying to call a precompiled C function that uses a struct as one of
> it's arguments.
> I could write a wrapper function in C, but I was hoping there is some way
> to
> pack fields into an array of type raw that could be passed directly to the
> function.
>
> Here is some more detail. The C struct is simple, but has mixed types:
>
> struct STRUCT1 {
> long type;
> long nx;
> double *x;
> double a;
> double b;
> };
> typedef struct STRUCT1 STRUCT1_TYPE;
>
> The C function header is
>
> void func1( long method, STRUCT1 my_struct, double *output);
>
> I would like to have an R list mimicking the C struct,
> and then use .C to call func1 with this information, e.g.
>
> my.struct <- list(type=3,nx=5,x=1:5,a=2.5,b=8.3)
> my.func1( 3, convert2raw( my.struct ), )
>
It might be possible, but it would be quite tricky, and I'd guess the "double *x" would be just about impossible. R has no way to see C level pointers. Just write the wrapper in C, it will be easier than writing one in R. Duncan Murdoch
> where R function convert2raw would return a vector of type raw with
> the fields of my.struct packed into memory just like STRUCT1, and then
> I could call func1 with that vector of raws.
>
> Can I write a convert2raw( ) function and then use
> my.func1 <- function( method, buf ) {
> a <- .C("func1", as.integer(method), as.raw(buf) , output=double(1)
> )
> return(a$output)
> }
>
>
> John
>
> ...........................................................................
>
> John P. Nolan
> Math/Stat Department
> 227 Gray Hall
> American University
> 4400 Massachusetts Avenue, NW
> Washington, DC 20016-8050
>
> jpnolan_at_american.edu
> 202.885.3140 voice
> 202.885.3155 fax
> http://academic2.american.edu/~jpnolan
>
> ...........................................................................
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 101 Date: Fri, 06 Jun 2008 16:18:23 -0400 From: "Kevin E. Thorpe" <kevin.thorpe_at_utoronto.ca> Subject: Re: [R] R + Linux To: Douglas Bates <bates_at_stat.wisc.edu> Cc: r-help_at_r-project.org, steven wilson <swpt07_at_gmail.com> Message-ID: <48499B8F.10808@utoronto.ca> Content-Type: text/plain; charset=ISO-8859-1 Any of the three distros mentioned are sure to be fine. Personally, I find the sysadmin tool in opensuse to be fantastic for a novice. It comes down to preference. Try some live versions of the distros to see what you like best. Douglas Bates wrote:
> On Fri, Jun 6, 2008 at 1:13 PM, steven wilson <swpt07@gmail.com> wrote:
>> Dear all;
>
>> I'm planning to install Linux on my computer to run R (I'm bored of
>> W..XP). However, I haven't used Linux before and I would appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>
> Ah yes, we haven't had a good flame war for a long time. Let's start
> discussing the relative merits of various Linux distributions. That
> should heat things up a bit.
>
> I can only speak about Ubuntu. I have used it exclusively for several
> years now and find it to be superb. In my opinion it is easy to
> install and maintain and has very good support for R (take a bow,
> Dirk).
>
-- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: kevin.thorpe_at_utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057 ------------------------------ Message: 102 Date: Fri, 06 Jun 2008 16:36:34 -0400 From: Markus J?ntti <mjantti_at_abo.fi> Subject: Re: [R] R + Linux To: steven wilson <swpt07_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <1212784594.7023.61.camel@hades> Content-Type: text/plain I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R on the Debian and RedHat machines. I have encountered few problems running R on RedHat/CentOS, but I do think the Debian/Ubuntu package management system, combined with the kind provision of packages, makes life a lot simpler. (Yes, many thanks to Dirk!). Also, the ease of installing and maintaining among with the highly useful user forums of Ubuntu would lead me to recommend that particular distribution. Regards, Markus On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
> Dear all;
>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
>
> Thanks
> Steven
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- Markus Jantti Abo Akademi University mjantti_at_abo.fi http://www.iki.fi/~mjantti ------------------------------ Message: 103 Date: Fri, 06 Jun 2008 16:37:28 -0400 From: Michael Friendly <friendly_at_yorku.ca> Subject: [R] color scale mapped to B/W To: R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <4849A008.2000801@yorku.ca> Content-Type: text/plain; charset=ISO-8859-1; format=flowed In an R graphic, I'm using cond.col <- c("green", "yellow", "red") to represent a quantitative variable, where green means 'OK', yellow represents 'warning' and red represents 'danger'. Using these particular color names, in B/W, red is darkest and yellow is lightest. I'd like to find color designations to replace yellow and green so that when printed in B/W, the yellowish color appears darker than the greenish one. Is there some tool/code I can use to find these? i.e., something to display a grid of color swatches with color codes/names I can look at in color and B/W to decide? t hanks, -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA ------------------------------ Message: 104 Date: Fri, 6 Jun 2008 13:45:11 -0700 (PDT) From: Yasir Kaheil <kaheil_at_gmail.com> Subject: Re: [R] Random Forest To: r-help_at_r-project.org Message-ID: <17700817.post@talk.nabble.com> Content-Type: text/plain; charset=us-ascii hi there: please refer to: http://www.math.usu.edu/~adele/forests/cc_home.htm and http://www.math.usu.edu/~minnotte/S5600S07/R17.txt thanks BertrandM wrote:
>
> Hello
>
> Is there exists a package for multivariate random forest, namely for
> multivariate response data ? It seems to be impossible with the
> "randomForest" function and I did not find any information about this
> in the help pages ...
> Thank you for your help
>
> Bertrand
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
----- Yasir H. Kaheil Catchment Research Facility The University of Western Ontario -- View this message in context: http://www.nabble.com/Random-Forest-tp17698842p17700817.html Sent from the R help mailing list archive at Nabble.com. ------------------------------ Message: 105 Date: Fri, 06 Jun 2008 16:49:24 -0400 From: Roland Rau <roland.rproject_at_gmail.com> Subject: Re: [R] R + Linux To: steven wilson <swpt07_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849A2D4.5090206@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear all, a related follow up -- with the hope for some feedback from the specialists. Is the following general advice justified: ========================================================= If one has not more than 4GB RAM and one wants to run primarily R on one's Linux machine, it is a good idea to install the 32bit version of the operating system. The reasons are: The machine has 4GB RAM which implies that the 32bit version can (theoretically) use the whole available memory address space. The advantage of addressing more memory using 64bit is in this instance of a 4GB computer lost. Furthermore, 64bit often runs slower than 32bit (see Section 8 of R Admin Manual) due to the larger pointer size. ========================================================= Thanks, Roland steven wilson wrote:
> Dear all;
>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
> most popular ones (at least on the R-help lists). The computer is a PC
> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
> used only to run R.
>
> Thanks
> Steven
>
------------------------------ Message: 106 Date: Fri, 6 Jun 2008 15:52:40 -0500 From: Dirk Eddelbuettel <edd_at_debian.org> Subject: Re: [R] R + Linux To: "Kevin E. Thorpe" <kevin.thorpe_at_utoronto.ca> Cc: r-help_at_r-project.org, Douglas Bates <bates_at_stat.wisc.edu>, steven wilson <swpt07_at_gmail.com> Message-ID: <18505.41880.598299.374536@ron.nulle.part> Content-Type: text/plain; charset=us-ascii On 6 June 2008 at 16:18, Kevin E. Thorpe wrote: | Any of the three distros mentioned are sure to be fine. | Personally, I find the sysadmin tool in opensuse to be | fantastic for a novice. | | It comes down to preference. Try some live versions of the distros to | see what you like best. While that is certainly true, there is a difference that doesn't get mentioned as much. On Debian + Ubuntu you also get numerous add-ons and extension that the other distros may not have such as - 60+ packages from CRAN already in the distro - the ESS emacs add-on - out-of-the box RPy (R/Python) support - Ggobi and rggobi for visualization - Rkward as a friendly GUI - bindings from R to Shogun for data-mining - littler for scripting - out-of-the box Rmpi / Open MPI support and a few things I am probably forgetting. As you say, preferences. The ease of installation with Ubuntu (and also with the more recent Debian installers) coupled with the richer set of packages tilts this at least in my book. But that's just my $0.02. Dirk -- Three out of two people have difficulties with fractions. ------------------------------ Message: 107 Date: Fri, 6 Jun 2008 22:09:10 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] R + Linux To: Roland Rau <roland.rproject_at_gmail.com> Cc: r-help_at_r-project.org, steven wilson <swpt07_at_gmail.com> Message-ID: <alpine.LFD.1.10.0806062159020.18018@gannet.stats.ox.ac.uk> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed This is not sound advice. For 1GB yes, perhaps 2GB. Beyond that the extra freedom in the address space of a 64-bit system pays off. The user address space of a 32-bit Linux system is (in the examples I have seen) 3 to 3.5Gb. See ?"Memory-limits" for why that is restrictive. There are some anomalies, depending on the CPU. On Intel Core 2 Duos manipulating 64-bit pointers seems to be as efficient as 32-bit ones and on some platforms (e.g. Mac OS 10.5.3) 64-bit is actually faster than 32-bit R. So very similar CPUs can give quite different performance differences with 32- vs 64-bit R. On Fri, 6 Jun 2008, Roland Rau wrote:
> Dear all,
>
> a related follow up -- with the hope for some feedback from the specialists.
> Is the following general advice justified:
> =========================================================
> If one has not more than 4GB RAM and one wants to run primarily R on one's
> Linux machine, it is a good idea to install the 32bit version of the
> operating system.
> The reasons are:
> The machine has 4GB RAM which implies that the 32bit version can
> (theoretically) use the whole available memory address space. The advantage
> of addressing more memory using 64bit is in this instance of a 4GB computer
> lost. Furthermore, 64bit often runs slower than 32bit (see Section 8 of R
> Admin Manual) due to the larger pointer size.
> =========================================================
>
> Thanks,
> Roland
>
>
> steven wilson wrote:
>> Dear all;
>>
>> I'm planning to install Linux on my computer to run R (I'm bored of
>> W..XP). However, I haven't used Linux before and I would appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>>
>> Thanks
>> Steven
>>
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ------------------------------ Message: 108 Date: Fri, 06 Jun 2008 17:11:00 -0400 From: Esmail Bonakdarian <esmail.js_at_gmail.com> Subject: Re: [R] R + Linux To: steven wilson <swpt07_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849A7E4.5040006@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed steven wilson wrote:
>
> I'm planning to install Linux on my computer to run R (I'm bored of
> W..XP). However, I haven't used Linux before and I would appreciate,
> if possible, suggestions/comments about what could be the best option
> install,
Hi, I have used Linux since the early 1990s starting with the original slackware distribution, followed by various versions of Red Hat, Gentoo (compiled from source), Fedora and now Ubuntu. Ubuntu is my choice for having the least troublesome install and maintenance. It has a very nice package manager, and if your goal is to *use* a Linux system rather than tinker with it, you could do much worse than Ubuntu. I installed R via the package manger a month ago or so, very easy and trouble free. Hope that helps, Esmail ------------------------------ Message: 109 Date: Fri, 06 Jun 2008 17:22:01 -0400 From: Abhijit Dasgupta <adasgupt_at_mail.jci.tju.edu> Subject: Re: [R] R + Linux To: Markus J?ntti <mjantti_at_abo.fi> Cc: r-help_at_r-project.org, steven wilson <swpt07_at_gmail.com> Message-ID: <4849AA79.6040507@mail.jci.tju.edu> Content-Type: text/plain I've had R on an Ubuntu system for about 18 months now, and getting R up and running was a breeze. (I didn't realize it earlier, but Dirk certainly gets my vote of thanks for his efforts in making this process as easy as it is). Specially in terms of dependencies and the like, the Ubuntu packaging system has made things specially easy. I've also had the experience of installing R on a RedHat Enterprise System on a new server at university, and the dependencies issues was much more problematic (albeit, I wasn't allowed to use yum because of the way our IT people had set it up), specially at the compiler level. Just my limited experience in this area. In any case, I'm not going back to Windows now if not forced; I've been quite happy with my experience in the Linux world. Abhijit Markus Jdntti wrote:
> I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R
> on the Debian and RedHat machines. I have encountered few problems
> running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
> management system, combined with the kind provision of packages, makes
> life a lot simpler. (Yes, many thanks to Dirk!).
>
> Also, the ease of installing and maintaining among with the highly
> useful user forums of Ubuntu would lead me to recommend that particular
> distribution.
>
> Regards,
>
> Markus
>
> On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
>
>> Dear all;
>>
>> I'm planning to install Linux on my computer to run R (I'm bored of
>> W..XP). However, I haven't used Linux before and I would appreciate,
>> if possible, suggestions/comments about what could be the best option
>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>> most popular ones (at least on the R-help lists). The computer is a PC
>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>> used only to run R.
>>
>> Thanks
>> Steven
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
[[alternative HTML version deleted]] ------------------------------ Message: 110 Date: Fri, 06 Jun 2008 17:31:53 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] editing a data.frame To: "john.polo" <jpolo_at_mail.usf.edu> Cc: r-help_at_r-project.org Message-ID: <4849ACC9.7050304@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed works for me: > sub('1.00', '1', '1.00E-20') [1] "1E-20" remember, according to ?sub, it's sub(pattern, repl, string) try doing it step by step. first, see what yr1bp$TreeTag[1501] is. then, if it's the right data item, see what the output of sub("1.00", "1", yr1bp$TreeTag[1501]) is. that'll let you figure out where the problem lies. finally, if all your target strings are of the form 1.00E-20, you could sub the whole thing with a more general regexp: sub("([0-9])(\.[0-9]{2})(.*)", "\\1\\3", yourvector) (it matches a digit, followed by a dot and two digits, followed by "anything else", and takes out the "dot and two digits" bit in the replacement, in the whole vector.) on 06/06/2008 03:25 PM john.polo said the following:
> dear R users,
>
> the data frame (read in from a csv) looks like this:
> TreeTag Census Stage DBH
> 1 CW-W740 2001 juvenile 5.8
> 2 CW-W739 2001 juvenile 4.3
> 3 CW-W738 2001 juvenile 4.7
> 4 CW-W737 2001 juvenile 5.4
> 5 CW-W736 2001 juvenile 7.4
> 6 CW-W735 2001 juvenile 5.4
> ...
> 1501 1.00E-20 2001 adult 32.5
>
> i would like to change values under the TreeTag column. as the last
> value shows, some of the tags have decimals followed by 2 decimal
> places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i have
> a rough understanding of regexp and grepped all the positions that have
> the inappropriate tags. i tried sub() a couple of different ways, like
> yr1bp$TreeTag[1501]<-sub("1.00", "1", yr1bp$TreeTag[1501])
> and after turning yr1bp$TreeTag[1501] into <NA>,
> yr1bp$TreeTag[1501]<-sub("", "1E-20", yr1pb$TreeTag[1501])
> and
> sub("", "1E-20", yr1bp$TreeTag[1501])
> but it's not working. i guess it has something to do with the data.frame
> characteristics i'm not aware of or don't understand. would i somehow
> have to tear apart the columns, edit them, and then put it back
> together? not that i know how to do that, but i'm wondering out loud.
>
> john
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
------------------------------ Message: 111 Date: Fri, 06 Jun 2008 17:34:07 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] R + Linux To: adasgupt_at_mail.jci.tju.edu Cc: Markus J?ntti <mjantti_at_abo.fi>, r-help_at_r-project.org, steven wilson <swpt07_at_gmail.com> Message-ID: <4849AD4F.8040409@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed another vote for ubuntu here - works for me, and quite trouble-free. add the r-project repositories, and you're sure to always have the latest, too. (if you don't care for the latest R, you can of course also just get R from the distro's repos as well) on 06/06/2008 05:22 PM Abhijit Dasgupta said the following:
> I've had R on an Ubuntu system for about 18 months now, and getting R
> up and running was a breeze. (I didn't realize it earlier, but Dirk
> certainly gets my vote of thanks for his efforts in making this process
> as easy as it is). Specially in terms of dependencies and the like, the
> Ubuntu packaging system has made things specially easy. I've also had
> the experience of installing R on a RedHat Enterprise System on a new
> server at university, and the dependencies issues was much more
> problematic (albeit, I wasn't allowed to use yum because of the way our
> IT people had set it up), specially at the compiler level. Just my
> limited experience in this area. In any case, I'm not going back to
> Windows now if not forced; I've been quite happy with my experience in
> the Linux world.
>
> Abhijit
>
> Markus J?ntti wrote:
>> I have both Debian, Ubuntu, RedHat and CentOS systems, and primary run R
>> on the Debian and RedHat machines. I have encountered few problems
>> running R on RedHat/CentOS, but I do think the Debian/Ubuntu package
>> management system, combined with the kind provision of packages, makes
>> life a lot simpler. (Yes, many thanks to Dirk!).
>>
>> Also, the ease of installing and maintaining among with the highly
>> useful user forums of Ubuntu would lead me to recommend that particular
>> distribution.
>>
>> Regards,
>>
>> Markus
>>
>> On Fri, 2008-06-06 at 14:13 -0400, steven wilson wrote:
>>
>>> Dear all;
>>>
>>> I'm planning to install Linux on my computer to run R (I'm bored of
>>> W..XP). However, I haven't used Linux before and I would appreciate,
>>> if possible, suggestions/comments about what could be the best option
>>> install, say Fedora, Ubuntu or OpenSuse which to my impression are the
>>> most popular ones (at least on the R-help lists). The computer is a PC
>>> desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be
>>> used only to run R.
>>>
>>> Thanks
>>> Steven
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
------------------------------ Message: 112 Date: Fri, 6 Jun 2008 17:55:17 -0400 From: Jonathan Baron <baron_at_psych.upenn.edu> Subject: Re: [R] R + Linux To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: Markus J?ntti <mjantti_at_abo.fi>, r-help_at_r-project.org, steven wilson <swpt07_at_gmail.com> Message-ID: <20080606215517.GA4326@psych.upenn.edu> Content-Type: text/plain; charset=us-ascii R works just fine on Fedora 9. ------------------------------ Message: 113 Date: Fri, 06 Jun 2008 18:10:25 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Patrick Burns <pburns_at_pburns.seanet.com> Cc: r-help_at_r-project.org Message-ID: <4849B5D1.4070202@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Hmm... ok... so i ran the code twice - once with a preallocated result, assigning rows to it, and once with a nrow=0 result, rbinding rows to it, for the first 20 quarters. There was no speedup. In fact, running with a preallocated result matrix was slower than rbinding to the matrix: for preallocated matrix: Time difference of 1.577779 mins for rbinding: Time difference of 1.498628 mins (the time difference only counts from the start of the loop til the end, so the time to allocate the empty matrix was /not/ included in the time count). So, it appears that rbinding a matrix is not the bottleneck. (That it was actually faster than assigning rows could have been a random anomaly (e.g. some other process eating a bit of cpu during the run?), or not - at any rate, it doesn't make an /appreciable/ difference. Any other suggestions? :) on 06/06/2008 02:03 PM Patrick Burns said the following:
> That is going to be situation dependent, but if you
> have a reasonable upper bound, then that will be
> much easier and not far from optimal.
>
> If you pick the possibly too small route, then increasing
> the size in largish junks is much better than adding
> a row at a time.
>
> Pat
>
> Daniel Folkinshteyn wrote:
>> thanks for the tip! i'll try that and see how big of a difference that
>> makes... if i am not sure what exactly the size will be, am i better
>> off making it larger, and then later stripping off the blank rows, or
>> making it smaller, and appending the missing rows?
>>
>> on 06/06/2008 11:44 AM Patrick Burns said the following:
>>> One thing that is likely to speed the code significantly
>>> is if you create 'result' to be its final size and then
>>> subscript into it. Something like:
>>>
>>> result[i, ] <- bestpeer
>>>
>>> (though I'm not sure if 'i' is the proper index).
>>>
>>> Patrick Burns
>>> patrick_at_burns-stat.com
>>> +44 (0)20 8525 0696
>>> http://www.burns-stat.com
>>> (home of S Poetry and "A Guide for the Unwilling S User")
>>>
>>> Daniel Folkinshteyn wrote:
>>>> Anybody have any thoughts on this? Please? :)
>>>>
>>>> on 06/05/2008 02:09 PM Daniel Folkinshteyn said the following:
>>>>> Hi everyone!
>>>>>
>>>>> I have a question about data processing efficiency.
>>>>>
>>>>> My data are as follows: I have a data set on quarterly
>>>>> institutional ownership of equities; some of them have had recent
>>>>> IPOs, some have not (I have a binary flag set). The total dataset
>>>>> size is 700k+ rows.
>>>>>
>>>>> My goal is this: For every quarter since issue for each IPO, I need
>>>>> to find a "matched" firm in the same industry, and close in market
>>>>> cap. So, e.g., for firm X, which had an IPO, i need to find a
>>>>> matched non-issuing firm in quarter 1 since IPO, then a (possibly
>>>>> different) non-issuing firm in quarter 2 since IPO, etc. Repeat for
>>>>> each issuing firm (there are about 8300 of these).
>>>>>
>>>>> Thus it seems to me that I need to be doing a lot of data selection
>>>>> and subsetting, and looping (yikes!), but the result appears to be
>>>>> highly inefficient and takes ages (well, many hours). What I am
>>>>> doing, in pseudocode, is this:
>>>>>
>>>>> 1. for each quarter of data, getting out all the IPOs and all the
>>>>> eligible non-issuing firms.
>>>>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>>>>> industry, sort them by size, and finally grab a matching firm
>>>>> closest in size (the exact procedure is to grab the closest bigger
>>>>> firm if one exists, and just the biggest available if all are smaller)
>>>>> 3. assign the matched firm-observation the same "quarters since
>>>>> issue" as the IPO being matched
>>>>> 4. rbind them all into the "matching" dataset.
>>>>>
>>>>> The function I currently have is pasted below, for your reference.
>>>>> Is there any way to make it produce the same result but much
>>>>> faster? Specifically, I am guessing eliminating some loops would be
>>>>> very good, but I don't see how, since I need to do some fancy
>>>>> footwork for each IPO in each quarter to find the matching firm.
>>>>> I'll be doing a few things similar to this, so it's somewhat
>>>>> important to up the efficiency of this. Maybe some of you R-fu
>>>>> masters can clue me in? :)
>>>>>
>>>>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>>>>
>>>>> ========== my function below ===========
>>>>>
>>>>> fcn_create_nonissuing_match_by_quarterssinceissue =
>>>>> function(tfdata, quarters_since_issue=40) {
>>>>>
>>>>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix
>>>>> is cheaper, so typecast the result to matrix
>>>>>
>>>>> colnames = names(tfdata)
>>>>>
>>>>> quarterends = sort(unique(tfdata$DATE))
>>>>>
>>>>> for (aquarter in quarterends) {
>>>>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>>>>
>>>>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>>>>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
>>>>> & (tfdata_quarter$IPO.Flag == 0), ]
>>>>> tfdata_quarter_ipoissuers = tfdata_quarter[
>>>>> tfdata_quarter$IPO.Flag == 1, ]
>>>>>
>>>>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>>>>> arow = tfdata_quarter_ipoissuers[i,]
>>>>> industrypeers = tfdata_quarter_fitting_nonissuers[
>>>>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>>>>> industrypeers = industrypeers[
>>>>> order(industrypeers$Market.Cap.13f), ]
>>>>> if ( nrow(industrypeers) > 0 ) {
>>>>> if (
>>>>> nrow(industrypeers[industrypeers$Market.Cap.13f >=
>>>>> arow$Market.Cap.13f, ]) > 0 ) {
>>>>> bestpeer =
>>>>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>>>>> ][1,]
>>>>> }
>>>>> else {
>>>>> bestpeer = industrypeers[nrow(industrypeers),]
>>>>> }
>>>>> bestpeer$Quarters.Since.IPO.Issue =
>>>>> arow$Quarters.Since.IPO.Issue
>>>>>
>>>>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>>>>> bestpeer$PERMNO] = 1
>>>>> result = rbind(result, as.matrix(bestpeer))
>>>>> }
>>>>> }
>>>>> #result = rbind(result, tfdata_quarter)
>>>>> print (aquarter)
>>>>> }
>>>>>
>>>>> result = as.data.frame(result)
>>>>> names(result) = colnames
>>>>> return(result)
>>>>>
>>>>> }
>>>>>
>>>>> ========= end of my function =============
>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help_at_r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
------------------------------ Message: 114 Date: Fri, 06 Jun 2008 18:15:23 -0400 From: Esmail Bonakdarian <esmail.js_at_gmail.com> Subject: Re: [R] R + Linux To: steven wilson <swpt07_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849B6FB.2020902@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed FWIW, those who are curious about Linux but are not willing or ready to abandon the Windows platform can now very easily try out Ubuntu without having to repartition their hard drive. Wubi is a project that installs Ubuntu under Windows so that it can be uninstalled easily and requires no messing around with hard drive partitions. From the Wubi web site: "Wubi is an officially supported Ubuntu installer for Windows users that can bring you to the Linux world with a single click. Wubi allows you to install and uninstall Ubuntu as any other Windows application, in a simple and safe way. Are you curious about Linux and Ubuntu? Trying them out has never been easier!" For more information see: http://wubi-installer.org/ Esmail ------------------------------ Message: 115 Date: Fri, 6 Jun 2008 15:19:55 -0700 From: Horace Tso <Horace.Tso_at_pgn.com> Subject: [R] FW: R + Linux To: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <D49782AAF0ACCD4B836AA8D7D40BF417C30FEDE68F@APEXMAIL.corp.dom> Content-Type: text/plain; charset="us-ascii" I'll add my $0.02 as I've just gone thru a (painful) transition to Linux. In my case Ubuntu didn't quite work for reason I'm still not sure (must be hardware + driver issue). I eventually put on opensuse 10.3 and installed R in an rpm pkgage on the command line. Getting R in was not simple. I got errors that complain about not finding BLAS and a couple other things I forgot. And you can't install packages by install.packages() on the prompt. I had to download them in tar.gz and then install. However, once the priliminaries are out of the way it seems to work just fine. I have rkward because Tinn-R is not available on linux and that works just fine, so far. H (Sorry folks.....it's Friday aftnoon....) -----Original Message----- From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of steven wilson Sent: Friday, June 06, 2008 11:14 AM To: r-help_at_r-project.org Subject: [R] R + Linux Dear all; I'm planning to install Linux on my computer to run R (I'm bored of W..XP). However, I haven't used Linux before and I would appreciate, if possible, suggestions/comments about what could be the best option install, say Fedora, Ubuntu or OpenSuse which to my impression are the most popular ones (at least on the R-help lists). The computer is a PC desktop with 4GB RAM and Intel Quad-Core Xeon processor and will be used only to run R. Thanks Steven ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 116 Date: Fri, 6 Jun 2008 15:48:10 -0700 From: Don MacQueen <macq_at_llnl.gov> Subject: Re: [R] Improving data processing efficiency To: Daniel Folkinshteyn <dfolkins_at_gmail.com>, r-help_at_r-project.org Message-ID: <p0623090ec46f6b55f10e@[128.115.92.33]> Content-Type: text/plain; charset="us-ascii" ; format="flowed" In a case like this, if you can possibly work with matrices instead of data frames, you might get significant speedup. (More accurately, I have had situations where I obtained speed up by working with matrices instead of dataframes.) Even if you have to code character columns as numeric, it can be worth it. Data frames have overhead that matrices do not. (Here's where profiling might have given a clue) Granted, there has been recent work in reducing the overhead associated with dataframes, but I think it's worth a try. Carrying along extra columns and doing row subsetting, rbinding, etc, means a lot more things happening in memory. So, for example, if all of your matching is based just on a few columns, extract those columns, convert them to a matrix, do all the matching, and then based on some sort of row index retrieve all of the associated columns. -Don At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote:
>Hi everyone!
>
>I have a question about data processing efficiency.
>
>My data are as follows: I have a data set on quarterly institutional
>ownership of equities; some of them have had recent IPOs, some have
>not (I have a binary flag set). The total dataset size is 700k+ rows.
>
>My goal is this: For every quarter since issue for each IPO, I need
>to find a "matched" firm in the same industry, and close in market
>cap. So, e.g., for firm X, which had an IPO, i need to find a
>matched non-issuing firm in quarter 1 since IPO, then a (possibly
>different) non-issuing firm in quarter 2 since IPO, etc. Repeat for
>each issuing firm (there are about 8300 of these).
>
>Thus it seems to me that I need to be doing a lot of data selection
>and subsetting, and looping (yikes!), but the result appears to be
>highly inefficient and takes ages (well, many hours). What I am
>doing, in pseudocode, is this:
>
>1. for each quarter of data, getting out all the IPOs and all the
>eligible non-issuing firms.
>2. for each IPO in a quarter, grab all the non-issuers in the same
>industry, sort them by size, and finally grab a matching firm
>closest in size (the exact procedure is to grab the closest bigger
>firm if one exists, and just the biggest available if all are
>smaller)
>3. assign the matched firm-observation the same "quarters since
>issue" as the IPO being matched
>4. rbind them all into the "matching" dataset.
>
>The function I currently have is pasted below, for your reference.
>Is there any way to make it produce the same result but much faster?
>Specifically, I am guessing eliminating some loops would be very
>good, but I don't see how, since I need to do some fancy footwork
>for each IPO in each quarter to find the matching firm. I'll be
>doing a few things similar to this, so it's somewhat important to up
>the efficiency of this. Maybe some of you R-fu masters can clue me
>in? :)
>
>I would appreciate any help, tips, tricks, tweaks, you name it! :)
>
>========== my function below ===========
>
>fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>quarters_since_issue=40) {
>
> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>cheaper, so typecast the result to matrix
>
> colnames = names(tfdata)
>
> quarterends = sort(unique(tfdata$DATE))
>
> for (aquarter in quarterends) {
> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>
> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>(tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue)
>& (tfdata_quarter$IPO.Flag == 0), ]
> tfdata_quarter_ipoissuers = tfdata_quarter[
>tfdata_quarter$IPO.Flag == 1, ]
>
> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
> arow = tfdata_quarter_ipoissuers[i,]
> industrypeers = tfdata_quarter_fitting_nonissuers[
>tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
> industrypeers = industrypeers[
>order(industrypeers$Market.Cap.13f), ]
> if ( nrow(industrypeers) > 0 ) {
> if (
>nrow(industrypeers[industrypeers$Market.Cap.13f >=
>arow$Market.Cap.13f, ]) > 0 ) {
> bestpeer =
>industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f,
>][1,]
> }
> else {
> bestpeer = industrypeers[nrow(industrypeers),]
> }
> bestpeer$Quarters.Since.IPO.Issue =
>arow$Quarters.Since.IPO.Issue
>
>#tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>bestpeer$PERMNO] = 1
> result = rbind(result, as.matrix(bestpeer))
> }
> }
> #result = rbind(result, tfdata_quarter)
> print (aquarter)
> }
>
> result = as.data.frame(result)
> names(result) = colnames
> return(result)
>
>}
>
>========= end of my function =============
>
>______________________________________________
>R-help_at_r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
-- -------------------------------------- Don MacQueen Environmental Protection Department Lawrence Livermore National Laboratory Livermore, CA, USA 925-423-1062 ------------------------------ Message: 117 Date: Fri, 6 Jun 2008 17:55:04 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Daniel Folkinshteyn" <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <f8e6ff050806061555k4d8b5947vc73e5bc50c419cff@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 5:10 PM, Daniel Folkinshteyn <dfolkins_at_gmail.com> wrote:
> Hmm... ok... so i ran the code twice - once with a preallocated result,
> assigning rows to it, and once with a nrow=0 result, rbinding rows to it,
> for the first 20 quarters. There was no speedup. In fact, running with a
> preallocated result matrix was slower than rbinding to the matrix:
>
> for preallocated matrix:
> Time difference of 1.577779 mins
>
> for rbinding:
> Time difference of 1.498628 mins
>
> (the time difference only counts from the start of the loop til the end, so
> the time to allocate the empty matrix was /not/ included in the time count).
>
> So, it appears that rbinding a matrix is not the bottleneck. (That it was
> actually faster than assigning rows could have been a random anomaly (e.g.
> some other process eating a bit of cpu during the run?), or not - at any
> rate, it doesn't make an /appreciable/ difference.
Why not try profiling? The profr package provides an alternative display that I find more helpful than the default tools: install.packages("profr") library(profr) p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...)) plot(p) That should at least help you see where the slow bits are. Hadley -- http://had.co.nz/ ------------------------------ Message: 118 Date: Fri, 06 Jun 2008 18:59:02 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: Don MacQueen <macq_at_llnl.gov> Cc: r-help_at_r-project.org Message-ID: <4849C136.3080608@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed thanks for the suggestions! I'll play with this over the weekend and see what comes out. :) on 06/06/2008 06:48 PM Don MacQueen said the following:
> In a case like this, if you can possibly work with matrices instead of
> data frames, you might get significant speedup.
> (More accurately, I have had situations where I obtained speed up by
> working with matrices instead of dataframes.)
> Even if you have to code character columns as numeric, it can be worth it.
>
> Data frames have overhead that matrices do not. (Here's where profiling
> might have given a clue) Granted, there has been recent work in reducing
> the overhead associated with dataframes, but I think it's worth a try.
> Carrying along extra columns and doing row subsetting, rbinding, etc,
> means a lot more things happening in memory.
>
> So, for example, if all of your matching is based just on a few columns,
> extract those columns, convert them to a matrix, do all the matching,
> and then based on some sort of row index retrieve all of the associated
> columns.
>
> -Don
>
> At 2:09 PM -0400 6/5/08, Daniel Folkinshteyn wrote:
>> Hi everyone!
>>
>> I have a question about data processing efficiency.
>>
>> My data are as follows: I have a data set on quarterly institutional
>> ownership of equities; some of them have had recent IPOs, some have
>> not (I have a binary flag set). The total dataset size is 700k+ rows.
>>
>> My goal is this: For every quarter since issue for each IPO, I need to
>> find a "matched" firm in the same industry, and close in market cap.
>> So, e.g., for firm X, which had an IPO, i need to find a matched
>> non-issuing firm in quarter 1 since IPO, then a (possibly different)
>> non-issuing firm in quarter 2 since IPO, etc. Repeat for each issuing
>> firm (there are about 8300 of these).
>>
>> Thus it seems to me that I need to be doing a lot of data selection
>> and subsetting, and looping (yikes!), but the result appears to be
>> highly inefficient and takes ages (well, many hours). What I am doing,
>> in pseudocode, is this:
>>
>> 1. for each quarter of data, getting out all the IPOs and all the
>> eligible non-issuing firms.
>> 2. for each IPO in a quarter, grab all the non-issuers in the same
>> industry, sort them by size, and finally grab a matching firm closest
>> in size (the exact procedure is to grab the closest bigger firm if one
>> exists, and just the biggest available if all are smaller)
>> 3. assign the matched firm-observation the same "quarters since issue"
>> as the IPO being matched
>> 4. rbind them all into the "matching" dataset.
>>
>> The function I currently have is pasted below, for your reference. Is
>> there any way to make it produce the same result but much faster?
>> Specifically, I am guessing eliminating some loops would be very good,
>> but I don't see how, since I need to do some fancy footwork for each
>> IPO in each quarter to find the matching firm. I'll be doing a few
>> things similar to this, so it's somewhat important to up the
>> efficiency of this. Maybe some of you R-fu masters can clue me in? :)
>>
>> I would appreciate any help, tips, tricks, tweaks, you name it! :)
>>
>> ========== my function below ===========
>>
>> fcn_create_nonissuing_match_by_quarterssinceissue = function(tfdata,
>> quarters_since_issue=40) {
>>
>> result = matrix(nrow=0, ncol=ncol(tfdata)) # rbind for matrix is
>> cheaper, so typecast the result to matrix
>>
>> colnames = names(tfdata)
>>
>> quarterends = sort(unique(tfdata$DATE))
>>
>> for (aquarter in quarterends) {
>> tfdata_quarter = tfdata[tfdata$DATE == aquarter, ]
>>
>> tfdata_quarter_fitting_nonissuers = tfdata_quarter[
>> (tfdata_quarter$Quarters.Since.Latest.Issue > quarters_since_issue) &
>> (tfdata_quarter$IPO.Flag == 0), ]
>> tfdata_quarter_ipoissuers = tfdata_quarter[
>> tfdata_quarter$IPO.Flag == 1, ]
>>
>> for (i in 1:nrow(tfdata_quarter_ipoissuers)) {
>> arow = tfdata_quarter_ipoissuers[i,]
>> industrypeers = tfdata_quarter_fitting_nonissuers[
>> tfdata_quarter_fitting_nonissuers$HSICIG == arow$HSICIG, ]
>> industrypeers = industrypeers[
>> order(industrypeers$Market.Cap.13f), ]
>> if ( nrow(industrypeers) > 0 ) {
>> if ( nrow(industrypeers[industrypeers$Market.Cap.13f
>> >= arow$Market.Cap.13f, ]) > 0 ) {
>> bestpeer =
>> industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ][1,]
>> }
>> else {
>> bestpeer = industrypeers[nrow(industrypeers),]
>> }
>> bestpeer$Quarters.Since.IPO.Issue =
>> arow$Quarters.Since.IPO.Issue
>>
>> #tfdata_quarter$Match.Dummy.By.Quarter[tfdata_quarter$PERMNO ==
>> bestpeer$PERMNO] = 1
>> result = rbind(result, as.matrix(bestpeer))
>> }
>> }
>> #result = rbind(result, tfdata_quarter)
>> print (aquarter)
>> }
>>
>> result = as.data.frame(result)
>> names(result) = colnames
>> return(result)
>>
>> }
>>
>> ========= end of my function =============
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------ Message: 119 Date: Fri, 6 Jun 2008 18:02:36 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] color scale mapped to B/W To: "Michael Friendly" <friendly_at_yorku.ca> Cc: R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <f8e6ff050806061602k76f3b54ched7e6f732dfc09eb@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 3:37 PM, Michael Friendly <friendly_at_yorku.ca> wrote:
> In an R graphic, I'm using
>
> cond.col <- c("green", "yellow", "red")
> to represent a quantitative variable, where green means 'OK', yellow
> represents 'warning'
> and red represents 'danger'. Using these particular color names, in B/W, red
> is darkest
> and yellow is lightest. I'd like to find color designations to replace
> yellow and green so
> that when printed in B/W, the yellowish color appears darker than the
> greenish one.
An alternative approach would be to convert the colours into Luv, adjust luminance appropriately and then convert back: cond.col <- c("green", "yellow", "red") col <- col2rgb(cond.col) col.Luv <- convertColor(t(col), "sRGB", "Luv") rownames(col.Luv) <- cond.col col.Luv[, "L"] <- c(8000, 6000, 8000) t(convertColor(col.Luv, "Luv", "sRGB")) However, that doesn't actually seem to work - the back-transformed colours are the same as the original. Hadley -- http://had.co.nz/ ------------------------------ Message: 120 Date: Fri, 06 Jun 2008 19:10:51 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: hadley wickham <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <4849C3FB.9090600@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed on 06/06/2008 06:55 PM hadley wickham said the following:
> Why not try profiling? The profr package provides an alternative
> display that I find more helpful than the default tools:
>
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
[[elided Yahoo spam]] ------------------------------ Message: 121 Date: Sat, 7 Jun 2008 01:23:21 +0200 (CEST) From: Achim Zeileis <Achim.Zeileis_at_wu-wien.ac.at> Subject: Re: [R] color scale mapped to B/W To: Michael Friendly <friendly_at_yorku.ca> Cc: R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <Pine.LNX.4.64.0806070110130.3742@paninaro.stat-math.wu-wien.ac.at> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, Michael Friendly wrote:
> In an R graphic, I'm using
>
> cond.col <- c("green", "yellow", "red")
> to represent a quantitative variable, where green means 'OK', yellow
> represents 'warning'
> and red represents 'danger'. Using these particular color names, in B/W, red
> is darkest
> and yellow is lightest. I'd like to find color designations to replace
> yellow and green so
> that when printed in B/W, the yellowish color appears darker than the
> greenish one.
>
> Is there some tool/code I can use to find these? i.e., something to display a
> grid
> of color swatches with color codes/names I can look at in color and B/W to
> decide?
You could look at colors in HCL (i.e., polar LUV). For example, you could choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and a yellow somewhere in between. To emulate what happens when you print that out, you just set the chroma to zero. There are some functions helpful for that in "vcd", you could do ## load package library("vcd") ## select colors from dark red to light green c1 <- heat_hcl(3, h = c(0, 120), c = c(90, 70), l = c(40, 90), power=1.7) c2 <- heat_hcl(3, h = c(0, 120), c = 0, l = c(40, 90), power=1.7) ## visualize in color and grayscale emulation plot(-1, -1, xlim = c(0, 1), ylim = c(0, 2), axes = FALSE) rect(0:2/3, 0, 1:3/3, 1, col = c1, border = "transparent") rect(0:2/3, 1, 1:3/3, 2, col = c2, border = "transparent") The ideas underlying this color choice are described in this report we've written together with Kurt and Paul: http://epub.wu-wien.ac.at/dyn/openURL?id=oai:epub.wu-wien.ac.at:epub-wu-01_c87 hth, Z
> thanks,
>
>
> --
> Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology
> Dept.
> York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT M3J 1P3 CANADA
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
------------------------------ Message: 122 Date: Fri, 6 Jun 2008 17:27:09 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: Re: [R] color scale mapped to B/W To: "Michael Friendly" <friendly_at_yorku.ca>, "R-Help" <r-help_at_stat.math.ethz.ch> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6BE1A51@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii Try this (you need tcl 8.5 and the TeachingDemos package): library(teachingDemos) tmpplot <- function(col1='red', col2='yellow', col3='green'){ plot(1:10,1:10, type='n') rect(1,1,4,4, col=col1) rect(1,4,4,7, col=col2) rect(1,7,4,10, col=col3) rect(6,1,9,4, col=col2grey(col1)) rect(6,4,9,7, col=col2grey(col2)) rect(6,7,9,10, col=col2grey(col3)) text(5, c(2,5,8), c(col1, col2, col3)) } cols <- colors()[ -c( 152:253, 260:361) ] tkexamp( tmpplot(), list(col1=list('combobox', values=cols, init='red'), col2=list('combobox',values=cols, init='yellow'), col3=list('combobox',values=cols, init='green') ) ) Hope it helps, ________________________________________ From: r-help-bounces_at_r-project.org [r-help-bounces_at_r-project.org] On Behalf Of Michael Friendly [friendly_at_yorku.ca] Sent: Friday, June 06, 2008 2:37 PM To: R-Help Subject: [R] color scale mapped to B/W In an R graphic, I'm using cond.col <- c("green", "yellow", "red") to represent a quantitative variable, where green means 'OK', yellow represents 'warning' and red represents 'danger'. Using these particular color names, in B/W, red is darkest and yellow is lightest. I'd like to find color designations to replace yellow and green so that when printed in B/W, the yellowish color appears darker than the greenish one. Is there some tool/code I can use to find these? i.e., something to display a grid of color swatches with color codes/names I can look at in color and B/W to decide? t hanks, -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 123 Date: Fri, 06 Jun 2008 19:35:13 -0400 From: Daniel Folkinshteyn <dfolkins_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: hadley wickham <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <4849C9B1.4060308@gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers... i suppose i'll try using matrices and see how that stacks up (since all my cols are numeric, should be a problem-free approach). but i'm really wondering if there isn't some neat vectorized approach i could use to avoid at least one of the nested loops... ------------------------------ Message: 124 Date: Fri, 6 Jun 2008 17:41:15 -0600 From: "Greg Snow" <Greg.Snow_at_imail.org> Subject: Re: [R] color scale mapped to B/W To: "Michael Friendly" <friendly_at_yorku.ca>, "R-Help" <r-help_at_stat.math.ethz.ch> Message-ID: <B37C0A15B8FB3C468B5BC7EBC7DA14CC60F6BE1A52@LP-EXMBVS10.CO.IHC.COM> Content-Type: text/plain; charset=us-ascii You may also want to look at the "show.colors" function in the "DAAG" package to get candidate colors. ________________________________________ From: r-help-bounces_at_r-project.org [r-help-bounces_at_r-project.org] On Behalf Of Michael Friendly [friendly_at_yorku.ca] Sent: Friday, June 06, 2008 2:37 PM To: R-Help Subject: [R] color scale mapped to B/W In an R graphic, I'm using cond.col <- c("green", "yellow", "red") to represent a quantitative variable, where green means 'OK', yellow represents 'warning' and red represents 'danger'. Using these particular color names, in B/W, red is darkest and yellow is lightest. I'd like to find color designations to replace yellow and green so that when printed in B/W, the yellowish color appears darker than the greenish one. Is there some tool/code I can use to find these? i.e., something to display a grid of color swatches with color codes/names I can look at in color and B/W to decide? t hanks, -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 125 Date: Fri, 06 Jun 2008 19:45:43 -0400 From: Esmail Bonakdarian <esmail.js_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: hadley wickham <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849CC27.6020506@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed hadley wickham wrote:
>
Hi, I tried this suggestion as I am curious about bottlenecks in my own R code ...
> Why not try profiling? The profr package provides an alternative
> display that I find more helpful than the default tools:
>
> install.packages("profr")
> install.packages("profr") Warning message: package ?profr? is not available > any ideas? Thanks, Esmail
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
------------------------------ Message: 126 Date: Fri, 6 Jun 2008 16:46:35 -0700 From: Horace Tso <Horace.Tso_at_pgn.com> Subject: Re: [R] Improving data processing efficiency To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: "r-help_at_r-project.org" <r-help_at_r-project.org> Message-ID: <D49782AAF0ACCD4B836AA8D7D40BF417C30FEDE6C1@APEXMAIL.corp.dom> Content-Type: text/plain; charset="us-ascii" Daniel, allow me to step off the party line here for a moment, in a problem like this it's better to code your function in C and then call it from R. You get vast amount of performance improvement instantly. (From what I see the process of recoding in C should be quite straight forward.) H. -----Original Message----- From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Daniel Folkinshteyn Sent: Friday, June 06, 2008 4:35 PM To: hadley wickham Cc: r-help_at_r-project.org; Patrick Burns Subject: Re: [R] Improving data processing efficiency
> install.packages("profr")
> library(profr)
> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
> plot(p)
>
> That should at least help you see where the slow bits are.
>
> Hadley
>
so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the biggest timesuckers... i suppose i'll try using matrices and see how that stacks up (since all my cols are numeric, should be a problem-free approach). but i'm really wondering if there isn't some neat vectorized approach i could use to avoid at least one of the nested loops... ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ------------------------------ Message: 127 Date: Fri, 06 Jun 2008 20:09:41 -0400 From: Esmail Bonakdarian <esmail.js_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: hadley wickham <h.wickham_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849D1C5.7090109@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Esmail Bonakdarian wrote:
> hadley wickham wrote:
>>
>
> Hi,
>
> I tried this suggestion as I am curious about bottlenecks in my own
> R code ...
>
>> Why not try profiling? The profr package provides an alternative
>> display that I find more helpful than the default tools:
>>
>> install.packages("profr")
>
> > install.packages("profr")
> Warning message:
> package ?profr? is not available
I selected a different mirror in place of the Iowa one and it worked. Odd, I just assumed all the same packages are available on all mirrors. ------------------------------ Message: 128 Date: Sat, 7 Jun 2008 08:12:05 +0800 From: ronggui <ronggui.huang_at_gmail.com> Subject: [R] Problem of installing Matrix To: R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <38b9f0350806061712l4b9bc55eoa9c5e4ca7c343e19@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 [wincent_at_PC-BSD]export MAKE=gmake [wincent_at_PC-BSD]sudo R .....
> install.packages("Matrix")
--- Please select a CRAN mirror for use in this session --- Loading Tcl/Tk interface ... done trying URL 'http://bibs.snu.ac.kr/R/src/contrib/Matrix_0.999375-9.tar.gz' Content type 'application/x-gzip' length 1483674 bytes (1.4 Mb) opened URL ================================================== downloaded 1.4 Mb /usr/local/lib/R/library * Installing *source* package 'Matrix' ... ** libs ** arch - "Makefile", line 8: Need an operator "Makefile", line 13: Need an operator "Makefile", line 16: Need an operator "Makefile", line 27: Need an operator "Makefile", line 29: Need an operator "Makefile", line 31: Need an operator make: fatal errors encountered -- cannot continue ERROR: compilation failed for package 'Matrix' ** Removing '/usr/local/lib/R/library/Matrix' The downloaded packages are in /tmp/Rtmpq3enyj/downloaded_packages Updating HTML index of packages in '.Library' Warning message: In install.packages("Matrix") : installation of package 'Matrix' had non-zero exit status
>

> sessionInfo()
R version 2.6.1 (2007-11-26) i386-portbld-freebsd7.0 locale: zh_CN.eucCN/zh_CN.eucCN/C/C/C/C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] rcompgen_0.1-17 tcltk_2.6.1 tools_2.6.1
> R.version
_ platform i386-portbld-freebsd7.0 arch i386 os freebsd7.0 system i386, freebsd7.0 status major 2 minor 6.1 year 2007 month 11 day 26 svn rev 43537 language R version.string R version 2.6.1 (2007-11-26) -- HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/ Bachelor of Social Work, Fudan University, China Master of sociology, Fudan University, China Ph.D. Candidate, CityU of HK. ------------------------------ Message: 129 Date: Fri, 6 Jun 2008 17:23:32 -0700 From: "Charles C. Berry" <cberry_at_tajo.ucsd.edu> Subject: Re: [R] Improving data processing efficiency To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org, Patrick Burns <pburns_at_pburns.seanet.com> Message-ID: <Pine.LNX.4.64.0806061647020.29398@tajo.ucsd.edu> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, Daniel Folkinshteyn wrote:
>> install.packages("profr")
>> library(profr)
>> p <- profr(fcn_create_nonissuing_match_by_quarterssinceissue(...))
>> plot(p)
>>
>> That should at least help you see where the slow bits are.
>>
>> Hadley
>>
> so profiling reveals that '[.data.frame' and '[[.data.frame' and '[' are the
> biggest timesuckers...
>
> i suppose i'll try using matrices and see how that stacks up (since all my
> cols are numeric, should be a problem-free approach).
>
> but i'm really wondering if there isn't some neat vectorized approach i could
> use to avoid at least one of the nested loops...
>
As far as a vectorized solution, I'll bet you could do ALL the lookups of non-issuers for all issuers with a single call to findInterval() (modulo some cleanup afterwards) , but the trickery needed to do that would make your code a bit opaque. And in the end I doubt it would beat mapply() (read on...) by enough to make it worthwhile. --- What you are doing is conditional on industry group and quarter. So using indus.quarter <- with(tfdat, paste(as.character(DATE), as.character(HSICIG), sep="."))) and then calls like this: split( <various> , indus.quater[ relevant.subset ] ) you can create: a list of all issuer market caps according to quarter and group, a list of all non-issuer caps (that satisfy your 'since quarter' restriction) according to quarter and group, a list of all non issuer indexes (i.e. row numbers) that satisfy that restriction according to quarter and group Then you write a function that takes the elements of each list for a given quarter-industry group, looks up the matching non-issuers for each issuer, and returns their indexes. findInterval() will allow you to do this lookup for all issuers in one industry group in a given quarter simultaneously and greatly speed this process (but you will need to deal with the possible non-uniqueness of the non-issuer caps - perhaps by adding a tiny jitter() to the values). Then you feed the function and the lists to mapply(). The result is a list of indexes on the original data.frame. You can unsplit() this if you like, then use those indexes to build your final "result" data.frame. HTH, Chuck p.s. and if this all seems like too much work, you should at least avoid needlessly creating data.frames. Specifically reorder things so that industrypeers = <etc> is only done ONCE for each industry group by quarter combination and change stuff like nrow(industrypeers[industrypeers$Market.Cap.13f >= arow$Market.Cap.13f, ]) > 0 to any( industrypeers$Market.Cap.13f >= arow$Market.Cap.13f )
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry_at_tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ------------------------------ Message: 130 Date: Fri, 6 Jun 2008 21:32:24 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] Improving data processing efficiency To: "Esmail Bonakdarian" <esmail.js_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <f8e6ff050806061932i2180298sbec7a9d41abbd3d1@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1
>> > install.packages("profr")
>> Warning message:
>> package 'profr' is not available
>
> I selected a different mirror in place of the Iowa one and it
> worked. Odd, I just assumed all the same packages are available
> on all mirrors.
The Iowa mirror is rather out of date as the guy who was looking after it passed away. Hadley -- http://had.co.nz/ ------------------------------ Message: 131 Date: Fri, 6 Jun 2008 21:34:39 -0500 From: "hadley wickham" <h.wickham_at_gmail.com> Subject: Re: [R] color scale mapped to B/W To: "Achim Zeileis" <Achim.Zeileis_at_wu-wien.ac.at> Cc: Michael Friendly <friendly_at_yorku.ca>, R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <f8e6ff050806061934u3702b69bmd275d71c848db864@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 6:23 PM, Achim Zeileis <Achim.Zeileis_at_wu-wien.ac.at> wrote:
> On Fri, 6 Jun 2008, Michael Friendly wrote:
>
>> In an R graphic, I'm using
>>
>> cond.col <- c("green", "yellow", "red")
>> to represent a quantitative variable, where green means 'OK', yellow
>> represents 'warning'
>> and red represents 'danger'. Using these particular color names, in B/W,
>> red is darkest
>> and yellow is lightest. I'd like to find color designations to replace
>> yellow and green so
>> that when printed in B/W, the yellowish color appears darker than the
>> greenish one.
>>
>> Is there some tool/code I can use to find these? i.e., something to
>> display a grid
>> of color swatches with color codes/names I can look at in color and B/W to
>> decide?
>
> You could look at colors in HCL (i.e., polar LUV). For example, you could
> choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and a
> yellow somewhere in between.
How did you get to those numbers? I seem to remember there being someway to convert rgb to hcl, but I can't find it. Hadley -- http://had.co.nz/ ------------------------------ Message: 132 Date: Fri, 06 Jun 2008 23:22:30 -0400 From: "john.polo" <jpolo_at_mail.usf.edu> Subject: Re: [R] editing a data.frame To: Daniel Folkinshteyn <dfolkins_at_gmail.com> Cc: r-help_at_r-project.org Message-ID: <4849FEF6.1070803@mail.usf.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Daniel Folkinshteyn wrote:
> works for me:
> > sub('1.00', '1', '1.00E-20')
> [1] "1E-20"
when i input what you wrote, i get the same result. but that doesn't change the value for TreeTag at row 1501, it's just floating around in space. if i try it for yr1bp$TreeTag[1501], which is 1.00E-20 i get this: > yr1bp$TreeTag[1501]<-sub("1.00", "1", yr1bp$TreeTag[1501]) Warning message: In `[<-.factor`(`*tmp*`, 1501, value = "1E-20") : invalid factor level, NAs generated and then 1501 turns into: 1501 <NA> 2001 adult 32.5 which is less useful than the way it was originally input. thanks for the suggestion. john
> finally, if all your target strings are of the form 1.00E-20, you
> could sub the whole thing with a more general regexp:
>
> sub("([0-9])(\.[0-9]{2})(.*)", "\\1\\3", yourvector)
> (it matches a digit, followed by a dot and two digits, followed by
> "anything else", and takes out the "dot and two digits" bit in the
> replacement, in the whole vector.)
thanks for that suggestion. it could come in handy.
> on 06/06/2008 03:25 PM john.polo said the following:
>> dear R users,
>>
>> the data frame (read in from a csv) looks like this:
>> TreeTag Census Stage DBH
>> 1 CW-W740 2001 juvenile 5.8
>> 2 CW-W739 2001 juvenile 4.3
>> 3 CW-W738 2001 juvenile 4.7
>> 4 CW-W737 2001 juvenile 5.4
>> 5 CW-W736 2001 juvenile 7.4
>> 6 CW-W735 2001 juvenile 5.4
>> ...
>> 1501 1.00E-20 2001 adult 32.5
>>
>> i would like to change values under the TreeTag column. as the last
>> value shows, some of the tags have decimals followed by 2 decimal
>> places. i just want whole numbers, i.e. not 1.00E-20, but 1E-20. i
>> have a rough understanding of regexp and grepped all the positions
>> that have the inappropriate tags. i tried sub() a couple of different
>> ways, like
>> yr1bp$TreeTag[1501]<-sub("1.00", "1", yr1bp$TreeTag[1501])
>> and after turning yr1bp$TreeTag[1501] into <NA>,
>> yr1bp$TreeTag[1501]<-sub("", "1E-20", yr1pb$TreeTag[1501])
>> and
>> sub("", "1E-20", yr1bp$TreeTag[1501])
>> but it's not working. i guess it has something to do with the
>> data.frame characteristics i'm not aware of or don't understand.
>> would i somehow have to tear apart the columns, edit them, and then
>> put it back together? not that i know how to do that, but i'm
>> wondering out loud.
>>
>> john
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
------------------------------ Message: 133 Date: Fri, 6 Jun 2008 21:56:24 -0700 (PDT) Subject: [R] error message with dat To: r-help_at_r-project.org Message-ID: <317894.77066.qm@web46105.mail.sp1.yahoo.com> Content-Type: text/plain Hello everyone, I have two problems which I am unable to solve : 1.I am trying to add the row labels (g1-g2000) to the very left of a data table. The data table is 2000 rows by 62 columns.I have used the following code. read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\colon cancer1.txt",header=T,row.names=1) rowname(dat) <- paste("g", c(1:nrow(dat)), sep="") file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\colon cancer1.txt") The error message I get is "error in nrow(dat):object "dat" not found 2.I am also trying to populate a scatter plot with data from two columns which are 2000 values long.I have tried the following code: read.table(file="C:\\Documents and Settings\\Owner\\My Documents\\colon cancer1.txt",header=T,row.names=1) file.show(file="C:\\Documents and Settings\\Owner\\My Documents\\colon cancer1..txt") plot(50,1500,type='p',xlab='normal',ylab='tumor',main='Tumor sample vs.Normal Sample-2000genes') plot(50,1500,type='p',xlab='normal1',ylab='normal2',main='Two normal samples--first 20 genes',pch=15,col='blue') plot(dat(,1), dat(,2)) I get the following error message "error in plot (dat(,1),dat(,2) could not find function dat I am not sure how I am suppossed to use function dat that is where and how to define it to the table? Any help would be appreciated. Paul [[alternative HTML version deleted]] ------------------------------ Message: 134 Date: Sat, 7 Jun 2008 06:00:25 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] Problem of installing Matrix To: ronggui <ronggui.huang_at_gmail.com> Cc: R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <alpine.LFD.1.10.0806070557360.28358@gannet.stats.ox.ac.uk> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>From the DESCRIPTION for Matrix:
SystemRequirements: GNU make Presumably you have a BSD make on your FreeBSD system. This has come up before, and FreeBSD users have succeeded with GNU make. On Sat, 7 Jun 2008, ronggui wrote:
> [wincent@PC-BSD]export MAKE=gmake
> [wincent_at_PC-BSD]sudo R
> .....
>> install.packages("Matrix")
> --- Please select a CRAN mirror for use in this session ---
> Loading Tcl/Tk interface ... done
> trying URL 'http://bibs.snu.ac.kr/R/src/contrib/Matrix_0.999375-9.tar.gz'
> Content type 'application/x-gzip' length 1483674 bytes (1.4 Mb)
> opened URL
> ==================================================
> downloaded 1.4 Mb
>
> /usr/local/lib/R/library
> * Installing *source* package 'Matrix' ...
> ** libs
> ** arch -
> "Makefile", line 8: Need an operator
> "Makefile", line 13: Need an operator
> "Makefile", line 16: Need an operator
> "Makefile", line 27: Need an operator
> "Makefile", line 29: Need an operator
> "Makefile", line 31: Need an operator
> make: fatal errors encountered -- cannot continue
> ERROR: compilation failed for package 'Matrix'
> ** Removing '/usr/local/lib/R/library/Matrix'
>
> The downloaded packages are in
> /tmp/Rtmpq3enyj/downloaded_packages
> Updating HTML index of packages in '.Library'
> Warning message:
> In install.packages("Matrix") :
> installation of package 'Matrix' had non-zero exit status
>>
>
>> sessionInfo()
> R version 2.6.1 (2007-11-26)
> i386-portbld-freebsd7.0
>
> locale:
> zh_CN.eucCN/zh_CN.eucCN/C/C/C/C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] rcompgen_0.1-17 tcltk_2.6.1 tools_2.6.1
>
>> R.version
> _
> platform i386-portbld-freebsd7.0
> arch i386
> os freebsd7.0
> system i386, freebsd7.0
> status
> major 2
> minor 6.1
> year 2007
> month 11
> day 26
> svn rev 43537
> language R
> version.string R version 2.6.1 (2007-11-26)
>
> --
> HUANG Ronggui, Wincent http://ronggui.huang.googlepages.com/
> Bachelor of Social Work, Fudan University, China
> Master of sociology, Fudan University, China
> Ph.D. Candidate, CityU of HK.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ------------------------------ Message: 135 Date: Sat, 7 Jun 2008 01:39:51 -0400 (EDT) From: Rebecca Sela <rsela_at_stern.nyu.edu> Subject: [R] Predicting a single observatio using LME To: r-help <r-help_at_r-project.org> Message-ID: <32407705.667441212817191240.JavaMail.root@calliope.stern.nyu.edu> Content-Type: text/plain; charset=utf-8 When I use a model fit with LME, I get an error if I try to use "predict" with a dataset consisting of a single line. For example, using this data:
> simpledata
Y t D ID 1 -1.464740870 1 0 1 2 1.222911373 2 0 1 3 -0.605996798 3 0 1 4 0.155692707 4 0 1 5 3.849619772 1 0 2 6 4.289213902 2 0 2 7 2.369407737 3 0 2 8 2.249052533 4 0 2 9 0.920044316 1 0 3 10 2.003262622 2 0 3 11 0.003833438 3 0 3 12 1.578300927 4 0 3 13 -0.842322442 1 1 4 14 -0.657256158 2 1 4 15 1.504491575 3 1 4 16 2.896007045 4 1 4 17 0.990505440 1 1 5 18 2.722942793 2 1 5 19 4.395861278 3 1 5 20 4.849296475 4 1 5 21 3.049616421 1 1 6 22 2.874405962 2 1 6 23 4.359511097 3 1 6 24 6.165419699 4 1 6 This happened:
> testLME <- lme(Y~t+D,data=simpledata,random=~1|ID)
> predict(testLME, simpledata[1,])
Error in val[revOrder, level + 1] : incorrect number of dimensions This has occurred with other datasets as well. Is this a bug in the code, or am I doing something wrong? (Also, is there a way to parse a formula of a type given to "random"? For example, given ~1+t|ID, I'd like to be able to extract all the variable names to the left of | and to the right of |, the way one can with a normal formula.) Thanks in advance! Rebecca ------------------------------ Message: 136 Date: Sat, 7 Jun 2008 03:25:55 -0300 From: Reid Tingley <r_tingley_at_hotmail.com> Subject: [R] expected risk from coxph (survival) To: <r-help_at_r-project.org> Message-ID: <BAY122-W2997A32B5EA5B347FFA87586B60@phx.gbl> Content-Type: text/plain Hello, When I try to to obtain the expected risk for a new dataset using coxph in the survival package I get an error. Using the example from ?coxph:
> test1 <- list(time= c(4, 3,1,1,2,2,3),+ status=c(1,NA,1,0,1,1,0),+ x= c(0, 2,1,1,1,0,0),+ sex= c(0, 0,0,0,1,1,1))> cox<-coxph( Surv(time, status) ~ x + strata(sex), test1) #stratified model> > new<-list(time= c(5, 1,1,2,2,4,3),+ status=c(1,NA,1,0,0,1,1),+ x= c(0, 2,1,1,1,0,0),+ sex= c(0, 0,0,0,1,1,1))> > predict(cox,new,type="expected")Error in predict.coxph(cox, new, type = "expected") : Method not yet finished
I assume that this is something that has simply not yet been incorporated into the survival package. Does anyone know of a way to calculate the expected risk for a new data set? Is this even possible? I would appreciate any help that you could give me. Cheers, Reid _________________________________________________________________ [[alternative HTML version deleted]] ------------------------------ Message: 137 Date: Fri, 6 Jun 2008 14:30:56 -0700 (PDT) From: RobertsLRRI <raymond.roberts_at_ncf.edu> Subject: [R] txt file, 14000+ rows, only last 8000 appear To: r-help_at_r-project.org Message-ID: <17701519.post@talk.nabble.com> Content-Type: text/plain; charset=us-ascii when I load my data file in txt format into the R workstation I lose about 6000 rows, this is a problem. Is there a limit to the display capabilities for the workstation? is all the information there and I just can't see the first couple thousand rows? -- View this message in context: http://www.nabble.com/txt-file%2C-14000%2B-rows%2C-only-last-8000-appear-tp17701519p17701519.html Sent from the R help mailing list archive at Nabble.com. ------------------------------ Message: 138 Date: Fri, 6 Jun 2008 16:14:35 -0700 (PDT) Subject: [R] functions for high dimensional integral To: r-help_at_r-project.org Message-ID: <17702978.post@talk.nabble.com> Content-Type: text/plain; charset=us-ascii I need to compute a high dimensional integral. Currently I'm using the function adapt in R package adapt. But this method is kind of slow to me. I'm wondering if there are other solutions. Thanks. Zhongwen -- View this message in context: http://www.nabble.com/functions-for-high-dimensional-integral-tp17702978p17702978.html Sent from the R help mailing list archive at Nabble.com. ------------------------------ Message: 139 Date: Sat, 7 Jun 2008 05:58:26 +0200 From: "Mathieu Prevot" <mathieu.prevot_at_ens.fr> Subject: [R] compilation failed on MacOSX.5 / icc 10.1 / ifort 10.1 / R 2.7.0 To: r-help_at_R-project.org Message-ID: <3e473cc60806062058h36500ae3g670c88ecaf687c5b@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Hi, I got the following problem when I type make. The error is not enough verbose to me so I can find the problem. Please cc me, I'm not subscribed. Thanks, Mathieu --------------------------- make[4]: `vfonts.so' is up to date. building system startup profile building package 'base' all.R is unchanged ../../../library/base/R/base is unchanged dyld: lazy symbol binding failed: Symbol not found: _Rf_ScalarString Referenced from: /Users/mathieuprevot/PRE/R-2.7.0/lib/libR.dylib Expected in: dynamic lookup dyld: Symbol not found: _Rf_ScalarString Referenced from: /Users/mathieuprevot/PRE/R-2.7.0/lib/libR.dylib Expected in: dynamic lookup /bin/sh: line 1: 26329 Done cat ./makebasedb.R 26330 Trace/BPT trap | R_DEFAULT_PACKAGES=NULL LC_ALL=C ../../../bin/R --vanilla --slave > /dev/null make[3]: *** [all] Error 133 make[2]: *** [R] Error 1 make[1]: *** [R] Error 1 make: *** [R] Error 1 ------------------------------ Message: 140 Date: Sat, 7 Jun 2008 08:15:48 +0000 (UTC) From: Dieter Menne <dieter.menne_at_menne-biomed.de> Subject: Re: [R] expected risk from coxph (survival) To: r-help_at_stat.math.ethz.ch Message-ID: <loom.20080607T081341-321@post.gmane.org> Content-Type: text/plain; charset=us-ascii Reid Tingley <r_tingley <at> hotmail.com> writes:
> When I try to to obtain the expected risk for a new dataset using coxph in the
survival package I get an error.
> Using the example from ?coxph:
# Example rewritten by DM; please do not use HTML mail library(survival) test1 <- list(time= c(4, 3,1,1,2,2,3), status=c(1,NA,1,0,1,1,0), x= c(0, 2,1,1,1,0,0), sex= c(0,0,0,0,1,1,1)) cox<-coxph( Surv(time, status) ~ x + strata(sex), test1) #stratified model new<-list(time= c(5, 1,1,2,2,4,3), status=c(1,NA,1,0,0,1,1), x= c(0, 2,1,1,1,0,0), sex= c(0,0,0,0,1,1,1)) predict(cox,new,type="expected") # } # else if (type == "expected") { # if (missing(newdata)) # pred <- y[, ncol(y)] - object$residuals # else stop("Method not yet finished") Looks like this is "by design"; see the code above. You might try to use cph and predict.Design from Frank Harrell's Design package instead. Dieter ------------------------------ Message: 141 Date: Sat, 7 Jun 2008 09:18:43 +0100 From: "Paul Smith" <phhs80_at_gmail.com> Subject: Re: [R] txt file, 14000+ rows, only last 8000 appear To: r-help_at_r-project.org Message-ID: <6ade6f6c0806070118nfbd437eo8aa214af16b52de3@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jun 6, 2008 at 10:30 PM, RobertsLRRI <raymond.roberts_at_ncf.edu> wrote:
>
> when I load my data file in txt format into the R workstation I lose about
> 6000 rows, this is a problem. Is there a limit to the display capabilities
> for the workstation? is all the information there and I just can't see the
> first couple thousand rows?
> --
> View this message in context: http://www.nabble.com/txt-file%2C-14000%2B-rows%2C-only-last-8000-appear-tp17701519p17701519.html
Does nrow(your_data.frame) return the correct number of rows? If so, R read all lines. Paul ------------------------------ Message: 142 Date: Sat, 7 Jun 2008 10:21:58 +0200 (CEST) From: Achim Zeileis <Achim.Zeileis_at_wu-wien.ac.at> Subject: Re: [R] color scale mapped to B/W To: hadley wickham <h.wickham_at_gmail.com> Cc: Michael Friendly <friendly_at_yorku.ca>, R-Help <r-help_at_stat.math.ethz.ch> Message-ID: <Pine.LNX.4.64.0806071011450.10440@paninaro.stat-math.wu-wien.ac.at> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, hadley wickham wrote:
> On Fri, Jun 6, 2008 at 6:23 PM, Achim Zeileis
> <Achim.Zeileis_at_wu-wien.ac.at> wrote:
>> On Fri, 6 Jun 2008, Michael Friendly wrote:
>>
>>> In an R graphic, I'm using
>>>
>>> cond.col <- c("green", "yellow", "red")
>>> to represent a quantitative variable, where green means 'OK', yellow
>>> represents 'warning'
>>> and red represents 'danger'. Using these particular color names, in B/W,
>>> red is darkest
>>> and yellow is lightest. I'd like to find color designations to replace
>>> yellow and green so
>>> that when printed in B/W, the yellowish color appears darker than the
>>> greenish one.
>>>
>>> Is there some tool/code I can use to find these? i.e., something to
>>> display a grid
>>> of color swatches with color codes/names I can look at in color and B/W to
>>> decide?
>>
>> You could look at colors in HCL (i.e., polar LUV). For example, you could
>> choose a dark red HCL = (0, 90, 40) and a light green (120, 70, 90) and a
>> yellow somewhere in between.
>
> How did you get to those numbers?

>From scratch:
- hues 0 and 120 because Michael wanted red and green - luminances 40 and 90 for sequential colors from dark to light - chromas 90 and 70 for two reasons: only small differences in chroma seemed necessary, and 90 and 70 are close to the maximal values given the other two coordinates for each color.
> I seem to remember there being
> someway to convert rgb to hcl, but I can't find it.
I always use "colorspace" for that. See example("polarLUV", package = "colorspace") Best, Z
> Hadley
>
>
> --
> http://had.co.nz/
>
>
------------------------------ Message: 143 Date: Sat, 7 Jun 2008 08:29:27 +0000 (UTC) From: Dieter Menne <dieter.menne_at_menne-biomed.de> Subject: Re: [R] Predicting a single observatio using LME To: r-help_at_stat.math.ethz.ch Message-ID: <loom.20080607T082734-878@post.gmane.org> Content-Type: text/plain; charset=us-ascii Rebecca Sela <rsela <at> stern.nyu.edu> writes:
>
> When I use a model fit with LME, I get an error if I try to use "predict" with
a dataset consisting of a single line.
>
> For example, using this data:
> > simpledata
> Y t D ID
> 23 4.359511097 3 1 6
> 24 6.165419699 4 1 6
>
> This happened:
> > testLME <- lme(Y~t+D,data=simpledata,random=~1|ID)
> > predict(testLME, simpledata[1,])
> Error in val[revOrder, level + 1] : incorrect number of dimensions
>
> This has occurred with other datasets as well. Is this a bug in the code, or
am I doing something wrong? No, this looks like a bug due to dimension-dropping when using one row. Probably nobody used it with one value before. As a workaround, do some cheating predict(testLME, simpledata[c(1,2),]) Dieter ------------------------------ Message: 144 Date: Sat, 7 Jun 2008 08:36:05 +0000 (UTC) From: Dieter Menne <dieter.menne_at_menne-biomed.de> Subject: Re: [R] lsmeans To: r-help_at_stat.math.ethz.ch Message-ID: <loom.20080607T083356-644@post.gmane.org> Content-Type: text/plain; charset=us-ascii John Fox <jfox <at> mcmaster.ca> writes:
> I intend at some point to extend the effects package to linear and
> generalized linear mixed-effects models, probably using lmer() rather
> than lme(), but as you discovered, it doesn't handle these models now.
>
> It wouldn't be hard, however, to do the computations yourself, using
> the coefficient vector for the fixed effects and a suitably constructed
> model-matrix to compute the effects; you could also get standard errors
> by using the covariance matrix for the fixed effects.
>

>> Douglas Bates:
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2007q2/000222.html
>>
My big problem with lsmeans is that I have never been able to understand how they should be calculated and, more importantly, why one should want to calculate them. In other words, what do lsmeans represent and why should I be interested in these particular values?
>>
Truly Confused, torn apart by the Masters Dieter ------------------------------ Message: 145 Date: Sat, 7 Jun 2008 10:56:13 +0100 (BST) From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk> Subject: Re: [R] functions for high dimensional integral Cc: r-help_at_r-project.org Message-ID: <alpine.LFD.1.10.0806071049170.10617@gannet.stats.ox.ac.uk> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Fri, 6 Jun 2008, ZT2008 wrote:
> I need to compute a high dimensional integral. Currently I'm using the
> function adapt in R package adapt. But this method is kind of slow to me.
> I'm wondering if there are other solutions. Thanks.
What does 'high' mean? Numerical quadrature will be slow in more than a handful of dimensions. What to recommend depends on what you know about the function -- Evans & Swartz (2000) 'Approximating Integrals via Monte Carlo and Deterministic Methods' is a good reference on integration for statisticians. But accurate evaluation of an integral in more than 2 or 3 dimensions is potentially a very computer-intensive task -- people spend days of CPU time using e.g. MCMC to do just that.
> Zhongwen
-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ------------------------------ _______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. End of R-help Digest, Vol 64, Issue 7 ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Mon 09 Jun 2008 - 06:32:23 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 09 Jun 2008 - 15:30:41 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive