Re: [Rd] Randomness not due to seed

From: Paul Johnson <pauljohn32_at_gmail.com>
Date: Mon, 25 Jul 2011 11:49:55 -0500

On Tue, Jul 19, 2011 at 8:13 AM, jeroen00ms <jeroen.ooms_at_stat.ucla.edu> wrote:
> I am working on a reproducible computing platform for which I would like to
> be able to _exactly_ reproduce an R object. However, I am experiencing
> unexpected randomness in some calculations. I have a hard time finding out
> exactly how it occurs. The code below illustrates the issue.
>
> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
>
> makelm <- function(){
>        return(lm(dist~speed, data=cars));
> }
>
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
>
> When inspecting both objects there seem to be some rounding differences.
> Setting a seed does not make a difference. Is there any way I can remove
> this randomness and exactly reproduce the object every time?
>

William Dunlap was correct. Observe in the sequence of comparisons below, the difference in the "terms" object is causing the identical to fail: Everything else associated with this model--the coefficients, the r-square, cov matrix, etc, exactly match.

> mylm1 <- lm(dist~speed, data=cars);
> mylm2 <- lm(dist~speed, data=cars);
> identical(mylm1, mylm2); #TRUE
[1] TRUE
> makelm <- function(){

+ return(lm(dist~speed, data=cars)); + }
> mylm1 <- makelm();
> mylm2 <- makelm();
> identical(mylm1, mylm2); #FALSE
[1] FALSE
> identical(coef(mylm1), coef(mylm2))

[1] TRUE
> identical(summary(mylm1), summary(mylm2))
[1] FALSE
> identical(coef(summary(mylm1)), coef(summary(mylm2)))
[1] TRUE
> all.equal(mylm1, mylm2)

[1] TRUE
> identical(summary(mylm1)$r.squared, summary(mylm2)$r.squared)
[1] TRUE
> identical(summary(mylm1)$adj.r.squared, summary(mylm2)$adj.r.squared)
[1] TRUE
> identical(summary(mylm1)$sigma, summary(mylm2)$sigma)
[1] TRUE
> identical(summary(mylm1)$fstatistic, summary(mylm2)$fstatistic)
[1] TRUE
> identical(summary(mylm1)$residuals, summary(mylm2)$residuals)
[1] TRUE
> identical(summary(mylm1)$cov.unscaled, summary(mylm2)$cov.unscaled)
[1] TRUE
> identical(summary(mylm1)$call, summary(mylm2)$call)
[1] TRUE
> identical(summary(mylm1)$terms, summary(mylm2)$terms)
[1] FALSE
> summary(mylm2)$terms

dist ~ speed

attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1

attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1b76ae0>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed

"numeric" "numeric"
>
> summary(mylm1)$terms

dist ~ speed
attr(,"variables")
list(dist, speed)
attr(,"factors")
      speed
dist      0
speed     1

attr(,"term.labels")
[1] "speed"
attr(,"order")
[1] 1
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x1cf06b8>
attr(,"predvars")
list(dist, speed)
attr(,"dataClasses")
     dist     speed

"numeric" "numeric"
-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas

______________________________________________
R-devel_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Received on Mon 25 Jul 2011 - 16:53:32 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 25 Jul 2011 - 17:30:12 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-devel. Please read the posting guide before posting to the list.

list of date sections of archive