[R] Example function for bigglm (biglm) data input from file

From: Yeh, Richard C <richard.c.yeh_at_bankofamerica.com>
Date: Mon 22 Jan 2007 - 19:01:53 GMT


This is to submit a commented example function for use in the data argument to the bigglm(biglm) function, when you want to read the data from a file (instead of a URL), or rescale or modify the data before fitting the model. In the hope that this may be of help to someone out there.

make.data <- function (filename, chunksize, ...) {   conn<-NULL;
  function (reset=FALSE) {
    if (reset) {

      if (!is.null(conn)) {
        close(conn);
      };
      # This is for a file.
      # For other methods, see: help("connections")
      # and replace the following definition of conn
      # (and possibly the read.table call).
      conn <<- file (description=filename, open="r");
    } else {
      # It's best that the file you use has no header 
      # line, because when you use the connection to 
      # read each excerpt, any header won't get re-read.
      # If you choose to skip the first line, then the 
      # first line of each excerpt will be skipped.
      rval <- read.table (conn, nrows=chunksize, 
        skip=0, header=FALSE,...);
      if (nrow(rval)==0) {

# Then we have reached the end of the input.
# Clean up:
close(conn); conn<<-NULL; rval<-NULL; } else {
# We did not reach the end of the input,
# so this function will return data.
# Here, you can define any derived fields
# or put instructions to rescale input data
# that you want done after the data are read
# but before they are used for fitting.
# For example:
rval$rescaled_column <- rval$original_column / 1000000.0;
# If you don't want to do anything like this,
# then delete this "else" clause, and make
# the end of the function resemble the URL
# example in bigglm.
};

    return(rval);
    }
  }
};

a <- make.data ( filename = "myfile", chunksize = 1000000,

  # In our definition of make.data, any remaining 
  # arguments get passed to the read.table function by 
  # the ... argument.
  # Define column types:

  colClasses = list ("character", "character",     "integer", "numeric", "numeric"),
  # Define the column names in the call:   # (recall that we cannot rely on the file header)   col.names = c("fromState", "toState",
    "first", "original_column", "second") );

library(biglm);

bigglm (formula = toState ~ 1 + first + rescaled_column,   data = a, family = binomial(link='logit'),   weights = ~second);

summary(.Last.value)

NOTICE TO RECIPIENTS: Any information contained in or attach...{{dropped}}



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Tue Jan 23 06:08:18 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 22 Jan 2007 - 20:30:38 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.