Re: [R] GLM/GAM and unobserved heterogeneity

From: Spencer Graves <spencer.graves_at_pdf.com>
Date: Thu 25 Aug 2005 - 14:08:28 EST

          Have you considered "lmer" in library(lme4)? See for example sec/ 4 pm "Two-level models for binary data" in vignette("MlmSoftRev") wiht library(mlmRev) in addition to www.r-project.org -> "Documentation: Newsletter" -> "R News Volume 5/1" -> "Fitting Linear Mixed Models in R" by Doug Bates, pp. 27-30.

          If you have more questions after reviewing this material please submit another question, preferably following the posting guide! "http://www.R-project.org/posting-guide.html". The posting guide is not just another symbol of burocracy. It was written to try to help questioners improve the chances that they will get the information they want quickly. I believe it is quite effective when it is used. Many people get answers to their questions in minutes, but that requires a question that a potential respondent can understand and formulate a sensible answer in seconds.

          spencer graves

Kyle G. Lundstedt wrote:

> Hello,
> I'm interested in correcting for and measuring unobserved
> heterogeneity ("missing variables") using R. In particular, I'm
> searching for a simple way to measure the amount of unobserved
> heterogeneity remaining in a series of increasingly complex models
> (adding additional variables to each new model) on the same data.
> I have a static database of 400,000 or so individual mortgage
> loans, each of which is observed monthly from origination (t=0) until
> termination (a binary yes/no variable). In my update database, there
> are up to 60 months of observed data for each loan in the static
> database, and an individual loan has an "average life" of roughly 36
> months.
> Each loan has static covariates observed at origination, such as
> original loan amount and credit score, as well as time-varying
> covariates (TVC) such as age, interest rates, and house prices.
> Because these TVC change each month, I've constructed a modeling
> database that merges the static database with the update database.
> The resulting "loan-month" modeling database has one observation
> for every loan-month, and the static covariates remain the same for
> all loan-months for a given loan. Thus, the modeling database has
> roughly 14.4 million loan-month records. A loan is considered
> "active" as long as it has not yet terminated or been censored; my
> interest is in predicting termination.
> This type of data is often referred to as "event history" or
> "discrete hazard" data. The standard R package to apply to such data
> is "survival", with which I could estimate a Cox proportional hazard
> model using coxph. The advantage of such an approach is that
> unobserved heterogeneity is easily addressed using the "frailty" term.
> The disadvantages, at least for my purposes, are two-fold.
> First, my audience is unfamiliar with hazard models. Second, my
> monthly data has many "ties" (many terminations in the same month),
> so I've been told that coxph won't work well on a large dataset with
> many ties.
> On the other hand, because the data is measured discretely each
> month, many references suggest applying generalized linear models
> (GLM, "logit"-type models) or even generalized addivitive models
> (GAM, "logit"-type models that incorporate nonlinearity in individual
> covariates). The advantage to this approach is that GLM and GAM are
> readily available in R, and my audience is very familiar with logit-
> type models.
> The disadvantage, however, is that I am totally unfamiliar with
> ways to correct for and measure unobserved heterogeneity using GLM/
> GAM-type models. I've been told that unobserved heterogeneity in the
> hazard framework is analogous to random effects in the GLM/GAM
> framework, but there seem to be a number of R packages that address
> this issue in different ways.
> So, I'd greatly appreciate suggestions on a simple way to
> incorporate unobserved heterogeneity into a GLM/GAM-type model. I'm
> not much of a statistician, so simple examples are always helpful.
> I'm also happy to track down specific article/book references, if
> folks think those might be of help.
>
> Many thanks,
> Kyle
> ---
> kyle at hotmail . com
> (email altered in obvious ways)
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA

spencer.graves@pdf.com
www.pdf.com <http://www.pdf.com>
Tel:  408-938-4420
Fax: 408-280-7915

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Thu Aug 25 14:12:50 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:50:31 EST