Re: [R] GLM/GAM and unobserved heterogeneity

From: Spencer Graves <>
Date: Thu 25 Aug 2005 - 14:08:28 EST

          Have you considered "lmer" in library(lme4)? See for example sec/ 4 pm "Two-level models for binary data" in vignette("MlmSoftRev") wiht library(mlmRev) in addition to -> "Documentation: Newsletter" -> "R News Volume 5/1" -> "Fitting Linear Mixed Models in R" by Doug Bates, pp. 27-30.

          If you have more questions after reviewing this material please submit another question, preferably following the posting guide! "". The posting guide is not just another symbol of burocracy. It was written to try to help questioners improve the chances that they will get the information they want quickly. I believe it is quite effective when it is used. Many people get answers to their questions in minutes, but that requires a question that a potential respondent can understand and formulate a sensible answer in seconds.

          spencer graves

Kyle G. Lundstedt wrote:

> Hello,
> I'm interested in correcting for and measuring unobserved
> heterogeneity ("missing variables") using R. In particular, I'm
> searching for a simple way to measure the amount of unobserved
> heterogeneity remaining in a series of increasingly complex models
> (adding additional variables to each new model) on the same data.
> I have a static database of 400,000 or so individual mortgage
> loans, each of which is observed monthly from origination (t=0) until
> termination (a binary yes/no variable). In my update database, there
> are up to 60 months of observed data for each loan in the static
> database, and an individual loan has an "average life" of roughly 36
> months.
> Each loan has static covariates observed at origination, such as
> original loan amount and credit score, as well as time-varying
> covariates (TVC) such as age, interest rates, and house prices.
> Because these TVC change each month, I've constructed a modeling
> database that merges the static database with the update database.
> The resulting "loan-month" modeling database has one observation
> for every loan-month, and the static covariates remain the same for
> all loan-months for a given loan. Thus, the modeling database has
> roughly 14.4 million loan-month records. A loan is considered
> "active" as long as it has not yet terminated or been censored; my
> interest is in predicting termination.
> This type of data is often referred to as "event history" or
> "discrete hazard" data. The standard R package to apply to such data
> is "survival", with which I could estimate a Cox proportional hazard
> model using coxph. The advantage of such an approach is that
> unobserved heterogeneity is easily addressed using the "frailty" term.
> The disadvantages, at least for my purposes, are two-fold.
> First, my audience is unfamiliar with hazard models. Second, my
> monthly data has many "ties" (many terminations in the same month),
> so I've been told that coxph won't work well on a large dataset with
> many ties.
> On the other hand, because the data is measured discretely each
> month, many references suggest applying generalized linear models
> (GLM, "logit"-type models) or even generalized addivitive models
> (GAM, "logit"-type models that incorporate nonlinearity in individual
> covariates). The advantage to this approach is that GLM and GAM are
> readily available in R, and my audience is very familiar with logit-
> type models.
> The disadvantage, however, is that I am totally unfamiliar with
> ways to correct for and measure unobserved heterogeneity using GLM/
> GAM-type models. I've been told that unobserved heterogeneity in the
> hazard framework is analogous to random effects in the GLM/GAM
> framework, but there seem to be a number of R packages that address
> this issue in different ways.
> So, I'd greatly appreciate suggestions on a simple way to
> incorporate unobserved heterogeneity into a GLM/GAM-type model. I'm
> not much of a statistician, so simple examples are always helpful.
> I'm also happy to track down specific article/book references, if
> folks think those might be of help.
> Many thanks,
> Kyle
> ---
> kyle at hotmail . com
> (email altered in obvious ways)
> ______________________________________________
> mailing list
> PLEASE do read the posting guide!

Spencer Graves, PhD
Senior Development Engineer
PDF Solutions, Inc.
333 West San Carlos Street Suite 700
San Jose, CA 95110, USA <>
Tel:  408-938-4420
Fax: 408-280-7915

______________________________________________ mailing list
PLEASE do read the posting guide!
Received on Thu Aug 25 14:12:50 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 15:50:31 EST