[R] Regression lines for differently-sized groups on the same plot

From: Laura M Marx <marxlau1_at_msu.edu>
Date: Wed 20 Jul 2005 - 09:53:23 EST


Hi there,
  I've looked through the very helpful advice about adding fitted lines to plots in the r-help archive, and can't find a post where someone has offered a solution for my specific problem. I need to plot logistic regression fits from three differently-sized data subsets on a plot of the entire dataset. A description and code are below:
  I have an unbalanced dataset consisting of three different species (hem, yb, and sm), with unequal numbers of wood pieces in each species group. I am trying to generate a plot that will show the size of the wood piece on the X axis, the probability of it having tree seedlings growing on it on the Y (a binomial yes or no variable), and three fitted curves showing how the probability of having tree seedlings changes with increasing wood piece size for each species.
  I have no problem generating fits using GLM, and no problem creating the plot. However, if I try to add a fitted curve based only on the hem data subset to a plot that shows the entire dataset, I get an error message that the lengths of those data sets differ. "Error in xy.coords(x,y) : x and y lengths differ". I could see R's point -- you can't plot a regression line of babies born as a function of stork abundance on a graph of cherries produced (Y) versus rainfall (X), which for all the program knows, I'm trying to do. As a temporary fix, I added NAs to the end of the hem, yb, and sm subsets to make them the same length as the entire dataset. I can now add my fitted curves to the plot, but the lines are not connected. That is, if the hem group only contains wood pieces that are 1, 4, and 10 meters long, the plot has an X axis that ranges from 1 to 10, but line segments for the hem group regression line only appear above 1, 4, and 10. How can I fix this? An ideal solution would not require me to make the hem subset of my data the same length as the full dataset, either (although the summaries of regressions with the NAs (or zeroes) added and taken away are identical). I'd also settle for a work-around that would have R connect the pieces of the curve so that I get a solid line rather than small dots and dashes where actual data exist. Thanks so much for your help!   Laura Marx
  Michigan State University, Dept. of Forestry

#Note: hemdata has all the rows that are not hemlock species replaced with
#"NA"s.

hemhem=glm(hempresence~logarea, family=binomial(logit), data=hemdata) hemyb=glm(hempresence~logarea, family=binomial(logit), data=birchdata) hemsm=glm(hempresence~logarea, family=binomial(logit), data=mapledata)

attach(logreg) #logreg is the full dataset plot(logarea, hempresence, xlab = "Surface area of log (m2)", ylab="Probability of hemlock seedling presence", type="n", font.lab=2, cex.lab=1.5, axes=TRUE)

lines(logarea,fitted(hemhem), lty=1, lwd=2)
lines(logarea,fitted(hemyb), lty="dashed", lwd=2)
lines(logarea,fitted(hemsm), lty="dotted", lwd=2)

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed Jul 20 09:57:34 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:50 EST