Re: [R] ANCOVA error again

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Mon, 21 Apr 2008 17:08:45 +0100

On Mon, 2008-04-21 at 17:21 +0200, Birgit Lemcke wrote:
> Hello Gavin,
>
> thanks for you answer.
> If I use it without " with" I get back the same error.
> The "with" thing was only to try out for functions that do not
> contain a data-argument. I still try to learn and therefor I
> sometimes just try.

OK, as per Prof. Ripley's off-list reply to us both, the R developers and helpeRs on the list can't diagnose and fix the segfault without a reproducible example, or the data and the exact code to reproduce the segfault. R shouldn't segfault so this is something that could potentially be fixed, but not without a reproducible example.

>
> It is understood that I am on the way to simplify the model once I
> have it for the hole slot.
> I don`t wanna predict the gender.
> I would like to know which of my variables are the strongest to
> divide all into the already existing groups: male and female.
> In the case that all my variables would be continuous, I could have
> probably used a discriminant function analysis, but most of the
> variables are categorical.

That is what I meant --- you /are/ trying to predict sex, it is known and you want to find rules that allow you to assign unknowns to one of the two sexes. Discriminants analysis (LDA) is one technique in the broad topic of classification (not to be confused with clustering; ecologists often call "clustering" "classification"), or supervised learning. Here categorical variables can be handled just fine using classification trees.

A good introduction from the ecologists point of view is:

CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS Glenn De'ath and Katharina E. Fabricius Ecology Volume 81, Issue 11 (November 2000) pp. 3178–3192 DOI: 10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2 And, in the same journal, the use of randomForests is introduced in: Cutler et al (2007) RANDOM FORESTS FOR CLASSIFICATION IN ECOLOGY,Ecology 88(11) 2783-2792. DOI: 10.1890/07-0539.1 Then take a look at Andy Liaw and Matthew Wiener. Classification and regression by randomForest. R News, 2(3):18-22, December 2002, for an intro to using randomForest in R if you want to give that a try. See the variable importance example in that newsletter for one approach that could be used instead of your multiple testing idea.

You might also want to take a look at:

BOOSTED TREES FOR ECOLOGICAL MODELING AND PREDICTION Glenn De'ath Ecology Volume 88, Issue 1 (January 2007) pp. 243–251 DOI: 10.1890/0012-9658(2007)88[243:BTFEMA]2.0.CO;2
>
> My plan is to delete in each case, one of the interacting variables
> and then compare the models with the left over variables using a
> ChiSquare test.

That sounds like the definition of data dredging to me ;-)

>
> But I am always open for suggestions, because I am still not very
> good in statistics.
>
> Presently I still have the same error message and don`t know how to
> fix this.

Unless you know C and the R internals very well, you can't fix this. You can try a different approach, such as the classification/supervised learning one I provide references for. You are on a hiding to nothing if you proceed with your current approach...

All the best,

G

>
> Greets
>
> B.
>
> Am 21.04.2008 um 16:52 schrieb Gavin Simpson:
> > On Mon, 2008-04-21 at 15:43 +0200, Birgit Lemcke wrote:
> >> Hello R users!
> >>
> >> I got again an error message.
> >
> > Something here is causing compiled code to segfault ("crash"). I don't
> > know what the problem is here exactly --- I'll let those much more
> > acquainted with R look into that --- but you seem to be using R's
> > model
> > formulae in a non-standard way.
> >
> > You don't need with() wrapping your call to glm(), just include a data
> > frame as the data argument:
> >
> > ModelFemMal85 <- glm(Sex ~ .^2, data = FemMal85_Sex,
> > na.action = na.exclude, family = binomial)
> >
> > Will do what you appear to have attempted below (all main effects plus
> > first order interactions). This is a simpler call so see if this will
> > work in R without causing the segfault.
> >
> > However, I would consider what on earth you are going to do with
> > such a
> > huge number of coefficients in the model --- over 3500 if I
> > interpretted
> > your formula correctly and assuming that the variables are all
> > continuous. You do have many, many more than 3500 observations?
> >
> > If you are trying to predict the sex of individuals, why not try
> > some of
> > the classification techniques available in R? A simple technique would
> > be a classification tree (packages rpart and party for example). These
> > will help with feature selection and do include interactions,
> > though not
> > in exactly the same way you have done so here. Bagging, boosting or
> > randomForests could be used to improve predictions (or make them more
> > stable). Check out the Machine Learning and Environmetrics Task Views
> > for additional info and pointers to relevant R packages/functions.
> >
> > My two pennies worth,
> >
> > G
> >
> >>
> >> I used this code:
> >>
> >> with (FemMal85_Sex, {
> >> ModelFemMal85<- glm
> >> (Sex~outLatTep_like_other*outLatTep_like_conduplicate*outLatTep_keele
> >> d_w
> >> inged*spathellae_conspicuous*spathellae_inconspicuous_absent
> >>
> >> *InfSpath_persistence*InfSpath_caducuous*bractsSpacing_lax*bractsSpac
> >> ing
> >> _imbricate*InfType_sparsely_paniculate*InfType_racemose*InfType_panic
> >> ula
> >> te*InfType_globose*bracApexShape_truncate
> >> *bracApexShape_rounded
> >> *bracApexShape_obtuse
> >> *bracApexShape_acute
> >> *bracApexShape_acuminate
> >> *bracApexShape_apiculate
> >> *bracApexShape_aciculate
> >> *BracUpperMarg_like_rest*BracUpperMarg_memebranous*BracUpperMarg_hone
> >> yco
> >> mbed_cells*InfSpathText_coriaceous*InfSpathText_hyaline*InfSpathText_
> >> cha
> >> rtaceous*InfSpathText_cartilaginous*InfSpathText_membranous*spikShape
> >> Sid
> >> e_linear*spikShapeSide_oblong*spikShapeSide_square*spikShapeSide_elli
> >> pti
> >> cal*spikShapeSide_ovate*spikShapeSide_obovate*spikShapeSide_obtriangu
> >> lar
> >> *spikShapeSide_orbicular*spikShapeSide_undifferentiated*SpikApexShape
> >> _tr
> >> uncate*SpikApexShape_rounded*SpikApexShape_obtuse*SpikApexShape_acute
> >> *Sp
> >> ikApexShape_undifferentiated*BracShape_linear*BracShape_oblong*BracSh
> >> ape
> >> _square*BracShape_elliptical*BracShape_ovate*BracShape_obovate*BracSh
> >> ape
> >> _orbicular*BracText_bony*BracText_coriaceous*BracText_hyline*BracText
> >> _ch
> >> artaceous*BracText_cartilaginous
> >> *BracText_membranous
> >> *BracText_centrChartaceousMargMembranous
> >> *TepText_bony*TepText_coriaceous*TepText_chartaceous
> >> *TepText_cartilaginous
> >> *TepText_membranous*InfLengthMin*InfLengthMax*InfWidthMin*InfWidthMax
> >> *Sp
> >> athellaeLengthMin*SpathellaeLengthMax*SpikLengthMin*SpikLengthMax*Flo
> >> wNu
> >> mbSpikMin*FlowNumbSpikMax*BracLengthMin*BracLengthMax*FlowLengthMin*F
> >> low
> >> LengthMax*InfSpathLengthToSpikMin*InfSpathLengthToSpikMax*TepInOutMin
> >> *Te
> >> pInOutMax*BracLengthtoFlowMin*BracLengthtoFlowMax*BracMargMin*BracMar
> >> gMa
> >> x*BracAwnToBodyMin*BracAwnToBodyMax,
> >> na.action=na.exclude,family=binomial)})
> >>
> >> and got this error message:
> >>
> >> *** caught segfault ***
> >> address 0xbf7fffb0, cause 'memory not mapped'
> >>
> >> Traceback:
> >> 1: terms.formula(formula, data = data)
> >> 2: terms(formula, data = data)
> >> 3: model.frame.default(formula = Sex ~ outLatTep_like_other *
> >> outLatTep_like_conduplicate *........... * BracAwnToBodyMax,
> >> drop.unused.levels = TRUE)
> >> 4: model.frame(formula = Sex ~ outLatTep_like_other *
> >> outLatTep_like_conduplicate *........... * BracAwnToBodyMax,
> >> drop.unused.levels = TRUE)
> >> 5: eval(expr, envir, enclos)
> >> 6: eval(mf, parent.frame())
> >> 7: glm(Sex ~ outLatTep_like_other * outLatTep_like_conduplicate
> >> *............* BracAwnToBodyMax, family = binomial)
> >> 8: eval.with.vis(expr, envir, enclos)
> >> 9: eval.with.vis(ei, envir)
> >> 10: source("/Users/birgitlemcke/Job/Doktorarbeit/R/Protokolle_Codes/
> >> Protokoll21.04.08.R")
> >>
> >> Possible actions:
> >> 1: abort (with core dump, if enabled)
> >> 2: normal R exit
> >> 3: exit R without saving workspace
> >> 4: exit R saving workspace
> >> Selection:
> >>
> >>
> >>
> >> ........... I deleted here some of the 85 variables
> >>
> >> What does this message mean?
> >>
> >> Thanks a lot in advance.
> >>
> >> B.
> >>
> >>
> >> Am 21.04.2008 um 14:50 schrieb John Fox:
> >>> Dear Brigit,
> >>>
> >>> My guess is that you forgot to specify the argument
> >>> family=binomial in
> >>> the call to glm().
> >>>
> >>> Had you included the commands that you used as well as the error
> >>> that
> >>> was produced, it wouldn't be necessary to guess.
> >>>
> >>> I hope this helps,
> >>> John
> >>>
> >>> On Mon, 21 Apr 2008 14:23:13 +0200
> >>> Birgit Lemcke <birgit.lemcke_at_systbot.uzh.ch> wrote:
> >>>> R version 2.6.2 PowerBook G4
> >>>>
> >>>> Hello R User,
> >>>>
> >>>> I try to perform an ANCOVA using the glm function.
> >>>> I have a dataset with continuous and categorical data (explanatory
> >>>> variables) and my response variable is also binary categorical.
> >>>>
> >>>> Fehler: NA/NaN/Inf in externem Funktionsaufruf (arg 4)
> >>>> Zusätzlich: Warning messages:
> >>>> 1: In Ops.factor(y, mu) : - nicht sinnvoll für Faktoren (makes no
> >>>> sense for factors)
> >>>> 2: In Ops.factor(eta, offset) : - nicht sinnvoll für Faktoren
> >>>> 3: In Ops.factor(y, mu) : - nicht sinnvoll für Faktoren
> >>>>
> >>>> My dataset contains NA`s but if I try to use na.exclude, I got the
> >>>> same Error message.
> >>>>
> >>>> I thought the function should use with my dataset. What am I doing
> >>>> wrong?
> >>>>
> >>>> Thanks in advance for your help.
> >>>>
> >>>> Birgit
> >>>>
> >>>>
> >>>> Birgit Lemcke
> >>>> Institut für Systematische Botanik
> >>>> Zollikerstrasse 107
> >>>> CH-8008 Zürich
> >>>> Switzerland
> >>>> Ph: +41 (0)44 634 8351
> >>>> birgit.lemcke_at_systbot.uzh.ch
> >>>>
> >>>> 175 Jahre UZH
> >>>> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> >>>> MNF-Jubiläumsevent für gross und klein.
> >>>> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> >>>> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> >>>> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
> >>>>
> >>>> ______________________________________________
> >>>> R-help_at_r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>> --------------------------------
> >>> John Fox, Professor
> >>> Department of Sociology
> >>> McMaster University
> >>> Hamilton, Ontario, Canada
> >>> http://socserv.mcmaster.ca/jfox/
> >>
> >> Birgit Lemcke
> >> Institut für Systematische Botanik
> >> Zollikerstrasse 107
> >> CH-8008 Zürich
> >> Switzerland
> >> Ph: +41 (0)44 634 8351
> >> birgit.lemcke_at_systbot.uzh.ch
> >>
> >> 175 Jahre UZH
> >> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> >> MNF-Jubiläumsevent für gross und klein.
> >> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> >> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> >> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
> >>
> >> ______________________________________________
> >> R-help_at_r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-
> >> guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> > Dr. Gavin Simpson [t] +44 (0)20 7679 0522
> > ECRC, UCL Geography, [f] +44 (0)20 7679 0565
> > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
> > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
> > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
>
> Birgit Lemcke
> Institut für Systematische Botanik
> Zollikerstrasse 107
> CH-8008 Zürich
> Switzerland
> Ph: +41 (0)44 634 8351
> birgit.lemcke_at_systbot.uzh.ch
>
> 175 Jahre UZH
> «staunen.erleben.begreifen. Naturwissenschaft zum Anfassen.»
> MNF-Jubiläumsevent für gross und klein.
> 19. April 2008, 10.00 Uhr bis 02.00 Uhr
> Campus Irchel, Winterthurerstrasse 190, 8057 Zürich
> Weitere Informationen http://www.175jahre.uzh.ch/naturwissenschaft
>
>
>
>
>
>

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Mon 21 Apr 2008 - 16:11:46 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Mon 21 Apr 2008 - 16:30:30 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive