Re: [R] Incorrect 'n' returned by survfit()

From: Thomas Lumley <>
Date: Sat 28 Oct 2006 - 17:28:35 GMT

On Wed, 25 Oct 2006, yongchuan wrote:

> I've a data set with 60000 rows of data representing 6000+ distinct loans. I did a coxph() regression on it (see call below), but a subsequent survfit() call on the coxph object is almost certainly wrong. It gives n=6 when it should be
> more like 6000+ (I think)
>> survfit(resultag)
> Call: survfit.coxph(object = resultag)
> n events median 0.95LCL 0.95UCL
> 6 489 Inf 2 Inf
> When I reduced the dataset to just 1000 rows, the survfit()
> call on the coxph object looks more correct.
>> survfit(resulting)
> Call: survfit.coxph(object = resulting)
> n events median 0.95LCL 0.95UCL
> 115 15 Inf Inf Inf
> Is there a limit to the size of the data set that I read in?
> Or am I just doing something silly above?
> (this is the coxph regression:
> resultag <- coxph(Surv(Start,Stop,PrepayDate)~modBalance + closingCoupon+lienPosition +originalFICO,table)

You may be misunderstanding the `n` column in the output. If you read the help for print.survfit you will find:

      The "number of observations" is not well-defined for counting
      process data. Previous versions of this code used the number at
      risk at the first time point. This is misleading if many
      individuals enter late or change strata. The original S code for
      the current version uses the number of records, which is
      misleading when the counting process data actually represent a
      fixed cohort with time-dependent covariates.

      Four possibilities are provided, controlled by 'print.n' or by
      'options(survfit.print.n)': '"none"' prints 'NA', '"records"'
      prints the number of records, '"start"' prints the number at the
      first time point and '"max"' prints the maximum number at risk.
      The initial default is '"start"'.


Thomas Lumley			Assoc. Professor, Biostatistics	University of Washington, Seattle

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. Received on Sun Oct 29 04:35:07 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Sat 28 Oct 2006 - 18:30:16 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.