# Re: [R] strange behavior of loess() & predict()

From: Hong Ooi <Hong.Ooi_at_iag.com.au>
Date: Wed 07 Dec 2005 - 10:41:14 EST

The problem appears to be in how your original data has several tied values:

> table(x)

x
1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 4   1 2 2 2 5 7 2 3 1 2 1

IIRC the maths and programming behind loess assume unique values for the predictor.

One way to get around this is to jitter your data:

> x2 <- jitter(x)
> modj <- loess(y ~ x2, span=.5, degree=1)
> predict(modj, data.frame(x=X))

```  3.156192 3.141705 3.126918 3.112996 3.101108 3.087696 3.063471 3.038609 3.024639 3.032585 3.059480 3.091774
 3.115763 3.117743 3.092979 3.040798 2.988283 2.957976 2.950648 3.008358 3.070065 3.127379 3.193501 3.149428
 3.082843 3.010998 2.939407 2.888213 2.841487 2.812815 2.801583 2.807181 2.837887 2.899130 2.978165 3.062088
 3.137995 3.204628 3.271813 3.339450 3.407396 3.475510 3.543843 3.612450 3.681267 3.750227 3.819267 3.888321
 3.957324 4.026212

```

Another way is to summarise your data using table() and aggregate(), and fit a weighted model where the weights are the counts for each unique x-value:

> dtab <- aggregate(data.frame(y=y), by=list(x=x), FUN=mean)
> dtab\$x <- as.numeric(as.character(dtab\$x))
> dtab\$w <- table(x)
> modt <- loess(y ~ x, span=.5, degree=1, weights=w, data=dtab)
> predict(modt, data.frame(x=X))

```  3.186959 3.163133 3.136244 3.110822 3.091396 3.076705 3.047705 3.018362 3.007143 3.032246 3.069599 3.092369
 3.098049 3.084134 3.053633 3.027429 3.012429 3.013908 3.036517 3.060372 3.076116 3.086870 3.095758 3.097287
 3.073824 3.031238 2.976659 2.917402 2.863489 2.821469 2.796398 2.793336 2.823850 2.892363 2.980322 3.068725
 3.140843 3.208920 3.279124 3.351965 3.427952 3.504330 3.577149 3.647119 3.714984 3.781486 3.847369 3.913375
 3.980249 4.048733

```

There's probably a way to make the aggregate and table calls neater.

```--
Hong Ooi
Senior Research Analyst, IAG Limited
388 George St, Sydney NSW 2000
+61 (2) 9292 1566
-----Original Message-----
From: r-help-bounces@stat.math.ethz.ch [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Leo Gürtler
Sent: Wednesday, 7 December 2005 8:10 AM
To: r-help@stat.math.ethz.ch
Cc: gavin.simpson@ucl.ac.uk
Subject: Re: [R] strange behavior of loess() & predict()

Gavin Simpson wrote:

Dear list,

I am very sorry for being inaccurate in my question. But re-reading the
predict.loess help site does not provide a solution. As long as predict
is used on a new dataset based on this dataset, the strange values
remain and can be reproduced.
Adding a new element to both vectors (at the beginning, e.g. "1" for
each vector) results in plausible values - but not in every case.
Even switching x and y is sufficient (i.e. x as predictor and y as
dependent variable). So my question is:

Is it normal - or under which conditions does it take place - that
predict.loess predicts values that are almost 20000/max(y) ~ 5000 times
higher than expected?

best,

leo gürtler

>On Tue, 2005-12-06 at 18:09 +0100, Leo Gürtler wrote:
>
>
>>Dear altogether,

>>
>>
><snip>
>
>
>># here is the difference!!
>>predict(mod, data.frame(x=X), se=TRUE)
>>predict(mod, x=X, se=TRUE)
>>
>>
>><--- end of snip --->
>>
>>I assume this has some reason but I do not understand this reason.
>>Merci,
>>
>>
>
>Not sure if this is the reason, but there is no argument x in
>predict.loess, and:
>
>a <- predict(mod, se = TRUE)
>
>gives you the same results as:
>
>b <- predict(mod, x=X, se=TRUE)
>
>so the x argument appears to be being passed on/in the ... arguments and
>ignored? As such, you have no newdata, so mod\$x is used.
>
>Now, when you do:
>
>c <- predict(mod, data.frame(x=X), se=TRUE)
>
>You have used an un-named argument in position 2. R takes this to be
>what you want to use for newdata and so works with this data rather than
>the one in mod\$x as in the first case:
>
># now named second argument - gets ignored as in a and b
>d <- predict(mod, x = data.frame(x=X), se=TRUE)
>
>all.equal(a, b) # TRUE
>all.equal(a, c) # FALSE
>all.equal(a, d) # TRUE
>
># this time we assign X to x by using (), the result is used as newdata
>e <-  predict(mod, (x=X), se=TRUE)
>
>all.equal(c, e) # TRUE
>
>If in doubt, name your arguments and check the help! ?predict.loess
>would have quickly shown you where the problem lay.
>
>HTH
>
>G
>
>
>
>>best regards
>>
>>leo gürtler
>>
>>______________________________________________
>>R-help@stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>
>>

--

email: leog@anicca-vijja.de
www: http://www.anicca-vijja.de/

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

_______________________________________________________________________________________

The information transmitted in this message and its attachme...{{dropped}}

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help