From: Anthony Mathelier <anthony.mathelier_at_gmail.com>

Date: Mon, 16 Jun 2008 18:14:40 +0200

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 16 Jun 2008 - 19:39:56 GMT

Date: Mon, 16 Jun 2008 18:14:40 +0200

OK, it seems like I do not succeed in expressing what I do, or want to do.
So, I give you the example that bring me to this kind of analysis. I wrote
the paper "Chromosomal periodicity of evolutionary conserved gene pairs"
(which you can download at http://www.pnas.org/cgi/reprint/104/25/10559). In
figure 2, they have a histogram of distances between genes on a chromosome
and they make a discrete fourier transform analysis to exhibit a period of
117kb. They explain how they did in the first paragraph of "Distributions of
distances and positions and fourier transform" (last page). I thought that
this kind of analysis was made by spec.pgram with a histogram. But perhaps I
am wrong because I really do not understand what they mean by "the histogram
was tranformed into a continuous probability density by using a Gaussian
smoothing window and normalizing the total density over the entire genome to
1. A discrete Fourier transform of the data were computed from 0 to 1,000kb
by using a Tukey window to taper the end (ratio of 0.5 for tapered to
untapered length.".

I hope it explains better what I want to obtain from my distances.
Best regards,

Anthony

On Mon, Jun 16, 2008 at 5:25 PM, stephen sefick <ssefick_at_gmail.com> wrote:

> To get some sort of frequency which in your case seem to be cycles per

*> distance? Is a valid use of a fourier transform as long as it is a distance
**> that is measured in a way that would be analogous to a time series- In
**> other words if the distance proceeds from an origin in one direction-
**> geophysicists do this often with the realization of an earthquake picked up
**> by sensors that are a distance away from the origin of the epicenter, but
**> they are looking for coherencies in the signal from one place to the next in
**> the frequency domain seperated by distance- this is called beam forming-
**> They use the raw signal- by binning (making a histogram) the data you are
**> loosing the signal- you are looking at frequency of occurance of certain
**> values not for the underlying periodicities of the data (in time or
**> space). You are fitting cos and isin function to you data to see if there
**> is periodicity- the power is the integration of the convolution of this sin
**> and cosine function with your data- It seems to me meaningless to preform
**> this convolution agianst something that is not a signal (the histogram). If
**> you want to use a frequency domain technique you have to have a frequency to
**> investigate- a histogram does not have this- I is a frequency of occurance
**> by bin size which is NOT what you want (your would have cycles/binlength
**> that doesn't make any sense to me) to do this analysis on- You want a
**> signal- dissolved oxygen curve, sunspot record, etc. through time, or
**> distance as stated above- you are looking for the frequency of a waveform-
**> Anyway, I may be misunderstanding- supply some code and explain the data
**> otherwise this line of though- in my limited expertise- is a dead end, but
**> agian I still don't know what it is that you are, exactly, trying to do- and
**> what your dataset constits. I hope these ruminations help
**>
**> I recommend doing this analysis on the raw data- It doesn't matter that you
**> don't have the same amount of data points- as long as both sets of data have
**> circa ten times the length of (cycles/distance) what you want to detect- If
**> things in your case are spaced by one meter then the lowest cycle
**> perdistance that you can reliably detect if 0.5 meters, this is all
**> speculation because you don't have a problem with reproducible code, and we
**> have no idea what you are measuring or what your data looks like- without
**> this information there is no way that I can say one way or the other that
**> you approach (suggested non-histogram) would be right or wrong.
**>
**> Stephen
**>
**>
**> On Mon, Jun 16, 2008 at 9:33 AM, Anthony Mathelier <
**> anthony.mathelier_at_gmail.com> wrote:
**>
**>> Perhaps I'm applying spec.pgram wrong as you said. I will explain what I
**>> want, so you can tell me why I'm wrong and perhaps what I have to do to do
**>> it well.
**>> I have some points in a 1-D space and I want to know if they are spaced at
**>> a certain periodic distance. So, I computed all the distances between points
**>> in my space. Then, I would like to know if a certain distance (period), or
**>> multiples of a certain distance, is preferred to space my data. I made a
**>> histogram of the distances and apply the spec.pgram function to know the
**>> frequence (so the period) which is the most important to space the original
**>> data.
**>> But, when I have to sets of data (without necessarily the same number of
**>> observation in each set), I want to compare the importance of the period
**>> given by spec.pgram between the sets. Could I normalize the amplitude of the
**>> peaks given by spec.pgram?
**>> So, am I wrong to apply this methodology to exhibit a periodic distance
**>> between my data? If, true, what could you recommend me to do this?
**>> Thanks in advance for your answers.
**>> Best regards,
**>>
**>> Anthony
**>>
**>> On Tue, Jun 10, 2008 at 6:13 PM, stephen sefick <ssefick_at_gmail.com>
**>> wrote:
**>>
**>>> I from a first thought I would say that you are apply this wrong! The
**>>> fourier transform convolves a function (cos(x)+isin(x) (this may not be the
**>>> exact formula but I don't have my books near)) to the data and then
**>>> integrates over -1/2 to 1/2 takes the modulus and plots this- the
**>>> periodogram. The reason you preform a fourier transform is to look at
**>>> recurring frequencies in the data, which are in the time domain. The
**>>> fourier transform converts the time series into the frequency domain and
**>>> viola you have a peak into the hidden/recurring parts of your signal. From
**>>> your explaination your are applying this technique wrong- look at schumway,
**>>> MASS4, et al. books to get a handle on how this technique is used. If you
**>>> are to apply a time series analysis please use it on a time series. Maybe
**>>> your logic is not flawed but I don't see how a histogram with its associated
**>>> binning is a better candidate for time series analysis than the original
**>>> time series if at all.
**>>> good luck
**>>>
**>>> Stephen
**>>>
**>>> On Tue, Jun 10, 2008 at 8:49 AM, Matthieu Stigler <
**>>> Matthieu.Stigler_at_gmail.com> wrote:
**>>>
**>>>> Hello
**>>>>
**>>>> I don't know exactly what you want to do but:
**>>>>
**>>>> -why do you use in your example h$counts and not h? Furthermore helpl
**>>>> file says it should be a time series, why then rather not your time series?
**>>>>
**>>>> -usually na.action will make the "default" action, which you can see by
**>>>> getOptions("na.action")
**>>>>
**>>>> -here in this function it is provided in the function values na.action =
**>>>> na.fail so it will just remove the NA in the time series
**>>>>
**>>>> -if you want to study a function, I advise you to copy it entirely,
**>>>> rename it and then just insert print(curiousobject...) in the function, this
**>>>> will allow you to let the function run and grasp the interessting objects,
**>>>> like:
**>>>>
**>>>> study<-function (x, spans = NULL, kernel = NULL, taper = 0.1, pad = 0,
**>>>> fast = TRUE, demean = FALSE, detrend = TRUE, plot = TRUE,
**>>>> na.action = na.fail, ...)
**>>>> {
**>>>> series <- deparse(substitute(x))
**>>>> x <- na.action(as.ts(x))
**>>>> print(x)
**>>>> xfreq <- frequency(x)
**>>>> ...}
**>>>> study(sunspots)
**>>>>
**>>>> -when you provide an example, instead of giving an external reference
**>>>> for the data, try to search a convenient internal data (accessed by data()
**>>>> ), so one will be able to reproduce your problems. Here you could use
**>>>> sunspots
**>>>>
**>>>> -to obtain the commented code... I don't know it...
**>>>>
**>>>> -good luck
**>>>>
**>>>> Matthieu
**>>>>
**>>>>
**>>>>
**>>>>
**>>>>
**>>>> Hi everyone,
**>>>>>
**>>>>> first of all, I would like to say that I am a newbie in R, so I
**>>>>> apologize in
**>>>>> advance if my questions seem to be too easy for you.
**>>>>>
**>>>>> Well, I'm looking for periodicity in histograms. I have histograms of
**>>>>> certain phenomenons and I'm asking whether a periodicity exists in
**>>>>> these
**>>>>> data. So, I make a periodogram with the function spec.pgram. For
**>>>>> instance,
**>>>>> if I have a histogram h, I call spec.pgram by spec.pgram (h, log="no",
**>>>>> taper=0.5). So, I have some peaks that appear and I would like to
**>>>>> interpret
**>>>>> them but I do not know how they are computed and so what a peak with a
**>>>>> value
**>>>>> of 10000 represents in comparison with a peak of value 600 with another
**>>>>> histogram.
**>>>>> I looked at the source code of the function spec.pgram to better
**>>>>> understand
**>>>>> what is behind. But, when I apply the source code line by line, I've
**>>>>> got a
**>>>>> problem. For instance, I make:
**>>>>>
**>>>>>
**>>>>>> >data = scan ("file.txt")
**>>>>>> >h = hist (data, breaks=max(data)/5000)
**>>>>>>
**>>>>>>
**>>>>> #then I apply the first two lines of the spec.pgram function
**>>>>>
**>>>>>
**>>>>>> >series <- deparse(substitute(h$counts))
**>>>>>> >x <- na.action(as.ts(h$counts))
**>>>>>> >x
**>>>>>>
**>>>>>>
**>>>>> NULL
**>>>>> I do not understand why when I apply the first two lines of the
**>>>>> function I
**>>>>> have x which is equal to NULL (which make a mistake in the following
**>>>>> lines
**>>>>> of the code) but if I apply the function directly with h$counts it
**>>>>> gives me
**>>>>> a result.
**>>>>> So, if someone can explain to me what is the problem and/or how
**>>>>> spec.pgram
**>>>>> exactly computes the periodogram and how to interpret it with my data,
**>>>>> I
**>>>>> would be so grateful.
**>>>>> And subsidiary questions:
**>>>>> - Is it possible to have the commented source code of the function?
**>>>>> - I do not understand what is the function na.action in the second line
**>>>>> of
**>>>>> spec.pgram, so if you can explain it to me.
**>>>>>
**>>>>> Thanks in advance for your answers.
**>>>>> Best regards,
**>>>>>
**>>>>> Anthony Mathelier
**>>>>>
**>>>>> [[alternative HTML version deleted]]
**>>>>>
**>>>>
**>>>> ______________________________________________
**>>>> R-help_at_r-project.org mailing list
**>>>> https://stat.ethz.ch/mailman/listinfo/r-help
**>>>> PLEASE do read the posting guide
**>>>> http://www.R-project.org/posting-guide.html
**>>>> and provide commented, minimal, self-contained, reproducible code.
**>>>>
**>>>
**>>>
**>>>
**>>> --
**>>> Let's not spend our time and resources thinking about things that are so
**>>> little or so large that all they really do for us is puff us up and make us
**>>> feel like gods. We are mammals, and have not exhausted the annoying little
**>>> problems of being mammals.
**>>>
**>>> -K. Mullis
**>>
**>>
**>>
**>
**>
**> --
**> Let's not spend our time and resources thinking about things that are so
**> little or so large that all they really do for us is puff us up and make us
**> feel like gods. We are mammals, and have not exhausted the annoying little
**> problems of being mammals.
**>
**> -K. Mullis
**>
*

[[alternative HTML version deleted]]

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon 16 Jun 2008 - 19:39:56 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 16 Jun 2008 - 20:32:15 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*