Reply
Thread Tools Display Modes
#1
Old 07-03-2011, 06:44 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Is there an example of an *exact* normal distribution in nature?

The normal (or Gauss) distribution is a terrific approximation in a wide variety of settings. And it's backed by the central limit theorem, which I admittedly do not understand.

But is it exactly observed in nature, for samples with an arbitrarily large number of observations? By exact, I mean that large samples consistently reject the hypothesis of non-normality.

In finance, rates of return tend to be leptokurtotic - they have long tails. There's even some (negative?) skew. That doesn't stop analysts from using the Gaussian distribution as a rough and ready model (sometimes to the chagrin of their investors, but that's another matter).

What about other settings? Biology? Demography? Chemistry? Atmospheric science? Engineering?

I'm guessing that a reliably perfect Gaussian process is empirically rare, since there's typically some extreme event that kicks in periodically. A large and important part of reality may consist of sums of very small errors, but I suspect that another aspect involves sporadic big honking errors. For every few thousand leaky faucets, we get a bust water main.


While I'm at it, why do the Shapiro-Wilk and Shapiro-Francia tests for normality cap out with sample sizes of 2000 and 5000?
------------------

I posed this question at the now defunct website, teemings.org in 2008. The short answer (by the esteemable ultrafilter) was, "no, nothing's going to be exactly normally distributed. On the other hand, particularly in cases where the central limit theorem applies, the difference between what you see and what you'd expect to see is negligible."

A tighter version of my question follows:


1. Has anybody stumbled upon an unsimulated and naturally occurring dataset with, say, more than 100 observations that looks exactly like a textbook normal curve?

2. Does any natural process consistently spin off Gaussian distributions, with p values consistent with normality virtually all the time? (Presumably this would be produced by something other than the central limit theorem (CLT) alone.) Ultrafilter says, "No", if I understand him correctly.

3. Does any natural process consistently spin off large datasets (thousands of observations each) where normality is *not* rejected - at least 95% of the time (at the 5% level of confidence)? If the CLT is the only thing in play, there should be natural processes like this. But I suspect that black swans are pretty much ubiquitous.

Then again, it should be possible to isolate a process that hews to a conventional Gaussian.


Bonus question: Do any of the tests for normality evaluate moments higher than 4?
----
As always wiki is helpful: http://en.wikipedia.org/wiki/Normal_...ion#Occurrence, but I'm not sure whether I should trust their claims of exactness.

Finally, if anybody has an empirical dataset whose process is plausibly Gauss, sample is huge and is in a reasonably accessible computer format, feel free to link to it if you're curious. I'll run some tests at some point using the statistical package Stata. Datasets are admittedly "All around the internet", but extracting lots of fairly large (2000+) samples typically requires some work.

Last edited by Measure for Measure; 07-03-2011 at 06:45 PM.
#2
Old 07-03-2011, 07:18 PM
Guest
Join Date: May 2001
Location: In another castle
Posts: 18,988
Your question is a little goofy. If I generate 20 random samples from a standard normal distribution, I would expect the hypothesis of normality to be rejected for one of them at alpha = .05. That doesn't mean that the points it produces aren't normally distributed, just that strange things can happen on a small sample.

In fact, I just ran this experiment, and the seventh sample didn't look normal to the robust Jarque-Bera test (p = .014).
#3
Old 07-03-2011, 07:56 PM
Guest
Join Date: Aug 2010
Posts: 1,901
Exact normal distributions can be found in physics. They are not rare. Wikipedia has some examples.
#4
Old 07-03-2011, 08:26 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by ultrafilter View Post
Your question is a little goofy. If I generate 20 random samples from a standard normal distribution, I would expect the hypothesis of normality to be rejected for one of them at alpha = .05. That doesn't mean that the points it produces aren't normally distributed, just that strange things can happen on a small sample.

In fact, I just ran this experiment, and the seventh sample didn't look normal to the robust Jarque-Bera test (p = .014).
Question #2 is a little goofy (provided the CLT is the only driver of normality, a fair assumption). Question #3 covers your concern though. (You addressed question #1 in 2008: to wit, a Gauss random number generator will typically produce distributions that don't look perfectly normal, but are statistically indistinguishable from a perfect normal distribution. So Q1 is a hunt for an anomaly of sorts.).

iamnotbatman: Ok. It's just that Wikipedia's image of "The ground state of a quantum harmonic oscillator," doesn't look remotely Gauss, as it lacks a pair of inflection points. Velocities of ideal gases are not empirical observations (are they?).

Last edited by Measure for Measure; 07-03-2011 at 08:27 PM.
#5
Old 07-03-2011, 08:45 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Sorry, my question is actually a lotta goofy. "Gauss is an approximation MfM! Who cares what happens at the 12th moment?"

I reply that models based upon rough approximations made by highly paid analysts got us into trouble during the dawn of this Little Depression. And in a more general context, the late and great Peter Kennedy wrote:
Quote:
Originally Posted by Peter Kennedy
It is extremely convenient to assume that errors are distributed normally, but there exists little justification for this assumption... Poincare is said to have claimed that "everyone believes in the [Gaussian] law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an empirical fact."
#6
Old 07-03-2011, 11:21 PM
Charter Member
Moderator
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 73,120
Wiki's picture for the quantum harmonic oscillator is a bit confusing. You see how it has those pixellated-looking horizontal bands across it? Each of those is one energy level. So the bottom-most band represents the ground state (with intensity as a function of position instead of height as a function of position, like on a graph). The ground state of a harmonic oscillator is, in fact, a perfect Gaussian.

But that doesn't really answer the question, either-- It just moves it to the question of whether there are really any perfect harmonic oscillators in nature. The harmonic oscillator, like the Gaussian itself, is something that doesn't actually show up exactly, very often, but which is a very good and simple approximation for a lot of things that do show up. The best I can come up with is a charged particle in a magnetic field with no other influences on it, but that "no other influences" is a killer. Especially since the particle would also have to have no magnetic dipole moment.

But I'm a bit unclear about what the OP means by a "natural" process. For instance, would rolling a whole bunch of dice and adding them up count as "natural"? Because it's really easy to get an arbitrarily-good Gaussian that way.
#7
Old 07-04-2011, 02:48 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by Chronos View Post
But I'm a bit unclear about what the OP means by a "natural" process. For instance, would rolling a whole bunch of dice and adding them up count as "natural"? Because it's really easy to get an arbitrarily-good Gaussian that way.
Nice point. Roll a reasonably fair die and you have a uniform random number generator. Add a bunch of them up, and the CLT kicks in. So that's one example.

But the spirit of the OP seeks:
a) a one million observation empirical Gaussian dataset (give or take a magnitude) on the internet that I can evaluate in Stata and/or
b) verification of the hypothesis that in practice no natural process is wholly governed by the CLT: black swans are ubiquitous for example. And that's just the 4th moment.

Hypothesis b) is falsified depending upon whether you consider rolled dice a natural process. Are there even better examples?
Quote:
It just moves it to the question of whether there are really any perfect harmonic oscillators in nature.
My knowledge of physics is pretty sparse so I can't tell which phenomena are a step or two removed from direct observation. I am not contesting that a regression of a near-Gaussian process could strip away the non-Gaussian bits, leaving behind systematically Gaussian errors. That's what I meant by Gaussian processes, whose existence I accept. Solid examples of such processes would be welcome as well. But I'm really asking about pure Gaussian empirical data.

Anyway, Wikipedia gives examples of exact Gaussian processes. I can't evaluate them: I can't tell which are theoretical artifacts and which correspond to actual datasets. Would anyone like to take a crack? http://en.wikipedia.org/wiki/Normal_...ion#Occurrence
#8
Old 07-04-2011, 05:09 AM
Guest
Join Date: Aug 2010
Posts: 1,901
Wikipedia gives three examples of exact normality. In truth, none of them are exact, although in practice I believe that the exactness in all three cases could (and I'm quite sure it has, though I'm too lazy to find a cite) been confirmed over thousands of trials to many decimal places.

The first example is the velocities of the molecules in the ideal gas. One reason this isn't truly exact (in the sense of wanting a true continuous distribution) is because the number of molecules is finite. But in practice, for a macroscopic ensemble like a liter of gas, the normality of the distribution would be impossible to differentiate from non-normality with 21st century technology.

The second example is the ground state of the quantum harmonic oscillator. As Chronos pointed out, there may not be any truly perfect quantum harmonic oscillators in nature. But any potential (smooth, continuous) with local minima will have ground states that to a very good approximation have normal distributions. Nature provides such minima in abundance (in magnetic or electric field configurations, vibrations of diatomic molecules...). The problem is that if you include the finite size of the universe, or the effect of tiny perturbations due to interactions with the quantum vacuum, or even the tiny contribution of gravity waves from the stars in the sky, you are bound to imperceptibly distort your potentials so that they are not exactly gaussian.

The third example is the position of a particle in quantum mechanics. If you measure the position exactly, then wait some time, and measure again, the distributions of measured positions is exactly Gaussian. Of course, this is assuming an idealized particle, and no other potentials in the vicinity, which, as I mentioned above, is not ever going to be achieved perfectly in practice. Similarly you are not going to be able to perfectly measure the position of the particle in the first place.
#9
Old 07-04-2011, 08:12 AM
Guest
Join Date: May 2005
Posts: 5,889
Radioactive decay.
#10
Old 07-04-2011, 08:36 AM
Guest
Join Date: Aug 2010
Posts: 1,901
Quote:
Originally Posted by Kevbo View Post
Radioactive decay.
That's not a normal distribution.
#11
Old 07-04-2011, 12:59 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
iamnotbatman:
I have difficulty working through these science examples as I lack a background in either college physics or chemistry. Apologies for my inaccuracy and imprecision: I'm really winging it here.

Ok. In nature there are no ideal gases, no perfect harmonic oscillators and no perfect measures of any given particle's position. These models approximate the world quite well though. This isn't exactly what I'm getting at.

Is there a dataset with ~10K observations of the velocity of the molecules of a liter of nitrogen? [1] If so, I'm not asking whether the distribution produced is perfectly Gauss. I'm asking whether it is consistently indistinguishable from Gauss at the 95% level of confidence. Sufficiently small perturbations only matter with sufficiently large sample sizes. (My call for huge datasets was so that I could have a set n=2000 subsamples, and work out the share of them that reject normality at 5%).

In practice though, I might wonder whether measurement errors make a difference. Ok, now I'm wandering into the goofy again.



[1] Seriously, what are we measuring here? Is it the average velocity of molecules in a liter of gas? I'm guessing that the data would be measuring temperature and pressure, which is something different. The Gauss would be used to transform empirical observation into a postulated velocity: it would address some processes and set others aside. This is something different than "A Gaussian dataset". I'm calling it "A Gaussian process". So what I'm gathering from this thread is that "There are lots of examples of the exact Gauss in physics", though I don't have a clear idea of any particular Gauss dataset in physics. Could we specify the experimental setup in greater detail?
#12
Old 07-04-2011, 05:12 PM
Guest
Join Date: Aug 2010
Posts: 1,901
Quote:
Originally Posted by Measure for Measure View Post
iamnotbatman:
I have difficulty working through these science examples as I lack a background in either college physics or chemistry. Apologies for my inaccuracy and imprecision: I'm really winging it here.

Ok. In nature there are no ideal gases, no perfect harmonic oscillators and no perfect measures of any given particle's position. These models approximate the world quite well though. This isn't exactly what I'm getting at.

Is there a dataset with ~10K observations of the velocity of the molecules of a liter of nitrogen? [1] If so, I'm not asking whether the distribution produced is perfectly Gauss. I'm asking whether it is consistently indistinguishable from Gauss at the 95% level of confidence. Sufficiently small perturbations only matter with sufficiently large sample sizes. (My call for huge datasets was so that I could have a set n=2000 subsamples, and work out the share of them that reject normality at 5%).

In practice though, I might wonder whether measurement errors make a difference. Ok, now I'm wandering into the goofy again.



[1] Seriously, what are we measuring here? Is it the average velocity of molecules in a liter of gas? I'm guessing that the data would be measuring temperature and pressure, which is something different. The Gauss would be used to transform empirical observation into a postulated velocity: it would address some processes and set others aside. This is something different than "A Gaussian dataset". I'm calling it "A Gaussian process". So what I'm gathering from this thread is that "There are lots of examples of the exact Gauss in physics", though I don't have a clear idea of any particular Gauss dataset in physics. Could we specify the experimental setup in greater detail?
It might help to know what you're after. For example, why can't you produce your own dataset using a software package such as Root, R, or Mathematica (or C++)? In terms of the physical world, surely we have many indirect ways of testing normality. But you seem to be after a direct measurement of normality, which, from my perspective, is odd, or without motivation that makes sense to me. For example, we know that the normality of the velocity distribution is true, because of the macroscopic properties of the gas that we can measure with great precision. Usually we would not resort to looking at the literal distribution of individual velocities. Unfortunately I don't know of a paper published regarding your question in which they include their actual dataset for you to peruse. For the gas velocity distribution (we would be measuring the distribution of velocities of the gas molecules), an example of an experimental setup would be to have a chamber of gas at some temperature and pressure, and have a very tiny opening through which the gas molecules can escape more or less one at a time at a very low rate. You then a collimator that makes sure the 'beam' of molecules is straight, followed by a detector that records the velocities (perhaps ensuring the particles are ionized and measuring how much they are deflected in a given electric field). This was in fact done, I think, in 1926 by Stern, in a series of experiments, perhaps with input form Sterlach and Einstein, which showed more or less directly that the distribution of the velocities are normally distributed. I do not think they published their dataset though.
#13
Old 07-04-2011, 10:48 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
iamnotbatman:
Thanks for spelling out the ideal gas example.

I concede that a random number generator can produce Gauss random variables; I've even fooled around with that a little. My original motivation was linked to the tendency in the social sciences to assume Gauss errors without credible or even explicit justification. Now that's not necessarily a bad thing: it depends upon the problem in question. And in physics I now understand that there are very good reasons for assuming that certain types of distributions are indeed Gauss. But it is bogus and dangerous to blithely assume Gauss for financial market returns in the face of strong evidence to the contrary, at least without applying robustness checks and general due diligence. And yet that is what was done routinely some years back: this sort of mismanagement formed one of the necessary conditions for the financial crisis and subsequent Little Depression.

Construct the Casual Fact
So much for high-level motivation. For this thread I'm working on a more general level. I'd like to say something along the lines of, "The Gauss distribution doesn't exist empirically in nature: what we have are distribution mixtures." But I don't think that's quite correct. I'm trying to work out the proper rough characterization about the prevalence of observed exact Gaussians.

Again, in many applications this doesn't matter. If you're conducting an hypothesis test and the underlying distribution is even Laplace, applying the student's t-statistic probably won't steer you that far wrong. Type I and II errors will be less than optimal, but arguably acceptable. Or so I speculate: I haven't read the relevant Monte Carlo study. But if you are forecasting central tendency and dispersion, that's another matter entirely.
#14
Old 07-04-2011, 10:55 PM
Guest
Join Date: May 2001
Location: In another castle
Posts: 18,988
Quote:
Originally Posted by Measure for Measure View Post
But it is bogus and dangerous to blithely assume Gauss for financial market returns in the face of strong evidence to the contrary, at least without applying robustness checks and general due diligence. And yet that is what was done routinely some years back: this sort of mismanagement formed one of the necessary conditions for the financial crisis and subsequent Little Depression.
That's so far down the list of causes that it barely even rates a mention.
#15
Old 07-04-2011, 11:00 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by ultrafilter View Post
That's so far down the list of causes that it barely even rates a mention.
I disagree, but that's a matter for another thread. The failure of due diligence and ignoring once-every-eight-year events was a big part of the story. Admittedly over-application of the Gaussian distribution is only a proximate cause: it's not like the financial engineers lacked sufficient training or understanding.
------

Note to future readers (if any): interesting discussion of the Gaussian distribution occurs here: http://boards.academicpursuits.us/sdmb/...d.php?t=614864

Last edited by Measure for Measure; 07-04-2011 at 11:02 PM.
#16
Old 07-05-2011, 04:44 AM
Guest
Join Date: Aug 2010
Posts: 1,901
Quote:
Originally Posted by Measure for Measure View Post
iamnotbatman:
Thanks for spelling out the ideal gas example.

I concede that a random number generator can produce Gauss random variables; I've even fooled around with that a little. My original motivation was linked to the tendency in the social sciences to assume Gauss errors without credible or even explicit justification. Now that's not necessarily a bad thing: it depends upon the problem in question. And in physics I now understand that there are very good reasons for assuming that certain types of distributions are indeed Gauss. But it is bogus and dangerous to blithely assume Gauss for financial market returns in the face of strong evidence to the contrary, at least without applying robustness checks and general due diligence. And yet that is what was done routinely some years back: this sort of mismanagement formed one of the necessary conditions for the financial crisis and subsequent Little Depression.

Construct the Casual Fact
So much for high-level motivation. For this thread I'm working on a more general level. I'd like to say something along the lines of, "The Gauss distribution doesn't exist empirically in nature: what we have are distribution mixtures." But I don't think that's quite correct. I'm trying to work out the proper rough characterization about the prevalence of observed exact Gaussians.

Again, in many applications this doesn't matter. If you're conducting an hypothesis test and the underlying distribution is even Laplace, applying the student's t-statistic probably won't steer you that far wrong. Type I and II errors will be less than optimal, but arguably acceptable. Or so I speculate: I haven't read the relevant Monte Carlo study. But if you are forecasting central tendency and dispersion, that's another matter entirely.
I read the Taleb book a couple years back and found it very compelling, so I know where you are coming from. But it is important to separate out two separate things, because generally what causes the 'fat tail' is not the underlying thing you are trying to measure, for which the theoretical gaussian distribution is well understood and thoroughly correct, but some external factor. In physics we would call this external factor the 'systematic error' in the measurement, and it is considered of great importance to account for it correctly. Consider an example:

If you toss a coin N times, the number of times the coin shows 'heads' (N_h) is a random variable that is binomially distributed, but for large N is normally distributed. (Btw if you are a masochist you can use this method to produce your own dataset, but you should really just use a software method). Now, one source of systematic error is whether or not the coin is 'fair'. But let's ignore that, because even if it wasn't fair, the distribution shouldn't have a fat tail. Now suppose you needed a coin tossed a trillion times, so you farmed out the work to some company that used a robot to flip coins and an employed image recognition software to determine which side of the coin landed up. The question is, do you trust the company to do this without error? Perhaps the robot can flip with such precision that if it produces the same 'flipping' force the coin will always land heads-up, and the company's software was never tested beyond a few million flips, and since some of their variables were 32-bit and reset after 2^32 flips, after a few billion flips the robot gets into a pattern where it keeps throwing heads over and over again. If the company was incompetent (which is extremely common in the real world), they may not notice the bug, and hand you the dataset claiming that the systematic error is zero. But in reality you would get a very fat tail -- not because the underlying process was non-gaussian, but because of external factors which were not correctly accounted for. I think a common problem in the financial world may be an arrogance regarding the evaluation of systematic errors combined with a lot of top-down pressure and under-regulated competitive pressure (tragedy of the commons, etc) -- not honestly accounting for systematic error. For example, if you have companies competing to build coin-flipping machines in the market place, they are going to make competitive shortcuts and unrealistic promises regarding low systematic errors. Cheaper and flimsier coin-flipping machines may be built and trusted because it is necessary to compete against others making the same mistakes. Needless to say, in such a market, I would not 'bet on' normally distributed data -- humans make mistakes, and it is unrealistic to expect that the probability for making such a mistake is as small as a normal distribution says it can be.

I think your focus on normal distributions could be expanded to *any* distribution that claims probabilities that can be vanishingly small (disregarding boundary conditions). In the real world your systematic error is generally large enough that when you add it to the statistical prediction, you always expect some fatness to your tails, to some extent. Most people know this intuitively. For example we know that in quantum mechanics it is possible for your stapler to tunnel through your desk and onto the floor. The probability distribution is a tail very much like that of the tail of the normal curve, and is unimaginably small. But if you are doing a home experiment, you have to account for the possibility that someone took your stapler while you weren't looking, and someone else dropped a stapler near your desk and it ended up below yours. That is in fact analogous to one of the systematic errors that must be controlled for when doing some of these actual quantum mechanical experiments.
#17
Old 07-05-2011, 01:04 PM
Charter Member
Moderator
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 73,120
Strictly speaking, an error which causes fat tails on both sides wouldn't be a systematic error, since systematic errors by definition will bias your data in one direction.
#18
Old 07-05-2011, 02:59 PM
Guest
Join Date: Aug 2010
Posts: 1,901
Quote:
Originally Posted by Chronos View Post
Strictly speaking, an error which causes fat tails on both sides wouldn't be a systematic error, since systematic errors by definition will bias your data in one direction.
Surely it would be a systematic error biased in one direction -- the direction away from the mean! The wikipedia article has examples in the first section of this type of systematic error.
#19
Old 07-06-2011, 12:16 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
iamnotbatman :
As I said, my original curiosity arose from the ubiquitous assumption of Gaussian errors. The financial crisis added some additional motivation though. Most analysts are aware of these issues, but I at least don't have a solid grasp of them.

I like the systematic error concept. I'm inclined to abandon my "No observed Gauss anywhere" notion: there are solid reasons to believe in Gaussian processes in certain physics contexts. Let me propose another conceptual handle: "In practice, most dataset errors reflect some sort of distribution mixture. Following the central limit theorem, the sum of lots of equally weighted distributions will be Gauss. But in practice, Gauss will be an approximation, since the weights won't be equal.[1] " That should encompass the systematic error concept to some extent. One of the problems in the social sciences is that there typically aren't solid theoretic reasons for believing in any particular exact error distribution (even if there are plausible arguments for Gaussian approximations or whatever). Furthermore your dependent variable typically reflects a lot of unmeasurables and even unponderables.
Quote:
Originally Posted by iamnotbatman
I think a common problem in the financial world may be an arrogance regarding the evaluation of systematic errors combined with a lot of top-down pressure and under-regulated competitive pressure (tragedy of the commons, etc) -- not honestly accounting for systematic error.
I agree, but fear that financial returns are somewhat more complicated than that. Still, "Gauss plus a swan every 5-10 years" is probably a lot better than simple Gauss.
Quote:
Originally Posted by Chronos
Strictly speaking, an error which causes fat tails on both sides wouldn't be a systematic error, since systematic errors by definition will bias your data in one direction.
FWIW, financial returns typically have a negative skew as well.

I downloaded a dataset of 15,000 observations from yahoo. It consists of daily percentage price changes of the S&P Composite, a weighted sum of large capitalization stocks.[2] I'll compare it to Gauss in an upcoming post.


[1] (Whether the sample size is sufficient to distinguish your empirical distribution from pure Gauss is a separate matter.)

[2] It's the S&P 500, except there were fewer companies in the index during the 1950s.

Last edited by Measure for Measure; 07-06-2011 at 12:18 AM.
#20
Old 07-07-2011, 12:42 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
So here's what the distribution of the S&P Composite looks like. As a control, I specified a normally distributed random variable with the same mean and standard deviation:

Code:
 Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
dailypret500 | 15474 .0003306 .00967 -.204669 .1158
 normrnd | 15475 .0002605 .0097017 -.036254 .0388416
Resolved mean daily returns differed, even with a sample size exceeding 15,000. Standard deviation differed as well. Daily price returns ranged from -20.5% to 11.6%: the Gauss random varible had a range of only -3.6% to +3.9%.

Now consider kurtosis and skew:
Code:
 DailyPret500
-------------------------------------------------------------
 Percentiles Smallest
 1% -.025709 -.204669
 5% -.014333 -.09035
10% -.009873 -.089295 Obs 15474
25% -.004123 -.088068 Sum of Wgt. 15474

50% .0004635 Mean .0003306
 Largest Std. Dev. .00967
75% .004963 .070758
90% .010133 .090994 Variance .0000935
95% .014507 .10789 Skewness -.6616936
99% .025685 .1158 Kurtosis 25.22477

 normrnd
-------------------------------------------------------------
 Percentiles Smallest
 1% -.0224537 -.036254
 5% -.0157981 -.0340847
10% -.0121544 -.0337052 Obs 15475
25% -.0062004 -.03304 Sum of Wgt. 15475

50% .0002771 Mean .0002605
 Largest Std. Dev. .0097017
75% .0068832 .0339581
90% .0126438 .0363308 Variance .0000941
95% .016328 .0382705 Skewness -.0217871
99% .0227815 .0388416 Kurtosis 3.00358
The SP500 had a kurtosis of 25.2. In contrast Gauss random variables have kurtosis of 3 while LaPlace R.Vs have kurtosis of 6. The SP500 also has a negative skew. Yes the kurtosis and skew is significantly different than Gauss:
Code:
 Skewness/Kurtosis tests for Normality
 ------- joint ------
 Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-------------+-------------------------------------------------------
dailypret500 | 0.000 0.000 . .
 normrnd | 0.268 0.904 1.24 0.5379
Here's a histogram of the SP500:
http://wm55.inbox.com/thumbs/44_130b...5_oP.png.thumb

It looks a little like the Burj Dubai. Here it is with a overlaid perfect normal distribution:
http://wm55.inbox.com/thumbs/45_130b...c_oP.png.thumb

The Burj 500 is pointier, has longer tails, and is somewhat skewed.

Are outliers driving this effect? Let's see what happens if we remove the 30 most negative and positive returns. That would be one day per year on average, so we are removing swans of all colors.
Code:
 DailyPret500
-------------------------------------------------------------
 Percentiles Smallest
 1% -.024287 -.04356
 5% -.014088 -.043181
10% -.009789 -.04279 Obs 15414
25% -.004107 -.042532 Sum of Wgt. 15414

50% .0004635 Mean .0003492
 Largest Std. Dev. .0087627
75% .004948 .040826
90% .010061 .041729 Variance .0000768
95% .014279 .041867 Skewness -.0402395
99% .024358 .04241 Kurtosis 5.482383

 normrnd
-------------------------------------------------------------
 Percentiles Smallest
 1% -.0224537 -.036254
 5% -.015802 -.0340847
10% -.0121599 -.0337052 Obs 15414
25% -.0062022 -.03304 Sum of Wgt. 15414

50% .0002828 Mean .0002593
 Largest Std. Dev. .0097034
75% .0068737 .0339581
90% .0126438 .0363308 Variance .0000942
95% .0163285 .0382705 Skewness -.0216746
99% .0227815 .0388416 Kurtosis 3.004975

 Skewness/Kurtosis tests for Normality
 ------- joint ------
 Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
-------------+-------------------------------------------------------
dailypret500 | 0.041 0.000 . 0.0000
 normrnd | 0.272 0.876 1.23 0.5400
Well it's improved. Kurtosis is down to 5.5, which isn't Gauss but is at least close to LaPlace. Both skew and kurtosis are significantly different from Gauss at the 5% level.

In short, Gauss is a pretty rough approximation for financial returns and while black swans are a big part of the story, they are not the only part. Note that I picked a period of relative economic stability. 1890-1950 was far more tumultuous. For that matter 1830-1890 wasn't exactly smooth sailing either.
#21
Old 07-07-2011, 01:01 AM
Guest
Join Date: May 2001
Location: In another castle
Posts: 18,988
Do you know what it means for a time series to be nonstationary?
#22
Old 07-07-2011, 01:20 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by ultrafilter View Post
Do you know what it means for a time series to be nonstationary?
Well, I would frankly need to review the ins and outs of that. But I'm running percentage returns, not prices (and not even first differences), so I wasn't worried about unit roots. If you're interested, here's the Dickey-Fuller test:
Code:
. dfuller dailypret500, regress
Dickey-Fuller test for unit root Number of obs = 15473
 ---------- Interpolated Dickey-Fuller ---------
 Test 1% Critical 5% Critical 10% Critical
 Statistic Value Value Value
------------------------------------------------------------------------------
 Z(t) -120.229 -3.430 -2.860 -2.570
------------------------------------------------------------------------------
MacKinnon approximate p-value for Z(t) = 0.0000
------------------------------------------------------------------------------
D. |
dailypret500 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dailypret500 |
 L1 | -.9660763 .0080353 -120.23 0.000 -.9818264 -.9503262
_cons | .0003187 .0000777 4.10 0.000 .0001663 .000471
------------------------------------------------------------------------------
Unit roots are rejected at the 1% level.

Last edited by Measure for Measure; 07-07-2011 at 01:21 AM.
#23
Old 07-07-2011, 01:24 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
...strictly speaking some financial analysts like to use log changes, but I doubt that presentation would make a big difference. I could be wrong though.


ETA: IIRC correctly, stationarity implies constant variance as well, which is something that financial returns don't have. Variance tends to be auto-correlated.

Last edited by Measure for Measure; 07-07-2011 at 01:26 AM.
#24
Old 07-09-2011, 05:08 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by Measure for Measure View Post
Let me propose another conceptual handle: "In practice, most dataset errors reflect some sort of distribution mixture. Following the central limit theorem, the sum of lots of equally weighted distributions will be Gauss. But in practice, Gauss will be an approximation, since the weights won't be equal.[1] " That should encompass the systematic error concept to some extent.
I must admit that this characterization doesn't exactly leap out from my sample. Here's another try: "Data will reflect some sort of underlying structure. Errors may do the same, except that unobserved and even unobservable variables may play a prominent role. So absent evidence to the contrary, the distribution posited for the error term might be thought of as an approximation, possibly a rough one."


Poking around the internet it seems that Easyfit is one of a few pieces of specialty software used for fitting lots of different distributions to a dataset. It seems that this sort of procedure isn't a standard part of the usual general purpose statistical packages. Stata 8 fits the normal for example, but that's all AFAIK. Then again, I know little about these tests: I don't know whether measuring and matching moments would be straightforward.
#25
Old 07-09-2011, 05:37 PM
Guest
Join Date: May 2001
Location: In another castle
Posts: 18,988
Quote:
Originally Posted by Measure for Measure View Post
Well, I would frankly need to review the ins and outs of that.
Yes, you do, because your entire argument seems to be based on the assumption that financial models assume a stationary return distribution, which is demonstrably false.

You also don't really need a test to see that the S&P 500 returns are not stationary. I did a quick plot of the returns over the period 1999-2010, and it's immediately obvious that you're not looking at a stationary series. Any longer time period would show much more variability.
#26
Old 07-09-2011, 06:14 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Oh, but I agree financial returns can be and are modeled with GARCH: 2nd moments are autocorrelated after all. In fact I had GARCH in mind when I spoke of "Underlying structure". But returns are also commonly modeled assuming normally distributed returns. Look at the Value at Risk literature. Consider the Black-Scholes model of options pricing. Read the business press when they speak of once every 10,000 year events that for some reason seem to recur every 5 years. Taleb wrote an entire book on this, which admittedly I haven't read.

Again, I'm not claiming that financial professionals are unaware of these issues, although I guess I am saying that they are known to be blown off now and then. Permit me to quote from John Hull's Options, Futures and Other Derivatives, 4th ed. He has a chapter on "Estimating Volatilities and Correlations" [with GARCH]:
Quote:
Originally Posted by John Hull
Summary

Most popular option pricing models, such as Black-Scholes, assume that the volatility of the underlying asset is constant. This assumption is far from perfect. In practice, the volatility of an asset, like the assets price, is a stochastic variable. ...
Emphasis added. I'll also note that Hull's Fundamentals book lacks a chapter or mention of GARCH, although it is used to train stock traders.

Last edited by Measure for Measure; 07-09-2011 at 06:17 PM.
#27
Old 07-09-2011, 11:22 PM
Guest
Join Date: Dec 2007
Location: NW Chicago Suburbs
Posts: 2,363
Quote:
Originally Posted by iamnotbatman View Post
Exact normal distributions can be found in physics. They are not rare. Wikipedia has some examples.
Yup, my immediate thought was radioactive decay (since the number of particles is so huge (thank you, Avogadro) as to make it a nearly perfect normal distribution)

Also, the phsyics department at my Uni back in the day had a really neat demo lab. One of the demos was a impressively large Bean Machine. I used to love playing with that. And it always produced a nice looking normal curve. Not quite nature, since it was constructed by humans, but the principles are correct.
#28
Old 07-10-2011, 03:37 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
I believe that radioactive decay is a Poisson Process.
#29
Old 07-10-2011, 06:16 AM
Member
Join Date: May 2000
Location: Massachusetts
Posts: 41,651
Quote:
Originally Posted by Chronos View Post
Wiki's picture for the quantum harmonic oscillator is a bit confusing. You see how it has those pixellated-looking horizontal bands across it? Each of those is one energy level. So the bottom-most band represents the ground state (with intensity as a function of position instead of height as a function of position, like on a graph). The ground state of a harmonic oscillator is, in fact, a perfect Gaussian.

But that doesn't really answer the question, either-- It just moves it to the question of whether there are really any perfect harmonic oscillators in nature. The harmonic oscillator, like the Gaussian itself, is something that doesn't actually show up exactly, very often, but which is a very good and simple approximation for a lot of things that do show up. The best I can come up with is a charged particle in a magnetic field with no other influences on it, but that "no other influences" is a killer. Especially since the particle would also have to have no magnetic dipole moment.

But I'm a bit unclear about what the OP means by a "natural" process. For instance, would rolling a whole bunch of dice and adding them up count as "natural"? Because it's really easy to get an arbitrarily-good Gaussian that way.


There are plenty of harmonic oscillators in nature for which this is a pretty good approximation -- it works extremely well for diatomic molecules. If you use the better-fitting Morse potentialinstead of a perfect harmonic potential, the lowest order state is pretty close to a perfect Gaussian.


And, of course, the cross section through a TEM00 mode in a laser is a perfect Gaussian. In the real world, of course, there will invariably be dust specks in the beam, and the mirrors will be of finite extent. I doubt if anything in the real world can ever be a perfect Gaussian, because I doubt if anything in the real world will ever be a perfect function of any sort.
#30
Old 07-13-2011, 03:54 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Ignorance fought Cal. Thanks to all the participants in this discussion.

I'd like to wind up a few loose threads.
Quote:
Originally Posted by Measure for Measure
If you're conducting an hypothesis test and the underlying distribution is even Laplace, applying the student's t-statistic probably won't steer you that far wrong. Type I and II errors will be less than optimal, but arguably acceptable. Or so I speculate...
I retract my speculation and will simply say that I don't know the implications of non-normality on hypothesis testing. One advocate of robust regression 1 states:
Quote:
Originally Posted by Frank Hampel
Now, the scientists are often told (or hope) that small deviations from normality do not matter, that the t-test is "robust" against small deviations from the mathematical assumption of normality. But this is true, roughly speaking, only for the level of the t-test; the power (and the corresponding length of confidence intervals, as well as the efficiency of the arithmetic mean) is very sensitive even to small deviations from normality.

...Tukey [122], (summarizing earlier work) showed the nonrobustness of the arithmetic mean even under slight deviations from normality.

...The avoidable efficiency losses of least squares (and other "classically" optimal procedures) compared with good robust procedures are more typically in the range of 10% - 100% than in the range of 1% - 10% (as some confusing papers and a misinterpretation of the Gauss -Markov theorem would seem to suggest, cf. below);
A less efficient estimator implies that we are not making the best guess possible of the parameter's true value. The author goes on to say that informal eyeballing techniques can drop the efficiency loss to 10-20%: I guess he has outliers in mind. I'm not saying this guy is the final authority though: I'm just retracting my overly hasty remarks.
----

GARCH is a method of taking into account serially correlated variances. So in the context of the stock market, a large move today implies a large move tomorrow -- though we won't know the direction of that move. According to one set of authors2 while volatility clustering in normal GARCH models will increase the kurtosis of the series, it generally doesn't do so sufficiently to reflect the kurtosis (or long tails) of financial market returns. Like other researchers, they opt for a GARCH model with non-Gauss innovations.

I would think that a normal GARCH model would produce zero skew, though I haven't verified this. Eriksson and Forsberg (2004)3 use a GARCH model with conditional skewness. That seems to me to be an odd way of modelling momentum in returns, but I frankly don't understand this properly. Anyway, they appear to use the Wald distribution in their GARCH model rather than Gauss.


Still, the OP is about the applicability and robustness of Gauss in general, and is not confined to financial markets. As computing power is cheap, it might not be a bad idea for the researcher to examine the descriptive statistics for empirical errors of uncertain provenance. But whether such a procedure would involve a risk of inappropriate data mining is something that I would have to think harder about.

ETA: Bean machines. No data. Discrete output. If it was made continuous by measuring impact location on a plate it would have an odd shape unless the bottom pins were moving.

--------------------------------------
1See ROBUST INFERENCE by Frank Hampel (2000)
2See Kurtosis of GARCH and Stochastic Volatility Models with Non-normal Innovations by Xuezheng Bai, Jeffrey R. Russell, George C. Tiao, July 27, 2001
3See The Mean Variance Mixing GARCH (1,1) model -a new approach to estimate conditional skewness by Anders Eriksson Lars Forsberg 2004. All these working papers are available as .pdfs via google.

Last edited by Measure for Measure; 07-13-2011 at 03:56 AM. Reason: Bean machine comment.
#31
Old 07-13-2011, 12:03 PM
Charter Member
Moderator
Join Date: Jan 2000
Location: The Land of Cleves
Posts: 73,120
A bean machine meets the conditions of the Central Limit Theorem, so it'll be a good approximation to within the limitations imposed by the binning, the finite number of beans, and the truncated tails. But of course you still have those limitations.
#32
Old 07-16-2011, 05:46 PM
Guest
Join Date: May 2007
Posts: 9,028
If man's current understanding of physics is correct, then I would guess that a true normal distribution in nature is flat out impossible.

One property of normal distributions is that there is a finite probability of exceeding any value.

Thus, if the distribution of velocities of a set of particles is truly normal, then there is a finite chance that one or more of the particles will exceed the speed of light. Which is impossible if man's current understanding of physics is correct.

Similarly, if the position of a particle after time t is normally distributed, then there is a finite chance that the particle moved faster than the speed of light.
#33
Old 07-21-2011, 01:13 AM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Trivial Hijack!

Quote:
Originally Posted by Peter Kennedy
It is extremely convenient to assume that errors are distributed normally, but there exists little justification for this assumption... Poincare is said to have claimed that "everyone believes in the [Gaussian] law of errors, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an empirical fact.
The provenance of this quote has been less than clear for a few decades, but google resolves the matter. Poincare quotes a conversation with Gabriel Lippmann in Calcul des probabilités (1912). Others had difficulty locating the precise citation.

Boring Historical Details

Here's the original french:
Quote:
Originally Posted by Poincaré in Calcul des probabilités
108. Cela ne nous apprendrait pas grand'chose si nous n'avions aucune donnée sur phi et psi. On a donc fait une hypothèse sur phi, et cette hypothèse a été appelée loi des erreurs.
Elle ne s'obtient pas par des déductions rigoureuses ; plus d'une démonstration qu'on a voulu en donner est grossière, entre autres celle qui s'appuie sur raffirmation que la probabilité des écarts est proportionnelle aux écarts. Tout le monde y croit cependant, me disait un jour M. Lippmann, car les expérimentateurs s'imaginent que c'est un théorème de mathématiques, et les mathématiciens que c'est un fait expérimental.
Voici comment Gauss y est arrivé.
Lorsque nous cherchons la meilleure valeur à donnera a z, nous n'avons pas d'autre ressource que de prendre la moyenne entre x1, x2, ...,xn en l'absence de toute considération qui justifierait un autre choix. Il faut donc que la loi des erreurs s'adapte à cette façon d'opérer. Gauss cherche quelle doit être phi pour que la valeur la plus probable soit la valeur moyenne.
Emphasis in original: p. 170-71. Here is one translation combining google translate, my dismal high school french and some poetic license:
This doesn't teach us much if we don't have data on phi and psi. So we form an hypothesis for phi and call it The Law of Errors.

That can't be deduced rigorously, though there are demonstrations such as the probability of the deviations is proportional to the differences.1 Everyone believes however, as Gabriel Lippmann told me one day, the experimenters because they imagine that it is a mathematical theorem and the mathematicians because it is an experimental fact.

Here's how Gauss did it.

When we seek the best value to give to z, we have no alternative but to take the average of x1, x2, ..., xn in the absence of any considerations that would justify a choice. Therefore the law of errors fits this mode of operation. Gauss looks for what should be the most likely value of phi, the mean value.
And here's my paraphrase: as the great physicist Gabriel Lippmann once told Poincare, "Everybody believes in the Gaussian Law of Errors, the experimenters because they imagine that it is a mathematical theorem and the mathematicians because they believe it is an empirical fact."


1Translators note: huh?
#34
Old 07-23-2011, 09:43 PM
Charter Member
Join Date: Oct 2000
Location: Twitter: @MeasureMeasure
Posts: 12,926
Quote:
Originally Posted by Measure for Measure
What about other settings? Biology? Demography? Chemistry? Atmospheric science? Engineering?
The American astronomer and mathematician Charles Sanders Peirce wondered about the Law of Errors as well. To test it, he hired a laborer with no scientific background to respond to a signal by pressing a telegraph key. Peirce would measure the gap between the signal and the response in milliseconds. He did this for 24 days, 500 times per day. In 1872.

By the central limit theorem, he hypothesized that the resulting distribution would be Gauss. He was happy with his results: these graphs do indeed appear to be approximately normal1, as he came to label that distribution. This seemed to him to justify the use of least squares methods.

Sort of. His data was re-evaluated in 1928 by Edwin B. Wilson and Margartz M. Hilrirty. They concluded that the sample had many more outliers than a Gaussian and a positive skew as well. The dataset was revisited in 2009 by Roger Koenker of the University of Illinois using modern significance tests. Gaussian skewness was rejected in 19 out of 24 days; Gaussian kurtosis was rejected on all days. The author suggested that median approaches might be superior to mean ones, and that quantile approaches might be even better.

How did Peirce, sometimes referred to as one of the two greatest American scientists of the 1800s, form his conclusion? Well the plots actually do reveal some visual skew and kurtosis. But they also conform to Tukey's Maxim: "All distributions are normal in the middle."


1See Peirce, C. S. (1873): On the Theory of Errors of Observation," Report of the Superintendent of the U.S. Coast Survey, pp. 200-224., Reprinted in The New Elements of Mathematics, (1976) collected papers of C.S. Peirce, ed. by C. Eisele, Humanities Press: Atlantic Highlands, N.J., vol. 3, part 1, 639-676.

Last edited by Measure for Measure; 07-23-2011 at 09:44 PM.
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 01:14 AM.

Copyright © 2017
Best Topics: skyrim rain midgets in baseball gentleman door ms gulch cutter qatar vixen means vaporizing metal tripple lindy breathing 100 oxygen mountain mohammed baroque singing ball the jack allegra drowsy what is hentai groove folder synchronization pilgrim buckles archeology wod ksi stress ripping library cds akc dog names pursuit car alarms shatner height tute sweet wiping sweat interspecies rape barracuda song meaning pinning party do cats inbred men wear tampons washable skin markers almira gulch audibles message board how to disable a car engine chips bag in microwave writing a check without cents world vision good or bad if the creek don't rise saying circle below the waist game red up the house can't update windows xp speed bumps damage to cars bad taste in mouth after nap how to measure bust on a shirt countries the bible is banned in band has 2 drummers call in sick message example american accent in french does leather conduct electricity roger that ghost rider how old was jesus when he was crucified song with hey hey hey in the lyrics other words for boyfriend do i need a car in chicago unable to stay awake after eating best metal to metal epoxy traci lords under age how long to boil water in microwave 1 cup a gold quarter worth play old mac games on pc what is mortar board when to add detergent in washing machine another name for short story can i use hair dye on my beard 12v ac to dc converter russian style hat name