How to calculate jugs of whip

Procedure for hypothesis testing

The general procedure for a hypothesis test is the same for all variants:

  1. You set up your hypotheses (null and alternative hypothesis)
  2. You choose the test that best suits your question
  3. One determines the significance level \ (\ alpha \)
  4. You collect your data
  5. This data is used to calculate a summarizing key figure, the Test variable (or Test statistics)
  6. The distribution of this test variable is determined
  7. Either the critical area or the p-value is calculated
  8. The result from step 6 is used to check whether the null hypothesis is rejected or retained.
There are written exams on this topic in the eBook shop!
To the eBooks

These eight steps have already been mentioned in the article “What are hypothesis tests?”. These steps are described in more detail here:

1. Make hypotheses

First of all, you formulate your question and put it in the form of two hypotheses. Here it is important that one wants to refute the null hypothesis \ (H_0 \) and want to prove that the alternative hypothesis, \ (H_1 \), applies instead. Therefore \ (H_0 \) and \ (H_1 \) must contradict each other. In the introductory article we already had the example with beer mugs. There we wanted to prove that, on average, not enough beer is filled into the beer mugs at the Oktoberfest. Our hypotheses are formulated as follows:

  • \ (H_0 \): The average content of a beer mug is equal to (or greater) than one liter
  • \ (H_1 \): The average content of a beer mug is less than a liter

It is important, as I said, that our assertion that we want to prove is formulated in the alternative \ (H_1 \). The article "What is in \ (H_0 \) and what is in \ (H_1 \)?" Provides more detailed help here.

If we now represent the average content of a beer mug with \ (\ mu \), we can formulate the hypotheses more briefly and mathematically more unambiguously:

  • \ (H_0: \; \ mu \ geq 1 \ text {Ltr.} \)
  • \ (H_1: \; \ mu <1 \ text {Ltr.} \)

One-sided and two-sided tests

There are three possible ways to make a pair of hypotheses. They are divided into one-sided and in bilateral Tests, depending on the direction in which the alternative hypothesis is aimed:

One one-sided We have just seen a test in the example above: We want to find out whether the average content in a beer mug smaller is than a liter. The alternative hypothesis only works a page, namely in the direction of "less than a liter". In general, the pair of hypotheses looks like this:

  • \ (H_0: \; \ mu \ geq a \)
  • \ (H_1: \; \ mu

There are also one-sided tests in the other direction. Then the alternative hypothesis is that the parameter greater is than any predetermined value. For example, if you want to sound the alarm when the average temperature at some point greater is than a certain value, then such a test would be needed. The hypotheses are then:

At a bilateral Test you just want to find out whether a parameter different is than a predetermined value - regardless of whether it is smaller or larger. An example would be a test in a food factory to determine whether the filling weight in a package is constant at the specified value. You need an alarm if the weight deviates, whether up or down. The hypotheses are then generally:

Intermediate task

One would like to prove by means of a test that career starters with a master’s degree earn more on average than career starters with a bachelor’s degree. To this end, 100 young professionals are asked about their degree and starting salary.

What is the null or alternative hypothesis in this case?

2. Select test

In order to decide which test is the right one, one must first set up the null and alternative hypotheses and define the scale level of all occurring variables (target variable and possibly influencing variable (s)). The test selection is then possible, for example, via a table, as I show it in an article here. In the example with the beer mug above, we have a normally distributed target variable and no influencing variable - according to the table, the one-sample t-test fits here.

After you have chosen the appropriate test, it is automatically determined later which test variable you have to calculate and what distribution it has.

3. Define the significance level

A hypothesis can never be confirmed or refuted with absolute certainty, but only with a certain probability. So it can always happen that by chance we get a lot of beer mugs in our sample and calculate an average of, for example, \ (\ bar {x} = 940 \ text {ml} \). So we would falsely “prove” that, on average, too little beer is filled into the mugs, even though the real average content is actually one liter.

Formulated in statistical language this means: We would therefore reject the null hypothesis, although it is true in reality.

Before performing the test, you have to commit yourself to a level of significance, called \ (\ alpha \), which determines the maximum probability that such a mistake can happen to us. The more certain we want to be with our decision, the lower this error probability has to be chosen. In the vast majority of cases, both in practice and in exams, this value is specified as \ (\ alpha = 5 \% \).

\ (\ alpha \) and \ (\ beta \) errors

In addition to the error of rejecting \ (H_0 \) even though it is true, there is another wrong decision that can happen during testing: If, on average, too little beer is actually bottled and our test cannot prove this. Then we keep the null hypothesis (enough beer) when in reality the alternative hypothesis (too little beer) is true.

A total of four cases can occur in a test:

  1. We reject \ (H_0 \), so assume \ (H_1 \).
    1. In reality, \ (H_0 \) is true: Here we wrongly reject \ (H_0 \). This is the \ (\ alpha \) error, also called type I error. This case occurs with a probability of \ (\ alpha \) - because a test is constructed in exactly the same way. The level \ (\ alpha \) regulates how certain you can be that \ (H_1 \) is actually true, given that \ (H_0 \) is also rejected.
    2. In reality, \ (H_1 \) is true: Everything OK. \ (H_1 \) is correct, and we assume \ (H_1 \).
  2. We keep \ (H_0 \).
    1. In reality \ (H_0 \) is true: Everything OK. \ (H_0 \) is true, and we don't believe in \ (H_1 \).
    2. In reality, \ (H_1 \) is true: In this case our guess is true (i.e. \ (H_1 \), which we want to prove, is true), but the test could not confirm it because we keep \ (H_0 \). This is the so-called \ (\ beta \) error, also called type 2 error. We cannot control this probability, it depends on the type of test and the significance level \ (\ alpha \).

4. Collect data

The next step is to collect data. You don't have to do that in an exam, of course, but in real situations, collecting data is usually the most time-consuming step.

In our example we would go to Oktoberfest, e.g. order ten liters of beer and measure the contents. The results could look like this:

Jug \ (x_i \)12345678910
content968ml1001ml987ml995ml1010ml983ml994ml962ml979ml965ml

5. Calculate the test variable

The data are now evaluated, assuming that \ (H_0 \) applies, i.e. everything is in order, i.e. the average content of a beer mug is actually one liter.

In order to be able to make a test decision later, you have to calculate a key figure from the data, the distribution of which is known (and which is usually attached to a formula collection in exams as a distribution table).

The idea of ​​the test in our case works as follows: We calculate the average content of the collected (hihi) beer mugs. With us this is \ (\ bar {x} = 984.4 \ text {ml} \).

The question that the test answers is now: “Assuming the true average content is 1000ml, this result of 984.4ml is still plausible enough that it could have been caused by random fluctuations, or is it so implausible that the true average is not 1000ml, but lower? "

We could of course be subjective and say: "984ml is already low - the mean value is definitely not 1000ml." But that is not a clear decision rule. What would we say with an average of 985ml? At 990ml? At 995ml?

The test now packages this question into a mathematical formula and a decision rule. It becomes a Test variable (or Test statistics), which in this case is a standardized version of the mean value \ (\ bar {x} \):

\ [T = \ sqrt {n} \ frac {\ bar {x} - \ mu_0} {s} \]

All the standardizations in this formula are there so that the test doesn't care

In our example we determine \ (\ bar {x} = 984.4 \ text {ml} \) and \ (s = 16.057 \). We take the value \ (\ mu_0 = 1000 \) from the null hypothesis. Our test variable \ (Z \) is thus

\ [T = \ sqrt {n} \ frac {\ bar {x} - \ mu_0} {s} = \ sqrt {10} \ frac {984.4 - 1000} {16.057} = -3.072 \]

6. Determine the distribution of the test variable

In order to be able to determine which values ​​are “normal”, ie still acceptable, for the test variable, one must know what distribution this test variable has. The test variable in a binomial test has, for example, the distribution \ (B (n, p) \), i.e. a binomial distribution with n = "number of observations" and p = "probability in the null hypothesis". In a t-test, the test variable then has a \ (t (n-1) \) distribution, i.e. a t-distribution with \ (n-1 \) degrees of freedom.

A test is usually designed in such a way that the distribution is "simple", e.g. a normal distribution with a mean value of 0 and a standard deviation of 1. The reason for this is that it is enough in books, exams, etc. only a To map the table for the normal distribution, namely the one with mean value 0 and standard deviation 1.

And this fact is also the reason why we calculate the test variable a little more laboriously. We could simply take the mean value of the data as a test variable. Instead, we standardize it by subtracting \ (\ mu_0 \) and dividing it by \ (s \). The advantage of this variant is, as just described, that the test variable is “forced” into a distribution for which we have a table.

7. Complete the test: Two options

Now there are two ways to answer the question of whether our mean is still plausible or not:

Completing the test: About the critical area (usually with the help of a distribution table in exams)

The first time we make the test decision, we'll designate one critical area. If our test variable then Not lies in this critical area, we assume that the beer mugs are correctly filled. But if the test variable in the critical area we have proof that in reality less than 1000ml is filled in a beer mug.

The critical area is a fixed area for a certain test type which, in the event that \ (H_0 \) applies, is only reached very rarely by the test variable (namely with a probability of \ (\ alpha \)). If the test variable is in this critical range after all, we have a strong reason to believe more in \ (H_1 \).

In a one-sided test, this area is only on one side, there is one Cabinets, and depending on the test direction, a check is made to see whether the test variable is above or below this limit. In a two-sided test, the critical area consists of two areas, i.e. there are two limits, one left and one right, and a check is made to see whether the test variable lies within the two limits or (in any direction) outside.

The critical limit can be read off from a distribution table without any problems. This is how you did it in the past, before the computer age, and that's how you still do it in exams. In practice, however, it is now more common to work with p-values:

Completing the test: Using the p-value (mostly in statistics programs)

Alternatively, we can also make one from the test variable p-value to calculate. This value tells us how likely it is to get such an extreme deviation from the mean value \ (\ mu_0 = 1000 \ text {ml} \) assuming a correct filling of an average of 1000ml.

If this probability is now very low (more precisely: if it is below the specified significance level \ (\ alpha \)), one has again proof that in reality less than 1000ml is filled into a beer mug. However, if the p-value is higher, this could not be proven and the null hypothesis is retained.

8. Make test decision

Now you have calculated all the values ​​in order to be able to make your test decision. If you choose the critical area has decided, we consider two values: the test variable and the critical area. You can simply see whether the test variable is within or outside of this range. If it is outside, then “everything is fine”, so we keep the null hypothesis, but if it is within the critical range, then we have found enough evidence for it to be able to accept the alternative hypothesis.

If you go to step 6 for the p-value has decided, the last step is a little easier: You consider two other values: the p-value and the significance level \ (\ alpha \). If the p-value is above the significance level \ (\ alpha \), we keep the null hypothesis, but if the p-value is less than \ (\ alpha \), we have found enough evidence to support the alternative hypothesis.

This entry was posted on by Alex under General, Hypothesis Tests.