Your investment advisor proposes you a monthly income investment scheme which promises a variable return each month. You will invest in it only if you are assured of an average of a $180 monthly income. Your advisor also tells you that for the past 300 months, the scheme had returns with an average value of $190 and standard deviation of $75. Should you invest in this scheme?
Hypothesis testing comes to the aid for such decision making.
This article assumes readers' familiarity with concepts of a normal distribution table, formula, p-value and related basics of statistics.
For more on practical applications of data to determine risk, see "5 Ways to Measure Mutual Fund Risk."
Hypothesis Testing (or significance testing) is a mathematical model for testing a claim, idea or hypothesis about a parameter of interest in a given population set, using data measured in a sample set. Calculations are performed on selected samples to gather more decisive information about characteristics of the entire population, which enables a systematic way to test claims or ideas about the entire dataset.
Here is a simple example: (A) A school principal reports that students in her school score an average of 7 out of 10 in exams. To test this “hypothesis”, we record marks of say 30 students (sample) from the entire student population of the school (say 300) and calculate the mean of that sample. We can then compare the (calculated) sample mean to the (reported) population mean and attempt to confirm the hypothesis.
Another example: (B) The annual return of a particular mutual fund is 8%. Assume that mutual fund has been in existence for 20 years. We take a random sample of annual returns of the mutual fund for, say, five years (sample) and calculate its mean. We then compare the (calculated) sample mean to the (claimed) population mean to verify the hypothesis.
Different methodologies exist for hypothesis testing. The following four basic steps are involved:
Step 1: Define the hypothesis:
Usually the reported value (or the claim statistics) is stated as the hypothesis and presumed to be true. For the above examples, hypothesis will be:
- Example A: Students in the school score an average of 7 out 10 in exams
- Example B: Annual return of the mutual fund is 8% per annum
This stated description constitutes the “Null Hypothesis (H0)” and is assumed to be true. Like a jury trial starts by assuming innocence of the suspect followed by determination whether the assumption is false. Similarly, hypothesis testing starts by stating and assuming the “Null Hypothesis”, and then the process determines whether the assumption is likely to be true or false.
The important point to note is that we are testing the null hypothesis because there is an element of doubt about its validity. Whatever information that is against the stated null hypothesis is captured in the Alternative Hypothesis (H1). For the above examples, alternative hypothesis will be:
- Students score an average which is not equal to 7
- Annual return of the mutual fund is not equal to 8% per annum
In summary, Alternative hypothesis is a direct contradiction of the null hypothesis.
As in a trial, jury assumes suspect's innocence (null hypothesis). The prosecutor has to prove otherwise (alternative). Similarly, the researcher has to prove that the null hypothesis is either true or false. If the prosecutor fails to prove the alternative hypothesis, the jury has to let go the "suspect"(basing the decision on null hypothesis). Similarly, if researcher fails to prove alternative hypothesis (or simply does nothing), then null hypothesis is assumed to be true.
Step 2: Set the decision criteria
The decision-making criteria have to be based on certain parameters of datasets and this is where the connection to normal distribution comes into the picture.
As per the standard statistics postulate about sampling distribution, “For any sample size n, the sampling distribution of X̅ is normal if the population X from which the sample is drawn is normally distributed.” Hence, the probabilities of all other possible sample means one could select are normally distributed.
For e.g., determine if the average daily return, of any stock listed on XYZ stock market, around New Year's time is greater than 2%.
H0: Null Hypothesis: mean = 2%
H1: Alternative Hypothesis: mean > 2% (This is what we want to prove)
Take the sample (say of 50 stocks out of total 500) and compute the mean of sample.
For a normal distribution, 95% of the values lie within 2 standard deviations of the population mean. Hence, this normal distribution and central limit assumption for the sample dataset allows us to establish 5% as a significance level. It makes sense as under this assumption, there is less than a 5% probability (100-95) of getting outliers that are beyond 2 standard deviations from the population mean. Depending upon the nature of datasets, other significance levels can be taken at 1%, 5% or 10%. For financial calculations (including behavioral finance), 5% is the generally accepted limit. If we find any calculations that go beyond the usual 2 standard deviations, then we have a strong case of outliers to reject the null hypothesis. Standard deviations are extremely important to understanding statistical data. Learn more about them by watching Investopedia's video on Standard deviations.
Graphically, it is represented as follows:
In the above example, if the mean of the sample is much larger than 2% (say 3.5%), then we reject the null hypothesis. The alternative hypothesis (mean >2%) is accepted, which confirms that the average daily return of the stocks are indeed above 2%.
However, if the mean of sample is not likely to be significantly greater than 2% (and remain at say around 2.2%), then we CANNOT reject the null hypothesis. The challenge comes on how to decide on such close range cases. To make a conclusion from selected samples and results, a level of significance is to be determined, which enables a conclusion to be made about the null hypothesis. The alternative hypothesis enables establishing the level of significance or the "critical value” concept for deciding on such close range cases. As per the standard definition, “A critical value is a cutoff value that defines the boundaries beyond which less than 5% of sample means can be obtained if the null hypothesis is true. Sample means obtained beyond a critical value will result in a decision to reject the null hypothesis”. In the above example, if we have defined the critical value as 2.1%, and the calculated mean comes to 2.2%, then we reject the null hypothesis. A critical value establishes a clear demarcation about acceptance or rejection.
More examples to follow – First, though, let's look at some more key steps and concepts.
Step 3: Calculate the test statistic:
This step involves calculating the required figure(s), known as test statistics (like mean, z-score, p-value, etc.), for the selected sample. The various values to be calculated are covered in a later section with examples.
Step 4: Make conclusions about the hypothesis
With the computed value(s), decide on the null hypothesis. If the probability of getting a sample mean is less than 5%, then the conclusion is to reject the null hypothesis. Otherwise, accept and retain the null hypothesis.
Types of Errors in decision making:
There can be four possible outcomes in sample-based decision making, with regards to the correct applicability to entire population:
Decision to Retain
Decision to Reject
Applies to entire population
(TYPE 1 Error - a)
Does not apply to entire population
(TYPE 2 Error - b)
The “Correct” cases are the ones where the decisions taken on the samples are truly applicable to the entire population. The cases of errors arise when one decides to retain (or reject) the null hypothesis based on sample calculations, but that decision does not really apply for the entire population. These cases constitute Type 1 (alpha) and Type 2 (beta) errors, as indicated in the table above.
Selecting the correct critical value allows eliminating the type-1 alpha errors or limiting them to an acceptable range.
Alpha denotes the error on level of significance, and is determined by the researcher. To maintain the standard 5% significance or confidence level for probability calculations, this is retained at 5%.
As per the applicable decision-making benchmarks and definitions:
- “This (alpha) criterion is usually set at 0.05 (a = 0.05), and we compare the alpha level to the p value. When the probability of a Type I error is less than 5% (p < 0.05), we decide to reject the null hypothesis; otherwise, we retain the null hypothesis.”
- The technical term used for this probability is p-value. It is defined as “the probability of obtaining a sample outcome, given that the value stated in the null hypothesis is true. The p value for obtaining a sample outcome is compared to the level of significance”.
- A Type II error, or beta error, is defined as “the probability of incorrectly retaining the null hypothesis, when in fact it is not applicable to the entire population.”
A few more examples will demonstrate this and other calculations.
Example 1. A monthly income investment scheme exists that promises variable monthly returns. An investor will invest in it only if he is assured of an average $180 monthly income. He has a sample of 300 months’ returns which has a mean of $190 and standard-deviation of $75. Should he or she invest in this scheme?
Let’s set up the problem. The investor will invest in the scheme if he or she is assured of his desired $180 average return. Here,
H0: Null Hypothesis: mean = 180
H1: Alternative Hypothesis: mean > 180
Method 1 -Critical Value Approach:
Identify a critical value XL for the sample mean, which is large enough to reject the null hypothesis – i.e. reject the null hypothesis if sample mean >= critical value XL
P(identify a Type I alpha error) = P(reject H0 given that H0 is true),
which would be achieved when sample mean exceeds the critical limits i.e.
= P( given that H0 is true) = alpha
Taking alpha = 0.05 (i.e. 5% significance level), Z0.05 = 1.645 (from the Z-table or normal distribution table)
= > XL = 180 +1.645*(75/sqrt(300)) = 187.12
Since the sample mean (190) is greater than the critical value (187.12), the null hypothesis is rejected, and conclusion is that average monthly return is indeed greater than $180, so the investor can consider investing in this scheme.
Method 2 - Using standardized test statistics:
One can also use the standardized value z.
Test Statistic, Z = (sample mean – population mean)/(std-dev/sqrt(no. of samples) i.e.
Then, the rejection region becomes
Z= (190 – 180)/(75/sqrt(300)) = 2.309
Our rejection region at 5% significance level is Z> Z0.05 = 1.645
Since Z= 2.309 is greater than 1.645, the null hypothesis can be rejected with the similar conclusion mentioned above.
Method 3 - P-value calculation:
We aim to identify P(sample mean >= 190, when mean = 180)
= P (Z >= (190- 180)/( 75 / sqrt (300))
= P (Z >= 2.309) = 0.0084 = 0.84%
The following table to infer p-value calculations concludes that there is confirmed evidence of average monthly returns being higher than 180.
less than 1%
Confirmed evidence supporting alternative hypothesis
between 1% and 5%
Strong evidence supporting alternative hypothesis
between 5% and 10%
Weak evidence supporting alternative hypothesis
greater than 10%
No evidence supporting alternative hypothesis
Example 2: A new stock broker (XYZ) claims that his brokerage rates are lower than that of your current stock broker (ABC). Data available from an independent research firm indicates that the mean and std-dev of all ABC broker clients are $18 and $6 respectively.
A sample of 100 clients of ABC is taken and brokerage charges are calculated with the new rates of XYZ broker. If the mean of sample is $18.75 and std-dev is same ($6), can any inference be made about the difference in the average brokerage bill between ABC and XYZ broker?
H0: Null Hypothesis: mean = 18
H1: Alternative Hypothesis: mean <> 18 (This is what we want to prove)
Rejection region: Z <= - Z2.5 and Z>=Z2.5 (assuming 5% significance level, split 2.5 each on either side)
Z = (sample mean – mean)/(std-dev/sqrt(no. of samples)
= (18.75 – 18) / (6/(sqrt(100)) = 1.25
This calculated Z value falls between the two limits defined by
- Z2.5 = -1.96 and Z2.5 = 1.96.
This concludes that there is insufficient evidence to infer that there is any difference between the rates of your existing and new broker.
Alternatively, The p-value = P(Z< -1.25)+P(Z >1.25)
= 2 * 0.1056 = 0.2112 = 21.12% which is greater than 0.05 or 5%, leading to the same conclusion.
Graphically, it is represented by the following:
Criticism Points for Hypothetical Testing Method:
- Statistical method based on assumptions
- Error prone as detailed in terms of alpha and beta errors
- Interpretation of p-value can be ambigous, leading to confusing results
The Bottom Line
Hypothesis testing allows a mathematical model to validate a claim or idea with certain confidence level. However, like majority of statistical tools and models, this too is bound by a few limitations. The use of this model for making financial decisions should be considered with criticality, keeping all dependencies in mind. Alternate methods like Bayesian Inference are also worth exploring for similar analysis.
Photo by: olly
Social science research, and by extension business research, uses a number of different approaches to study a variety of issues. This research may be a very informal, simple process or it may be a formal, somewhat sophisticated process. Regardless of the type of process, all research begins with a generalized idea in the form of a research question or a hypothesis. A research question usually is posed in the beginning of a research effort or in a specific area of study that has had little formal research. A research question may take the form of a basic question about some issue or phenomena or a question about the relationship between two or more variables. For example, a research question might be: "Do flexible work hours improve employee productivity?" Another question might be: "How do flexible hours influence employees' work?"
A hypothesis differs from a research question; it is more specific and makes a prediction. It is a tentative statement about the relationship between two or more variables. The major difference between a research question and a hypothesis is that a hypothesis predicts an experimental outcome. For example, a hypothesis might state: "There is a positive relationship between the availability of flexible work hours and employee productivity."
Hypotheses provide the following benefits:
- They determine the focus and direction for a research effort.
- Their development forces the researcher to clearly state the purpose of the research activity.
- They determine what variables will not be considered in a study, as well as those that will be considered.
- They require the researcher to have an operational definition of the variables of interest.
The worth of a hypothesis often depends on the researcher's skills. Since the hypothesis is the basis of a research study, it is necessary for the hypothesis be developed with a great deal of thought and contemplation. There are basic criteria to consider when developing a hypothesis, in order to ensure that it meets the needs of the study and the researcher. A good hypothesis should:
- Have logical consistency. Based on the current research literature and knowledge base, does this hypothesis make sense?
- Be in step with the current literature and/or provide a good basis for any differences. Though it does not have to support the current body of literature, it is necessary to provide a good rationale for stepping away from the mainstream.
- Be testable. If one cannot design the means to conduct the research, the hypothesis means nothing.
- Be stated in clear and simple terms in order to reduce confusion.
HYPOTHESIS TESTING PROCESS
Hypothesis testing is a systematic method used to evaluate data and aid the decision-making process. Following is a typical series of steps involved in hypothesis testing:
- State the hypotheses of interest
- Determine the appropriate test statistic
- Specify the level of statistical significance
- Determine the decision rule for rejecting or not rejecting the null hypothesis
- Collect the data and perform the needed calculations
- Decide to reject or not reject the null hypothesis
Each step in the process will be discussed in detail, and an example will follow the discussion of the steps.
STATING THE HYPOTHESES.
A research study includes at least two hypotheses—the null hypothesis and the alternative hypothesis. The hypothesis being tested is referred to as the null hypothesis and it is designated as H It also is referred to as the hypothesis of no difference and should include a statement of equality (=, ≥, or £). The alternative hypothesis presents the alternative to the null and includes a statement of inequality (≠). The null hypothesis and the alternative hypothesis are complementary.
The null hypothesis is the statement that is believed to be correct throughout the analysis, and it is the null hypothesis upon which the analysis is based. For example, the null hypothesis might state that the average age of entering college freshmen is 21 years.
H 0 The average age of entering college freshman = 21 years
If the data one collects and analyzes indicates that the average age of entering college freshmen is greater than or less than 21 years, the null hypothesis is rejected. In this case the alternative hypothesis could be stated in the following three ways: (1) the average age of entering college freshman is not 21 years (the average age of entering college freshmen ≠ 21); (2) the average age of entering college freshman is less than 21 years (the average age of entering college freshmen < 21); or (3) the average age of entering college freshman is greater than 21 years (the average age of entering college freshmen > 21 years).
The choice of which alternative hypothesis to use is generally determined by the study's objective. The preceding second and third examples of alternative hypotheses involve the use of a "one-tailed" statistical test. This is referred to as "one-tailed" because a direction (greater than [>] or less than [<]) is implied in the statement. The first example represents a "two-tailed" test. There is inequality expressed (age ≠ 21 years), but the inequality does not imply direction. One-tailed tests are used more often in management and marketing research because there usually is a need to imply a specific direction in the outcome. For example, it is more likely that a researcher would want to know if Product A performed better than Product B (Product A performance > Product B performance), or vice versa (Product A performance < Product B performance), rather than whether Product A performed differently than Product B (Product A performance ≠ Product B performance). Additionally, more useful information is gained by knowing that employees who work from 7:00 a.m. to 4:00 p.m. are more productive than those who work from 3:00 p.m. to 12:00 a.m. (early shift employee production > late shift employee production), rather than simply knowing that these employees have different levels of productivity (early shift employee production ≠ late shift employee production).
Both the alternative and the null hypotheses must be determined and stated prior to the collection of data. Before the alternative and null hypotheses can be formulated it is necessary to decide on the desired or expected conclusion of the research. Generally, the desired conclusion of the study is stated in the alternative hypothesis. This is true as long as the null hypothesis can include a statement of equality. For example, suppose that a researcher is interested in exploring the effects of amount of study time on tests scores. The researcher believes that students who study longer perform better on tests. Specifically, the research suggests that students who spend four hours studying for an exam will get a better score than those who study two hours. In this case the hypotheses might be:
H 0 The average test scores of students who study 4 hours for the test = the average test scores of those who study 2 hours.
H 1 The average test score of students who study 4 hours for the test < the average test scores of those who study 2 hours.
As a result of the statistical analysis, the null hypothesis can be rejected or not rejected. As a principle of rigorous scientific method, this subtle but important point means that the null hypothesis cannot be accepted. If the null is rejected, the alternative hypothesis can be accepted; however, if the null is not rejected, we can't conclude that the null hypothesis is true. The rationale is that evidence that supports a hypothesis is not conclusive, but evidence that negates a hypothesis is ample to discredit a hypothesis. The analysis of study time and test scores provides an example. If the results of one study indicate that the test scores of students who study 4 hours are significantly better than the test scores of students who study two hours, the null hypothesis can be rejected because the researcher has found one case when the null is not true. However, if the results of the study indicate that the test scores of those who study 4 hours are not significantly better than those who study 2 hours, the null hypothesis cannot be rejected. One also cannot conclude that the null hypothesis is accepted because these results are only one set of score comparisons. Just because the null hypothesis is true in one situation does not mean it is always true.
DETERMINING THE APPROPRIATE TEST STATISTIC.
The appropriate test statistic (the statistic to be used in statistical hypothesis testing) is based on various characteristics of the sample population of interest, including sample size and distribution. The test statistic can assume many numerical values. Since the value of the test statistic has a significant effect on the decision, one must use the appropriate statistic in order to obtain meaningful results. Most test statistics follow this general pattern:
For example, the appropriate statistic to use when testing a hypothesis about a population means is:
In this formula Z = test statistic, Χ̅ = mean of the sample, μ = mean of the population, σ = standard deviation of the sample, and η = number in the sample.
SPECIFYING THE STATISTICAL SIGNIFICANCE SEVEL.
As previously noted, one can reject a null hypothesis or fail to reject a null hypothesis. A null hypothesis that is rejected may, in reality, be true or false. Additionally, a null hypothesis that fails to be rejected may, in reality, be true or false. The outcome that a researcher desires is to reject a false null hypothesis or to fail to reject a true null hypothesis. However, there always is the possibility of rejecting a true hypothesis or failing to reject a false hypothesis.
Rejecting a null hypothesis that is true is called a Type I error and failing to reject a false null hypothesis is called a Type II error. The probability of committing a Type I error is termed α and the probability of committing a Type II error is termed β. As the value of α increases, the probability of committing a Type I error increases. As the value of β increases, the probability of committing a Type II error increases. While one would like to decrease the probability of committing of both types of errors, the reduction of α results in the increase of β and vice versa. The best way to reduce the probability of decreasing both types of error is to increase sample size.
The probability of committing a Type I error, α, is called the level of significance. Before data is collected one must specify a level of significance, or the probability of committing a Type I error (rejecting a true null hypothesis). There is an inverse relationship between a researcher's desire to avoid making a Type I error and the selected value of α; if not making the error is particularly important, a low probability of making the error is sought. The greater the desire is to not reject a true null hypothesis, the lower the selected value of α. In theory, the value of α can be any value between 0 and 1. However, the most common values used in social science research are .05, .01, and .001, which respectively correspond to the levels of 95 percent, 99 percent, and 99.9 percent likelihood that a Type I error is not being made. The tradeoff for choosing a higher level of certainty (significance) is that it will take much stronger statistical evidence to ever reject the null hypothesis.
DETERMINING THE DECISION RULE.
Before data are collected and analyzed it is necessary to determine under what circumstances the null hypothesis will be rejected or fail to be rejected. The decision rule can be stated in terms of the computed test statistic, or in probabilistic terms. The same decision will be reached regardless of which method is chosen.
COLLECTING THE DATA AND PERFORMING THE CALCULATIONS.
The method of data collection is determined early in the research process. Once a research question is determined, one must make decisions regarding what type of data is needed and how the data will be collected. This decision establishes the bases for how the data will be analyzed. One should use only approved research methods for collecting and analyzing data.
DECIDING WHETHER TO REJECT THE NULL HYPOTHESIS.
This step involves the application of the decision rule. The decision rule allows one to reject or fail to reject the null hypothesis. If one rejects the null hypothesis, the alternative hypothesis can be accepted. However, as discussed earlier, if one fails to reject the null he or she can only suggest that the null may be true.
XYZ Corporation is a company that is focused on a stable workforce that has very little turnover. XYZ has been in business for 50 years and has more than 10,000 employees. The company has always promoted the idea that its employees stay with them for a very long time, and it has used the following line in its recruitment brochures: "The average tenure of our employees is 20 years." Since XYZ isn't quite sure if that statement is still true, a random sample of 100 employees is taken and the average age turns out to be 19 years with a standard deviation of 2 years. Can XYZ continue to make its claim, or does it need to make a change?
- State the hypotheses.
H 0 = 20 years
H 1 ≠ 20 years
- Determine the test statistic. Since we are testing a population mean that is normally distributed, the appropriate test statistic is:
- Specify the significance level. Since the firm would like to keep its present message to new recruits, it selects a fairly weak significance level (α = .05). Since this is a two-tailed test, half of the alpha will be assigned to each tail of the distribution. In this situation the critical values of Z = +1.96 and −1.96.
- State the decision rule. If the computed value of Z is greater than or equal to +1.96 or less than or equal to −1.96, the null hypothesis is rejected.
- Reject or fail to reject the null. Since 2.5 is greater than 1.96, the null is rejected. The mean tenure is not 20 years, therefore XYZ needs to change its statement.
Donna T. Mayo
Revised by Marcia Simmering
Anderson, David R., Dennis J. Sweeney, and Thomas A. Williams. Statistics for Business and Economics. 9th ed. Mason, OH: South-Western College Publishing, 2004.
Kerlinger, Fred N., and Howard B. Lee. Foundations of Behavioral Research. 4th ed. Fort Worth, TX: Harcourt College Publishers, 2000.
Pedhazur, Elazar J., and Liora Pedhazur Schmelkin. Measurement, Design, and Analysis: An Integrated Approach. Hillsdale, NJ: Lawrence Erlbaum Associates, 1991.
Schwab, Donald P. Research Methods for Organizational Studies. Mahwah, NJ: Lawrence Erlbaum Associates, 1999.
Also read article about Hypothesis Testing from Wikipedia
Human Resource Management Income Statements