Independent vs. Dependent Events
Two or more events are independent if the occurrence of one event has... Read More
Sampling considerations refer to the desirable characteristics that should always be taken into account when selecting a sample so as to increase chances of accurate estimation of population parameters. In general, larger samples are preferred to smaller ones because the confidence intervals that result from the use of a large n are narrower. Therefore, such intervals have more confidence and reliability. This means that the estimation of the population parameter is more precise compared to smaller samples.
Further, a large sample may be desirable in any statistical analysis because the standard error is inversely proportional to the sample size, i.e., the standard error reduces when n increases.
However, larger samples have the following shortcomings:
Data mining is the practice of analyzing historical data so as to unearth trends and other inherent relationships between variables. Analysts may then use such trends to predict future behavior.
Data mining bias occurs when analysts excessively analyze data, giving rise to statistically irrelevant and, sometimes, non-existent trends.
Sample selection bias refers to the tendency to exclude a section of the population from sample analysis due to unavailability of data. This erodes the idea of randomness since the exclusion of a certain class of data somewhat amounts to collecting data from a subset of the population. The resulting parameter is, as such, not representative of the population as a whole.
Survivorship bias entails exclusion of information that relates to financial vehicles that are no longer existent, during sampling. Consequently, conclusions may underestimate or overestimate the population parameters. For example, most mutual fund databases that track performance may exclude funds that have underperformed, leading to closure. Analyzing only the “surviving” funds may overestimate the average mutual fund earnings.
Look-ahead bias is occasioned by an analyst assumption that information is readily available on a certain date when in, fact, it’s not. For example, analysts may assume that end-of-year financial information, such as the annual profit generated, is available in January yet most companies take up to 60 additional days before releasing results.
Time period bias involves inappropriate generalization of time-specific results – those results that only apply to certain seasons or periods. Most entities experience seasonal variation in performance so that some months may be more productive than others. For example, ice cream production companies across Europe may record bigger sales during the summer and lower sales during winter. Therefore, a sample of such entities drawn during winter will estimate winter-specific parameters.
Reading 10 LOS 10k:
Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.