###### Cumulative Distribution Function (CDF)

A cumulative distribution function, \(F(x)\), gives the probability that the random variable \(X\)... **Read More**

Sampling considerations refer to the desirable characteristics that should always be taken into account when selecting a sample, which in turn increase the chances of accurately estimating the population parameters. In general, larger samples are preferred to smaller ones. This is because the confidence intervals that result from the use of a large *n* are narrower than those that result from the use of a small *n*. Obviously, such intervals have more confidence and reliability. This means that the estimation of the population parameter is more precise compared to smaller samples.

There is yet another reason why a large sample may be desirable in any statistical analysis: The standard error is inversely proportional to the sample size, i.e., the standard error reduces when *n* increases.

However, larger samples have the following shortcomings:

- Some population parameters tend to change over time. A good example of such population parameters is stock market financial performance which is affected by ever-changing factors. This means that mixing “old” data with more recent data may result in a population parameter that is not only unreliable but also somewhat outdated.
- Taking a larger sample may increase the overall sampling costs.

Data snooping is the practice of analyzing historical data so as to unearth trends and other inherent relationships between variables. Analysts may then use such trends to predict future behavior.

Data snooping bias occurs when analysts excessively analyze data, giving rise to statistically irrelevant, and sometimes, non-existent trends.

Sample selection bias refers to the exclusion of a section of the population from sample analysis due to the unavailability of data. This erodes the aspect of randomness since the exclusion of a certain class of data somehow amounts to the collection of data from a subset of the population. The resulting parameter is not representative of the population as a whole.

Survivorship bias entails the exclusion of information that relates to financial vehicles that are no longer existent during the sampling period. Consequently, conclusions informed by survivorship bias may underestimate or overestimate the population parameters. For example, most mutual fund databases that track performance may exclude funds that have since closed. As such, analyzing only the “surviving” funds may overestimate the average mutual fund earnings.

Backfill bias arises when successful funds submit their past performance to a database, which then compiles all previous data. Meanwhile, unsuccessful funds don’t report to the database. Their poor performance is, therefore, excluded from the database.

* Note to candidates:* Make sure to be able to differentiate between survivorship bias and backfill bias for the exam!

Look-ahead bias manifests itself when an analyst assumes that information is readily available on a certain date when, in fact, it is not. For example, analysts may assume that end-of-year financial information, such as the annual profit generated, is available in January. Such an assumption might mislead in a case where most companies take up to 60 additional days before releasing results.

Time period bias involves an inappropriate generalization of time-specific results – results that only apply to certain seasons or periods. Most entities experience seasonal variation in performance as a result of which some months may be more productive than others. For example, ice cream production companies across Europe may record bigger sales during summer and lower sales during winter. Therefore, a sample of such entities drawn during winter will estimate winter-specific parameters.