The term bias, in a statistical context, has a variety of meanings. These include: selection bias, recall bias, estimation bias, systematic bias and observer bias. Each of the terms relates to a specific technical aspect of the overall concept of bias, but exclude the broader questions of bias in the conceptualization of a problem or in the general manner in which results are collected, processed, analyzed and interpreted.
The most common usage of the term relates to selection bias. This refers to the selection of individuals or entities in a manner that is not representative of the population of interest. In general selection bias can be minimized by careful study design, but there are many pitfalls in this process and it is easy for bias to occur without it being at all obvious. One example of selection bias is self-selection — sample surveys are typically only completed by willing participants, and by definition there is selection bias occurring. Sampling ‘easy to collect’ data is another example of possible bias. For example, when taking samples of surface water and groundwater in order to measure the level of certain chemicals in the groundwater, most samples are taken from rivers, streams and wells — these sampling points are convenient but may not be representative of either the groundwater as a whole or its variations across the spatial extent of the study area.
When sampling human populations some degree of self-selection is almost inevitable, so it is particularly important to examine just how representative the samples are for the problem at hand. Self-selection can be acceptable if the sample does indeed represent the population of interest — for example, women in the age group 18-30 who are single and regularly use social networking sites might be recruited by an invitation to female members of a range of such sites, and those accepting the invitation may be regarded as an acceptable sample.
A special example of selection bias occurs when meta-analysis is conducted. In this instance the bias occurs in two possible ways: (i) available studies to utilize are often restricted to those published, and may be biased towards studies that show distinct result rather than no-result studies (sometimes referred to as publication bias); and (ii) selection of studies for inclusion in meta-analyses that confirm a particular viewpoint will increase the potential bias significantly (akin to the broader issues of bias mentioned at the start of this topic). This issue is discussed at length in the context of medical research by Ben Goldacre in his 2012 book "Bad Pharma" ([GOL1]), a follow-up to his well-known book entitled "Bad Science" ([GOL2]).
Recalled results are not as reliable as measured or monitored results. For example, when completing questionnaires respondents often over-estimate or over-elaborate in some areas and under-estimate or omit important topics in others. In medical research, cohort studies are generally not as susceptible to bias as case-control studies, since the latter may be subject to selection bias or recall bias (being asked to recall information after the event).
Bias is also used to refer to the difference between the true or population value of a parameter being estimated from a sample and the sample value, i.e. estimation bias. For example, the variance of a truly representative sample of size n from a much larger population will be less than or equal to the population variance, since there will generally be a wider range of values in the population than in the sample. To obtain an unbiased estimate the sample variance is adjusted by a factor n/(n-1), which tends to 1 as the sample size is increased.
In more common usage bias is applied to deliberate or unintentional systematic distortion of information. This usage does have practical importance within the field of statistical analysis, for example in studies that involve measurements using specialized equipment. It is not unusual for equipment to be either incorrectly calibrated (e.g. the zero point is not precisely located at zero) or to experience drift over time or in response to environmental factors (e.g. temperature fluctuations). Careful study design and regular device calibration are approaches to addressing such problems, although many effects may not be possible to control (e.g. solar activity impacting data recorded via satellite remote sensing devices).
Similar issues arise with so-called observer bias. In this case those involved in designing, administering and/or analyzing an experiment or other data collection exercise, influence the results, typically unintentionally. For example, the presence of an observer might alter the results in an animal or human behavior study (see, for example, the so-called Hawthorne effect), or in recording very small changes in temperature in an enclosed space. Such effects can generally be minimized by careful experimental design, by use of multiple independent observers, and in some experiments, by ensuring that the observers are unaware of which treatments have been applied to which cases.