If the frequency distribution for a dataset is broadly unimodal and left-skewed, the natural log transform (logarithms base e) will adjust the pattern to make it more symmetric/similar to a Normal distribution. For variates whose values may range from 0 upwards a value of 1 is often added to the transform. The data may be back transformed with the exp() function. The basic transform is illustrated in the graph at the end of this section, and is:

Note that there is a simple relationship between natural (base e) and base10 logs:

ln(x)=loge(x)=log10(x)*log10(e)

The back transform is:

Some software packages provide the option of using the exponential function itself as a transform. The form is typically:

To illustrate the use of log transforms we note that very often data sets show marked skewness to the left and have all values >0. This is common with both physical data (e.g. recording of the levels of trace elements in soil samples) and census data (e.g. family incomes). In the example illustrated below we have taken the recorded levels of zinc in parts per million (ppm) in 98 soil samples from the Maas region of the Netherlands (see Burrough and McDonnell, 1998, Appendix 3 [BUR1] for this dataset). The left hand histogram shows the source data and the right hand chart the data after transformation.

Zinc levels in 98 soil samples, ppm

With a relatively small sample size, the transformed data is unlikely to provide a very close match to a Normal distribution, but as can be seen from the table below the skewness of the dataset has been greatly reduced, as has the kurtosis. Note that the back transform of processed data, for example the mean (which has a back-transform of 355), is not the same as the mean of the source data, i.e. exp(mean(log(x)))≠mean(x).

Measure |
X |
Ln(X) |
---|---|---|

Mean |
481.03 |
5.8707 |

Standard deviation |
398.81 |
0.77831 |

Skewness |
1.4486 |
0.30696 |

Kurtosis |
4.6766 |
1.8789 |

Basic Loge data transforms

Variants of the basic log transforms, known as Johnson transforms (after Johnson, 1940, 1970 [JOH1], [JOH2]) are provided by some packages such as Minitab. In each case the transform is an adjustment to the standard form to incorporate addition parameters that are selected according to which provides the best fit to a Normal distribution (see Chou et al., 1998 for details of fitting procedure used by Minitab). These variants are:

In each case η>0 and δ>0. The case zL is essentially corresponds to the family of lognormal distributions, with zB and zU providing bounded and unbounded variations. Letting Y=(x-ε)/λ it can be seen that the three variants simplify to distributions whose shape depends on the values assigned to γ and δ only,

References

[BUR1] Burrough P A, McDonnell R A (1998) Principles of Geographical Information Systems. Appendix 3. Oxford University Press, Oxford

[CHO1] Chou Y, Polansky A M, Mason R L (1998) Transforming Nonnormal Data to Normality in Statistical Process Control. J of Quality Technology, 30, 133-141

[FAR1] Farnum N R (1996) Using Johnson Curves to Describe Non-Normal Process Data. Quality Engineering. 9,2, 329-336

[JOH1] Johnson N L (1949) Systems of frequency curves generated by methods of translation. Biometrika, 36, 149-176

[JOH2] Johnson N L, Kotz S (1970) Continuous Univariate Distributions - 1. Ch. 12 section 4.3. Houghton-Mifflin, Boston