Test Datasets and data archives

Navigation:  Resources >

Test Datasets and data archives

Previous pageReturn to chapter overviewNext page

The following is a brief list of some of the best known sources of datasets that have been made available, often in connection with particular statistics books and articles. The list is then followed by a small sample of often cited test datasets.

Population datasets: Princeton University Office of Population Research data is obtainable from: http://opr.princeton.edu/archive/

Installed R packages include a large number of datasets used to provide examples of the use of the various functions, either in the base package or in additional installed packages such as MASS, spstat, tree, etc. In the case of the standard package “datasets” over 100 datasets are included, including the sets SWISS and IRIS3 datasets used in this document. In some instances the data tables have been reproduced in this Handbook for ease of access.

Climate datasets: IPCC : http://www.ipcc-data.org/

Time series datasets (Box-Jenkins): http://www.stat.wisc.edu/~reinsel/bjr-data/index.html

StatLib: http://lib.stat.cmu.edu/datasets/ - a large number of datasets, mostly uploaded in the 1990s, including several that provide the data used in well-known books, such as Chatfield's (2003) Time Series Analysis and Diggle's (1990) Time series, together with links to other data archives.

UK Data Archive: Social sciences and humanities datasets, including medical: http://www.data-archive.ac.uk/

Datasets used in M J Crawley (2007) “The R Book” can be obtained from: http://www.bio.ic.ac.uk/research/mjcraw/therbook/

NIST StRD (Statistical Reference Datasets): http://www.itl.nist.gov/div898/strd/general/dataarchive.html

Swiss fertility data

This dataset is comprised of standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. There are 47 observations on 6 variables, each of which is in percent, i.e., in [0,100]. All variables but ‘Fertility’ give proportions of the population.

Fertility

Ig, ‘common standardized fertility measure’

Agriculture

% of males involved in agriculture as occupation

Examination

% draftees receiving highest mark on army examination

Education

% education beyond primary school for draftees.

Catholic

% ‘catholic’ (as opposed to ‘protestant’).

Infant.Mortality

live births who live less than 1 year.

 

 

Fertility

Agriculture

Examination

Education

Catholic

Infant.Mortality

Courtelary

80.2

17

15

12

9.96

22.2

Delemont

83.1

45.1

6

9

84.84

22.2

Franches-Mnt

92.5

39.7

5

5

93.4

20.2

Moutier

85.8

36.5

12

7

33.77

20.3

Neuveville

76.9

43.5

17

15

5.16

20.6

Porrentruy

76.1

35.3

9

7

90.57

26.6

Broye

83.8

70.2

16

7

92.85

23.6

Glane

92.4

67.8

14

8

97.16

24.9

Gruyere

82.4

53.3

12

7

97.67

21

Sarine

82.9

45.2

16

13

91.38

24.4

Veveyse

87.1

64.5

14

6

98.61

24.5

Aigle

64.1

62

21

12

8.52

16.5

Aubonne

66.9

67.5

14

7

2.27

19.1

Avenches

68.9

60.7

19

12

4.43

22.7

Cossonay

61.7

69.3

22

5

2.82

18.7

Echallens

68.3

72.6

18

2

24.2

21.2

Grandson

71.7

34

17

8

3.3

20

Lausanne

55.7

19.4

26

28

12.11

20.2

La Vallee

54.3

15.2

31

20

2.15

10.8

Lavaux

65.1

73

19

9

2.84

20

Morges

65.5

59.8

22

10

5.23

18

Moudon

65

55.1

14

3

4.52

22.4

Nyone

56.6

50.9

22

12

15.14

16.7

Orbe

57.4

54.1

20

6

4.2

15.3

Oron

72.5

71.2

12

1

2.4

21

Payerne

74.2

58.1

14

8

5.23

23.8

Paysd'enhaut

72

63.5

6

3

2.56

18

Rolle

60.5

60.8

16

10

7.72

16.3

Vevey

58.3

26.8

25

19

18.46

20.9

Yverdon

65.4

49.5

15

8

6.1

22.5

Conthey

75.5

85.9

3

2

99.71

15.1

Entremont

69.3

84.9

7

6

99.68

19.8

Herens

77.3

89.7

5

2

100

18.3

Martigwy

70.5

78.2

12

6

98.96

19.4

Monthey

79.4

64.9

7

3

98.22

20.2

St Maurice

65

75.9

9

9

99.06

17.8

Sierre

92.2

84.6

3

3

99.46

16.3

Sion

79.3

63.1

13

13

96.83

18.1

Boudry

70.4

38.4

26

12

5.62

20.3

La Chauxdfnd

65.7

7.7

29

11

13.79

20.5

Le Locle

72.7

16.7

22

13

11.22

18.9

Neuchatel

64.4

17.6

35

32

16.92

23

Val de Ruz

77.6

37.6

15

7

4.97

20

ValdeTravers

67.6

18.7

25

7

8.65

19.5

V. De Geneve

35

1.2

37

53

42.34

18

Rive Droite

44.7

46.6

16

29

50.43

18.2

Rive Gauche

42.8

27.7

22

29

58.33

19.3

Iris data

This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. There are 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species.

Sample

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

Species

1

5.1

3.5

1.4

0.2

setosa

2

4.9

3

1.4

0.2

setosa

3

4.7

3.2

1.3

0.2

setosa

4

4.6

3.1

1.5

0.2

setosa

5

5

3.6

1.4

0.2

setosa

6

5.4

3.9

1.7

0.4

setosa

7

4.6

3.4

1.4

0.3

setosa

8

5

3.4

1.5

0.2

setosa

9

4.4

2.9

1.4

0.2

setosa

10

4.9

3.1

1.5

0.1

setosa

11

5.4

3.7

1.5

0.2

setosa

12

4.8

3.4

1.6

0.2

setosa

13

4.8

3

1.4

0.1

setosa

14

4.3

3

1.1

0.1

setosa

15

5.8

4

1.2

0.2

setosa

16

5.7

4.4

1.5

0.4

setosa

17

5.4

3.9

1.3

0.4

setosa

18

5.1

3.5

1.4

0.3

setosa

19

5.7

3.8

1.7

0.3

setosa

20

5.1

3.8

1.5

0.3

setosa

21

5.4

3.4

1.7

0.2

setosa

22

5.1

3.7

1.5

0.4

setosa

23

4.6

3.6

1

0.2

setosa

24

5.1

3.3

1.7

0.5

setosa

25

4.8

3.4

1.9

0.2

setosa

26

5

3

1.6

0.2

setosa

27

5

3.4

1.6

0.4

setosa

28

5.2

3.5

1.5

0.2

setosa

29

5.2

3.4

1.4

0.2

setosa

30

4.7

3.2

1.6

0.2

setosa

31

4.8

3.1

1.6

0.2

setosa

32

5.4

3.4

1.5

0.4

setosa

33

5.2

4.1

1.5

0.1

setosa

34

5.5

4.2

1.4

0.2

setosa

35

4.9

3.1

1.5

0.2

setosa

36

5

3.2

1.2

0.2

setosa

37

5.5

3.5

1.3

0.2

setosa

38

4.9

3.6

1.4

0.1

setosa

39

4.4

3

1.3

0.2

setosa

40

5.1

3.4

1.5

0.2

setosa

41

5

3.5

1.3

0.3

setosa

42

4.5

2.3

1.3

0.3

setosa

43

4.4

3.2

1.3

0.2

setosa

44

5

3.5

1.6

0.6

setosa

45

5.1

3.8

1.9

0.4

setosa

46

4.8

3

1.4

0.3

setosa

47

5.1

3.8

1.6

0.2

setosa

48

4.6

3.2

1.4

0.2

setosa

49

5.3

3.7

1.5

0.2

setosa

50

5

3.3

1.4

0.2

setosa

51

7

3.2

4.7

1.4

versicolor

52

6.4

3.2

4.5

1.5

versicolor

53

6.9

3.1

4.9

1.5

versicolor

54

5.5

2.3

4

1.3

versicolor

55

6.5

2.8

4.6

1.5

versicolor

56

5.7

2.8

4.5

1.3

versicolor

57

6.3

3.3

4.7

1.6

versicolor

58

4.9

2.4

3.3

1

versicolor

59

6.6

2.9

4.6

1.3

versicolor

60

5.2

2.7

3.9

1.4

versicolor

61

5

2

3.5

1

versicolor

62

5.9

3

4.2

1.5

versicolor

63

6

2.2

4

1

versicolor

64

6.1

2.9

4.7

1.4

versicolor

65

5.6

2.9

3.6

1.3

versicolor

66

6.7

3.1

4.4

1.4

versicolor

67

5.6

3

4.5

1.5

versicolor

68

5.8

2.7

4.1

1

versicolor

69

6.2

2.2

4.5

1.5

versicolor

70

5.6

2.5

3.9

1.1

versicolor

71

5.9

3.2

4.8

1.8

versicolor

72

6.1

2.8

4

1.3

versicolor

73

6.3

2.5

4.9

1.5

versicolor

74

6.1

2.8

4.7

1.2

versicolor

75

6.4

2.9

4.3

1.3

versicolor

76

6.6

3

4.4

1.4

versicolor

77

6.8

2.8

4.8

1.4

versicolor

78

6.7

3

5

1.7

versicolor

79

6

2.9

4.5

1.5

versicolor

80

5.7

2.6

3.5

1

versicolor

81

5.5

2.4

3.8

1.1

versicolor

82

5.5

2.4

3.7

1

versicolor

83

5.8

2.7

3.9

1.2

versicolor

84

6

2.7

5.1

1.6

versicolor

85

5.4

3

4.5

1.5

versicolor

86

6

3.4

4.5

1.6

versicolor

87

6.7

3.1

4.7

1.5

versicolor

88

6.3

2.3

4.4

1.3

versicolor

89

5.6

3

4.1

1.3

versicolor

90

5.5

2.5

4

1.3

versicolor

91

5.5

2.6

4.4

1.2

versicolor

92

6.1

3

4.6

1.4

versicolor

93

5.8

2.6

4

1.2

versicolor

94

5

2.3

3.3

1

versicolor

95

5.6

2.7

4.2

1.3

versicolor

96

5.7

3

4.2

1.2

versicolor

97

5.7

2.9

4.2

1.3

versicolor

98

6.2

2.9

4.3

1.3

versicolor

99

5.1

2.5

3

1.1

versicolor

100

5.7

2.8

4.1

1.3

versicolor

101

6.3

3.3

6

2.5

virginica

102

5.8

2.7

5.1

1.9

virginica

103

7.1

3

5.9

2.1

virginica

104

6.3

2.9

5.6

1.8

virginica

105

6.5

3

5.8

2.2

virginica

106

7.6

3

6.6

2.1

virginica

107

4.9

2.5

4.5

1.7

virginica

108

7.3

2.9

6.3

1.8

virginica

109

6.7

2.5

5.8

1.8

virginica

110

7.2

3.6

6.1

2.5

virginica

111

6.5

3.2

5.1

2

virginica

112

6.4

2.7

5.3

1.9

virginica

113

6.8

3

5.5

2.1

virginica

114

5.7

2.5

5

2

virginica

115

5.8

2.8

5.1

2.4

virginica

116

6.4

3.2

5.3

2.3

virginica

117

6.5

3

5.5

1.8

virginica

118

7.7

3.8

6.7

2.2

virginica

119

7.7

2.6

6.9

2.3

virginica

120

6

2.2

5

1.5

virginica

121

6.9

3.2

5.7

2.3

virginica

122

5.6

2.8

4.9

2

virginica

123

7.7

2.8

6.7

2

virginica

124

6.3

2.7

4.9

1.8

virginica

125

6.7

3.3

5.7

2.1

virginica

126

7.2

3.2

6

1.8

virginica

127

6.2

2.8

4.8

1.8

virginica

128

6.1

3

4.9

1.8

virginica

129

6.4

2.8

5.6

2.1

virginica

130

7.2

3

5.8

1.6

virginica

131

7.4

2.8

6.1

1.9

virginica

132

7.9

3.8

6.4

2

virginica

133

6.4

2.8

5.6

2.2

virginica

134

6.3

2.8

5.1

1.5

virginica

135

6.1

2.6

5.6

1.4

virginica

136

7.7

3

6.1

2.3

virginica

137

6.3

3.4

5.6

2.4

virginica

138

6.4

3.1

5.5

1.8

virginica

139

6

3

4.8

1.8

virginica

140

6.9

3.1

5.4

2.1

virginica

141

6.7

3.1

5.6

2.4

virginica

142

6.9

3.1

5.1

2.3

virginica

143

5.8

2.7

5.1

1.9

virginica

144

6.8

3.2

5.9

2.3

virginica

145

6.7

3.3

5.7

2.5

virginica

146

6.7

3

5.2

2.3

virginica

147

6.3

2.5

5

1.9

virginica

148

6.5

3

5.2

2

virginica

149

6.2

3.4

5.4

2.3

virginica

150

5.9

3

5.1

1.8

virginica

Michelson-Morley data

The classical data of Michelson and Morley on the speed of light. The data consists of five experiments, each consisting of 20 consecutive ‘runs’. The response is the speed of light measurement, suitably coded. The dataset contains the following components: Expt: The experiment number, from 1 to 5. Run:The run number within each experiment. Speed: Speed-of-light measurement. The data is here viewed as a randomized block experiment with ‘experiment’ and ‘run’ as the factors. ‘run’ may also be considered a quantitative variate to account for linear (or polynomial) changes in the measurement over the course of a single experiment.

Expt

Run

Speed

1

1

850

1

2

740

1

3

900

1

4

1070

1

5

930

1

6

850

1

7

950

1

8

980

1

9

980

1

10

880

1

11

1000

1

12

980

1

13

930

1

14

650

1

15

760

1

16

810

1

17

1000

1

18

1000

1

19

960

1

20

960

2

1

960

2

2

940

2

3

960

2

4

940

2

5

880

2

6

800

2

7

850

2

8

880

2

9

900

2

10

840

2

11

830

2

12

790

2

13

810

2

14

880

2

15

880

2

16

830

2

17

800

2

18

790

2

19

760

2

20

800

3

1

880

3

2

880

3

3

880

3

4

860

3

5

720

3

6

720

3

7

620

3

8

860

3

9

970

3

10

950

3

11

880

3

12

910

3

13

850

3

14

870

3

15

840

3

16

840

3

17

850

3

18

840

3

19

840

3

20

840

4

1

890

4

2

810

4

3

810

4

4

820

4

5

800

4

6

770

4

7

760

4

8

740

4

9

750

4

10

760

4

11

910

4

12

920

4

13

890

4

14

860

4

15

880

4

16

720

4

17

840

4

18

850

4

19

850

4

20

780

5

1

890

5

2

840

5

3

780

5

4

810

5

5

760

5

6

810

5

7

790

5

8

810

5

9

820

5

10

850

5

11

870

5

12

870

5

13

810

5

14

740

5

15

810

5

16

940

5

17

950

5

18

800

5

19

810

5

20

870

 

Time Series Data Sets from Box, Jenkins, and Reinsel (1974)

The series below shows Series G used in the Box-Jenkins-Reinsel book, which is analyzed in our discussion of ARIMA models. Other series from this book are available from:

http://www.stat.wisc.edu/~reinsel/bjr-data/index.html

Series G. International Airline Passengers: Monthly Totals, 1949-1960 (order is row-wise, row 1 = 1949)

 112  118  132  129  121  135  148  148  136  119  104  118

 115  126  141  135  125  149  170  170  158  133  114  140

 145  150  178  163  172  178  199  199  184  162  146  166

 171  180  193  181  183  218  230  242  209  191  172  194

 196  196  236  235  229  243  264  272  237  211  180  201

 204  188  235  227  234  264  302  293  259  229  203  229

 242  233  267  269  270  315  364  347  312  274  237  278

 284  277  317  313  318  374  413  405  355  306  271  306

 315  301  356  348  355  422  465  467  404  347  305  336

 340  318  362  348  363  435  491  505  404  359  310  337

 360  342  406  396  420  472  548  559  463  407  362  405

 417  391  419  461  472  535  622  606  508  461  390  432