6. The dendrogram for the French food expenditures, Ward
algorithm.
MVAclusfood.xpl
coherent with our previous analysis, we standardize each variable. The dendrogram of the
Ward method is displayed in Figure 11.7. Two dominant clusters are visible. A further
refinement of say, 4 clusters, could be considered at a lower level of distance.
To interprete the two clusters, we present the mean values and their respective standard
errors of the thirteen e
X variables by group in Table 11.6. Comparing the mean values for
both groups shows that all the differences in the means are individually significant and that
cluster one corresponds to housing districts with better living quality and higher house prices,
whereas cluster two corresponds to less favored districts in Boston. This can be confirmed,
for instance, by a lower crime rate, a higher proportion of residential land, lower proportion
of blacks, etc. for cluster one. Cluster two is identified by a higher proportion of older
houses, a higher pupil/teacher ratio and a higher percentage of the lower status population.
This interpretation is underlined by visual inspection of all the variables presented on scat-
terplot matrices in Figures 11.8 and 11.9. For example, the lower right boxplot of Figure 11.9
and the correspondingly colored clusters in the last row confirm the role of each variable in
318
11
Cluster Analysis
Ward method
250
200
150
distance
100
50
0
0
1
2
3
4
5
index*E2
Figure 11.7. Dendrograms of the Boston housing data using the Ward
algorithm.
MVAclusbh.xpl
determining the clusters. This interpretation perfectly coincides with the previous PC anal-
ysis (Figure 9.11). The quality of life factor is clearly visible in Figure 11.10, where cluster membership is distinguished by the shape and color of the points graphed according to the
first two principal components. Clearly, the first PC completely separates the two clusters
and corresponds, as we have discussed in Chapter 9, to a quality of life and house indicator.
11.5
Exercises
EXERCISE 11.1 Prove formula (11.16).
EXERCISE 11.2 Prove that IR = tr(SR), where SR denotes the empirical covariance matrix
of the observations contained in R.
11.5
Exercises
319
Figure 11.8. Scatterplot matrix for variables e
X1 to e
X7 of the Boston
housing data.
MVAclusbh.xpl
EXERCISE 11.3 Prove that
n
n
n
∆(R, P + Q) =
R + nP
∆(R, P ) +
R + nQ
∆(R, Q) −
R
∆(P, Q),
nR + nP + nQ
nR + nP + nQ
nR + nP + nQ
when the centroid formula is used to define d2(R, P + Q).
320
11
Cluster Analysis
Figure 11.9. Scatterplot matrix for variables e
X8 to e
X14 of the Boston
housing data.
MVAclusbh.xpl
EXERCISE 11.4 Repeat the 8-point example (Example 11.5) using the complete linkage and
the Ward algorithm. Explain the difference to single linkage.
EXERCISE 11.5 Explain the differences between various proximity measures by means of
an example.
11.5
Exercises
321
first vs. second PC
4
2
PC2
0
-2
-5
0
5
PC1
Figure 11.10. Scatterplot of the first two PCs displaying the two clusters.
MVAclusbh.xpl
EXERCISE 11.6 Repeat the bank notes example (Example 11.6) with another random sam-
ple of 20 notes.
EXERCISE 11.7 Repeat the bank notes example (Example 11.6) with another clustering
algorithm.
EXERCISE 11.8 Repeat the bank notes example (Example 11.6) or the 8-point example
(Example 11.5) with the L1-norm.
EXERCISE 11.9 Analyze the U.S. companies example (Table B.5) using the Ward algorithm
and the L2-norm.
EXERCISE 11.10 Analyze the U.S. crime data set (Table B.10) with the Ward algorithm
and the L2-norm on standardized variables (use only the crime variables).
322
11
Cluster Analysis
EXERCISE 11.11 Repeat Exercise 11.10 with the U.S. health data set (use only the number
of deaths variables).
EXERCISE 11.12 Redo Exercise 11.10 with the χ2-metric. Compare the results.
EXERCISE 11.13 Redo Exercise 11.11 with the χ2-metric and compare the results.
12 Discriminant Analysis
Discriminant analysis is used in situations where the clusters are known a priori. The aim
of discriminant analysis is to classify an observation, or several observations, into these
known groups. For instance, in credit scoring, a bank knows from past experience that there
are good customers (who repay their loan without any problems) and bad customers (who
showed difficulties in repaying their loan). When a new customer asks for a loan, the bank
has to decide whether or not to give the loan. The past records of the bank provides two
data sets: multivariate observations xi on the two categories of customers (including for
example age, salary, marital status, the amount of the loan, etc.). The new customer is
a new observation x with the same variables. The discrimination rule has to classify the
customer into one of the two existing groups and the discriminant analysis should evaluate
the risk of a possible “bad decision”.
Many other examples are described below, and in most applications, the groups correspond
to natural classifications or to groups known from history (like in the credit scoring example).
These groups could have been formed by a cluster analysis performed on past data.
Section 12.1 presents the allocation rules when the populations are known, i.e., when we know