Skip to main content


Showing posts from September, 2010

Monte Carlo testing of classification groups

This is another article on the theme of defining groups in a hierarchical classification. A previous article described homogeneity analysis to visualize how any well any number of groups, defined at the same level accounts for the variability in the dataset, as measured by within-group pairwise distances. Here we will look at testing whether splitting a particular group into two daughter groups is supported by the data.

Suppose we have a group G containing N objects which we are considering splitting into two groups G1 and G2 consisting of N1 and N2 objects respectively. A Monte Carlo approach to testing whether this is worth doing is as follows:

1. Calculate the mean within-group pairwise distance that would result from defining G1 and G2.

2. Randomly allocate objects to two groups of sizes N1 and N2 and calculate the resulting mean within-group pairwise distance.

3. Repeat step 2 a large number of times (e.g. 999) to create a vector of Monte Carlo means. Add the actual mean from step 1 …