5. In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes a...
5. In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes among people based on physical characteristics. Macdonnell (1902) reports a correlation matrix for the following seven physical variables measured on 3000 British criminals: (1) head length (HEADLEN), (2) head breadth (HEADBDTH), (3) face breadth (FACEBDTH), (4) left finger length (FINGLEN), (5) left forearm length (FOREARM), (6) left foot length (FOOT), and (7) height (HEIGHT). Assume that all original variables were measured in centimetres. The R-output is presented below of a principal components anal- ysis based on Macdonnell's correlation matrix. PC6 PC7 -.016505 0.017690 PC5 PC4 PC2 PC3 PCI .087441 0.005455 0.067843 0.882232 0.364447 0.276214 HEADLEN 0.034908 -.083151 -.255886 .382679 -.069875 .036635 0.687188 0.211852 0.639506 HEADBDTH -.074653 .697667 0.103419 0.113410 0.033762 0.512374 -.234869 -.276601 -.178344 0.102549 FACEBDTH FINGLEN 0.295062 0.503388 0.618945 0.318252 0.437587 0.455737 0.450262 0.290290 0.038723 -.784745 FOREARM 0.014469 0.034273 870496 0.053181 -.059009 FOOT 0.233015 0.352716 .083677 -.769546 .006241 HEIGHT 0.435716 -,179459 The eigenvalues of the correlation matrix is as follows: Eigenvalue Proportion Cumulative 0.542759 PC 3.79931 0.54276 0.75732 0.214565 PC2 1.50195 0.092926 0.85025 PC3 0.65048 0.051419 0.90167 PC4 0.35994 0.95012 0.33915 0.23525 0.11391 0,048450 PC5 0.033608 0.98373 PC6 PC7 0.016274 1.00000 On of the goals of principal components analysis is to reduce the dimension of the original data. How would you choose the number of principal components to retain for subsequent analyses? In this example how many components would you retain? (a) (5 marks) (b) Another goal of PCA is to visualise the dataset. Given an observation (the criminal's name is John Doe) of the dataset below, based on the number of PCs you have chosen, write down the coordinates of John Doe in the new coordinate system. Show your intermediate and final results in three significant figures. (6 marks) HEADLEN HEADBDTH FACEBDTH FINGLEN FOREARM FOOT HEIGHT 20.0 16.0 14.0 10.0 30.0 26.0 180.0 (c) Is it more (scaled) or unstandardised (unscaled) variables? Justify your answer. appropriate for this example to perform PCA based on standardised a (4 marks)
5. In the early 1900's, several investigators were interested in predicting behavioural and so- cial outcomes among people based on physical characteristics. Macdonnell (1902) reports a correlation matrix for the following seven physical variables measured on 3000 British criminals: (1) head length (HEADLEN), (2) head breadth (HEADBDTH), (3) face breadth (FACEBDTH), (4) left finger length (FINGLEN), (5) left forearm length (FOREARM), (6) left foot length (FOOT), and (7) height (HEIGHT). Assume that all original variables were measured in centimetres. The R-output is presented below of a principal components anal- ysis based on Macdonnell's correlation matrix. PC6 PC7 -.016505 0.017690 PC5 PC4 PC2 PC3 PCI .087441 0.005455 0.067843 0.882232 0.364447 0.276214 HEADLEN 0.034908 -.083151 -.255886 .382679 -.069875 .036635 0.687188 0.211852 0.639506 HEADBDTH -.074653 .697667 0.103419 0.113410 0.033762 0.512374 -.234869 -.276601 -.178344 0.102549 FACEBDTH FINGLEN 0.295062 0.503388 0.618945 0.318252 0.437587 0.455737 0.450262 0.290290 0.038723 -.784745 FOREARM 0.014469 0.034273 870496 0.053181 -.059009 FOOT 0.233015 0.352716 .083677 -.769546 .006241 HEIGHT 0.435716 -,179459 The eigenvalues of the correlation matrix is as follows: Eigenvalue Proportion Cumulative 0.542759 PC 3.79931 0.54276 0.75732 0.214565 PC2 1.50195 0.092926 0.85025 PC3 0.65048 0.051419 0.90167 PC4 0.35994 0.95012 0.33915 0.23525 0.11391 0,048450 PC5 0.033608 0.98373 PC6 PC7 0.016274 1.00000 On of the goals of principal components analysis is to reduce the dimension of the original data. How would you choose the number of principal components to retain for subsequent analyses? In this example how many components would you retain? (a) (5 marks) (b) Another goal of PCA is to visualise the dataset. Given an observation (the criminal's name is John Doe) of the dataset below, based on the number of PCs you have chosen, write down the coordinates of John Doe in the new coordinate system. Show your intermediate and final results in three significant figures. (6 marks) HEADLEN HEADBDTH FACEBDTH FINGLEN FOREARM FOOT HEIGHT 20.0 16.0 14.0 10.0 30.0 26.0 180.0 (c) Is it more (scaled) or unstandardised (unscaled) variables? Justify your answer. appropriate for this example to perform PCA based on standardised a (4 marks)