T-square Self Study
Tuesday, September 18, 2001

Begin this self-study by reviewing Chapter 5 in Johnson and Wichern (1998: 224-257, 273-275).

Now refer to Exercise 5.16 (Page 285), Example 5.5 (Page 244), and the data in Table 5.2 (page 245) in Johnson and Wichern (1998) dealing with the college test data and peform the following tasks.

  1. Write a SAS program that creates a SAS data set from the data in Table 5.2.
    An ASCII file of the data can be obtained from the web link
    http://www.stat.lsu.edu/faculty/moser/exst7037/jwdata/T5-2.DAT.
    If you are somewhat new to SAS programming, the following fragment can help you get started.
    Data College;
     Input X1 X2 X3;
     Label X1="Social Science and History"
           X2="Verbal"
           X3="Science";
    Datalines;
      468  41  26
      428  39  26
      514  53  21
      547  67  33
      .     .   .
      .     .   .
      .     .   .
      474  41  16
      441  47  26
      607  67  32
    ;
    
  2. Use PROC CORR of SAS to compute the mean vector, covariance matrix, and correlation matrix for the college data. You will need to add the COV option to the PROC CORR statement. See the SAS on-line documentation (http://www.lsu.edu/ocs/sas.html) for additional details on PROC CORR (look in SAS/STAT). Compare these sample statistics with those given in Johnson and Wichern (1998:244).
     
  3. Verify the simultaneous T-square 95% confidence intervals given for the mu1, mu2, and mu3 on page 244. I would suggest that you work them without looking at the details given on page 244, then compare your calculations with those in the book.
     
  4. Now compute Bonferroni simultaneous 95% confidence itervals for mu1, mu2, and mu3, and compare these results with those found above for the simultaneous T-square intervals. If both are simultaneous 95% confidence intervals, why are they not equivalent?
     
  5. (see exercise 5.16) Suppose that the vector [500, 50, 30]' represents average scores for thousands of college students over the last 10 years. Is there reason to believe that the group of students represented by the scores in Table 5.2 is scoring differently? Explain.

    Extend your SAS program to test the above hypothesis. Remember that first you will need to translate the observed variables (X1, X2, and X3) to new variables (Y1, Y2, and Y3) that would have expectations of zero under the above hypothesis (revist the one-sample sweat data example). Use the Wilks' lambda statistic reported by PROC GLM (see page 232-233) for your test statistic.
     
  6. Construct QQ plots from the marginal distributions of social science and history, verbal, and science scores. Also construct the three possible scatter diagrams from the pairs of variables. Finally, construct the chi-square QQ plot for squared distances from the sample mean vector. Do these data appears to be normally distributed? Discuss. If not, check for outliers and/or try transformations (see pages 200-208) to attempt to improve the analysis, then repeat the analyses above.

    Add to your SAS program code to construct the QQ plots and scatter plots. You may use PROC UNIVARIATE, PROC PLOT, PROG GPLOT, and/or SAS/INSIGHT, and other procedures as you like. Note that here you may operate on the original variables or residuals since the problem is a one-sample problem. However, I suggest that you get in the habit of checking the residuals from the model. Thus add an OUTPUT statement to your PROC GLM code from the T-square analysis (again, see the sweat data example).
     
  7. Use PROC IML in SAS to extract the eigenvalues and eigenvectors of the covariance matrix for the college test data (see the example on our web site). Determine the lengths and directions for the axes of the 95% confidence ellipsoid for mu.