T-square Self Study
Tuesday, September 18, 2001
Begin this self-study by reviewing Chapter 5 in Johnson and Wichern (1998: 224-257, 273-275).
Now refer to Exercise 5.16 (Page 285), Example 5.5 (Page 244), and the
data in Table 5.2 (page 245) in Johnson and Wichern (1998) dealing
with the college test data and peform the following tasks.
- Write a SAS program that creates a SAS data set from the data in Table 5.2.
An ASCII file of the data can be obtained from the web link
http://www.stat.lsu.edu/faculty/moser/exst7037/jwdata/T5-2.DAT.
If you are somewhat new to SAS programming, the following fragment can
help you get started.
Data College;
Input X1 X2 X3;
Label X1="Social Science and History"
X2="Verbal"
X3="Science";
Datalines;
468 41 26
428 39 26
514 53 21
547 67 33
. . .
. . .
. . .
474 41 16
441 47 26
607 67 32
;
-
Use PROC CORR of SAS to compute the mean vector, covariance matrix,
and correlation matrix for the college data. You will need to
add the COV option to the PROC CORR statement. See the SAS on-line
documentation
(http://www.lsu.edu/ocs/sas.html)
for additional details on PROC CORR (look in SAS/STAT). Compare these
sample statistics with those given in Johnson and Wichern (1998:244).
-
Verify the simultaneous T-square 95% confidence intervals given for
the mu1, mu2, and mu3 on page 244.
I would suggest that you work them without looking at the details
given on page 244, then compare your calculations with those in the
book.
-
Now compute Bonferroni simultaneous 95% confidence itervals for
mu1, mu2, and mu3, and compare these
results with those found above for the simultaneous T-square
intervals. If both are simultaneous 95% confidence intervals, why
are they not equivalent?
-
(see exercise 5.16)
Suppose that the vector [500, 50, 30]' represents average
scores for thousands of college students over the last 10 years. Is
there reason to believe that the group of students represented by the
scores in Table 5.2 is scoring differently? Explain.
Extend your SAS program to test the above hypothesis. Remember that
first you will need to translate the observed variables (X1, X2, and X3)
to new variables (Y1, Y2, and Y3) that would have expectations of zero
under the above hypothesis (revist the one-sample sweat data example).
Use the Wilks' lambda statistic reported by PROC GLM (see page 232-233)
for your test statistic.
-
Construct QQ plots from the marginal distributions of social science and
history, verbal, and science scores. Also construct the three possible
scatter diagrams from the pairs of variables. Finally, construct the
chi-square QQ plot for squared distances from the sample mean vector.
Do these data appears to be normally distributed? Discuss. If not,
check for outliers and/or try transformations (see pages 200-208)
to attempt to improve the analysis, then repeat the analyses above.
Add to your SAS program code to construct the QQ plots and scatter plots.
You may use PROC UNIVARIATE, PROC PLOT, PROG GPLOT, and/or SAS/INSIGHT,
and other procedures
as you like. Note that here you may operate on the original variables
or residuals since the problem is a one-sample problem. However, I
suggest that you get in the habit of checking the residuals from the
model. Thus add an OUTPUT statement to your PROC GLM code from the
T-square analysis (again, see the sweat data example).
-
Use PROC IML in SAS to extract the eigenvalues and eigenvectors of
the covariance matrix for the college test data (see the example
on our web site). Determine the lengths and directions for the
axes of the 95% confidence ellipsoid for mu.