In a diary I wrote a few days ago (In defense of standardized tests I noted that, to measure change in pupil ability, frequent testing was needed, and that testing once a year was not a good method. Some people expressed an interest in hearing more about why this was so, and about good methods for measuring change.
It's all below the fold
There is a lot of interest in measuring change in all sorts of areas:
Am I losing weight?
Is Bush losing support?
Is our children learning? (hehe).
At first, this seems quite simple. Weigh yourself Monday. Weigh yourself next Monday. Did your weight go down or up?
But it's not so simple, and for a couple of reasons.
The first reason is that, in all the questions above (and most of the ones we are interested in) there is error in our measurement. Scales aren't perfect. Polls aren't perfect. And tests of ability are a long way from perfect (ANY test of ability, standardized or not). Further, even if the scale or test were perfect, it wouldn't measure exactly what you want. Even a perfect test can only measure a student's ability on a particular day. If the kid has a cold, or didn't sleep well,or whatever, then, even if the test is an accurate view of the child's ability on that day, it isn't a good measure of his or her true ability.
This problem can be dealt with by having many measurements. For example, if you only test a kid twice, and his scores are 92 and 75, then you have to figure that his score went down, and that he is not doing well. There's no way to factor in the cold he had on the second day. But suppose you test him many times, and his scores are
92 91 93 94 72 94 92 91
now you can see that the 72 is some kind of mistake. You may not know what the problem is (he was sick, the student next to him was hitting him, he fell asleep, whatever) but you know that the 72 is weird in some way, and his ability is more or less constant.
The second problem is that many of the common methods for measuring change make assumptions about the data that are completely unrealistic. This one is harder to deal with, but it is possible, and it's the main topic of this diary.
There are a number of good books on this sort of analysis, but to my mind, one of the best and clearest is longitudinal data analysis by Donald Hedeker and Robert Gibbons. Don gave me permission to quote a bunch from the beginning of the book, and I do that, with my own comments interspersed, as Hedeker and Gibbons assume a background that might not be realistic here.
If you DO have some statistics background, but want to learn more about longitudinal studies, I highly recommend the Hedeker and Gibbons book
Broadly speaking, there are six methods that have been proposed for dealing with the measurement of change over time. None of them is perfect, but some are better than others.
Before getting into the six methods, a little background is in order. When you have one variable that you think is related to a bunch of other variables, perhaps even causally, then you are in the field of regression. By far the most common kind of regression analysis is called ordinary least squares regression. This is very useful in many situations, but it makes certain assumptions. There is a good web page on regression here. Longitudinal data violates the assumption of independence. If you weigh yourself today and tomorrow, clearly there is no independence.
The first method eliminates the longitudinal problem by reducing the repeated measures into a summary score. This could be an average, or a gain score, or a trend, or a more complex score. Suppose, for example, you have measurements as follows over several days:
182 181 183 181 179 180
you could take the average (181.2) or the change (180-182 = -2) or find a line that fits the points, or a number of other things. There are three problems with this method:
- "our uncertainty in the derived measure is [inversely] proportional to the number of measurements" - that is, you are better off if you've measured something six times than if you've measured it twice. This causes big problems if different subjects have different numbers of measurements.
- "By reducing multiple measurements to a single variable, there is typically a substantial loss of statistical power" - briefly, statistical power relates to your ability to find what you are looking for.
- "the use of time-varying covariates is not possible", for example, if you weigh yourself every day, but go on a diet after 5 days, and then off the diet 2 days later, there is no way to account for that.
"Second, perhaps the simplest but most restrictive model is the ANOVA for repeated measures"
The problem with this method is that it assumes compound symmetry, which means that the variancesand covariances are constant over time. What this means, in plain language, is that the relationship between your weight today and your weight yesterday is just as strong as the relationship between your weight today and your weight 2 weeks ago. Clearly, this is nonsensical. The part about the variannces means that the amount of spread in people's is constant over time, and this is also typically untrue. Oh, and if you have only two measurements, this is the same as a t-test.
"Third, MANOVA models have also been proposed for analysis of longitudinal data". MANOVA stands for multivariate analysis of variance. The main problem with this method is that it does not allow for missing values. Yet, in most of the research that we will be interested in, there are missing values.
The above three methods cannot be recommended. Now, three that can be recommended.
"Fourth, generalized mixed-effects regression models.....[more on these below].....are quite robust to missing data, and can handle time-invariant as well as time-variant covariates......the disadvantage is that they are computationally complex....."
Fifth, covariance pattern models model the the variance-covariance matrix directly. The advantage is that they are computationally simpler than the mixed effect models (and therefore allow estimation of the full likelihood). "The disadvantage is that they do not attempt to distinguish within-subject variance". That is, if you attempt to model the effect of a new education method by assigning some students to it, this method does not allow separate estimation of the effect of a person being a particular person vs. the effect of a person being in a particular group.
Finally, GEE or generalized estimatign equations are often useful, but they "assume that missing data are only ignorable if the missing data are explained by covariates in the model". Briefly, this means that the reasons why Johnny missed school on Tuesday are captured by something in the model. This is unrealistic in much of the sort of research that we are interested in.
For the above reasons, I (like Hedeker and Gibbons) think that mixed models are often the way to go. So, what are they?
Well.....there's one more preliminary. It's almost impossible to write about these models with matrices. So....VERY briefly:
A matrix is a rectangular array of numbers:
1 2 3
4 5 6
is an array. Adding two matrices is easy, but they have to be the same size. Just add each element to its correspondig element
1 2 3 4 5 6 5 7 9
4 5 6 + 7 8 9 = 11 13 15
Multiplying two matrices is harder, and I am not going to try to show it in this format. See this site
But, if none of that makes sense to you, just ignore it and think of it as ordinary multiplication.
OK. In regular regression, we model a dependent variable (Y) as a linear combination of several independent variable (X)
In matrix terms
Y = XB + e
where B is a bunch of parameters to estimate, and e is the error.
Suppose we wanted to predict a person's weight based on age, sex, and height. Then we have 3 independent variables, and the above turns to
<math>
Weight = b <sub> 0 </sub> + b <sub> 1 </sub> age + b <sub> 2 </sub> female + b <sub> 3 </sub> height + 3
</math>
Regression makes assumptions about e; specifically, it says that the errors are independent and identically distributed with mean 0 and constant variance.
The indepence assumption is violated with longitudinal data, so, we use a different model
Y = XB + ZG + e
Z is a bunch of what are called random effects.
The essential idea is that we let each individual have a random intercept and slope. So a person's equation is partially based on general characteristics about that person (e.g. race, sex, age, or whatever is relevant) and partially on them being them.
The price you pay for this flexibility is a lot more complexity both in terms of conceiving the model properly and interpreting the results. There are questions about the covariance structure that need to be answered, for instance. But this diary is already very long and complex, but if people have questions, I will try to answer.
Thanks for reading!