I. Goals.
A. What are within participants designs?
B. Why run within participants designs?
C. Why not run within participants designs?
D. Repeated measures designs.
E. Analysis.
II. What are within participants designs? Last time,
we looked at between participants two group designs. Let's put it
all together:
As you go down, the groups should become more and more like one another
prior to treatment. The next line will be for within participants
designs where you know the groups are exactly equal because it's the same
participant in every condition.
So far, we've been pretending that we only know about one type of design:
between participants designs. In these designs, each participant
sees one and only one experimental condition. In within participants
designs, each participant sees every experimental condition.
Consider our love experiment. To make it within participants,
we would have everybody be in love and not in love, and measure them in
each condition.
Top
III. Why run within participants designs? Several
reasons:
A. Equate the groups in your experiment. Random assignment
eliminates any systematic differences between the groups in your experiment.
It does nothing to eliminate differences. But, if every participant
is in every group, then there can't be any differences, because they're
all the same people.
*B. Efficiency. You greatly reduce the number of participants
that you need. Consider our experiment. We have two groups,
so, taking 10 participants per condition as a rough estimate of an appropriate
sample size, we need 20 participants to do the experiment. But, if
each participant participated in each condition, we could get 10 participants
in each condition and still only run 10 participants. That greatly
reduces the amount of work that we have to do. As you increase the
number of groups, this becomes a big issue.
More important, it lets us do the kind of quality research we'd like
to do. Ten participants per cell is nowhere near as many as we'd
like. Instead, we might bump that up to thirty per cell.
*C. Statistical power. Remember our basic equation:
between groups variation (you made) / within groups variation (error).
In a within participants design a portion of the within groups variation
goes away, and we get a smaller term in the denominator. That helps
us to achieve our goal of maximizing between groups while minimizing within
groups.
Top
IV. Why not run within participants designs?
A. Sometimes your manipulation has a permanent impact on the
participants. We'll call these sorts of effects carry-over effects
because the effect of previous manipulations carries over into new manipulations.
For example, if you want to use a surprise recall test in your experiment,
you can only surprise your participants once. Knowledge that a test
is coming carries over into subsequent conditions. Or, if you are
trying various therapies for depression, you can't return participants
to their original state after a particular therapy and try out another
one. The effect of the first therapy will carry over. (This
would prevent us from running the love experiment within participants.)
There are also some transient changes that you produce that still make
it impractical to do within participants designs in some situations.
For example, if you want to measure the effect of caffeine on performance
and you have levels of 0, 1, or 2 cups of coffee, you can run the order
0 -> 1 -> 2 cups, but not the order 2 -> 0 -> 1. If you can't run
all possible orders, you have to worry about order effects, which we'll
discuss in a moment. You could introduce a sufficient waiting period
between conditions (say one day for the coffee experiment), but that greatly
reduces the efficiency aspect of running within participants designs.
Consider a more subtle kind of change in the participants. In
the late fifties, a group of researchers were investigating the duration
of short term memory traces. They would have participants memorize
some information and then count backwards for a certain period of time
before trying to recall. What they discovered was that people's memories
gradually declined to virtually nothing after about 18 seconds. However,
a different group of researchers worried that what was actually happening
in these memory experiments was a phenomenon called build-up of proactive
interference (once you've crammed a lot of similar sounding stuff into
your head it gets harder to cram in more because the old items interfere).
To test this, they replicated the experiment, but each participant did
only one trial with a particular delay period. Obviously, they ended
up running a lot more participants. But, they found that memory traces
will last well past 18 seconds. So, a subtle change in the participants
was actually responsible for the effect. This kind of thing has to
be looked for when using within participants designs.
B. Order effects: Whenever the order of the stimuli has
an impact on the results of the experiment you have order effects.
1. Some of the most common kinds:
a. Practice effects: The more times you do something the
better you get at it. Say we did our coffee experiment and the DV
was performance on a pursuit rotor task (the participant tries to keep
a pointer on the same spot on a rapidly spinning disk). Since most
people have never done this before, we might expect their performance to
improve over time just from practice. If this is confounded with
a particular order (as in our 0 -> 1 -> 2 cups example) then you can't
tell if changes are due to the condition or the order (practice).
This can work for your hypothesis (if we expect caffeine to improve performance)
or against your hypothesis (if we expect caffeine to hurt performance).
b. Fatigue effects: The opposite of practice effects.
The longer participants do a task the more tired they get. So, they
don't have the ability/motivation to devote the same attention to tests
at the end of the experiment as they had at the beginning.
2. What do we do about order effects?
a. Counterbalancing: Simple definition: Every possible
ordering of conditions is presented to participants in the experiment.
This guarantees that every condition appears equally often in every position
in the sequence and that every condition is preceded and followed by every
other condition equally often. You can have full counterbalancing
where every order is shown, or partial counterbalancing where a subset
of orders are used, but they're carefully chosen to be representative.
For our purposes, when I say counterbalancing, I mean full counterbalancing.
Consider a simple experiment with two conditions, A and B.
This gives two possible orderings: A -> B and B -> A. We
can randomly assign participants to an order so that we get equal numbers
of participants in each order, and our experiment is then fully counterbalanced.
If there's an order effect, this should cancel it out. Any time
there's an effect of having A before B it's canceled by the effect of having
B before A.
Counterbalancing eliminates shifts in sizes of effects provided one
very important assumption is met. Namely, that there are no differential
order effects. In other words, the change caused by having A before
B is equal to the change caused by having B before A. Look at the
example above. We assumed that going second always added a constant
effect (say 10 units), regardless of which condition was first and which
was second. But, what if having A fist increased B by 10, but having
B first had no impact on A (the case if A was more tiring than B)?
Then, counterbalancing wouldn't cancel out the effects. You have
to carefully consider the context of your experiment to determine if effects
like this exist. If they do, you probably can't use a within participants
design.
Why don't people always do full counterbalancing? With two groups,
it's easy because you only have two orders. But, the more groups
you have the more orders you get. The rule to determine the number
of orders is:
Number of orders = N!
Where N is the number of conditions (! is factorial meaning multiply
N * N - 1 * N - 2 * ... * 1). To work some of these for you:
3 conditions require 6 orders, 4 conditions require 24 orders, and 6 conditions
require 720 orders. Keeping in mind that you need at least one participant
per order for the thing to work out correctly, you can see how this can
defeat the purpose of doing a within participants design.
b. You can also use a latin square to counterbalance if the number
of groups is too large for full counterbalancing.
c. Randomization is your friend. To avoid all the headaches,
most people randomize condition orders (especially when you get over four
or five conditions). The more conditions you have and the more orders
you use (the more participants you run) the better off you'll be.
The procedure is simple: Make a new random order for each participant
in the experiment.
Top
V. Repeated measures designs. When you measure each
condition more than one time for each participant. For example, we
have two conditions, and each participant produces three scores for each
condition (they get three trials in love and three trials not in love).
Trial: One observation in a repeated measures design. The Stroop
experiment worked like this because you had 50 trials each of words and
boxes.
Why do these? One way to improve statistical power is to collect
more than one observation per participant per condition. Imagine
collecting three trials from each participant in each condition.
Then, instead of just having one sample from the participant, you have
three. If you take some measure of central tendency from those samples
(say the mean) and use it as the participant's score, it will decrease
variability. How? Let's say on one trial the participants are
incompatible and don't fall in love. Right away their concentration
score will be affected because they aren't as in love as they're supposed
to be. If that's the only trial you get from the participant, then
you'll get a different score than you should. When you put that participant's
score into the pot with other participants’ scores, the variability will
be higher than it should be. By collecting multiple observations
you average out some of this chance variability and get more stable estimates
(closer to the true value).
These are most common in perception type experiments where each trial
requires the participant to make a very simple judgment and you can collect
a lot of observations in a limited period of time. Keep in mind that
this can introduce a new set of order problems, but randomization can still
bail you out.
Continuing on the theme of within participants designs: The more
data you collect from each participant the better; or the more work each
participant does the less work you do.
Top
VI. Analysis. For a two-group, within-participants
design, you will use a dependent samples t-test for the analysis.
The computations are complex enough that it's worth letting a computer
do it for you. When you finish, here's a sample of how to write up
the results:
“The data were analyzed using a dependent samples t-test. The
independent variable was amount of love, and the conditions were in love
and not in love. The dependent variable was concentration.
The mean concentration scores for people in love and not in love were 2.00
(0.71) and 4.80 (0.45) respectively. With alpha = .05, the two population
means were significantly different, t(8) = -7.57, estimated standard error
= 0.37.”
Top
Back to Langston's Research Methods Page