So in yesterday's notes I provided two logistic regression models that predict accuracy given confidence and remember/know/guess judgment. I noted that I was presenting results of a global regression, rather than the average of individualized subject-level regressions. Unfortunately, when we look at those, they aren't so pretty -- look if you dare:
Why is this happening, you ask? Well, it's mainly due to a few outlier subjects who have extraordinary remember and know coefficients. Subject 2 is -2142 log odds when he (or she) responds know, and Subject 19 is +1298 log odds when he responds remember. For reference, most of these other values are between -5 and 5.
So why do we have these outlier subjects? It comes down to strange variability in responding. For example, Subject 2, who had such weird data for know responses, shows some pretty unusual responses. Let's look at when he responds know:
Do you notice what he's doing? He rates his confidence as 100 (on a 0-100 scale) every time he responds know. He's also right every time he responds know. (Someone who misunderstood the instructions?) Obviously, this is going to throw off the prediction equations.
So now the challenge becomes "come up with a rule to exclude outliers," if such a rule is needed. If you talk to open science types, they'll suggest that I should have come up with this rule before looking at the data. But this reveals an issue with such a strategy: This is the first time I've conducted this kind of research, and didn't know I would be bumping into this type of issue (let alone what a rule for dealing with outliers should be).
Of course, the solution is transparency -- I'll have to say this in my dissertation, and perhaps use the experience of this study to craft an outlier rule for use in the future Experiment 3.