My buddy (and favorite Chemicals Professional) Bennett reports he's interested in a fitness tracker. I've been using my Jawbone UP 24 for about four months now, so it's a good time to write about it.
First, some background: I got burned by being an early adopter in this area about a year and a half ago. I went through six different copies of the Jawbone UP, v. 1, which was truly a horrible device. Once the pain of that experience faded, I tried some other trackers -- the Larklife (which I think has been discontinued) and the Fitbit Flex (which I wrote about here). But I finally got an UP24 and have been very happy with it since. Most tech writers seem to favor the Fitbit Force (assuming it isn't giving you horrible skin rashes), but I think the UP24 is an under appreciated piece of tech. Here's why.
Aesthetics and comfort. Let's get this out of the way first: I think the UP24 is one of the best-looking fitness trackers out there. The orange ("persimmon") is what I have, and the black is also a solid choice. If you go with the standard UP (to save $50), there are other nice colors available. These things look more like bracelets or fashion accessories than connected devices. They're also comfortable, too -- this is something you're buying to wear, literally, 24 hours a day (hence the name) -- so you can't understate the importance of this issue. The only times the UP24 gets uncomfortable or weird is when I'm pulling my arm out from underneath a pillow in the middle of the night. It's super easy to put on, take off, and charge. (Bennett is a swimmer, though, and the device is not waterproof. Rather, it's described as "showerproof.")
Fitness tracking. Want to record a workout with your UP24? Press the band a few times and it goes into stopwatch mode. When you're done, press it again to turn it off. If your workout was a run, you're basically done. If you did something else that the UP24 can't sense (e.g., bike riding, cross training), you go into the app and tell it what you did. Distance, time, calories burned, and so on are all baked in, and you can edit the intensity of your workout to fine-tune what's reported. I mostly use my UP24 to measure running workouts, so the device especially excels. The pedometer is generally accurate at measuring distance assuming I'm running at my usual slow pace. If I try to speed up or slow down even further, the estimates can get off (but never by more than half a mile or so). The biggest downside of the UP24 is that there's no screen on it, though, so you have no idea how far you've run while you're running. This means that before a workout, I'll have to map out my route and try and recall it during the run to make sure I do the right distance.
Sleep tracking. One of the main reasons I got the UP24 was to track my sleep. This hasn't been as useful as I thought it would be (although you can view my own public sleep data by clicking here -- although it seems to have broken, and I don't know why). Jawbone released a new UP Coffee app in the interests of helping users discover correlations between their daily activity and sleep habits, but knowing what we know about statistics, this sort of stuff requires a whole lot of data to be collected before "insights" become accurate.
Interconnectivity. Honestly, this is what makes the Jawbone UP24 the winner for me. Jawbone has spent more effort than any other fitness tracker has to ensure interconnectivity and interoperability between different "internet of things" devices. Here's what I mean. When I wake up, the living room lights turn on. When I hop on my scale, the data is sent directly to the UP app. Everything is simultaneously copied to my Google Drive, too. I could set up even more triggers, thanks to IFTTT (no other tracker supports this, to my knowledge), to text me when I hit a step limit, send an e-mail or a tweet if I've had a lazy day, or write a blog post if I report that I'm sad. I find this reassuring because it means that when it becomes time to try a new fitness tracker, I can take my data with me (perhaps with a bit of wrangling). Considering that personal fitness tracking has the potential to be a lifelong hobby, an eye toward archival and longevity is huge.
App. I like the Jawbone app a lot, too. I'm not going to get into it here, though. One warning: These guys seem to build for the iPhone first.
The bottom line: The Jawbone UP24 is a comfortable, fashion conscious wristband that provides a (relatively) inexpensive way to start collecting personal data. I'm happy with mine and love taking it on runs, and am hugely grateful that the device connects to so many other devices and services that I love. When something comes out with more and more accurate sensors and a screen (like some of Samsung's new offerings, perhaps), it might be time to upgrade. But for now, I don't foresee taking this thing off for the rest of the year at least.
A lot of angst on the internet today: an article talking about ageism in Silicon Valley and other issues, and another talking about the traditional operations of so-called "revolutionary" media companies. I can't write about the topic as well as the professionals can, but what I can say, is: Enjoy the hype, the money, the drama while you still can, everyone. It's not going to last.
Young people today joke about daytime T.V. -- "Who watches this? What a ridiculous medium! All of it is so empty, so useless!" But every step these companies take brings them a little closer to daytime T.V. status. Eventually, humans will realize that the internet can generate an illusion of meaning that, when you look at it carefully, doesn't really mean all too much.
So I guess this message is to say: Those of you out there doing real, meaningful work, keep it up. Be genuine and do what you believe in. I might not know you, but I am proud of you for doing what you do.
I took an eight mile jog around Forest Park today while I was thinking about these issues. Technically, it's spring break here at Washington University, so I've been taking some time to enjoy the warmer weather and think about some other projects, etc. (e.g., Skulving -- more about him one of these days); it's been a nice opportunity to clear my mind.
Back to your regularly scheduled research notes soon!
In a Facebook comment Jason Finley suggested that I graph the logistic regression curves for each individual subject in order to identify outliers, etc., writing, "Thoughtful. Try also making a tiny plot for each subject and eyeballing the lot of them."
So I did that for 20 of the 64 subjects and here's what we get:
It's definitely a mess. Blue is remember, red is know, and green is guess (look here for a bit of background on the project). I'm staring at the plots trying to identify relationships and it isn't easy. Generally, the blue curves are highest to the right of the figures, suggesting that remember responses are more accurate than the other types when made at high confidence. And red is generally larger than green, too. But other than that, I'm at a loss.
The implication, then, is that there's a lot of variability within subjects in this procedure. My opinion is that an aggregated logistic regression equation, although valid for characterizing the overall relationship between confidence and accuracy as a function of remember, know, and guess, won't prove too useful on an individual level.
Here's an interesting finding from this morning's fiddling. I correlated performance on a task in the Memory Lab with performance on a task elsewhere (sorry, I'm being deliberately ambiguous for the moment). The correlation was significant and sizable at r = .93. Of course, any good researcher knows always plot your data, so here's how they look:
As you can see, the relationship looks somewhat logarithmic (or otherwise curvilinear). What causes that kind of relation? I'm pondering.
So in yesterday's notes I provided two logistic regression models that predict accuracy given confidence and remember/know/guess judgment. I noted that I was presenting results of a global regression, rather than the average of individualized subject-level regressions. Unfortunately, when we look at those, they aren't so pretty -- look if you dare:
Why is this happening, you ask? Well, it's mainly due to a few outlier subjects who have extraordinary remember and know coefficients. Subject 2 is -2142 log odds when he (or she) responds know, and Subject 19 is +1298 log odds when he responds remember. For reference, most of these other values are between -5 and 5.
So why do we have these outlier subjects? It comes down to strange variability in responding. For example, Subject 2, who had such weird data for know responses, shows some pretty unusual responses. Let's look at when he responds know:
Do you notice what he's doing? He rates his confidence as 100 (on a 0-100 scale) every time he responds know. He's also right every time he responds know. (Someone who misunderstood the instructions?) Obviously, this is going to throw off the prediction equations.
So now the challenge becomes "come up with a rule to exclude outliers," if such a rule is needed. If you talk to open science types, they'll suggest that I should have come up with this rule before looking at the data. But this reveals an issue with such a strategy: This is the first time I've conducted this kind of research, and didn't know I would be bumping into this type of issue (let alone what a rule for dealing with outliers should be).
Of course, the solution is transparency -- I'll have to say this in my dissertation, and perhaps use the experience of this study to craft an outlier rule for use in the future Experiment 3.
In lab meeting today we watched a 60 Minutes by Morley Safer and Suzanne St. Pierre called "Lenell Geter's In Jail." You can watch it by clicking here. Our lab has been interested in applying principles of cognitive psychology to education for a while now, but recently we've been getting into the issue of eyewitness testimony and the relation between confidence and accuracy in memory.
This amazing segment, which runs 25 minutes or so, contains just about every classic mistake you can make when sentencing a man to prison for life. It's got a protagonist and some impressive villains. I want to share some of the details but it's really worth watching over dinner or something all the way to the end.
Actually, let's see if I can embed it here:
(Note: The embed doesn't seem to be working, sorry. Click the link up there.)
Now that I have my full complement of subjects (N = 64), I'm able to re-run logistic regression analyses and investigate the degree to which confidence is predictive of accuracy as a function of remember, know, or guess. I'll present two regression models. The first is just the effects of confidence on accuracy as a function of remember, know, or guess:
The regression equations for the figure above (Model 1) are:
For remember, log odds = -1.125 + .031c, where c is confidence (0-100)
For know, log odds = -1.999 + .031c
For guess, log odds = -2.104 + .031c
And here's what the prediction curves look like when we through in the interaction term (i.e., confidence x remember/know/guess). Let's call this Model 2.
For remember, log odds = -2.946 + .053c
For know, log odds = -1.895 + .029c
For guess, log odds = -1.043 + .005c
Because logistic regression is not my strong suit, I am researching the best way to evaluate these global models. Right now I am not convinced that the model with the interaction term is doing that much better predicting accuracy. One thing I'm reading about is how we can use signal detection theory to compare models. Here's one way of taking a look:
Model 1: HR = .805, FAR = .402, d' = 1.11
Model 2: HR = .778, FAR = .337, d' = 1.19
So as you can see, not a huge improvement in discrimination (prediction) going from Model 1 to Model 2.
Now, this is the global analysis. I suppose the next step is getting the subject-level equations.
What can we draw from all of this? A possible observation, which we get from Model 2, is that there really perhaps is a difference in the predictiveness of confidence as a function of remember/know/guess. This is helpful because some of the analyses I presented last week suggested that there wasn't one. But what we show here is that one increase in a point of confidence is worth twice as much, in terms of log odds, when one is remembering as opposed to when one is knowing. And when people say they're guessing, they really are, regardless of confidence rating. They hover around what I think is chance (i.e., 33% accuracy). This is really neat.
Of course, these analyses are leaving out something rather important: "new" responses, because we never collect remember/know/guess judgments for "new" responses. We need to characterize those responses another day, but you could always turn to DeSoto and Roediger (2014) for one look at this issue.
Here's the data (and the formulae) used to make the figures above, if you're interested.