Adaptive education Apps

I was recently reading Ad copy for some educational software describing itself as having adaptive technology. The App was nothing special: do a bunch of similar problems, repeat until you get 70% correct, then move on to the next. But it got me thinking. Adaptive is often presented as aspirational - at best we can demo an idea of what it might look like. But what could we do right now? Specifically, what practical changes could we make to the worksheet-grade-repeat model? Currently that system feels dead, scripted and mechanical. How to make it feel as if the computer was watching you, and deciding how to proceed next?

Probably the main thing is to stop waiting for the "Check my Answer" button. The computer should watch each step. That's not even a clever idea; any adult would do the same - "are you sure 3+6 has a carry?" If a problem doesn't have several small steps, maybe we can redo it to have some. Letting them retry wrong answers works well here - it gives us a chance to count how many tries it took.

Many Apps avoid this. They might be lazy, but I think the real problem is they're thinking of a test, especially for Apps required by schools. Tests have to meet the standards, with no extra help - they have to feel like tests. But an App is gong to be at most 10% testing. The rest is teaching, and should be using every trick that a human would.

We can also time them, but that's not as useful. When you think of a computer timing something, you think of a stopwatch, possibly displayed on-screen. We won't be doing that. The main thing is an "are they stuck" estimate. Restart the timer after each thing they do. If it hits 20 seconds, they're officially stuck. Otherwise, computers are bad at using timers. They can't tell whether you stopped to pet the cat, and can't look at your face and decide you're deep in concentration, and not stuck at all.

Just those two things are a big pile of data, which can seem pretty scary. If you haven't considered it: suppose the child took 1 second to incorrectly place a 3 in the first spot; then 12 whole seconds to incorrectly place a 4 in the second spot. Suppose we also record the correct answers - they tried to put a 3 into a 4, then a 4 into a 9. Trying to use all that can be paralyzing. The secret is, don't bother. Record only as much as you need, and simplify it as much as you want. For mistakes, I usually have just a counter - they made X mistakes in this problem, so far. I use that to "grade" the answer, then throw away the mistake count. For timing I don't even write down how many times they "took too long".

Using the data can be a formula with just a little depth, some randomness and some memory. Suppose we compute a 0 to 6 "Help Index" by giving +1 for each mistake, and -1 or 2 (randomly) for each correct choice. After each mistake, we roll a 6-sided die. If it's under the mistake index, we do something. The result feels very lifelike. It's not completely predictable, but we tend to offer less help if they've been doing well.

Adding a small amount of controlled randomness to feel more human is an old trick. Another example, our help-when-stuck could wait 8 seconds then have a 10% cumulative chance to help each second. It will feel like someone who isn't that good at timing things, sometimes just plain jumps the gun, and sometimes get distracted. In general, a program acting differently than normal is interpreted as choosing to do it, for a reason.

A running total, like my Help Index, is a cheap way to make it feel as if the computer is learning and remembering. We can make the "are you stuck?" check better by adding the Help Index. Now the computer's longer and shorter help times feel as if it's gaining or losing confidence in our skills.

Making these can be an art. It has to be complicated enough so users don't notice that it's only a formula. But complexity makes checking for silly results more difficult. The randomness needs to be limited or else it will feel like just boring old random. You also need to guard against terrible luck - an 80% chance could result not acting until the 10th mistake. We can add little hacks to our formula to prevent the worst of these. And a great thing is that close-enough is good-enough. If our equations result in some funny calls and odd runs, that's no worse than a human sometimes does.

Now that we have a plan for when to do something, we need a list of what to do. The most obvious is increasing difficulty. This is another one where you can be swamped with data. A trick is to arbitrarily divide problems into a handful of difficulties. If we're adding single-digit numbers we can break it into sums of 3-5, 4-8, and 7-10. They overlap, since why not.

Sometimes you have alternate categories, and aren't sure of the order. Don't worry - just pick one. Is sorting by size harder than by color? Is sorting 8 things into 2 bins harder than sorting 5 things into 3 bins? Check academic research, but then just guess. Pick any order that seems reasonable. For sums, I think 2+4 is harder than 6+2 (since you're "counting up" by the second number), and I use that to create more categories. But is 2+4 harder than 13+2? Again, don't worry - after some research and thought, pick any reasonable seeming order.

Having only a few difficulties has natural advantages. It's easier making a formula for when to move up to the next. It's also easy to "stay in" a difficulty without being obvious. You don't need to repeat it - you can just randomly select more from the category. And users can clearly see the difficulty jumping up - we can see the computer "adapting" to our success by giving us new types of problems.

The other useful thing to play with is scaffolding. This is a broad term for making the problem easier to solve - like we're erecting a scaffold next to it. Suppose we're counting how many blocks there are. Heavy scaffolding would place 1, 2, 3 ... on each block as you tap. Lighter would speak the digit, but only mark it with a dot. Even lighter would only place a dot. And, of course, with no scaffolding you perform your own count and are on your own to count each once.

Scaffolding could be using multiple choice, or forcing the correct order (for example, adding the 1's place first). We could force everything - like an interactive demonstration. We can add color-coding, or change from all-at-once to one-at-a-time. We could auto-solve subparts, so you only have to choose the correct order. The best ones are simplifications teachers actually do. Adding the scaffolding is like saying "remember how I showed you to imagine it?"

Together with our formulas, we might have the taking-too-long check add a color coding scaffold, or have the numbering scaffold pop up. If the Help Index goes above 4, the next problems might use multiple choice.

But things are getting complicated again. We now need rules for when to jump up or down a difficulty category, and more rules for changing scaffolding. As before, feel free to grossly simplify. Suppose we want color-coding to go away as they start doing well. When they jump up a difficulty level, should we add it back? Think about it a little, then guess (I think it feels nicer not too - it feels like a vote of confidence if we wait for them to get one wrong).

A funny thing, many Apps have difficulty levels and scaffolding, but only per exercise. They have one problem set that uses color coding, the next has it removed, and the third uses slightly larger numbers. I'm merely saying to combine them into one big exercise. I think many Apps want to have short problem sets - a minute or two to finish. The students get more breaks, the creators don't need crazy formulas like I use, and the problem set are easier to explain.

Some of the other obvious things we could do, I don't think work. Most hints - like pop-up text or a demo of what to move where - are simply confusing. They mentally jerk you out of the exercise. Being forced to watch videos is annoying, especially having to rewatch the entire explanation as punishment for failure. Short hint videos are the worst of both.

Something that seems perfect for the computer, but doesn't work, is tracking progress on particular problems. For example, there are 81 single-digit addition problems using 1-9. We could record your results for each and retest the ones you get wrong. But that's terrible. Do we really want to ask them 100 questions on simple adding? For real they're learning a single skill: start at the first number and finger-count up by the second. If they can solve 8+5 using "9, 10, 11, 12, 13" and a dozen more like it, they have the idea.

The only time we'd want to track each specific problem is for true memorization. For example, sight words - words you can't sound out and need to memorize. "one", "why", "our" and so on.

A similar non-helpful idea is remembering what we were bad at in previous problems. Say we're on 2-digit addition and the computer remembers we were bad at 7+5. But if we're here, how bad can we be? And that was a while ago, anyway. It's also not clear how to do things differently - do we use 7+5 more, or less? And users probably won't even notice - it won't feel any more adaptive.

We could try with general things, but that's also rough. Maybe they seemed to learn single-digit addition faster using car pictures, so now we start with cars for all addition problems. But those things are hard to guess - they may have learned despite the car pictures. It's like when a computer recommends books you might like, which you clearly will not. It's better to put those options in a menu.

Looking for specific mistakes tends to have the same problem. If they put a 6 where a 7 should have gone, maybe they forgot the carry, or maybe they just added wrong, or they put it in the wrong spot. A noticeable percent of the time we'll guess wrong and give completely inappropriate help.

It seems as if we could use Artificial Intelligence. It turns out those tricks for the equations are AI, at least, game AI. They're well-known ways to make a computer duck or monster feel like a real, thinking creature. Non-game, modern AI is no good for us at all. We'd need millions of examples of our App being used on real students, in identical situations, in every way, then their end-of-year test results so the computer could learn what worked.

Finally, now that we can make it feel adaptive, what does that get us? I think, a lot. The possibility of feedback after each step is pretty good. Really, anyone should be doing that. But the real advantage is flow. Whenever we finish a problem set, go back to home base, and start another, it feels like a new problem, requiring a fresh solution. On the other hand, if a problem with larger numbers just pops up, obviously it should be solved the same way as the one they just did. When the next problem suddenly removes scaffolding, which is added back after a few mistakes, we're strongly communicating that it's the same problem either way.

One of the main teaching tricks is connecting everything with a short step to something they already know. Having each step take place in a single problem set is a good way to emphasize that connectedness.

Adaptive problem sets are also more interesting. The computer has something in store for us. Apps are always trying to make the problems more interesting. Many use something external, like rewards. Obviously, if we can, we should make the actual problems more interesting. Not to sound too sappy, we should show kids that learning, for the sake of learning, is fun.