The Case For Multiple Goals

A useful flashcard application would allow users to set different goals for different cards, control the way cards are prioritized during review sessions, and code custom scheduling and prioritizing algorithms. Furthermore, it would be very useful to purposely design the application to allow for experimental data generation and collection if the user agrees to it.

In a nutshell

Allow defining multiple forgetting curve models
- In theory, this allows for greater accuracy
Allow user to select preferred forgetting curve model
Allow defining multiple scheduling algorithms
Allow cards to be assigned to different scheduling algorithms
- Allows user to have different goals for different cards
Allow user to sort reviews by multiple sorting algorithms
- Allow for both ascending and descending sorting for each algorithm
Allow user to define custom sorting algorithms
- Allows user to have great control over how cards are prioritized
Implement methods for generating experimental data that the user can use to calibrate forgetting curve models
- Allow user to optionally export/import shared experimental data

Discussion

Choosing a Memory Model

The often-copied SM2 algorithm, makes no predictions of the retention rate for any card at any time, stores no parameters for any forgetting curve equations, and does not seem to be derived from any memory model in particular (at least not from what I can see). It also makes some questionable (in my option) assumptions and requires the user to expend mental energy attempting to rate the ease of his own recall process.

But it is easy to implement and works well-enough as long as your goal is high-retention levels at all times for all cards.

The model of the forgetting curve most associated with supermemo (and perhaps by extension, SM2) is R = e^(-t/s), but this sort of simple model has not been found to fit the existing data well.

Under SM2-based applications, one of the few practical applications of this equation is creating an interval modifier with the goal of hitting a specific point on the forgetting curve during review.

The equation for this interval modifier is M = ln(desired R)/ln(current R), which is derived from the above equation.

R = recall probability m = modifier t = current review interval s = storage strength (time until R = 1/e) —- Let new interval = tm Given current R (Rc) = e^(-t/s) ; let desired R (Rd) = e^((-tm)/s) = e^(m(-t/s)) ln(Rc) = (-t/s) ; ln(Rd) = m*(-t/s) ln(Rc) = (-t/s) ; ln(Rd)/m = (-t/sR) ln(Rc) = ln(Rd)/m m = ln(Rd)/ln(Rc), which also equals log(Rd)/log(Rc)

Unfortunately, as Rd and Rc grow farther apart, m quickly becomes unreasonably large. This simple exponential model may work well enough, but only in a narrow range of R values.

There are several other mathematical models of the forgetting curve that have been shown to provide a good fit the the available data in peer-reviewed literature.

It seems to me that estimating the shape of a card’s current forgetting curve can be a useful part of scheduling that card for future review. And to do that, I think we should experiment with models that actually fit the available data.

Multiple Cards, Multiple Goals

In a useful srs flashcard application, the user should be able to assign each card to one of several goals. Here’s a list of possible goal suggestions.

As a short-hand, I’ll use retention, and retention rate to describe probability of retention

Speed: Attempt to present cards in a way that helps the user build speed of recall. This would probably be similar to the goal of fluency, but with speed cards presented multiple times and most likely in the context of a separate speed drill exercise rather than in the context of regular flashcard review.
Fluency (Drilling): The card is scheduled for daily or near daily review with the goal of making recall of the information easy; bordering on automatic; i.e. Recall becomes fast and efficient enough that eventually, it has little impact on working memory.
High Retention (Juggling): Similarly to SM2, the card is scheduled in an attempt to maintain retention probability above a set level, while using the maximum possible intervals between repetions.
High Efficiency (Getting the best bang for your buck): Card is scheduled in an attempt to maximize the ratio of (decrease in decay rate) / (workload of review). This might be effectively the same as (average recall) / (workload of review).
No Scheduling: Instead of scheduling a review, simply estimate the parameters of the selected forgetting curve so that the the retention rate may be estimated at any time. We would then rely on a prioritizing function to select these cards for review.

Allowing users to define their own scheduling functions

Prioritizing/Sorting Functions

Falling behind and creating a pile of over-due cards is a well-known fact of user behavior. Any good flashcard application should plan for this with some reasonable prioritizing functions. Here are some ideas for how to prioritize cards.

Prioritize Need: Estimate the retention rates of all cards and present the cards with the lowest estimated retention rates.
Prioritize Ripeness: Similar to prioritizing by need, prioritize by descending ratio of (time since last review) / (length of scheduled interval). I’ll refer to tis as the card’s ripeness because I think it’s an intuitive analogy.
- To handle cards without a schedule, their scheduled interval could be set to 1. This way, they would always have a ripeness of 1 and would always be selected after cards that are past their scheduled due date.
- A secondary sorting of cards by algorithm priority would allow perfectly ripe cards (with a ripeness of 1) to be presented before unscheduled cards, but if scheduling is to the nearest second or minute instead of to the nearest day, this is unlikely to be an issue.
Prioritize by Time Since Last Review: OK, by itself, this might seem to be a strange option. Here, you’d just review the cards you hadn’t reviewed in the longest time. You might think no one would choose this option, but you’ll see why I include it in a moment.
Prioritize by Algorithm: Would allow you to prioritize by which algorithm was used to schedule the card’s due date. This would be useful for ensuring that speed drills happen before fluency drills and that high retention cards come before high efficiency cards. On its own, this may not be a highly useful sorting function, but when combined with other options, it could be very helpful.
- Anki has something like this built in: Learning cards are always presented before review cards, and New cards can be placed before or after review cards.
  - I’ll ignore new cards and focus only on review cards in this article.

Combining Prioritizing Functions

A user’s priorities might best be served by combining prioritizing functions in “Sort by this, then this, then that” fashion.

E.g.: Prioritize by Algorithm, then Ripeness, then Time Since Last Review, then by Need.

This would cause all speed drills to come first, then fluency cards, then high-retention cards, then high-efficiency, and finally un-scheduled cards. This order would probably fit the priority that most people would desire.

Within each algorithm, you would first see the ripest cards. Where cards are of equal ripeness, you would first see that have gone longest without a review. You might decide that it’s more efficient to go for cards with long intervals first. Or you might decide that the shorter interval cards are a higher priority and go with a descending sort here.

Finally, any cards that use the same algorithm, have the same ripeness, and the same time since last review, would be sorted based on their estimated retention rate.

Goals Change Over Time

In addition to allowing the user to set (and change) the goal for each card, it could also be highly useful to allow the user to automate the transition from one goal to the next. For example, after 7 consecutive days of successful fluency review, a card’s goal could automatically switch to high retention until it has been remembered for 7 consecutive sessions, then its goal might automatically switch to prioritize need and remain so until manually changed by the user.

Collecting Experimental Data

It’s possible to grab data from usage statistics and perform a sort of natural experiment to try to fit a forgetting curve equation to the user’s data. While this is possible, there are many potential problems such as cards of wildly varying initial difficulty, and spotty coverage of points outside the narrow 90-80% range of the forgetting curve.

For experimental purposes, it’s best to design the experiment in a way that minimizes such variations. A very useful feature of a flashcard application would be some way of implementing a somewhat controlled experiment with the user (with his consent). The data from this experiment could be used by the user himself and optionally shared (again with permission) with other users. The shared data pool could be made available to anyone who wants to experiment with it and could be used for generating reasonable default settings.

A Design Starting Point

As a simple starting point for designing such an experiment, let’s assume that all experimental cards are vocabulary words from an obscure language (such as Navajo). Hopefully, this would give them all about the same initial difficulty. Then cards are randomly scheduled for 1 hour, 8 hours, 1 day, 3 days, 7 days, or 14 days later. This series of delays would give us several points on the forgetting curve and these data points could be used to fit an equation.

When next reviewed, the cards are re-presented after a few minutes if failed and then once remembered, randomly scheduled to one of the same series of intervals times 2.5 (so 2.5 hours, 20 hours, 2.5 days, etc.). This would go on indefinitely, with experimental cards being rescheduled for random.choice([1,8,24,72,168,336]) * 2.5^(number of study sessions).

This would allow us to not only fit an equation to subsequent forgetting curves, but also to get an idea of how review at 95% recall differs from review at 50% recall and how that possibly changes over time.

Background

I’ve been an Anki user for a very long time, and in that time, I’ve come to know the details of its algorithm fairly well. Anki uses a version of the supermemo-2 algorithm (SM2), but with some tweaks such as learning steps, interval modifiers, fewer grading options, and the ability to set a post-lapse interval to a percentage of the last interval.

These tweaks improve upon SM2, but the basic strategy of the algorithm remains: target a subjectively ‘good’ range of the forgetting curve (80-95% ???) and try to schedule reviews for when cards are in that range.

This isn’t necessarily a bad strategy, but it certainly isn’t the only possible approach, and it may not be the best approach for everything.

I went looking for novel approaches; projects where designers had started over unencumbered by the design decisions of SM2, and taken a first principles approach to the problem. I didn’t find much. I did, however, find a lot of people who don’t understand that SM2 makes certain assumptions and design decisions that do not necessarily follow from the existence of a forgetting curve.

A common misconception seems to be that the forgetting curve somehow implies the change due to review of subsequent forgetting curves… it most most certainly does not. There a several equations that can fit the shape of a single forgetting curve, but I don’t know of any serious models of how review at any point on one curve affects the decay rate of the subsequent forgetting curve.

Anyway, thinking about some of the basic assumptions of SM2, its design decisions, and my years of experience using Anki, I came up with the ideas above.