---
title: JLDrill's Scheduling Strategy
in_menu: true
sort_info: 85
---

The Goal of JLDrill
===================

JLDrill has two types of scheduling strategies.  The first is for
short term acquisition of new material.  In this mode the item is
repeatedly shown to the user until the user can remember it correctly
a number of times in a row.  The second mode is for long term review.
In this mode JLDrill uses spaced repetition to occasionally review an
item with the user over days, weeks or months.

The goal of JLDrill is to maximize the learning rate of items
using a spaced repetition algorithm.  In the following two sections
I will describe what I mean by that.

What is Spaced Repetition?
--------------------------

Spaced repetition is the presentation of material separated by spaces
of increasingly large duration.  These spaces are measured in terms of
minutes, hours, days, and months.

Initially a user sees a new item and tries to remember it.  If the
user can remember the item, the system will schedule the item
for review at a later time.  After some time passes (a space)
the user reviews the item again.  If they get it right again,
the system schedules a new review at an even later time.
Each time the user remembers the item, the space between reviews
gets longer.  But if the user forgets the item, then the system
starts again with a very short space between reviews.

The technique of spaced repetition is based on the concept of a
"forgetting curve".  When someone memorizes something, immediately
after memorizing it the chance of remembering is very high.  As
time passes, though, the chance of remembering the item falls.  The
speed with which people forget something is called the forgetting
curve.  The first time you see something, the curve is very steep and
you are likely to forget it after only a short time.

Each time you remember an item, though, the speed of forgetting it
slows down.  That is, the forgetting curve becomes less steep.
Ideally we want a very shallow forgetting curve so that even if we
don't see an item for several months, we are still likely to remember
it.

The intent of a spaced repetition algorithm is to schedule reviews of
an item so that the forgetting curve becomes less steep.  But we also
want to minimize the number of reviews so that we don't waste time
reviewing something we already know.

Learning Rate
-------------

Your ability to remember an item improves every time you correctly
remember it.  But it takes time to remember an item.  Every time you
review an item it might take you 10 or 20 seconds.  A database of
5,000 items (a reasonable amount for learning a language) would take
you as much as 24 hours to review even one time.  Obviously we can't
review every item all the time.  We want to pick and choose the items
that require review the most.

Since the learning curve gets less steep every time you remember an
item, you can increase the amount of time between reviews.  Maybe the
first time you wait 1 day.  The second time, 2 days.  Each time you
get it right, you double the time.  In this way you would be able to
go a whole month without reviewing the item after reviewing it only 6
times (an investment of a little over a minute).

But if you forget the item, you have to start at the beginning,
meaning you have wasted all the previous effort.  So you have to make
sure that you minimize the number of times you make mistakes.  This
means reviewing with short spaces so that you have a high probability
of remembering.

Obviously there is a balance to maintain.  The goal of JLDrill is
to maximize the number of items learned in a given amount of time.
Since there is always a chance that you will forget an item,
we have to define what "learned" means.  In this context it means
being able to remember the item after a space of 30 days.

I will define the "learning rate" to be the number of items
"learned" (i.e., have been successfully remembered after a space
of at least 30 days) divided by the total amount of time
invested in those items.  The goal of JLDrill is to maximize
this learning rate.

Note: JLDrill doesn't actually measure the learning rate right
now.  It certainly should.

Details of JLDrill's Strategies
===============================

In this section I will describe JLDrill's Strategies in more detail.

First we need some basic definitions.

There are three types of vocabulary items in JLDrill: 
    
- A new item is an item that you have never seen before.

- A working item is an item that you have seen, but haven't
completely memorized yet. You may have memorized it once
before, but if so, you have since forgotten it.

- A review item is an item that you have memorized and that you are
likely to remember in the future.

In the program, these items are organized into 3 analogous sets: the
"new set", the "working set" and the "review set".

Initially items are moved from the new set into the working set.  The
working set is of a limited size - usually only 10 or 15 items.  You
simply review the items in the working set a number of times until you
demonstrate that you have memorized it.  Once an item has been
memorized, JLDrill moves it into the review set.

JLDrill is organized a little bit differently than most other spaced
repetition programs.  In other programs, the focus of activity is
centered around reviewing items in what JLDrill calls the review
set.  JLDrill, on the other hand, focusses on the working set.
It tries to keep the working set full.  Once you move an item to
the review set, JLDrill will drill you on items from the review
set until you make a mistake.  The item you forgot is moved into
the working set so that it can be re-memorized.

However, it is not beneficial to review items from the review set
exclusively.  As you may have noticed in the previous section, there
are diminishing returns for reviewing items.  JLDrill keeps track of
how often you get items correct in the review set and once you reach a
rate of about 90% it stops drilling items.  Instead, when a working
set item is memorized, it is replaced with an item from the new set.

In this way, JLDrill tries to focus activity on learning new and
forgotten items, while retaining a recall rate in the review set of
about 90%.  Details of how it does this follow.

Learning Working Set Items
--------------------------

The purpose of the working set is to acquire new or forgotten items.
Theoretically, one could use spaced repetition to acquire new items.
However, this gets to be a bit problematic because very short
spaces between items might not be convenient for the user.  The
user is dedicated to the application while using it, so we might
as well review items continuously.

JLDrill simply creates a set of items (by default 10, but configurable
by the user).  Each item is presented to the user once in random
order.  Then they are randomized and presented again.  If the
user gets the item correct a number of times in a row (6 by default),
the item is "promoted" to the review set.

Since JLDrill is designed to drill Japanese vocabulary, there are 3
levels in the working set.  In the first level, the user is shown the
kanji and reading for a word and must guess the meaning.  In the
second level the user is shown the kanji and must guess the meaning
and reading.  In the third level, the user is shown the meaning and
must guess the kanji and reading.  Each level must be answered
correctly a number of times (by default 2) before it is promoted to
the next level.  After successfully answering the third level the
requisite number of times, the item is promoted to the review set.  If
the user makes a mistake, the item goes back to level one.

The forgetting curve for new items is very steep.  The exact amount of
time required between reviews depends a lot on the user and the type
of items involved.  The user can manipulate the space between reviews
by altering the size of the working set.  Since each item in the set
is reviewed once before they are all shown again, having less items
means the items will be repeated faster.

Reviewing Items in the Review Set
---------------------------------

JLDrill tries to maximize the retention of learned items while
minimizing the cost of review by trying to keep the chance that you
can remember items in the review set at 90% or above.  It does this by
roughly ordering the items by their probability of success and only
offering items for review when the probability drops below 90%.

In most spaced repetition programs items are scheduled for review.  When
the item has waited long enough, the item is reviewed.  The algorithms
that do this make a lot of assumptions about the shape of the learning
curve in various circumstances.  JLDrill takes a much simpler
approach.

JLDrill simply tries to grossly order the items by probability that
the item will be forgotten.  It then offers the items for review in
that order until the user demonstrates that they can correctly guess
around 90% of them.  At that point it stops offering items for review,
and new items are used instead.

The algorithm for ordering the items is also very simple.  Recall that
the probability for remembering an item is a curve.  At the start, the
chance of remembering an item is 100%.  As time passes the chance of
remembering falls.  This curve is not linear, but the part of the
curve from 100% to 90% is very, very close to linear.

The JLDrill algorithm creates a potential schedule for each item.  It
does this by multiplying the amount of time it took since last
successfully reviewing the item by a factor.  It then sorts the 
items based on the percentage of time that has elapsed in that 
potential schedule duration.  The factor used for multiplying depends
on how much time has elapsed (see below), but for the following
example, let's say the factor is 2.

For example, imagine there are two items.  Item A has waited 5 days
since the last review.  Item B has waited 50 days since the last
review.  Both items are guessed correctly and a new schedule is made.
Item A is scheduled for 10 days in the future.  Item B is scheduled
for 100 days in the future (twice their previous wait times).  After
waiting one more day, Item A is 10% through it's schedule, while Item
B is 1% through it's schedule.  Item A is sorted before Item B.

Note that neither item is likely to be reviewed on it's actual
scheduled date.  The schedule is only used for creating a rough
estimation of their probability of success.  Each time the application
is used, the items will be presented one after another (highest
percentage of time used first) until an actual measurement of 90%
success is achieved.

Prioritizing Review
-------------------

Because JLDrill doesn't rely on estimating the exact time an
item should be reviewed, the amount of time it chooses for its
potential schedule is less important.  This potential schedule
is only used to order the items, not determine when it will be
reviewed.  In fact, for long duration items the exact order
isn't even very important because the chance of forgetting it
drops off very slowly.

For example, imagine an item will drop from 100% recall rate
to 90% recall rate in 30 days.  Since this part of the forgetting
curve is very close to linear we can say that the chance of
forgetting increases by 1% every three days.  So even if the
item is delayed for 9 days, the chance of remembering only drops
to 87%.  In other words, getting the schedule wrong by almost 30%
affects the chance of remembering by only 3%.  As long as
these kind of items are roughly ordered, any deviations from
perfection will be unnoticeable.

Short duration items are more problematic.  Since the slope of
the curve is very steep, a delay of even a few days can easily
put the item in the non-linear part of the forgetting curve.
Thus it is important to increasingly prioritize short duration
items as time passes.  JLDrill does this by ordering the items
by percentage of time elapsed in the potential schedule.

For example, if there are 2 items, one with a potential schedule
of 2 days and another with a potential schedule of 10 days,
after waiting one day, the first has waited 50% of it's schedule
while the other has waited 10%.  After one more day, the first item
is at 100% of it's schedule while the second is only 20%.  In this
way, short duration items bubble to the top of the priority list
as time passes.

Determining the Potential Schedule
----------------------------------
Again, getting the exact scheduling time is not necessary.  JLDrill
orders the items roughly by percentage chance of remembering
and increasingly prioritizes short duration items as time passes.
Then the items are drilled until a 90% success rate is achieved.
Because of this, as long as the potential schedule is consistent,
it doesn't have to be correct.

JLDrill measures the amount of time between correct answers and
creates a new potential schedule by multiplying that time by a factor.
Note that this potential schedule is almost certainly wrong.
However, it is consistently wrong and will place the items in
roughly the correct order.

The factor it multiplies by is dependent upon the time that has
elapsed.  An item that was reviewed less than a day ago, will have
a factor of 2.  In other words, when such an item is correctly
remembered, the new potential schedule is twice the duration that
it had waited.  But for items that waited longer, the factor
is reduced linearly, until at 180 days the factor is 1.  In other
words, items that have waited 180 days or more will be scheduled
for the same amount of time they have waited.

There are many rationales for changing the multiplication factor.
The first is that it was determined that it is useful to review
known items on a regular basis.  By backing off the multiplication,
this is achieved.  Old items are reviewed about every 180 days.
But also, because the penalty of reviewing an old item is small,
the penalty for forgetting an old item is great.  So it makes sense
to give a little bit of extra effort for old items.

New items present a problem.  They have not been scheduled before and
it is difficult to determine where they related to each other and the
old items already in the schedule.  Because of this, JLDrill does not
use a fixed starting point for the first schedule.  Instead it
uses a value anywhere from 0 to 5 days depending on how many times the
item was incorrectly remembered in the working set.  The more times it
was incorrect, the closer to 0 the item will be scheduled.

This creates a step function for the first interval.  But this also
causes a problem.  Items that were guessed correctly almost every time
will end up with the exact same potential schedule.  This means
that they will be presented to the user in the same order every
time.  It is desirable to mix up the items every time they are
presented.  To achieve this, the schedule is always varied randomly
by +-10%.  This creates separation between the items and allows
other items to be inserted over time.  This variation is used even
when rescheduling old items.  This allows items which are guessed
identically to end up in completely different places after only a few
generations of scheduling.

Finally, if a user neglects to study, the amount of time since
the last review can be very large.  As such, the new potential
schedule will be even larger.  Even when an item has sat
for a long time, there is a small chance that the item will
be remembered.  In this case the item will be scheduled far
into the future, depriving the user of practice.  Because of
this, length of time used for generating the new potential
schedule is limited to 25% more than the preview potential
schedule.

For example, let's say an item is scheduled for 1 day in the
future.  But the user waits 10 days before reviewing the item.
Even if the user gets the item correct, the new potential
schedule will only be 2.5 days (1 day plus 25% of one day
times the multiplication factor of 2) instead of 20 days.

Success Rate Estimation
------------------------

The key feature of JLDrill's algorithm is that it measures when
the user is getting a 90% success rate and stops reviewing.
This is what allows JLDrill to operate without having to
precisely schedule each item.

Originally JLDrill used a Bayesian estimate of the probability
that the current items were at or above 90% success rate.
Because the items in the list are changing over trials (grossly
increasing) a lower limit was placed on this estimate.
When the estimate reached 90% (i.e., 90% confidence that the
rate was 90% or above) review was halted.

This proved to be troublesome, though.  The problem is that
for the most part, when the user is reviewing every day, the
items in the list don't fall too much below 90%.  They hover
around 75 - 90%.  Unfortunately the average number of reviews
that happen before the Bayesian estimate is 90% for items that
are actually 80% is only around 24.  When the daily review gets
large, this means that this will be hit often.

The other problem is that when an item is guessed incorrectly,
the Bayesian estimate always drops a long way.  That's because
we are testing that the value is 90% or above.  If you
get 8 right in a row and get one wrong, your confidence that it's
90% or above drops quickly.

This resulted in the estimate practically equating to getting
9 or 10 right in a row means that you are at 90% confidence.
It is not exactly the same, but you can prove to yourself that
it's so close that it doesn't matter.

In the end I decided to modify the algorithm to be something that the
user could relate to, rather than the Bayesian
estimate.  I collect the last 10 results.  When there is a 90%
success rate I start a countdown.  If the user maintains a 90%
success rate for 10 more items, the review is finished.

This is almost exactly equivalent to getting the Bayesian estimate.
Once it reaches 90% confidence, reset the confidence and
start again.  Once it reaches 90% confidence again the review is
finished.

The rationale for this approach is that we get to a 90% confidence
that we are in the 90+% area.  We then start our estimation
procedure again and test that we remain in the 90+% range.  This
gets us over most of the false peaks.

As I said, I decided simply to keep track of the last 10 results
and maintain a 90% rate for 10 items.  I believe this is more
understandable for the end user.  It is also almost identical in
result to using the Bayesian estimate (although the approach is
*slightly* more likely to finish earlier).  Finally, the code is much
more straightforward since it doesn't require any complicated
math.

Kanji and Meaning Problems
--------------------------
Starting in JLDrill 0.4.1, Kanji problems and Meaning problems
are reviewed with separate schedules.  In previous releases
there was only one schedule and the type of review was
randomly determined.  In the new scheduler, if any of
the problem types are guessed wrong, all of them are restarted
from the beginning.  This allows you to ensure that all
aspects of the vocabulary are remembered properly.

But this causes some issues for scheduling.  The most obvious
problem is that there are now 2 problems scheduled for each
item.  If the user sees them close together, they are likely
to remember the answer from the last review.  In order to
combat this, the problems are scheduled with different random
variations.  Since each schedule is varied by +- 10% it doesn't
take long until the problems are far apart in the schedule.

But the other issue is that the item is still being reviewed
twice as often as it was previously.  This means that the recall
rate will be above 90%.  This is especially problematic when
there are items mixed in which don't have kanji.  Those items
are only reviewed once.  The order of the items may be
incorrect because the recall rate is different.  This is
an issue that hasn't been addressed yet.

What Percentage Level to Stop Reviewing
---------------------------------------

JLDrill chooses a 90% rate to stop reviewing.  This was chosen
based on personal preference and claimed success rates of
other spaced repetition algorithms.  However, this rate should
be chosen such that the
weighted cost of relearning is equal to the cost of reviewing.
In other words, the cost of relearning the word over again multiplied
by the chance that the word is forgotten is equal to the cost of
re-reviewing the word so that it isn't forgotten.  This creates a
balance.  This value is quite difficult to calculate and is
made more difficult because the amount of time between reviews
in JLDrill is not predictable.  However, I have found that in
practice 90% seems to work well.  More work needs to be done
in this area.