Best Order To Learn Spanish Conjugations According To Word Frequency Data

If you want to tackle Spanish verb conjugation in an order that goes from most common uses to least common uses, and in an order that gradually builds your knowledge using the smallest increments possible, I think the curriculum below may be what you’re looking for.

Spanish Verb Conjugation Learning Sequence
- Relative Frequency of Common Conjugations in Spanish Subtitles
  - UPDATE 2020-05-26
    - Bull’s Text-Based Analysis
Design Principles
- Decreasing Frequency
- Baby Steps
Analysis Method
Background
- Mnemonics fail me

Spanish Verb Conjugation Learning Sequence

Note: These suggestions are based only on word frequency data. There are other considerations that you might use to design a learning sequence. For example, you might decide to delay the subjunctive mood because you find its use difficult to understand.

Indicative Present
- regular -ar, -er, and -ir patterns
- irregular verb ir (to go) indicative present
  - ir indicative present + infinitive (e.g. “I go(=am going) to eat”)
- past participle
  - haber indicative present
  - haber indicative present + past participle (= present perfect tense e.g. “I have spoken”)
- present participle
  - estar indicative present
  - estar indicative present + present participle (=present progressive e.g. “I am speaking”)
Imperative
- regular -ar, -er, and -ir patterns
Subjunctive Present
- regular -ar, -er, and -ir patterns
- haber subjunctive present
- haber subjunctive present + past participle (=present perfect subjunctive e.g. “I may have spoken”)
Indicative Preterite (past perfect)
- regular -ar, -er, and -ir patterns
Indicative Future
- regular -ar, -er, and -ir patterns
- haber indicative future
- haber indicative future + past participle (= future perfect e.g. “I will have spoken”)
Indicative Imperfect
- regular -ar, -er, and -ir patterns
- estar indicative imperfect
- estar indicative imperfect + past participle (=past progressive e.g. “I was speaking” aka imperfect progressive)
- haber indicative imperfect
- haber indicative imperfect + past participle (=past perfect e.g. “I had spoken”)
Indicative Conditional
- regular -ar, -er, and -ir patterns
- haber indicative conditional
- haber indicative conditional + past participle (=conditional perfect e.g. “I would have spoken”)
Subjunctive Imperfect
- regular -ar, -er, and -ir patterns
- haber subjunctive imperfect
- haber subjunctive imperfect + past participle (=past perfect subjunctive e.g. “I might have spoken”)

Steps 1-4 alone cover about 90% of verb usage frequency!

Relative Frequency of Common Conjugations in Spanish Subtitles

The suggested order above is based on the relative frequencies I found for the most common conjugation patters (in subtitles), as well as two simple design principles.

Indicative Present 40.3%
Imperative 15.4%
Subjunctive Present 12.2%
Indicative Preterite 9.0%
Infinitive 8.8%
Past Participle 3.7%
Indicative Future 2.9%
Indicative Imperfect 2.7%
Present Participle 2.2%
Indicative Conditional 0.9%
Subjunctive Imperfect 0.8%

If you’re curious to know how I designed the sequence, how I estimated the relative frequencies, and what prompted me to do this in the first place, then read on. I’ve tried to write this article in order of decreasing relevance/interest so you can just stop when you get bored. ;-)

UPDATE 2020-05-26

Thanks to a Reddit post, I discovered some analysis of verb tense frequencies published in a 1947 article in Hispania Volume 30 issue 4. Modern Spanish Verb-Form Frequencies, by William E. Bull, et al.

William E. Bull, Alfredo Cantón, William Cord, Rodger Farley, John Finan, Suzanne Jacobs, Robert Jaeger, Marie Koons and Barbara Tuegel Hispania Vol. 30, No. 4 (Nov., 1947), pp. 451-466

Bull analyzed a wide variety of written material, almost all of which was published after 1920. Most of it was published in the 1940s, if I recall correctly. No subtitles were included.

Bull’s Text-Based Analysis

Indicative Present 40.1%
Infinitive 18.7%
Indicative Preterite 12.1%
Indicative Imperfect 7.4%
Present Participle 5.2%
Indicative Present Perfect 3.6%
Subjunctive Present 3.1%
Indicative Future 2.6%
Imperative 2.5%
Subjunctive Imperfect 1.6%
Indicative Conditional 1.6%
Indicative Past Perfect 0.7%
Subjunctive Past Perfect 0.2%
Perfect Infinitive 0.16%
Subjunctive Present Perfect 0.12%
Indicative Future Perfect 0.09%
Subjunctive Future 0.03%
Conditional Perfect 0.02%
Preterite Perfect 0.01%

As you can see, Bull’s analysis can distinguish among the different compound tenses, whereas my quick-and-dirty analysis cannot. Also, Bull includes many exceedingly rare tenses that mine does not. We can see some big differences from the subtitle data; for example, the imperative (where you’re telling someone to do something) is far more common in subtitles than in text. If your goals is mostly to be able to read text, Bull’s analysis might suggest that you learn the tenses in a different order from the one I suggest.

Design Principles

Conjugation patterns (paradigms) should be learned in order of decreasing frequency.
The transition from one step to the next should require the learner to master a minimum amount of new information (the baby steps principle).

Decreasing Frequency

By learning the most frequent conjugations first, we take the most efficient steps towards mastering the language. The present tense seems to account for about 40% of verb usage in Spanish (see Analysis Method section), so by learning the present tense conjugations, we can now understand about 40% of the verb usage we see (usage, if not meaning). We should be able to look up the meaning of verbs used in the present tense and understand the sentence.

If we had started with the subjunctive imperfect, we would only understand about 0.8%, or practically none of the verb usage we see. I’d imagine that this would be very discouraging.

The faster the learner sees progress, the more motivated I think he’ll be to continue. And continuing is probably the most important part.

I don’t think this idea is particularly controversial. In fact, it may indeed be dogmatically accepted by most. However, applying this simple principle well may be difficult. It’s tempting to start out by grabbing a list of the 600 most common words off some website and tell yourself that you’re going to memorize them. The problem with this is that only a fraction of the people who try this will be able to stick with memorizing that many words, and even after memorizing their meanings you’ll most likely find that without any grammar, all those words do disappointingly little to help you understand the language.

Likewise, you could filter out the verbs from that list of most common words, but trying to learn every form of every verb on its own is going to be a lot more work than learning the relatively few patterns common to thousands of verbs.

Baby Steps

This is probably another principle that isn’t controversial, but which is sometimes hard to actually apply.

By following one conjugation pattern with the same pattern for an irregular verb, then a participle form that uses the irregular verb in that pattern, we create two steps that add on small pieces to what we’ve already studied.

As far as organizing the order of these verb tenses, that was about all I could do to break things up into baby steps.

Analysis Method

Warning: I’m going to just call mood/tense/aspect combinations tenses to save time and to make this more readable. I’m sure I’ll misuse other grammatical terms as well, but if you’re a grammar Nazi, you’ll understand what I mean anyway. You grammar Nazis can suck it! ;-)

And while I’m at it… Disclaimer: I’m not an expert in any of this stuff, so you shouldn’t just take my word for it.

The tools

A Concise Grammar Book

I’m no expert in Spanish, so in order to find out what the common conjugation patterns are and how the compound participle versions are formed, I used a little book I have called The QuickStudy for Spanish.

A Spanish Word Frequency Database

For Spanish word frequency data, I grabbed a very large list of word frequencies taken from subtitles. Here’s where I found it: http://crr.ugent.be/archives/679

Using subtitles means we’re using mostly modern Spanish (no old-time Don Quixote stuff from the 1600s), and we’re looking at the kind of things that people commonly say in dialogue as opposed to the more formal language people tend to use in writing. Traditional word frequency lists come from large bodies of written material (some of which are quite old) that often don’t reflect common modern usage. For what I’m doing here, this distinction may not matter so much. However, if your goal is to get up to speed speaking rather than reading old books, then subtitle-based data may be important.

This database includes the actual words, but no indication of what part of speech they are, and the verbs have no indication of what the infinitive form is. Homonyms such as para (a common preposition, but also a form of the verb parar) aren’t separated by meaning, so that will confound the results a little, but maybe not much, as I don’t think Spanish has many homonyms.

The list is downloadable in excel format. I wrote a little python script to read it and load it into an sqlite3 database.

Parts of Speech, Verb Conjugations, etc.

The other tool that was necessary was some sort of part-of-speech tagger that can also identify the lemma (dictionary form) of each word so that different forms of the same word (verb in this case) can be grouped together.

Well, I found something for that too: https://www.clips.uantwerpen.be/pattern

I imported the word frequency database into an sqlite3 database and added all words ending in ar, er, and ir to a separate table (called lemmata). This should have captured the infinitive (lemma) version of all the ar, er, and ir verbs, but there could be many non-verbs in the bunch.

I created a script to run each word through pattern.es to have it guess the part of speech and recorded that in the lemmata table. Pattern.es is much better at guessing the part of speech when you give it words in context; in a sentence. Then it can try to tell homonyms, and different usages of the same word apart. Next, for every one that pattern.es identified as a verb (with the VB code), I had it conjugate the verb into all forms of each of the tenses that pattern.es can handle (which includes all common tenses in modern Spanish, but not a few uncommon ones), and into the participle forms. After that, I had my python script search the word frequency database for each form of the verb and record the total frequency count for that verb/tense combination in a separate table.

Pattern.es can generate all forms of a verb with the lexeme() function, conjugate specific tenses with the conjugate() function and parse sentences (or even single words) to try to identify the parts of speech using the parse() function.

I ran this script on the 1,107 verbs excluding some such as ir and haber that are used as auxiliary verbs and thus have unusually high frequency.

The result was a table with the frequency of several tenses for each of 1,107 verbs. A table that undoubtedly contained errors due to words such as para (a common preposition) contributing to parar (a verb) as well as other homonym issues, but a table with data that was probably accurate enough for my purposes.

The Results

Using sqlite to sum up the totals by tense, I created the relative frequency list below.

Indicative Present 40.3%
Imperative 15.4%
Subjunctive Present 12.2%
Indicative Preterite 9.0%
Infinitive 8.8%
Past Participle 3.7%
Indicative Future 2.9%
Indicative Imperfect 2.7%
Present Participle 2.2%
Indicative Conditional 0.9%
Subjunctive Imperfect 0.8%

Interpreting the Results

Now the infinitive form is the plain dictionary form of the verb that everyone learns when they learn vocabulary. It doesn’t need any special attention because of this, and you can’t say anything meaningful with the infinitive unless you know at least one other verb form, so we can make a mental note to maybe introduce new verbs in the infinitive as we go along, but we can essentially jump straight to learning the present tense.

The participles make up a very large percentage of verb usage. This is most likely due to the fact that they are used with auxiliary verbs to form the compound verb tenses (e.g. had been, will have been, etc.). They can really only be used with another verb, so it makes sense to learn how they’re used along with present tense auxiliary verbs first, and then learn their other uses after we’ve learned the tense needed for the auxiliary verb.

There are some irregular verbs in Spanish, but I think we can put those off except for the most common irregulars; the ones that are used as auxiliary verbs (mostly just haber), the ones used for to be (estar and ser), and maybe a few other common ones.

So going through the order above, and by adding in special irregular verbs and participle uses after the relevant tense, I came up with the curriculum at the top of this article. Hopefully, this creates a sequence that does two things: studies conjugations in descending order of frequency, and creates minimal increments from one stage to the next.

Background

Learning Spanish verb conjugations is probably the biggest challenge for native English speakers because it’s a large chuck of the language that has little to no parallel in English. It would seem, then, that verb conjugation is the area where attempts to optimize the learning sequence have the most potential to pay off.

I’ve known bits and pieces of Spanish for a long time; nowhere near fluent, but not an absolute beginner. Throughout the years, I’ve decided at times to get serious about learning more, but I find it hard to stick with it.

Part of my frustration is not knowing enough of the verb conjugations to understand much outside the present tense. No matter how many nouns or adjectives I know, if they aren’t used with verbs in the present tense, they don’t do me a whole lot of good.

Much of Spanish (nouns, adjectives, adverbs, syntax, etc.) is similar enough to English that it isn’t much of a problem. For nouns, for example, you pretty much just need to know the meaning; the plural forms follow simple rules and only pronouns have accusative or dative cases. But where English uses syntax to say who’s doing the action, Spanish uses verb conjugation. This is a significant difference.

From what I can think of off the top of my head, regular English verbs have a grand total of about 5 forms: The regular (infinitive) form, the form with an ‘s’ attached for 3rd person singular (probably a weird hold-over from earlier more complicated versions of English), the past-tense (-ed) form, the -ing form (often used as a noun e.g. running), and the participle form (which is often the same as the past tense). Most tenses are created by adding auxiliary verbs (e.g. will, would, did, had, etc). Thus, we can say things like, “I jump. I will jump, I did jump, I would jump, I jumped, I had jumped, I have been jumped, I will have been jumping, etc.” with minimal changes to the verb jump.

Spanish uses auxiliary verbs in a similar way, but to a lesser extent. Instead, much of the work of saying when something happened and who did it is done by conjoining the verb stem with different endings (conjugation). Conjugations handle about five tenses in the indicative mood (past, present, imperfect, future, conditional), one in the imperative, and two common ones in the subjunctive (though there are more subjunctive tenses that are now uncommon). Two participle forms are used with various auxiliary verbs to create more tenses.

Now each of the tenses requires that a verb be conjugated six ways (one for the first person, second person, third person, and the plurals of the preceding three people). And each tense can be conjugated according to three main patterns: one for -ar verbs, -er verbs, and -ir verbs.

So ignoring the participle forms and the auxiliary-verb forms, we have 8 mood/tense combinations times 6 persons times 3 patterns for a grand total of one-hundred-forty-four regular endings to learn. For native speakers of other romance (Latin-based) languages, most of these forms are probably familiar and don’t represent much of a challenge, but for native English speakers, this is probably the biggest challenge of learning Spanish.

Mnemonics fail me

I tried to create mnemonic devices for all these endings. What I did was create an element for the person, time, and ending. So mnemonics for verbs ending in ar all included cars, the present tense included presents (gifts), the future tense included Doc Brown from Back to The Future, the second person was a female sheep (a ewe), etc., etc.

It seemed like a logical systematic way to go about it. But there were several problems. First, so many mnemonic devices had elements in common, that there was a lot of interference; I couldn’t remember things like whether Bear gave Ewe an ass (donkey), or if Ewe give Bear an Ace cider. Second, the process for going from the idea of first-person singular preterite -ar to an actual ending (-é) was slow and difficult. Finally, I realized that I was spending a lot of time studying mnemonics that were not going to be very useful; that’s not very efficient. It would probably be more efficient to just learn the endings by rote repetition of the patterns and an example sentences for each form.

This isn’t to say that mnemonics aren’t helpful for learning conjugations; just that the way I was trying to go about it wasn’t. Yet another example of where applying a simple principle can be difficult to do well.

I figured that at least initially, it was probably a good idea to limit study to just a few tenses. The most common tenses seemed like likely contenders. That way, I’d be studying the most common things first, so the extra effort due to the unfamiliarity of the tenses would be somewhat balanced by their increased usefulness. As I gain more competence, fluency, and experience with the language, the less common tenses should be easier to learn, so their decreased usefulness should be mitigated by the (now-lower) difficulty in learning them (due to having more experience with the language).

A quick web search for most common Spanish tenses shows that there are mostly fragmented opinions out there; if anyone has done a systematic evaluation of how common the different mood/tense combinations are, I haven’t come across it. Perhaps it’s in the academic research out there and the search engines just don’t rank it high.

So there you have it. Hopefully my quick and dirty little analysis will be helpful to someone other than just me. ;-)