It seems there is a lot of confusion going around WordPress developers and enthusiasts regarding the proper use of the WordPress localization functions. Unfortunately, 99%1 of the tutorials circulating the Interwebz right now, only scratch the surface of localization by mentioning less than a handful of the functions available, and to make things worse, some of them are outdated or just plain wrong. Top that with insufficient knowledge of foreign languages, and you get a topic of localization that’s totally misunderstood or even skipped altogether; Plural Forms.
“What about plural forms?” you ask. 1 is singular, 2 is plural, right? Well, no… Or maybe yes, in your language. Let’s see a bit of code:
If you ever did anything similar to the above code2, you definitely need to keep reading.
One apple, two apple, three apple
Not all languages are alike and no two languages are exactly the same. If you don’t know the rules that govern a language, you shouldn’t be making any assumptions about it. And when you are trying to create a plugin/theme and want to make it translation-ready so that it can be translated in any language, you definitely can’t make any assumptions whatsoever since you don’t even know the languages that your code will be translated to.
You see, when it comes to numbered words, English is easy. Greek is easy too (happens to be my native language). In both languages counting apples goes like: 1 apple, 2 apples, 3 apples, etc. Other languages differ though. For example, in Turkish and Hungarian it goes like 1 apple, 2 apple, 3 apple but there is a different form for “the apple” and “the apples”. In Japanese however there is no distinction between singular and plural at all. Do you see now how the code above can be problematic? “Wait!” I hear you say. “The two strings above can be translated for Turkish, Hungarian and Japanese” you say.
O RLY? Consider this. Irish has special cases for 1, 2, 3-6, 7-11 and the rest. That’s 6 different ways to change the words following a number, depending on the number. But since you know this now, you could change your code to accommodate, right? How about Russian then, where there is a special case for those numbers that end in 1-4 but not end in 11-14. Getting out of hand already? Consider also Slovenian, where 1 and numbers ending in 02, 03 or 04 have special cases.
Can’t know it all
You see, it’s a whole wild world out there with a huge variety of languages and rules and exceptions and things we don’t know and perhaps never learn about them. Localization and internationalization however, are issues that pretty much existed during the whole course of computing history. That’s how gettext was born 20 years ago in the first place, and plural form handling was added some 5 years later. I’m not saying gettext is the only way to localize software, just that it’s open source, tried and proven by thousands of people in my/your place, many years before us. gettext eventually found its place in PHP, and WordPress built its wrapper functions around it to make it easier and cleaner for us to use.
Localizing text with numbers
Fortunately, WordPress provides us with a few function calls to make the whole process of translating numbered texts easy.
[box type=”info”]It is important to note that English is assumed to be the (base) language that strings are going to be translated from.[/box]
There are only 5 functions that we should know about, with two of them just being variations. We will not cover what a domain and a context is, as I assume you are already familiar with them. Let’s take a look at how to use them in action, and we’ll later cover the why.
_n( $single, $plural, $number, $domain );
This is the function that you should probably use more often. It goes hand in hand with sprintf(). $single is the text in singular, $plural is the text in plural, and $number is the variable number depending on which the right string will be used. For example:
If $apples equals to 1, the string echoed will be 1 apple. If $apples is greater than or equal to 2, for example 6, 6 apples will be printed.
_nx( $single, $plural, $number, $context, $domain );
We use this function just like _n(), except when we need to disambiguate between numbered words or phrases that can be difficult to translate without context. For example words that can be used unchanged as verbs and as nouns, homographs and homonyms, polysemes and capitonyms, are some of those cases that you’ll need this function.
For example, the word post can be used as a noun (e.g. a WordPress post) as well as a verb (e.g. the action of posting something).
You would use the appropriate one depending on the situation. For example, the first would be used when counting how many blog posts are in a category, while the second would be used when counting how many messages a user has posted.
Context, of course, can be longer than just a single word.
The word “minute” can now be properly translated in any language that has a different words for the time and angle units.
_n_noop( $singular, $plural, $domain );
There are cases where we might need to prepare our strings before we know the actual number that will determine if singular or plural will be used. For example, a library or a plugin might allow you to customize its output. Let’s consider an imaginary function that displays the count of products that a visitor currently has in the cart. The function resides in a plugin and expects a call like this:
Since you know that you need to pass a message that will display a number, you would call it like this:
What this does is, it prepares an array of all your values in the form that they were passed into the _n_noop() function. No actual translation happens at this time, since we don’t know the actual number yet. If you var_dump() the $msg variable, it should look like this:
_nx_noop( $singular, $plural, $context, $domain );
This function is identical to _n_noop() except it also accepts a disambiguation context, just like _nx().
translate_nooped_plural( $nooped_plural, $count, $domain );
Given a number, this function translates the output of _n_noop() and _nx_noop(). Consider the example function show_cart_count() given above. Since it already receives the messages to be translated, all it needs now is to determine the actual number, and translate and echo the appropriate message.
Note how I omitted to pass a domain to the translate_nooped_plural() function, although you can clearly see it accepts one in the headline above. This is because translate_nooped_plural() will check if $nooped_plural has a value in its $nooped_plural[‘domain‘] key, and if non-empty, it will use that. This way, a string can be translated according to the translations of the caller, which are stored under the my_domain domain, instead of the plugin_domain translations that may happen to collide.
Nooped plurals are mainly used with WordPress in cases similar to the above. When the action of translating/showing a message must happen in file A with domain AD, but the actual strings must be registered into file B and be included in the language files (.po/.mo) of domain BD, the only elegant solution is to noop them. This way, a project can have its unique domain while being able to accept and use a string from another domain. Konstantin Kovshenin gives a few more examples of nooped plurals.
Generally, they are used used to defer translation at a later time. The need for deferred translations arose while programming in C, in cases such as when messages need to be defined as constants. Their size is determined in compile-time and it needs to be constant by definition (duh!), but translated strings may have any any size.
Internally, the translate_nooped_plural() function uses the _n() and _nx() functions depending on the contents of $nooped_plural, so it really acts as a middleman.
Why sprintf() ?
In all examples above, the values returned by the _n*() functions were wrapped into an sprintf() call, and then echoed. This is because those functions don’t replace the format specifiers (%s, %d, etc) with the actual value of the number passed. For example:
If $apples was 3, this would print out: %s apples
This is because the _n*() functions jobs is to return the appropriate translation of the string, and nothing else. A Greek translator would have seen “%s apples” and would have translated it as “%s μήλα”. So, the above statement really translates to:
So we need to substitute the %s format specifier with the actual number, and that’s why we need to wrap the whole thing in sprintf() or any other of the *printf() family of functions (printf, sprintf, fprintf, vfprintf, vprintf, vsprintf).
While not necessary, it is good practice to have the _n*() inside the *printf() to avoid unnecessary bugs. Consider this:
This could easily be changed by mistake to:
This would print “2 apple” in English, and it would be obviously wrong. The used string was determined way before the printing, and the value changed somewhere along the way. This can easily happen in large and complex projects, so, a good rule of thumb is to translate and sprintf together, as late as possible.
Plural Forms – What .po files look like
Each .po or .pot file carries a number of headers that specify language-specific configuration. The headers of a pretty standard .po/.pot file look something like this:
What’s of interest right now to us though, are the following two lines:
The first line, defines the language of our file to be Greek (el), and the second line defines the rules that govern the usage of plural forms. If I was going to translate the apple example:
I would find the el.po file containing the following lines, generated from the combined information of Plural-Forms and _n():
Which I would go on and translate as:
Note that the msgstr line contains the singular form, and not the translation of the words for the case of %s being equal to 0. msgstr is really a zero-based array, and msgstr is its first element. Similarly, msgstr is the second element, and does not hold the string for %s = 1. This is due to the nplurals=2; plural=(n != 1); line.
nplurals=2; plural=(n != 1); is a C-syntax expression, and as far as the Plural Forms are concerned, it’s almost identical to PHP’s expressions. C variables don’t get a dollar sign in front of them, so you can now start seeing that the content of that line is just two variables getting assigned something. Breaking it into two lines and adding some spaces makes it a bit more clear.
nplurals is just a variable that states how many plurals the language has. According to this, msgstr zero-based elements are created. For Greek, it’s two, 0 and 1. The naming may be a bit unfortunate, as all number cases are included, not just plurals. Greek has one singular for number one, and one plural for the rest of the numbers. Two in total.
plural is the variable that points to the right element of the msgstr array, depending on the value of %s. The number of %s changes name in this line and is signified by n.
(n != 1) is just a comparison. Comparisons always evaluate to false (0) or true (1).
So, when we call _n( ‘%s apple’, ‘%s apples’, $apples, ‘my_domain’ ) and $apples = 1, (1 != 1) evaluates to false (0), so msgstr is used. If $apples is any other number, e.g. 2, (2 != 1) evaluates to true (1) and msgstr is used.
A more complex example
Let’s take a similar look now, but this time for Irish (ga). As mentioned earlier, Irish has 5 different cases. Number 1, number 2, numbers 3 to 6, numbers 7 to 10, and all the rest.
When a translator opens up the ga.po file, faced with our previous apples example, will see the following:
But what’s all that? What goes where?
Irish plural forms can be expressed with the following string:
Again, let’s rewrite it spacing things out so that it becomes a bit more clear.
nplurals gets assigned a five, which is the number of the total forms. This is straightforward.
plural get assigned the result of an expression, which is built using nested ternary operators. The ternary operator is just a shorthand if/then/else statement in the form of if ? then : else. In a normal if/then/else format, the above statement could have been rewritten as:
It’s much more clear now, isn’t it? When we call _n( ‘%s apple’, ‘%s apples’, $apples, ‘my_domain’ ) and $apples = 5, (n < 7) evaluates to true and plural gets assigned a 2, so msgstr is used. Similarly, if $apples = 15, all ifs evaluate to false, so the final else gets executed and plural gets assigned a 4 and msgstr is used.
The above if statements can be rewritten with inclusive limits and logical operators so that it will be more readable (to some).
Packing this into a single line using the ternary operator (multiple times) becomes:
So, the following two lines are really the same.
C/PHP whitespace and parentheses rules apply, so the above two line are identical to the following two:
I do hope however that you opt to keep the spaces for readability purposes :)
Rules of plural forms
The syntax and elements allowed in the Plural-Forms: line are very specific, however not much documentation is available about them. This is normal of course, as you shouldn’t re-invent the wheel and try to make your own expression if one already exists for your language. However, you might find that a given one is wrong and you need to fix it, or be a native speaker of a language that doesn’t have a known plural forms’ expression yet and you need to create it. Here are the rules that govern plural forms:
- Parentheses: (, ) (must be balanced)
- Ternary operator: expr ? statement1 : statement2
- Logical operators: && (and), || (or), ! (not)
- Comparison operators: == (equal to), != (not equal to), < (less than), <= (less than or equal to), > (greater than), >= (greater than or equal to)
- Arithmetic operators: % (modulo) and according to glibc source3, + (addition), – (subtraction), * (multiplication), / (division)
- Variables: Limited to n
- Numbers: Limited to integers
- Unary operators: + (positive), – (negative)
- Bitwise operators: & (and), | (or), ^ (xor), ~ (not), << (left shift), >> (right shift)
- Everything else: function calls, variables, strings, constants, etc
If you think I forgot to mention the exponentiation operator ** then I need to remind you that this is a C expression, and not a PHP one. C (and C++) don’t have an operator for exponentiation.
Now, you may try experimenting with the allowed operators but your millage may vary depending on the tools you use. For example, while experimenting, I’ve found out that Poedit doesn’t handle +, –, / and * or I just messed things up and wrote invalid expressions.
Zero and One
All these confusing rules, do’s and don’ts, and we still haven’t touched a pretty common scenario. What should we do when we want different texts for the cases of zero and one?
Let’s consider a real case scenario, where we want to display a message with the number of results found by the WordPress search widget. The file search.php is invoked and the global $wp_query->found_posts holds the number of search results. We now need to display a nice message.
This however, is wrong. We are assuming (based on our knowledge of English) that only the number 1 is singular. And we are so wrong, in so many languages. Take a look at this list of plural forms. For every language that has a plural form of nplurals=1; plural=0; or nplurals=2; plural=(n > 1); we are screwing things up. Languages with those plural forms include French, Japanese, Turkish, Brazilian Portugese and more. You see, Japanese uses the plural form nplurals=1; plural=0; so the translator will see:
The language only uses singular, so should he translate properly the “One result found” message? Or should he include a %s in the translation? What if the developers did something unorthodox and the singular doesn’t go through sprintf()? A literal %s would be printed on screen.
Now consider French (nplurals=2; plural=(n > 1);).
In English we would say: 0 posts found. 1 post found. 2 posts found.
In French we would say: 0 post found. 1 post found. 2 posts found.
You see, French uses the singular for the number 0 as well as 1. Not including a %s in your singular, messes up with the translations of other languages. Don’t forget that translators are not necessarily developers too. They probably don’t know that their translations go through sprintf(), or what sprintf() is in the first place.
What you should do
You should always include the %s in your singulars and your plurals. That’s the only way you can make sure your messages are translatable in every language. If you, however, want to have different texts for 0 and 1, you should create other translatable strings explicitly.
Handling the exceptions of 0 and 1 as above, allows the translators to translate strings as they are, instead of as what they should have been. The “No results found” and “Just one result found” can now be translated properly for each language, the way the translators think it’s best.
This is the end
If you made it through and read these words, I hope I helped you understand the proper usage of pluralization functions of WordPress. I don’t claim to be an expert in the subject matter, and my spoken languages are limited to Greek and English, so I could be wrong. However, I do trust the crowd-wisdom that made tools such as the GNU gettext exist, which has proven its power and correctness in countless projects over the years.
If you spot any errors whatsoever, I would appreciate it if you could drop me a quick comment. Or a kudos. Or just a hi!
Where to get plural forms from
There are various attempts to put all known plural forms in one place, either in code or just words. This is my attempt to gather all those attempts in one place.
- The GNU gettext manual itself – Additional functions for plural forms
- Translate House – Plural Forms
- Translate House – Plural Forms – on Github. Probably the most up to date.
- Mozilla.org – PluralForm.jsm
- Mozilla.org – Localization and Plurals
- Launchpad.net – Languages in Launchpad
- Unicode Unicode Common Locale Data Repository – Language Plural Rules
1Totally made-up statistic. However, you can only find 4-5 or so, good WordPress localization tutorials in total. ^
2I’ve been guilty myself. ^
3glibc source code of plural.y and plural.c. ^
It’s a bit unclear how you manage to get the right plural variation with the “n” functions since you can always only specify one singular and one plural form.
The PO format supports all possible variation, but the WP function does not unless all developers change their code using the “noop” variations (which is unlikely).
_n()function doesn’t support all possible variations.
Strings provided to the gettext functions serve two main purposes; 1) to have their strings extracted in order to build the PO/POT files (and in the case of PO files, that will then have the appropriate amount of plurals depending on the plural form present in the file), and 2) to fallback to those strings in case no appropriate translations are found.
Now, what if your source strings are in a language that have multiple plural forms? How do you use the WordPress/gettext functions? The answer is, you don’t. It is assumed that your “base” language (the one your source code strings are in) is English, or as a side-effect, any language that has only one plural form. According to the gettext manual:
Hope this clears things up a bit.
I was giving for granted that the “n” functions ignore the plural forms, but after some testing, it looks like I was wrong.
Indeed, it makes sense for this function to only accepts one plural (because of the reasons you explained), which is different from what it should return.
If the PO header is properly setup, there is a `Plural_Forms` class that, based on the `$number` argument, will properly look at the .po recipe and will know what to display from the compiled .mo version (which `msgstr` should be displayed).
Indeed. In fact, this is what the content below the “A more complex example” heading shows/explains.