Monday, 10 March 2014

Android/Java - formatting a String containing both English and Arabic

I ran across a problem today where I needed to compose a String containing both English (left-to-right text) and Arabic (right-to-left text) parts. In particular, what I needed to do was format a String containing three blocks to display as follows:

[1. English left-to-right text] [2. Arabic right-to-left text] [3. English left-to-right text]

The problem I found was that some of the English in the third block was displayed to the left of the Arabic in the second block, i.e. the implicit text ordering algorithm wasn't able to correctly determine where the Arabic right-to-left block had completed and the next English left-to-right block had started.

To get around this problem I had to explicitly specify directional formatting characters (\u202A, \u202B, \u202C, \u202D, \u202E etc) something as follows:

[1. English left-to-right text] \u202B [2. Arabic right-to-left text] \u202C [3. English left-to-right text]

To get the text to render exactly as you intend you might need to play around a little with the directional formatting characters. For example, you might have to surround the English block with directional formatting characters as well as the Arabic text, as follows:

\u202D [1. English left-to-right text] \u202B [2. Arabic right-to-left text] \u202C [3. English left-to-right text] \u202C

Here's the Stack Overflow thread which led me to the solution:
http://stackoverflow.com/questions/6177294/string-concatenation-containing-arabic-and-western-characters

For a deeper understanding of directional formatting characters and the Unicode Bidirectional Algorithm, check out the following document:
https://unicode.org/reports/tr9

No comments: