Avoiding Mojibake

Just about everyone uses e-mail today, and many of us in Japan do so in English, Japanese, and other languages as well. But anyone who corresponds in Japanese via e-mail knows that we still have a long way to go in terms of ensuring that our e-mail reaches the intended recipient both intact and readable.

With English this has never been much of a problem, but “mojibake” (garbled text) continues to confound those who e-mail in Japanese even today.

To further complicate matters, we have the additional question of message formatting to consider. Should we use plain-old plain text, HTML, or rich text? These are important questions not only for individuals but also organizations of all kinds that rely on e-mail to communicate with customers, members or affiliates.

How do you best control how the e-mail you send looks when viewed by the recipient? In this column I’ll discuss the key issues related to e-mail — character encoding, e-mail formats and message encoding types — and offer some tips on composing Japanese e-mail messages.

First, let’s take a look at character sets and encoding. Initially, e-mail servers and clients were designed to send messages using an encoding system called ASCII. This system was a means for representing characters using seven bits of data, with 127 possible discrete characters (called code points) available. This was adequate for representing all of the characters used in English, as well as numbers, punctuation marks, etc.

Japanese text is encoded (represented electronically) using a variety of methods, each one designed to represent the far greater number of code points required by the Japanese language. Generally speaking, these encoding methods can be divided into two groups: seven-bit and double-byte (double byte is equal to 16-bits, as 1 byte equals 8 bits). Double-byte encoding systems are used on all Japanese-capable computing platforms, with different variants employed on different operating systems.

Microsoft and Mac operating systems use an encoding system called Shift-JIS. Shift-JIS is actually a multibyte character set (MBCS) because it uses 7 or 8 bits to represent ASCII and Extended ASCII characters (half-width katakana, Roman characters with diacritical marks, etc.) and two bytes to represent kanji, kana, and full-width Roman characters. UNIX systems use a similar system called EUC (Extended UNIX Code) to accomplish the same thing.

These encoding systems all work fine on their respective platforms. It’s when you go “off platform” that things begin to go awry, and this is particularly true of e-mail.

Many e-mail servers and other network hardware were designed to think of text data in terms of 7 bits, and when they try to process multi- or double-byte text, bad things can happen.

The most obvious and common problem is mojibake, and in many cases the text is unrecoverable because a portion of it (the eighth, or “high” bit) has been stripped off in transit.

JIS, a seven-bit encoding system, was developed to allow for the safe transmission of Japanese text over the Internet. JIS is the preferred encoding for Japanese e-mail, and using it is one of the best ways to prevent mojibake. If you use an e-mail application such as Microsoft Outlook Express, you’ll note that you are able to select from various Japanese character encodings when composing Japanese e-mail (just click on the “Format” tab and go to “Encoding”). Although Shift-JIS and EUC appear among these options, JIS is the safest choice.

You’ll also see JIS encoding referred to as ISO-2022-JP, and this is the designation used in the header portion of the e-mail that tells your e-mail software how to display the message properly.

The Multipurpose Internet Mail Extension (MIME) and UUENCODE are other systems that were developed to allow for the safe transmission of 8-bit e-mail over the Internet, but it is advisable not to use these when sending Japanese e-mail since ISO-2022-JP performs this function already. If your e-mail client provides the option for selecting between MIME and UUENCODE e-mail formats, choose MIME with no encoding (Base64 or Quoted Printable) for best results. Again, the Japanese encoding should always be set to JIS.

Now we come to the question of text formatting. Do we use HTML, plain text, or something else, such as rich text. The prevailing logic up until a year or two ago was that plain text was the way to go if you weren’t sure what e-mail client your recipients are using. This was simply because many e-mail clients didn’t support HTML. However, there has been a trend in recent months of large companies moving away from plain text toward HTML in their marketing and other e-mail correspondence.

What’s behind the policy shift?

Well, there’s recent data for one. In an extensive survey conducted earlier this year by ClickZ (www.clickz.com) and Internet.com (www.internet.com), Edward Grossman revealed some interesting facts about users’ e-mail habits and capabilities.

Surprisingly, fewer than 3 percent of respondents indicated that they were unable to read HTML-based e-mail (though another 8 percent said they weren’t sure). Also, given the choice, only 30 percent responded that they prefer to receive e-mail as plain text.

Most folks use Microsoft e-mail clients (Outlook or Outlook Express) for business and personal e-mail, with the majority of others using Web-based e-mail clients such as Hotmail and Yahoo! Mail, which support HTML. Even AOL supports HTML now (as of version 7.0), meaning you really have to look hard to find people for whom HTML is problematic.

The bottom line? With the vast majority of people today using HTML-ready e-mail clients, there is little reason to cling to the notion that plain text is the safer of the two. Also, HTML just looks better. So send your marketing information and newsletters in HTML, but give people the option for plain text as well.

Finally, getting back to Japanese, you may find that there are still times when people reply to you saying your e-mail is unreadable. The only times I experience such problems these days is when sending Japanese e-mail using HTML. In such cases I simply re-send the message using plain text, and it seems to work fine.

The Japan Times: Feb. 27, 2003

2 thoughts on “Avoiding Mojibake

Comments are closed.