Archive for February, 2003

Feb 28 2003

Avoiding Mojibake

Published by michael under Articles

Just about everyone uses e-mail today, and many of us in Japan do so in English, Japanese, and other languages as well. But anyone who corresponds in Japanese via e-mail knows that we still have a long way to go in terms of ensuring that our e-mail reaches the intended recipient both intact and readable.

With English this has never been much of a problem, but “mojibake” (garbled text) continues to confound those who e-mail in Japanese even today.

To further complicate matters, we have the additional question of message formatting to consider. Should we use plain-old plain text, HTML, or rich text? These are important questions not only for individuals but also organizations of all kinds that rely on e-mail to communicate with customers, members or affiliates.

How do you best control how the e-mail you send looks when viewed by the recipient? In this column I’ll discuss the key issues related to e-mail — character encoding, e-mail formats and message encoding types — and offer some tips on composing Japanese e-mail messages.

First, let’s take a look at character sets and encoding. Initially, e-mail servers and clients were designed to send messages using an encoding system called ASCII. This system was a means for representing characters using seven bits of data, with 127 possible discrete characters (called code points) available. This was adequate for representing all of the characters used in English, as well as numbers, punctuation marks, etc.

Japanese text is encoded (represented electronically) using a variety of methods, each one designed to represent the far greater number of code points required by the Japanese language. Generally speaking, these encoding methods can be divided into two groups: seven-bit and double-byte (double byte is equal to 16-bits, as 1 byte equals 8 bits). Double-byte encoding systems are used on all Japanese-capable computing platforms, with different variants employed on different operating systems.

Microsoft and Mac operating systems use an encoding system called Shift-JIS. Shift-JIS is actually a multibyte character set (MBCS) because it uses 7 or 8 bits to represent ASCII and Extended ASCII characters (half-width katakana, Roman characters with diacritical marks, etc.) and two bytes to represent kanji, kana, and full-width Roman characters. UNIX systems use a similar system called EUC (Extended UNIX Code) to accomplish the same thing.

These encoding systems all work fine on their respective platforms. It’s when you go “off platform” that things begin to go awry, and this is particularly true of e-mail.

Many e-mail servers and other network hardware were designed to think of text data in terms of 7 bits, and when they try to process multi- or double-byte text, bad things can happen.

The most obvious and common problem is mojibake, and in many cases the text is unrecoverable because a portion of it (the eighth, or “high” bit) has been stripped off in transit.

JIS, a seven-bit encoding system, was developed to allow for the safe transmission of Japanese text over the Internet. JIS is the preferred encoding for Japanese e-mail, and using it is one of the best ways to prevent mojibake. If you use an e-mail application such as Microsoft Outlook Express, you’ll note that you are able to select from various Japanese character encodings when composing Japanese e-mail (just click on the “Format” tab and go to “Encoding”). Although Shift-JIS and EUC appear among these options, JIS is the safest choice.

You’ll also see JIS encoding referred to as ISO-2022-JP, and this is the designation used in the header portion of the e-mail that tells your e-mail software how to display the message properly.

The Multipurpose Internet Mail Extension (MIME) and UUENCODE are other systems that were developed to allow for the safe transmission of 8-bit e-mail over the Internet, but it is advisable not to use these when sending Japanese e-mail since ISO-2022-JP performs this function already. If your e-mail client provides the option for selecting between MIME and UUENCODE e-mail formats, choose MIME with no encoding (Base64 or Quoted Printable) for best results. Again, the Japanese encoding should always be set to JIS.

Now we come to the question of text formatting. Do we use HTML, plain text, or something else, such as rich text. The prevailing logic up until a year or two ago was that plain text was the way to go if you weren’t sure what e-mail client your recipients are using. This was simply because many e-mail clients didn’t support HTML. However, there has been a trend in recent months of large companies moving away from plain text toward HTML in their marketing and other e-mail correspondence.

What’s behind the policy shift?

Well, there’s recent data for one. In an extensive survey conducted earlier this year by ClickZ (www.clickz.com) and Internet.com (www.internet.com), Edward Grossman revealed some interesting facts about users’ e-mail habits and capabilities.

Surprisingly, fewer than 3 percent of respondents indicated that they were unable to read HTML-based e-mail (though another 8 percent said they weren’t sure). Also, given the choice, only 30 percent responded that they prefer to receive e-mail as plain text.

Most folks use Microsoft e-mail clients (Outlook or Outlook Express) for business and personal e-mail, with the majority of others using Web-based e-mail clients such as Hotmail and Yahoo! Mail, which support HTML. Even AOL supports HTML now (as of version 7.0), meaning you really have to look hard to find people for whom HTML is problematic.

The bottom line? With the vast majority of people today using HTML-ready e-mail clients, there is little reason to cling to the notion that plain text is the safer of the two. Also, HTML just looks better. So send your marketing information and newsletters in HTML, but give people the option for plain text as well.

Finally, getting back to Japanese, you may find that there are still times when people reply to you saying your e-mail is unreadable. The only times I experience such problems these days is when sending Japanese e-mail using HTML. In such cases I simply re-send the message using plain text, and it seems to work fine.

The Japan Times: Feb. 27, 2003

No responses yet

Feb 14 2003

Getting Real in 2ch

Published by michael under Articles

It was 1975 when University of North Carolina graduate student Steve Bellovin developed a handful of short programs to facilitate communication via UUCP (Unix-to-Unix Copy) between the University of North Carolina and Duke University. The scripts were later rewritten in the computer language “C” and extended, later becoming the basis for Usenet.

Hiroyuki Nishimura, operator of the discussion Web site 2 Channel, says the site gets some 20 million hits a day.

Usenet is a distributed conferencing system that provides for group discussions over the Internet, and includes thousands of “newsgroups” that cover thousands of topics. Usenet is alive and well today, and continues to enjoy wide popularity with millions of users familiar with configuring and using “news reader” software.

However, it was the ease and user-friendliness of Internet browsers and the World Wide Web that saw discussion groups and message boards really blossom into a cyberculture phenomenon. These systems use the same conceptual components as Usenet newsgroups — discussion topics, messages and threads — but are designed for use with a standard Web browser rather than a news reader. Popular English-language ones include Yahoo! Groups ( groups.yahoo.com ) MSN Groups ( groups.msn.com ). English-language forums specific to Japan can be found at JapanToday ( forum.japantoday.com ).

But the hands-down king of message boards (or “keijiban”) in Japan is a site called 2 Channel (pronounced “ni channeru”) at www.2ch.net .

2 Channel, or simply “2ch,” is designed and operated by 26-year-old Tokyo native Hiroyuki Nishimura, who describes himself as “forever 19, a lover of sweets, and a ‘hikikomori’ (someone who has withdrawn from society).” He started it in May 1999, he said, because “it seemed like fun.”

The sheer scale of 2ch is impressive for an independently run site. Nishimura told me he gets around 1 million posts a day and more than 20 million hits a day.

nc20030213mra.jpg At the top level, the message boards are divided into general categories such as Society and Current Events, Academics and Education, Living and Work, Culture and Hobbies, Computing, and Idle Chatter.

Below these are more than 200 subtopics covering everything from art to zoology. The list of message boards is displayed in a navigation frame to the left of the main content window, and you have to scroll for a long, long time to see the entire list.

Some examples of the topics you’ll find there, presented in a simple blue-on-white evocative of pre-1995 Web design, are: Media, SDF, Work (divided into 16 industries), Drugs and Crime, The Occult, Creative Arts, Movies, Biology, Mathematics, Psychology, Japanese History, Hangul, Philosophy, Home Appliances, Digital Cameras, Politics, Ramen, Candy, Credit, Bars, Furniture, Cosmetics, Tobacco, Convenience Stores, Student Life, Jokes, Single Men/Women, Sumo, Sportscars, Baseball, Extreme Sports, Travel, Television Shows, Gambling, Manga, Costume Play, Lost Love, Baldness and Wigs, and the list goes on and on.

The medium of discourse in 2ch message boards is Japanese, with the exception of boards that deal with English or other languages. Although this is an obvious barrier to those who read no Japanese, online dictionaries like Eijiro ( www.alc.co.jp ) can be a big help for people who can read some but need help with the occasional word.

Not surprisingly, there are also a number of adult message boards that cover a wide variety of topics including the sex trade, sex games, SMBD, and homosexuality. Users under 21 are supposedly restricted from accessing these areas, but apparently there is no enforcement mechanism is place to bar minors from participating.

Which brings us to an interesting feature of 2ch, namely that it allows anonymous viewing of and posting to the message boards. This anonymity is key to the site because it allows users to speak their minds as frankly as they wish without fear of recrimination.

As a consequence, the discussions on 2ch tend to become quite spirited, particularly when the topic is a hot-button issue such as the abduction of Japanese citizens by North Korea or the threat of a United States-led war against Iraq. And people do get engaged — Nishimura says the average visitor stays for 1.3 hours and views 59 pages.

The anonymity permits heated debates on all kinds of topics to rage daily on 2ch, with a breadth of opinion and unabashed frankness you almost never encounter in conventional media like television or newspapers.

Predictably, however, the anonymity can also easily lead to so-called flame wars, where the discussion degenerates into a volley of insults and vitriolic attacks. Gone in such cases are the ornamental pleasantries and attention to rank that characterize communication among Japanese under typical circumstances, replaced by “anta” or “omae” (harsher ways of saying “you”) and indelicate rebuffs, like “Die, quickly.”

Thankfully, such flaming seems to happen less frequently on 2ch than, for example, on unmoderated Internet newsgroups.

New threads appear on 2ch daily, and particularly active boards have hundreds of active threads at any given time. Each new day brings with it more news and events, meaning 2 Channel’s thousands of users never run out of things to discuss and debate.

So, if you want to get the “real” Japanese perspective on just about any topic you can imagine, have a look at 2 Channel. No matter what your particular interests may be, you’re sure to find people discussing them there.

The Japan Times: Feb. 13, 2003

No responses yet