Why do some non-ascii characters misbehave under (i.) and ($)?

If I paste a non-ascii character (eg an accented vowel or an APL primitive) into a string, it behaves like several characters.

Thus: $'abc⌹e' returns the value: 7, not 5.

Further puzzling behaviour:

   $z=: 'abc⌹e'
7
   z i. 'ce'
2 6
   z i. '⌹'
3 4 5
   3 5 $z
abc�
�eabc
⌹ea

What must I do to make 'abc⌹e' behave like a string of 5 characters, with '⌹' behaving like a single character occupying position 3?

The answer involves the u. primitive, Unicode, utf-8 and converting to so-called "wide characters" (wchar).

See: Guides/UnicodeGettingStarted for an extremely simple explanation.

Guides/General FAQ/Puzzling unicode (last edited 2010-11-29 03:16:22 by IanClark)