>>  <<  Usr  Pri  JfC  LJ  Phr  Dic  Voc  !:  Help  Dictionary

Unicode u:  _ _ _ Unicode

J datatypes: char (1-byte char) — an 8-bit value from 0 to 255
  literal2 (2-byte char) — a 16-bit value from 0 to 65535
  literal4 (4-byte char) — an unsigned 32-bit value
Encodings:  ASCII — 0 to 127, a subset of U8
  U8 — Unicode code point value in multibyte encoding
  U16 — Unicode code point value in multi-literal2 encoding
  U32 — Each unicode code point is represented by exactly one literal4 char

All J primitives and most u: dyads work with values, not encodings, the only exception is ": which convert literal2 and literal4 to U8 encoded 1-byte char. ASCII, U8, U16 and U32 encodings are used in 7&u: , 8&u: and 9&u: .

The monad u: applies to several kinds of arguments:

Argument   Result
char literal4   same as 2&u:
literal2 copy of argument
integers same as 4&u:

The inverse of the monad u: is 3&u:
 
  The dyad u: takes a scalar integer left argument and applies to several kinds of right arguments.

Left    Result               Right
1char
char  as is
literal2 high-order 8 bits discarded
literal4 high 3 bytes discarded
2literal2
char  high-order 8 bits are 0
literal2 as is
literal4 high 2 bytes discarded
3integers char,literal2 or literal4
4literal2 integers in the range -65536 to 65535
5char literal2 or literal4 in the range 0 to 255
6literal2 pairs of chars are converted to literal2s
7char or
U16
U8  converted to U16
ASCII as is
literal2 if all values <128, convert to ASCII, otherwise as is
U32 if all values <128, convert to ASCII, otherwise converted to U16
integers the range 0 to 16b10ffff converted to U16
an empty right argument produces an empty char result
8U8
U16 converted to U8
U32 converted to U8
char  as is
integers the range 0 to 16b10ffff converted to U8
an empty right argument produces an empty char result
9char or
U32
U8  converted to U32
ASCII as is
U16 if all values <128, convert to ASCII, otherwise converted to U32
literal4  as is, and any valid surrogate pairs are converted
integers converted to literal4
an empty right argument produces an empty char result
10
literal4
char promoted to literal4
literal2 promoted to literal4
literal4  as is
integers converted to literal4
each char or literal2 is promoted to literal4 character by character, no U8 or U16 encoding assumed

1&u: and 2&u: , 3&u: and 4&u: , and 7&u: and 8&u: are inverse pairs.
 

The display of an array x of 2-byte or 4-byte characters is that of 8 u:"1 x , that is, converting to 1-byte characters in utf-8 encoding.

Examples:
   ] t=: u: 'We the people' 
We the people
   3!:0 t
131072                         NB. the literal2 datatype numeric code is 131072

   ] t=: 10 u: 'We the people' 
We the people
   3!:0 t
262144                         NB. the literal4 datatype numeric code is 262144

   u: 97 98 99 +/ 0 256 512 1024
ašɡѡ                           NB. 2-byte characters have the same
bŢɢѢ                           NB. display as U8 characters
cţɣѣ

   'a' = u: 97 + 0 256 512 1024
1 0 0 0

   ] t=: (2 4$'abcdefgh') , u: 'wxyz'
abcd                           NB. 1- and 2-byte characters can be catenated together.
efgh                           NB. The 1-byte characters are promoted.
wxyz
   3!:0 t
131072

   ] t=: t , 10 u: 'ABCD'
abcd                           NB. The 2-byte characters are promoted to
efgh                           NB. 4-byte characters.
wxyz
ABCD
   3!:0 t
262144



>>  <<  Usr  Pri  JfC  LJ  Phr  Dic  Voc  !:  Help  Dictionary