u: Unicode

>> << Ndx Usr Pri JfC LJ Phr Dic Rel Voc !: wd Help Dictionary

Unicode

u: _ _ _

Unicode

J datatypes:		char (1-byte char) — an 8-bit value from 0 to 255
		wchar (2-byte char, wide char) — a 16-bit value from 0 to 65535
Encodings:		ASCII — 0 to 127, a subset of U8
		U8 — Unicode code point value in multibyte encoding

Most u: dyads work with values, not encodings. ASCII and U8 encoding are used in 7&u: and 8&u: .

The monad u: applies to several kinds of arguments:

Argument	Result
char	same as `2&u:`
wchar	copy of argument
integers	same as `4&u:`

The inverse of the monad u: is 3&u:

The dyad u: takes a scalar integer left argument and applies to several kinds of right arguments.

Left Result Right

1

char

char		as is
wchar		high-order 8 bits discarded

2

wchar

char		high-order 8 bits are 0
wchar		as is

3 integers char or wchar

4 wchar integers in the range -65536 to 65535

5 char wchar in the range 0 to 255

6 wchar pairs of chars are converted to wchars

7

char or
wchar

U8		converted to wchar
ASCII		as is
wchar		if all values <128, convert to ASCII, otherwise as is

an empty right argument produces an empty char result

8

wchar		converted to U8
char		as is

an empty right argument produces an empty char result

1&u: and 2&u: , 3&u: and 4&u: , and 7&u: and 8&u: are inverse pairs.

Examples:

   ] t=: u: 'We the people' 
We the people
   3!:0 t
131072                         NB. the unicode datatype numeric code is 131072

   u: 97 98 99 +/ 0 256 512 1024
aaaa                           NB. 2-byte characters have the same
bbbb                           NB. display as 1-byte characters
cccc 

   'a' = u: 97 + 0 256 512 1024
1 0 0 0

   ] t=: (2 4$'abcdefgh') , u: 'wxyz'
abcd                           NB. 1- and 2-byte characters can be catenated together.
efgh                           NB. The 1-byte characters are promoted.
wxyz
   3!:0 t
131072

>> << Ndx Usr Pri JfC LJ Phr Dic Rel Voc !: wd Help Dictionary