J6 has essentially complete support for unicode in code development and applications. The only minor limitation is that identifiers used in programming must be in 7-bit ascii, but this does not affect the use of unicode in applications. For example:
a=. '沒有問題' NB. assign unicode text to a 沒有=. 1 2 3 NB. identifiers must be 7-bit ascii |spelling error
Literal text is assumed to be in utf8 format. J also has a 2-byte unicode datatype, and the verb u: converts back and forth. Both representations can be useful when programming, so take care to ensure the right datatype is being used.
utf8 used in:
window driver interface file name in 1!:x family plot package interface regular expression of pcre *c argument of dll
2-byte unicode used in:
manipulation of character array *w argument of dll
Standard utilities include:
utf8
convert to utf8
ucp
convert to unicode datatype (cp=code point), if necessary
uucp
convert char or utf8 to wchar
ucpcount
code point (glyph or character) count
datatype
noun data type
The name a defined above is in literal text, and therefore assumed to be utf8. More examples:
a 沒有問題 datatype a NB. a is type literal literal #a NB. the count of a is the count of its utf8 representation 12 a. i. a NB. bytes in the utf8 representation 230 178 146 230 156 137 229 149 143 233 161 140 b=. ucp a NB. b is a converted to 2-byte unicode b NB. b displays the same as a 沒有問題 #b NB. the count of b is the number of characters 4 datatype b NB. b is type unicode unicode a -: utf8 b NB. utf8 converts b back to a 1
Scripts
Script cp2utf8 converts plain text files in codepages to utf8.
Script ufread reads unicode text files in various formats.
Renaming Unicode Files
We will define a win32 API verb
NB.*mv v move file, e.g. from mv to MoveFile=: 'kernel32 MoveFileW > i *w *w' cd ;&uucp
For testing we will create a file in one unicode range,
load'files dir'
'test' fwrite 'Test - 沒有問題' NB. create a file
4
fread 'Test - 沒有問題'
test
0 0{:: 1!:0]'Test -*' NB. dir find
Test - 沒有問題rename into another and be able to read it by new name.
'Test - 沒有問題' MoveFile 'Test - Без проблем' NB. rename
1
fread 'Test - Без проблем'
test
0 0{:: 1!:0]'Test -*' NB. dir find
Test - Без проблем
Links
User Manual - main unicode docs
J6 Release Highlights - notes on J6 changes
Vocabulary entry for u: - definition of verb u:
Unicode Test Drive - Oleg's notes on unicode
UTF-8 and Unicode Standards good background reading
