Contents
Getting Started with Unicode in J
This is a very quick page of notes, mainly as a reference for answering questions. Maybe it will help someone with the general task of looking up, choosing and using an arbitrary Unicode glyph, which I don't think is easy for a complete beginner to do with the present documentation.
Turn your IQ philco down to approx. 60 and let's go...
Page under reconstruction.
It needs to avoid duplicating material in: Guides/Unicode.
For example, wide as defined here is functionally equivalent to uucp defined in stdlib.ijs, which is the recommended word to use. Other recommended words, always available in locale 'z', are: ucp (Unicode code point, inverse of: utf8) and ucpcount (count code points).
Here's an arbitrary unicode character: ⌹
Yes, it's the APL Domino. But it could be anything from the Unicode collection.
Strictly that should be: anything from Plane0 Unicode. It's expanded from 16-bit code points into Plane1 now, starting with the Linear B Syllabary (Range: 10000–1007F).
If you've got the font: APL385 Unicode installed, you'll see it, otherwise you may not. Refer to the last section of the APL to J Phrasebook and troubleshoot from there.
For even greater detail, see: Typesetting/APL Fonts
Copy-paste Domino into the J session (j602) and embed it in the expression:
z=: 'abc⌹e' $z 7 NB. ...not 5, but 7!! z i. 'ce' 2 6 z i. '⌹' 3 4 5
This shows you that Domino is encoded in the J session in utf-8. This is a standard for embedding a general unicode character as several bytes in a string of 1-byte chars.
Inside char vector z, Domino takes up not 1 but 3 consecutive positions: 3 4 5, so the chars either side of it occupy positions 2 and 6.
If you need to tabulate or index unicode characters in an orderly array, you can convert the whole string z to "wide-chars" (wchars):
]w=: 7 u: z
abc⌹e
$w
5
w i. 'ce'
2 4
]Domino=: 3{w
⌹
$$Domino
0
w i. Domino
3You now have an orderly vector w of 5 wchars, which behave themselves under $ and i..
Choosing a sister character of Domino
Now consider this task: you've pasted a given unicode char from some given unicode-compliant software or document. You don't know that it's APL but you like the set and you want to use another character from the same set. You've heard of another character, called Quote Quad, looking something like this: ⍞.
You can look up a character code (called a [unicode] code point) on the unicode.org website and download tables of characters in PDF form.
A good place to start if you haven't the foggiest idea where to find your character is here: http://www.unicode.org/standard/where/
The main code charts are here: http://www.unicode.org/charts/index.html
The page is titled: Unicode 6.0 Character Code Charts.
If you know it's an APL char you want, then simply search the page for "APL". (You'll find it here: http://www.unicode.org/charts/PDF/U2300.pdf)
Perhaps though you don't know it is an APL char. Then you must find its code point and look it up in the most generalized way.
Finding the code point of a pasted character
3 u: 7 u: '⌹' 9017 require'convert' hfd 9017 NB. hex from dec: 9017 to look up in unicode.org 2339 NB. Let's just confirm that hex numeral is correct... u: 16b2339 ⌹
Looking up a character by its code point
At the top of the Code Charts page, http://www.unicode.org/charts/index.html there's a search box: Look up by character code:
You need to type a hex numeral in that box (...the code point). Viz the one you've just computed: 2339.
This reports to you:
Search Results for U+2339
The most current code chart containing U+2339 is:
http://www.unicode.org/charts/PDF/U2300.pdf (0.3 MB)...and the link allows you to download the relevant table (U2300.pdf)
From this document you can look up Quote Quad, say, and find that its code point is 235e.
u: 16b235e ⍞
A script to play with
NB. unicode snooper, by Ian Clark, Nov 2010.
BASE=: 4$$ HX=: '0123456789ABCDEF'
h4=: HX {~ BASE #: ] NB. 4-digit hex numeral
wide=: 2 u: 7 u: ] NB. force chars: y into wchars
copt=: 3 u: 7 u: ] NB. code-point (decimal)
PY=: copt '?' NB. init: latest code-point
cu=: monad define NB. see [unicode] char: y
if. 0=$,y do. y=. u: PY+1 end. NB. -->next code-pt
HY=: ,h4 PY=: copt Y=: y
smoutput Y,' U+',HY,' ',": PY
)
nx=: monad define NB. see next [y] char[s]
z=.|{.y,1
cu^:z ''
)
px=: monad define NB. see previous [y] char[s]
PY=: PY-z=.|{.y,1
cu^:z ''
empty PY=: PY-z
)
0 : 0 NB. try these using Ctrl+R ...
cu '⌹'
cu wide '⌹'
cu '' NB. next char
nx '' NB. next char
px '' NB. prev char
nx 16 NB. next 16 chars
px 16 NB. prev 16 chars
)The above script displays the code-point of a given unicode character (copy/pasted from some string, eg in the J wiki) in both hex and decimal, the hex form being suitable for looking-up at unicode.org.
Enter:
cu '⌹' to see the code-point for Domino: ⌹
nx'' to see the next code-point
nx 16 to see the next 16 code-points
and so on.
-- IanClark 2010-11-27 19:39:49
