Here's what I've figured out about how J internally represents nouns.
Contents
Details of J's Internal Data Representation
Data in J is stored as a header - which gives information about the data type, shape and size - followed by the raw data. The header consists of a fixed-length portion giving the basic information, followed by a variable length portion holding shape information.
So far, this only shows the details of the simpler of the following types (up to boxed): [from documentation for "3!:0 y Type." - "The internal type of the noun y, encoded as follows:"]
1 |
boolean |
2 |
literal |
4 |
integer |
8 |
floating point |
16 |
complex |
32 |
boxed |
The following remain to be explored:
64 |
extended integer |
128 |
rational |
1024 |
sparse boolean |
2048 |
sparse literal |
4096 |
sparse integer |
8192 |
sparse floating point |
16384 |
sparse complex |
32768 |
sparse boxed |
65536 |
symbol |
131072 |
unicode |
Fixed-length Portion of Header
Byte Positions |
Description |
0 - 7 |
Type (see above) |
8 - 11 |
Length (#,): number of elements |
12 - 15 |
Rank (#$): length of shape |
Variable-length Portion of Header
Byte Positions |
Description |
16 -> 19+4*Rank |
Shape ($) |
nn -> end |
Data |
Examples
In the examples that follow, we assume familiarity with a few basics of representation, e.g. 65 <-> a. i. 'A', etc.
Boolean
showNum 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 showNum 0 1 1 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 showNum 9$1 0 1 0 0 0 0 0 0 0 9 0 0 0 1 0 0 0 9 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 NB. Notice fullword padding showNum 8$1 0 1 0 0 0 0 0 0 0 8 0 0 0 1 0 0 0 8 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 NB. but showNum 7$1 0 1 0 0 0 0 0 0 0 7 0 0 0 1 0 0 0 NB. Padding seems to be slightly 7 0 0 0 1 0 1 0 NB. excessive in 8-bit case above: 1 0 1 0 NB. is this a performance enhancement?
Character
showNum 'A' 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 65 0 0 0 showNum 'AB' 2 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 65 66 0 0 showNum 2 3$'ABC' 2 0 0 0 0 0 0 0 6 0 0 0 2 0 0 0 2 0 0 0 3 0 0 0 65 66 67 65 66 67 0 0
Integer: Least-significant Byte Fist
showNum i.0 4 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 showNum i.1 4 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 showNum i.2 4 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 showNum i.3 4 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 showNum -i. 3 4 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 0 0 0 0 NB. Negative numbers are two's complement. 255 255 255 255 254 255 255 255
Floating point (IEEE standard representation of numbers)
showNum 1.1 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 154 153 153 153 153 153 241 63 showNum 1$1.1 8 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 154 153 153 153 153 153 241 63 showNum 1.1 2.2 8 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 154 153 153 153 showNum 1.1 2.2 3.3 8 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0
Boxed
showNum <'AB' 32 0 0 0 0 0 0 0 NB. Integer "20" at bytes 16-19 is pointer: 1 0 0 0 0 0 0 0 NB. position (number of bytes from start) 20 0 0 0 2 0 0 0 NB. of start of 1st boxed array. 0 0 0 0 2 0 0 0 NB. Note how contents of this box matches 1 0 0 0 2 0 0 0 NB. 2nd character array above. 65 66 0 0 showNum 'AB';0 1 2 32 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 28 0 0 0 NB. The 2 boxes begin at positions 28 52 0 0 0 2 0 0 0 NB. and 52. 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 65 66 0 0 4 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 showNum 2 2$'AB';(i.3);1.1 2.2;<'abcde' 32 0 0 0 0 0 0 0 4 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 40 0 0 0 64 0 0 0 96 0 0 0 132 0 0 0 2 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 65 66 0 0 4 0 0 0 0 0 0 0 3 0 0 0 1 0 0 0 3 0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 8 0 0 0 0 0 0 0 2 0 0 0 1 0 0 0 2 0 0 0 154 153 153 153 153 153 241 63 154 153 153 153 153 153 1 64 2 0 0 0 0 0 0 0 5 0 0 0 1 0 0 0 5 0 0 0 97 98 99 100 101 0 0 0
Definition of ''showNum''
The verb showNum is a basic tool for displaying the internal representation of J nouns.
showNum=: 3 : 0
NB.* showNum: show numeric values of y's internal representation,
NB. formatted x values per row.
8 showNum y
:
width=. x
vec=. a. i. 3!:1 y
len=. >.width%~$vec
leftover=. (len*width)-$vec NB. Pad last line w/spaces.
(len,width*4)$;_4{.&.>(":&.>vec),leftover$<' '
)