21. Calling External Programs

Calling a DLL

Interfacing to an external program is one thing you can do the old-fashioned way: by getting a copy of a working program and editing it. You can find starter programs in the J distribution. A good one to start with is \system\packages\winapi\registry.ijs. You can glance through this program to see what it does. It starts with

require 'dll'

which defines the verb cd . Calls to the DLL appear in lines like

rc =. 'Advapi32 RegCreateKeyExA i i *c i *c i i i *i *i'

cd root;key;0;";0;sam;0;(,_1);(,_1)

(this is a single line in the program; I have shown it here as two lines because it won't fit on the page as a single line)

This example exhibits the elements of a call to a DLL. The left operand of cd is a character list describing the function to be called and what arguments it is expecting. Here we are calling the entry point RegCreateKeyExA in the library Advapi32 (the library has a file extension of .dll under Windows, .so on Linux, and .dylib on the Mac). The sequence of is and *cs are descriptors that describe the interface to the function. The first item in that sequence describes the type of value returned by the function; the other items are the arguments to the function and are a one-for-one rendering of the argument list that would be passed in C. So, the line above is appropriate for calling a function defined with the prototype

int RegCreateKeyExA(int, char *, int , char *, int, int, int, int *, int *);

Naming the Procedure

The first two words passed in to cd are the library name and the procedure name. The library name is the name of the DLL file and follows the search path for your system. The procedure name is the name of a function found in that library.

There are two special library names: 0 and 1. Using these names allows you to call procedure by address of by function number. When the library name is 0, the procedure name is actually the integer address of the procedure. For example, library name/procedure of '0 559038737' would be a call to the procedure at memory address 0xdeadbeef.

When the library name is 1, it indicates that the procedure name gives the integer address of the object being called (actually it gives the address of the address of the vtable for the object). The first parameter gives the index of the procedure in the vtable, and its descriptor (which is the second descriptor, since the first one describes the result of the procedure) must be x or *.

Describing the Operands and Result

The descriptors come next. They can be c (char), s (short) , i (int), f (float), d (double), j (complex), w (Unicode), l (64-bit integer), x (J integer, either 32 or 64 bits) or n (placeholder--a value of 0 is used and the result is ignored); all but n can be preceded by * to indicate a non-atomic array of that type. * by itself indicates a non-atomic array of unspecified type.

The first descriptor describes the result of the procedure. Subsequent descriptors describe the operands to the procedure.

Supplying the Operand Data

The right operand of cd is the actual arguments to be passed to the called function. It is a list, with one item for each argument to the called function. The arguments correspond to the argument descriptors one for one (remember that the first descriptor describes the result, so there will be one more descriptor than arguments). Normally the items are boxed, with the contents of each box containing one argument, but if all the operands are scalars of the same type you can save some time by leaving the array of operands unboxed.

An argument whose descriptor does not include * must correspond to a scalar which can be converted to the appropriate type. For example, a descriptor of d can have an argument value of 0 or 4.5, but not 'a' . An argument whose descriptor includes * must be either a non-atomic array or a scalar memory address (see below). The address of the array or memory area is passed to the DLL and the DLL may modify the array or memory area; this is how you return an array from the function. If the type of an array does not match the descriptor, the interpreter makes of copy in the correct format and passes its address to the DLL.

s and f are not native J types. The interpreter always has to convert the argument to the correct format for the DLL; in addition, it converts the result back to J format. Arguments for s and f descriptors can also be character strings, 2 bytes per number for s and 4 bytes per number for f, that contain the byte sequence for the argument. 3!:4 and 3!:5 can be used to convert numbers into character-string form (see below under 'Filling a Structure: Conversions' for details).

Note that if you get things wrong and the function scribbles outside the bounds of an array, J may crash. Note also that a vector of 0s and 1s is held inside J as a vector of Booleans, which are chars. When the function calls for a vector of ints, J will convert any Boolean to int. In the example above, (,_1) reserved space for an int; (,0) would have worked too but would require a conversion.

When the function returns, its returned value will be boxed and appended to the front of the list of boxed operands that were passed to the DLL, to produce the result of the execution of cd . You may use this returned value as you see fit. Any box that contained a non-atomic array may have had its contents modified by the function; you may open the box to get the changed value.

Options

Flag characters that control the call to DLL may be included just before the first descriptor. The + and % flags control technical details of the calling convention and will not be discussed here. The > flag indicates that you are not interested in anything but the result from the DLL. When > is given, the result of cd is an unboxed scalar rather than a list of boxes, which can be much faster.

The cd verb has rank 1 1. If you have to call the same entry point many times, it may be faster to have an array of arguments and use cd once (preferably with the > flag).

Errors

If J was unable to call the DLL, the cd verb fails with a domain error. You can then execute the sentence cder '' which will return a 2-element list indicating what went wrong. The User Guide gives a complete list of errors; the most likely ones are 4 0 (the number of arguments did not match the number of declarations), 5 x (declaration number x was invalid--the count starts with the declaration of the returned value which is number 0), and 6 x (argument number x did not match its declaration--the count starts with the first argument which is number 0 and must match the second declaration).

Memory Management

Passing arrays into the called function is adequate only for simple functions. If the function expects an argument to be a structure, possibly containing pointers to other structures, you will have to allocate memory for the structures, fill the structures appropriately, and free the memory when it is no longer needed. J provides a set of verbs to support such memory management.

Allocate memory: mema (15!:3)

mema length allocates a memory area of length bytes. The result is the address of the memory area, as an integer. It is 0 if the allocation failed. mema has infinite rank.

You must box the memory address before using it as an operand to cd . Do not box the address for use as an operand to memf, memw, or memr .

Free memory: memf (15!:4)

memf address frees the memory area pointed to by address . address must be a value that was returned by mema . Result of 0 means success, 1 means failure. memf has infinite rank.

Write Into a Memory Area: memw (15!:2)

data memw address,byteoffset,count,type
causes data to be written, starting at an offset of byteoffset from the area pointed to by address, for a length of count items whose type is given by type . type is 2 for characters, 4 for integers, 8 for floating-point numbers, 16 for complex numbers; if omitted, the default is 2 . If type is 2, count may be one more than the length of data to cause a string-terminating NUL (\0) to be written after the data .

Read From a Memory Area: memr (15!:1)

memr address,byteoffset,count,type
produces as its result the information starting at an offset of byteoffset from the area pointed to by address, for a length of count items whose type is given by type . type is 2 for characters, 4 for integers, 8 for floating-point numbers, 16 for complex numbers; if omitted, the default is 2 . The result is a list with count items of the type given by type .If type is 2, count may be _1 which causes the read to be terminated before the first NUL (\0) character encountered.

Filling a Structure: Conversions

To create a structure to pass into a DLL, you must ensure that every byte is in the right place. The way to do this is to convert your nouns to character strings so that you have complete control over how the bytes are packed. J gives you a set of foreigns that will convert your numbers to character strings.

x (3!:4) y converts the atom or list y, which must be integer to within the comparison tolerance, to a character string in which each atom of y occupies 2^x bytes. x may be 1 (2-byte short result), 2 (4-byte long result), or 3 (8-byte result if you have J64). The 'conversion' is merely a change of type: for example, when an integer is converted to 4-byte character, the 32-bit binary code for the integer is not changed, but it is viewed as 4 characters.

To go in the other direction, x is negative: (-x) (3!:4) y splits a string into 2^x-character pieces and calls each piece an integer. If x is 0, the string is split into 2-byte pieces which are construed as representing an unsigned short which is converted to a J number.

3!:5 does a similar job for floating-point values. x (3!:5) y, where x is 1 or 2, produces a string in which each atom of y occupies 2^1+x bytes. y is first converted to a floating-point value of the correct size. Negative values of x convert from string to numeric representation.

Aliasing of Variables

When a noun is assigned the value of another noun, as in

a =. b =. 5

a single memory area is used to hold the value common to both nouns, and the two nouns are said to be aliases of each other. Aliasing obviously reduces the time and space used by a computation. The interpreter takes care to ensure that aliasing is invisible to the programmer; if after the statement above we execute

b =. 6

the interpreter will assign the new value to b only, leaving a unchanged. What actually happens is that the new value is created in a data block of its own and the descriptor for the noun b is changed to point to the new block. (Almost all verbs create their outputs in newly-allocated data blocks. As of J6.01 the exceptions are ], [, and , and u} when used in one of the forms that produces in-place modification. Increasing the number of cases recognized for in-place execution is a continuing activity of the J developers).

If there were nothing more to say about aliasing, I would not single it out for mention from among the dozens of performance-improving tricks used by the interpreter. What makes it worth considering is the effect aliasing has when elements outside the J language touch J's nouns. This can occur in two ways: when a noun is mapped to a file and when a noun is modified by a DLL.

Aliasing of Mapped Nouns

When a noun is mapped to a file, the descriptor for the noun points to the file's data and that pointer is never changed even if a value is assigned to the variable: the whole point of mapping the noun to the file is to cause changes in the noun to be reflected in the file, so any assignment to the noun causes the data to be copied into the area that is mapped to the file.

In addition, when b is a noun mapped to a file and is assigned to another noun as with

a =. b

the noun a, which is aliased to b, also inherits the 'mapped-to-file' attribute. This behavior is necessary to make mapped files useful, because assignments to x and y are implicit whenever a verb is invoked and it would defeat the whole purpose of mapping if the data of the file had to be copied every time the mapped noun was passed to a verb. The combination of aliasing and mapping means that any assignment to a mapped noun also changes the values of all other nouns which share the same mapping: for example, if you pass a mapped noun a as the right operand of a verb that modifies its y, y, a, and the data in the file will all be modified.

Keeping track of the aliasing is the price you pay for using mapped files. If you need to copy a noun making sure you get a fresh, unmapped data block, you must not assign the mapped noun directly, but instead assign the result of some verb that creates its output in a new data block. For example, as of J6.01 the assignment

a =. a:{b

will create a new data block containing the data of b, and a will point to that new block.

Aliasing of DLL Operands

The J interpreter uses aliasing for boxed cells of an array, so that if you execute

b =. i. 10000 10000

a =. b;5

item 0 of a simply contains a pointer to the data block of b rather than a fresh copy of the 800MB array. In addition, when a list of boxes is used as the right operand of cd, as in

'dll-spec' cd root;key;0;'';0;sam;0;(,_1);(,_1)

any array operands to the DLL function are passed via a pointer to the data in the list of boxes, with no separate copy of the data being made. This means that if the DLL modifies one of its arguments, any nouns aliased to that argument will also be modified: if the DLL function called above modifies its argument 1, the noun key and any noun aliased to key (possibly including private nouns in suspended verbs) will be changed. To protect yourself from such side-effects, you can use (a:{key) in place of key in the invocation of cd , which will make a temporary copy before calling the DLL.

Note that if your named argument has to be converted before it is passed to the DLL, any change made by the DLL will be in the copy and will not be reflected in your copy of the named argument. Since aliasing is usually considered a pernicious side-effect, this uncertainty will seldom trouble you. If for some reason you rely on the aliasing, you will need to ensure that the argument has the correct type. There is no officially-sanctioned way to do this, but as of J6.01 you can use monad <. to ensure that a value is an integer (for i descriptors) and monad _&<. to ensure that a value is floating-point (for d descriptors).