>>  <<  Ndx  Usr  Pri  JfC  LJ  Phr  Dic  Rel  Voc  !:  wd  Help  Learning J

Chapter 28: Data Files

The subject of file-handling in general, and how data is organized in files, is a major topic in itself. In this chapter we will cover only a selection of the facilities available in J.

J functions to read files produce results in the form of character-strings, and similarly functions to write files take strings as arguments. Such a string can be the whole data content of a file when the available memory of the computer is sufficient.

Our approach here will be to look first at some J functions for input and output of strings. Then we look at a few examples of dealing with strings as representing data in various formats. Finally we look at mapped files as an alternative to conventional file-hindling.

28.1 Reading and Writing Files

28.1.1 Built-in Verbs

In the following, a filename is a string which is valid as a filename for the operating-system of the computer where we are running J. For example, if the operating-system is Windows we could write :

    F =: 'c:\temp\demofile.xyz'       NB. a filename

or with Linux we could write:

    F =: '/home/username/demofile.xyz'

The built-in verb 1!:2 writes data to a file. The right argument is a boxed filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file. The result is null.

   'some data' 1!:2 < F    NB. write to file F

The built-in verb 1!:1 reads data from a file. The right argument is a boxed filename. The result is a character-string, the data read.

   data =: 1!:1 < F     NB.  read from file F

data $ data
some data 9

28.1.2 Screen and Keyboard As Files

Screen and keyboard can be treated as files, to provide a simple facility for user-interaction with a running program.

The expression x (1!:2) 2 writes the value of x to "file 2", that is, to the screen. A verb to display to the screen can be written as

   display =: (1!:2) & 2

For example, here is a verb to display the stages in the computation of least-common-denominator by Euclid's algorithm.

   E =: 4 : 0
display x , y
if. y = 0 do. x else. (x | y) E x end.
)
   
   12 E 15
12 15
3 12
0 3
3 0
3

The value to be displayed by (1!:2) &2 is not limited to strings: in the example above a list of numbers was displayed.

User-input can be requested from the keyboard by reading "file 1", that is, by evaluating (1!:1) 1. The result is a character-string containing the user's keystrokes. For example, a function for user-interaction might be:

   ui =: 3 : 0
display 'please type your name:' 
n  =.  (1!:1) 1
display 'thank you ', n
''
)

and then after executing

     ui ''
   

a dialogue appears on the screen, like this:

   please type your name:
   Waldo
   thank you Waldo

28.1.3 Library Verbs

The library script files provides a number of useful verbs. Here is a brief summary of a selection:

s fwrite F write string s to file F
fread F read string from file F
s fappend F append string s to file F
fread F;B,L read slice from file F, starting at B, length L
s fwrites F write text s to file F
freads F read text from file F
fexist F true if file F exists
ferase F delete file F

From now on we will use library verbs for our file-handling.

   require 'files'

The library verb fwrite writes data to a file. The right argument is a filename. The left argument is a character-string, the data to be written. The effect is that the file is created if it does not already exist, and the data becomes the whole content of the file.

   'some data' fwrite F    NB. file write
9

The result shows the number of characters written. A result of _1 shows an error: either the left argument is not a string or the right argument is not valid as a filename, or the specified file exists but is read-only.

   (3;4) fwrite F
_1

The library verb fread reads data from file. The argument is a filename and the result is a character-string.

z =: fread F $z
some data 9

A result of _1 shows an error: the specified file does not exist, or is locked.

fread 'qwerty' fexist 'qwerty'
_1 0

28.2 Large Files

For large files, the memory of the computer may not be sufficient to allow the file to be treated as a single string. We look at this case very briefly. An alternative approach, using mapped files, is covered below.

Write a file with some initial content:

   'abcdefgh' fwrite F
8

We can append some data to the file with library verb fappend.

   'MORE' fappend F
4

To see the effect of fappend (just for this demonstration, but not of course for a large file) we can read the whole file again :

   fread F
abcdefghMORE

We can read a selected slice of the file, say 8 bytes starting from byte 4. In this case we use fread with a right argument of the form filename;start,size.

   start =: 4
   size  =: 8
   fread F ; start, size
efghMORE

28.3 Data Formats

We look now at a few examples of how data may be organized in a file, that is, represented by a string. Hence we look at converting between character strings, with various internal structures, and J variables.

We take it that files are read and written for the purpose of exchanging data between programs. Two such programs we can call "writer" and "reader". Questions which arise include:

  1. Are writer and reader both to be J programs? If so, then there is a convenient J-only format. If not, then we expect to work from a programming-language-independent description of the data.

  2. Are writer and reader to run on computers with the same architecture? If not, then even in the J-to-J situation, some finesse may be needed.

  3. Is the data organized entirely as a repetition of some structure (for example, "fixed length records"). If so then we may usefully be able to treat it as one or more J arrays. If not, we may need some ad-hoc programming.

28.3.1 J-0nly Files

Suppose we aim to handle certain files only in J programs, so that we are free to choose any file format convenient for the J programmer. The "binary representation" is particularly convenient.

For any array A,

   A =:  'Thurs'; 19 4 2001 

the binary representation of A is a character string. There are built-in verbs to convert between arrays and binary representations of arrays.

   arrbin  =: 3!:1   NB. array to binary rep.
   binarr  =: 3!:2   NB. binary rep. to array

If B is the binary representation of A, we see that B is a character string, with a certain length.

A $ B =: arrbin A
+-----+---------+
|Thurs|19 4 2001|
+-----+---------+
88

We can write B to a file, read it back, and do the inverse conversion to recover the value of A :

B fwrite F $ Z =: fread F binarr Z
88 88 +-----+---------+
|Thurs|19 4 2001|
+-----+---------+

From J4.06 on, there are variations of the binary representation verbs above to allow for different machine architectures: see the Dictionary under 3!:1.

28.3.2 Text Files

The expression a. (lower-case a dot) is a built-in noun, a character-string containing all 256 ASCII characters in sequence.

65 66 67 { a. $ a.
ABC 256

In the ASCII character set, that is, in a., the character at position 0 is the null, at position 10 is line-feed and at position 13 is carriage return . In J, the names CR and LF are predefined in the standard profile to mean the carriage-return and linefeed characters.

   a. i. CR,LF
13 10

We saw fread and fwrite used for reading and writing character files. Text files are a special kind of character file, in that lines are delimited by CR and/or LF characters.

On some systems the convention is that lines of text are delimited by a single LF and on other systems a CR,LF pair is expected. Regardless of the system on which J is running, for J text variables, the convention is always followed of delimiting a line with single LF and no CR.

Here is an example of a text variable.

   t =: 0 : 0
There is physics
and there is 
stamp-collecting.
)

Evidently it is a string (that is, a 1-dimensional character list) with 3 LF characters and no CR characters.

$ t +/t=LF +/t=CR
49 3 0

If we aim to write this text variable t to a text file, we must choose between the single-LF or CRLF conventions. There are two useful library verbs, fwrites and freads to deal with this situation.

  • Under Windows, x fwrites y writes text-variable x to file y, in the process converting each LF in x to a CRLF pair in y.
  • Under Linux, x fwrites y writes text-variable x to file y, with no conversion.
  • Under Windows or Linux z =: freads y reads file y, converting any CRLF pair in y to a single LF in text-variable z.

For convenience in dealing with a text variable such as t, we can cut it into lines. A verb for this purpose is cut (described more fully in Chapter 17 ).

   cut =: < ;. _2

cut produces a boxed list of lines, removing the LF at the end of each line.

   lines =: cut t
   lines
+----------------+-------------+-----------------+
|There is physics|and there is |stamp-collecting.|
+----------------+-------------+-----------------+

The inverse of cut we can call uncut. It restores the LF at the end of each box and then razes to make a string.

   uncut =: ; @: (,&LF &. >)
   uncut lines
There is physics
and there is 
stamp-collecting.

28.3.3 Fixed Length Records with Binary Data

Suppose our data is in two J variables: a table of customer-names, and for each customer a corresponding amount:

cnames =: 'Mr Rochester' ,: 'Jane' ,. amts =: _10000 3
Mr Rochester
Jane
_10000
&nbsp;    3

Now suppose the aim is to write this data to a file, formatted in 16-byte records. Each record is to have two fields: customer-name in 12 bytes followed by amount in 4 bytes, as a signed integer. Here is a possible approach.

The plan is to construct, from cnames and amts, an n-by-16 character table, to be called records. For this example, n=2, and records will look like this:

Mr Rochester####
Jane        ####
   

where #### represents the 4 characters of an integer in binary form.

We build the records table by stitching together side by side an n-by-12 table for the customer names field, and an n-by-4 table for the amounts field.

For the customer-names field we already have cnames which is suitable, since it is 12 bytes wide:

   $ cnames
2 12

For the amounts field we convert amts to characters, using ci4 from Chapter 27. The result is a single string, which is reshaped to be n-by-4.

   ci4 =:  2 & (3!:4)  NB. integer to 4 char
   
   amtsfield =: ((# amts) , 4) $ ci4 amts

Now we build the n-by-16 records table by stitching together side-by-side the two "field" tables:

   records =: cnames ,. amtsfield

To inspect records, here is a utility verb which shows a non-printing character as #

   inspect =: 3 : ('A=.a.{~32+i.96';'(A i.y) { A,''#''')
   

inspect records $ records
Mr Rochester####
Jane&nbsp;       ####
2 16

The outgoing string to be written to the file is the ravel of the records.

   (, records) fwrite F
32

The inverse of the process is to recover J variables from the file. We read the file to get the incoming string.

   instr =: fread F

Since the record-length is known to be 16, the number of records is

   NR =: (# instr) % 16

Reshape the incoming string to get the records table.

   inspect records =: (NR,16) $ instr
Mr Rochester####
Jane        ####

and extract the data. The customer-names are obtained directly, as columns 0-11 of records.

   cnames =: (i.12) {"1 records

For the amounts, we extract columns 12-15, ravel into a single string and convert to integers with c4i.

   c4i =: _2 & (3!:4)  NB. 4 char  to integer
   
   amts   =: c4i  , (12+i.4) {"1  records

cnames ,. amts
Mr Rochester
Jane
_10000
&nbsp;    3

28.4 Mapped Files

A file is said to be mapped when the file is temporarily incorporated into the virtual-address-translation mechanism of an executing program. The data in a mapped file appears to the J programmer directly as the value of a J variable - an array. Changes to the value of the variable are changes to the data in the file.

In such a case, we can say, for present purposes, that the file is mapped to the variable or, equivalently, that the variable is mapped to the file.

Mapped files offer the following advantages:

  • Convenience. Data in a file is handled just like data in any J array. There is no reading or writing of the file.

  • Persistent variables. A variable mapped to a file lives in the file, and can persist from one J session to another.

Since mapped files may be very large, further advantages are:

  • Larger arrays are possible. Ordinarily, the size of a J array is limited by the size of available RAM and swap-file. Large amounts of additional virtual memory can be provided for the array by mapping it to a file.

  • Larger files may be handled. Ordinarily, a very large file must be dealt with piecemeal (because of the limit on size of arrays). By mapping, it can be handled as a single variable.

There are two cases.

In the first case, any kind of existing file can be mapped to a variable. We take as given the structure of the data in the file, and then the J program must supply a description of the desired mapping. For example, a file with fixed-length records could be mapped to a character table.

In the second case, a file can be created in J in a special format (called "jmf") specifically for the purpose of mapping to a variable. In this case, the description is automatically derived from the variable and stored in the file along with the data. Thus a "jmf" file is self-describing.

We look now at jmf files.

28.4.1 jmf Files and Persistent Variables

Suppose we have constructed an array V with some valuable data, which from now on we aim to use and maintain over a number of J sessions. Perhaps V is valuable now, or perhaps it will become valuable over subsequent sessions as it is modified and added-to.

Our valuable data V can be an array of numbers, of characters, or of boxes. For a simple example we start with V as a table of numbers.

   ] V =:  2 2 $ 1 2 3 4
1 2
3 4

We can make a persistent variable from V as follows. Step 0 is to ensure the needed library utilities are available.

   require 'files'
   require 'jmf'

Step 1 is to estimate the size, in bytes, of a file required for the value of V. Since we expect that over time V may grow from its present size ultimately to, say, 64 KB, then our estimate S is

   S =: 64000

If in doubt, allow plenty. The size must be given as a positive integer (not a float) and therefore less than 2147483648 (2Gb) on a 32-bit machine.

Step 2 is to choose a file-name and, for convenience, define a variable F to hold the the file name as a string. Under Windows we could write, say:

    F =: 'c:\temp\persis.jmf' 

or under Linux we could write:

    F =: '/home/username/persis.jmf'

Step 3 is to create file F as a jmf file, large enough to hold S bytes of data. For this purpose the utility function createjmf is available (in locale jmf) so we can write:

   createjmf_jmf_ F;S

(On your system, with a different version of J, you may see a response different from what is shown here.)

At this point, file F exists. If we inspect it we see its actual size is a little larger than S, to accommodate a header record which makes the file self-describing.

   fdir F
+----------+----------------+-----+---+------+
|persis.jmf|2006 7 9 11 30 4|64284|rw-|-----a|
+----------+----------------+-----+---+------+

The content of file F is initially set by createjmf_jmf_ to represent a J value, in fact a zero-length list. The important point is that file F now contains a definite value.

Step 4 is to map the content of file F to a new variable, for which we choose the name P.

   map_jmf_ 'P'; F

This statement means, in effect:

           P =:  value-currently-in-file-F

and we can verify that P is now an empty list:

P $ P
&nbsp; 0

Notice particularly that the effect of mapping file F to variable P is to assign the value in F to P and not the other way around. Hence we avoided mapping file F directly onto our valuable array V because V would be overwritten by the preset initial value in F, and lost.

Step 5 is to assign to P the desired value, that of V

   P =: V

Variable P is now a persistent variable, since it is mapped to file F. We can amend P, for example by changing the value at row 0 column 1 to 99.

P P =: 99 (<0 1) } P
1 2
3 4
1 99
3&nbsp; 4

or by appending a new row:

   ] P =: P ,  0 0
1 99
3  4
0  0

Step 6 is needed before we finish the current session. We unmap variable P, to ensure file F is closed.

   unmap_jmf_ 'P'
0

The result of 0 indicates success. The variable P no longer exists:

P $ P
error $ P

To demonstrate that the value of P persists in file F we repeat the mapping, processing and unmapping in this or another session. The name P we chose for our persistent variable is only for this session. In another session, the persistent variable in file F can be mapped to any name.

This time we choose the name Q for the persistent variable. We map file F to Q:

   map_jmf_ 'Q' ; F
   
   Q
1 99
3  4
0  0

modify Q:

   ] Q =: Q , 7 8 
1 99
3  4
0  0
7  8

and unmap Q to close file F.

   unmap_jmf_ 'Q'
0
   

28.4.2 Mapped Files Versus Binary Representation

It was said above that the binary representation of any J variable can be written to a file and subsequently read back, and so we have already a way to handle persistent variables.

Using mapped files rather than binary representation has the advantage that the variable in a mapped file is not limited in size by the amount of available RAM and swap-space. On the other hand, the (ultimate) size of a mapped variable must be fixed in advance.

28.4.3 Mapped Files are of Fixed Size

Recall that we created file F large enough for S bytes of data.

   S
64000
   fdir F
+----------+----------------+-----+---+------+
|persis.jmf|2006 7 9 11 30 4|64284|rw-|-----a|
+----------+----------------+-----+---+------+

The variable in file F is currently much smaller than this, and the unused trailing part of the file is filled with junk. However, if we continue to modify Q by appending to it, we reach a limit, by filling the file, and encounter an error. To demonstrate, with a verb fill for the purpose:

   fill =: 3 : 0
try.   while. 1 do. Q =: Q , 99 99 end.
catch. 'full'
end.
)
   
   map_jmf_ 'Q'; F 
   fill ''
full
   
   

The amount of data now in Q can be estimated as 4 bytes per integer (since Q is integer) multiplied by the number of integers, that is, altogether 4 * */$ Q. This result for the final size of Q accords with our original size estimate S.

4 * */ $ Q S
64000 64000

   unmap_jmf_ 'Q'
0
   

28.4.4 Given Files

Now we look at mapping ordinary data files (that is, files other than the special jmf-format files we considered above).

The way the data is laid out in the file we take as given, and our task is specify how this layout is to be represented by the type, rank and shape of a J variable, that is, to specify a suitable mapping.

For example, suppose we aim to read a given file G with its data laid out in fixed-length records, each record being 8 characters. Suppose file G was originally created by, say:

    G =: 'c:\temp\data.xyz'
   'ABCD0001EFGH0002IJKL0003MNOP0004' fwrite G
32

The next step is to decide what kind of a variable will be suitable for mapping the data in file G. We decide on an n-by-8 character table. The number of rows, n, will be determined by the amount of data in the file, so we do not specify n in advance.

It is convenient to start with a small example of an n-by-8 character table, which we call a prototype. The choice of n is unimportant.

   prototype =: 1 8 $ 'a'

Now the mapping can be defined by:

   ] mapping =: ((3!:0) ; (}. @: $)) prototype
+-+-+
|2|8|
+-+-+

We see that mapping is a boxed list. The first item is the data-type. Here 2, meaning "character", is produced by 3!:0 prototype. The second item is the trailing dimensions (that is, all but the first) of the prototype. Here 8 is all but the first of 1 8, produced by (}.@:$) prototype. Thus mapping expresses or encodes "n-by-8 characters".

Now mapping is supplied as left argument to (dyadic) map_jmf_. We map file G onto a variable for which we choose the name W thus:

   mapping map_jmf_ 'W'; G

We see that W is now a variable. Its value is the data in the file.

W $ W
ABCD0001
EFGH0002
IJKL0003
MNOP0004
4 8

We can amend the data in the ususal way:

   ] W =: 'IJKL9999' 2 } W
ABCD0001
EFGH0002
IJKL9999
MNOP0004

What we cannot do is add another row to the data, because all the space in file G is occupied by the data we already have.

W W =: W , 'WXYZ0000'
ABCD0001
EFGH0002
IJKL9999
MNOP0004
error

We close file G by unmapping variable W:

   unmap_jmf_ 'W'
0

28.4.5 Mapped Variables Are Special

Mapping files to variables offers the programmer significant advantages in functionality and convenience.

The price to be paid for these advantages is that there are some considerations applying to mapped variables which do not apply to ordinary variables. The programmer needs to be aware of, and to manage, these considerations. This is our topic in this section and the next.

If A is an ordinary variable, not mapped, then in the assignment B=: A the value of A is in effect copied to B. A subsequent change to A does not affect the value of B.

A =: 1 B =: A B A =: 2 B
1 1 1 2 1

By contrast, consider a variable mapped to a file. If the file is very large, there may not be enough space for another copy of the value. Hence copying is to be avoided.

Compare the previous example with the case when A is a mapped variable.

   map_jmf_ 'A';F

A =: 1 B =: A B A =: 2 B
1 1 1 2 2

We see that B changes with changes to A. In effect B =: A means that B is another name for A, not a copy of the value of A. That is, both A and B refer to the same thing - the value in the file.

Hence it is also the case that A changes with changes to B.

A B =: 7 A
2 7 7

Consider now an explicit verb applied to a mapped variable. Here y becomes another name for the data in the file. Hence assignment to y (even a local assignment) may cause an unintended change the mapped variable in the file. For example

   foo =: 3 : ' 3 * y =. y + 1'
   
   

foo 2 A foo A A
9 7 24 8

28.4.6 Unmapping Revisited

The current status of mapped files and variables is maintained by the J system in a "mapping table". The mapping table can be displayed by entering the expression showmap_jmf_ '' but for present purposes here is a utility function to display only selected columns.

   status =: 0 1 8 & {"1  @: showmap_jmf_
   status ''
+-------+----------------+-----+
|name   |fn              |msize|
+-------+----------------+-----+
|A_base_|e:\xi\persis.jmf|64000|
+-------+----------------+-----+
   

We see that currently variable A in locale base is mapped to file F (persis.jmf).

Under "refs", the value 3 means that the data in file F is the target of 3 references. One of these is variable A, a second is the variable B (which we know to be another name for A) and the third is for the system itself.

Variables A and B are both in existence:

A B
8 8

For the sake of simplicity, a recommended procedure for closing the file is first to erase all variables such as B which are alternative names for the originally-mapped variable A

   erase <'B' 
1

The status shows the number of references is reduced.

   status ''
+-------+----------------+-----+
|name   |fn              |msize|
+-------+----------------+-----+
|A_base_|e:\xi\persis.jmf|64000|
+-------+----------------+-----+

Now we can unmap A.

   unmap_jmf_ 'A'
0

The result of 0 means the file is closed and A erased. The status table shows no entries, that is, that no files are mapped.

   status ''
+----+--+-----+
|name|fn|msize|
+----+--+-----+
   

Let us recreate the situation in which A is mapped to F and B is another name for A, so there are 3 references to (the data in) file F.

   map_jmf_ 'A'; F
   B =: A
   status ''
+-------+----------------+-----+
|name   |fn              |msize|
+-------+----------------+-----+
|A_base_|e:\xi\persis.jmf|64000|
+-------+----------------+-----+

What happens if we erase all the variables referring to F ?

   erase 'A';'B'
1 1
   status ''
+-------+----------------+-----+
|name   |fn              |msize|
+-------+----------------+-----+
|A_base_|e:\xi\persis.jmf|64000|
+-------+----------------+-----+

We see there is still a single reference, under the name A even though there is no variable A. This single reference reflects the fact that file F is not yet unmapped.

Thus when we said earlier that file F gets mapped to variable A, it would be more accurate to say that file F gets mapped to the name A, and a variable of that name is created. Even though the variable is subsequently erased, the name A still identifies the mapped file, and can be used as an argument to unmap.

   unmap_jmf_ 'A'
0
   status ''
+----+--+-----+
|name|fn|msize|
+----+--+-----+

For more information, see the "Mapped Files" lab.

This is the end of Chapter 28


NEXT
Table of Contents
Index


The examples in this chapter were executed using J version 601-o-beta. This chapter last updated 27 Jun 2006 .
Copyright © Roger Stokes 2006. This material may be freely reproduced, provided that this copyright notice is also reproduced.


>>  <<  Ndx  Usr  Pri  JfC  LJ  Phr  Dic  Rel  Voc  !:  wd  Help  Learning J