d  Technical www.jsoftware.com
J database - J language
columnar analytics
backup - damaged - developer - file handles - links - libjd.so - locales - performance - ref - reference - update - wss

backup

Complete backup or restore is just a copy of the db file folder. Host shell scripts can provide full backup/restore. With large databases and suitable hardware it might be worthwhile to use multiple tasks and use compression.

CSV dump/restore with rebuild of dynamics also provides complete backup.

damaged

See tutorial log.

There are many ways a database can be damaged: hardware crashes, OS crashes, JD bugs, application bugs...

A simple example is a table where different cols have different counts. A db with such tables can work in some cases, get unexpected errors, result in further damage (that obscure the orignal problem), or in the worst case give wrong answers.

jd'validate' checks the integrity of the db (all tables, columns, files) and if there is a problem, marks the db as damaged and signals an error.

jdadmin does a validate to ensure the db is ready for use.

Validation details can be seen with: jd'info validate' NB. complete info jd'info validatebad' NB. info about problems

The db is marked as damaged and a jdlog record is written if a validation fails.

developer

Jd is distributed with JAL (Package Manager) and the Jd library is at ~addons/data/jd and is accessed with the following equivalent lines:
load'data/jd'
load'~addons/data/jd/jd.ijs'

A developer works with a local repo. Use the development library with something like:
load'~/dev/addons/data/jd/jd.ijs'

Loading jd.ijs sets JDP_z_ as the path to the Jd library and this is used for all library references.

An automated process copies the developer repo to the addon svn repo to build a new Jd release.

file handles

Jd requires lots of file handles. Using thousands of columns requires thousands of handles.

Jd fails badly if it runs out of handles. Unable to access a file, an error is signaled, perhaps in the middle of an operation that will leave the database damaged.

Windows user does not have a limit on file handles.

Linux/Mac user often has low soft and hard limits on handles and this must be increased for serious use of Jd. There is no reason to not raise the limit to 100000.

See the soft and hard limits with:
...$ ulimit -n

If hard limit is high enough, it might be easiest, before starting J, to do:
...$ ulimit -n 100000

To increase file handle limit for Linux Jd user fred: ...$ ulimit -n # show current file handle limit run superuser text editor and open /etc/security/limits.conf add following 2 lines at the end fred soft nofile 200000 fred hard nofile 200000 save the file, restart system, and verify new ulimit

To increase file handle limit for Mac the steps are similar, but of course different, and details are left to the reader. Yosemite has a low soft limit and a high hard limit.

links

Folder symbolic links (Windows folder junctions) are used to place db cols on different drives.

Possible benefits:

See Admin jdlinkmove, jdlinkset, and jdlinktargets for details.

See tutorial link.

libjd.so

Jd linux libjd.so shared library built on stable release (RH 4/Centos 4) and should run on most modern linux systems.

If Jd gets an error loading the linux shared library, please report the following to the J database forum: ...$ cat /proc/version ...$ cat /etc/issue ...$ ldd .../libjd.so

locales

Parts of a database (tables, cols, data) correspond directly with the file structure. That is, a table is a folder in the database, each col is a folder in its table folder, and data is a file in its col folder.

When a database is opened, J locales are created that correspond to the database structure. Each table has a locale with metadata and each col has a locale with metadata and mapped file(s) with the data.

Sometimes it can be useful to dig into the internals.

jdadminx'test' jd'gen test f 3' jd'reads from f' t=. jdgl_jd_'f' NB. get locale for table f NAME__t NB. table name NAMES__t NB. col names in table c=. jdgl_jd_'f x' NB. get locale for col x in table f typ__c NB. column type PATH__c NB. path to col dat file dat__c NB. mapped file data

performance

See pmhelp_jd_ for info.

Folder pm has scripts for performance measurement.

ref

.../db/tab/refcol/jdstate has dirty flag and starts as 1 set 0 whenever ref col is recalculated set 1 when delete/update/insert done to either table. jdstate written whenever dirty flag changes db can be closed with dirty 1

Command 'ref f a g b' creates f-datl dynamic col and provides only left1 join. A ref is on 1 or more data cols. It is simpler than a reference, requires much less space, and in some cases is faster.

A 1gb int data col requires only the single 1gb int datl dynamic col.

Delete/update/insert sets the datl col dirty flag to indicate datl is not valid. The datl col is recalculated (leftcols i. rightcols) as required before use.

File .../db/tab/refcol/jdstate has the dirty flag. It starts as 1. Set 0 whenever datl is refreshed. Set 0 when a delete/update/insert is done to either table. File jdstate is written whenever dirty changes. The database can be closed with dirty 1 and that will be the state when the database is opened.

The time to create a reference between tables each with 100e6 rows on an int col on platform with 6gb ram is 4 seconds.

If possible ref should be used and reference avoided. Especially for large tables.

reference

Command 'reference f a g b' creates f-datl, f-datr, f-hash_a, f-link_a, g-hash_b, and g-link_b dynamic cols and allows outer/inner/left/right/left1 joins. Hash cols are double size. A reference is on 1 or more data cols.

A reference on an int col with tables of the same size has an overhead of 8 times. A 1gb int col reference has 8gb of dynamic cols. With 2 cols in the reference there would be 16gb of dynamic cols.

Data col changes (inserts/deletes/updates) affect dynamic cols. This is complicated and expensive. Dynamic col changes are done for every operation. Timings have not been done, but dynamic updates likely make changes significantly slower, especially for large tables.

The time to create a reference between tables each with 100e6 rows on an int col with 24gb ram is 30 minutes. It would take forever with 6gb ram because of thrash.

A reference provides all join types and is fast for a static database. But it has a very high overhead in space and can be very slow for changes. A database that was otherwise reasonable on a machine with limited ram (6gb) would not be able to build the reference.

Delete/update is prevented on a table in a reference as currently the dynamic cols would not reflect the change.

update

Update is done as delete/insert. Changed rows are marked deleted and new rows of merged new/old data are appended.

An option to do update in-place might be nice. Consider a large table with 100 cols where a typical update changes the value in 1 col of 1 row. Update as delete/insert reads 100 different areas of disk to get the old values, then writes 100 different new areas of disk. An update in-place would be much faster.

windows search service

Windows Search Service (content indexing, ...) can cause lots of disk activity and can interfere with Jd file operations and if possible should be disabled when using Jd.

Disable Windows Search Service as follow: 1. command prompt ...>services.msc 2. scroll down and right click Windows Search 3. click Properties 4. click Stop button to stop service if it is running 5. change Startup type: to Disabled 6. click Apply