JDB, interactive graphical interface
- Location
- Heartland Brewery, 34th and 5th, NYC
Contents
Meeting Summary
We started off by talking about some figures on the biggest daily changes in the Dow Jones Industrial average and how some graphs of this might be improved and how this might be incorporated into the interactive graphics tool we're planning. Also, we talked about the new J database JDB and how we might influence and aid this effort. Finally, we talked about how these two efforts might relate to each other.
Agenda for NYCJUG of 20081014
1. Introducing J: publicizing the many packages available.
2. Show-and-tell: JDB introduction - what would be useful to have?
What would help support time-series data?
3. Advanced topics: Interactive Graphics: what features should this have?
See "Samples of Some Existing Plotting Packages.doc" for examples of
what is already out there.
4. Learning and teaching J: frustrations in finding things that you
know are there.
+.--------------------+.
To sum up: it is wrong always, everywhere, and for anyone,
to believe anything upon insufficient evidence.
- William Kingdon Clifford, "The Ethics of Belief"
Proceedings
To start the meeting off, we considered a timely topic much on everyone's mind these days by looking at a list I'd prepared of the biggest daily moves in the Dow Jones Industrial average.
The DJI is not the most widely used index these days but people are familiar with it because it's been around a long time. In fact, the data series I downloaded from Yahoo! Finance begins in October of 1928. This makes it almost exactly 80 years old, which is a nice number of years to consider for a number of reasons. One thing that I was looking at was how the number of big "up" days compares to the number of big "down" days and if this relation has changed over time in any easily-characterizable way.
Largest Daily Moves in the Dow Jones Industrial Average as of 10/13/2008 |
|
Losses |
|
Gains |
||
# |
Date |
% Decline |
Date |
% Gain |
|
1 |
10/19/1987 |
-22.6 |
3/15/1933 |
15.3 |
|
2 |
10/28/1929 |
-13.5 |
10/6/1931 |
14.9 |
|
3 |
10/29/1929 |
-11.7 |
10/30/1929 |
12.3 |
|
4 |
10/5/1931 |
-10.7 |
6/22/1931 |
11.9 |
|
5 |
11/6/1929 |
-9.9 |
9/21/1932 |
11.4 |
|
6 |
8/12/1932 |
-8.4 |
10/13/2008 |
11.1 |
|
7 |
1/4/1932 |
-8.1 |
10/21/1987 |
10.1 |
|
8 |
10/26/1987 |
-8.0 |
8/3/1932 |
9.5 |
|
9 |
6/16/1930 |
-7.9 |
9/5/1939 |
9.5 |
|
10 |
7/21/1933 |
-7.8 |
2/11/1932 |
9.5 |
|
11 |
10/9/2008 |
-7.3 |
11/14/1929 |
9.4 |
|
12 |
10/18/1937 |
-7.2 |
12/18/1931 |
9.4 |
|
13 |
10/27/1997 |
-7.2 |
5/6/1932 |
9.1 |
|
14 |
10/5/1932 |
-7.2 |
4/19/1933 |
9.0 |
|
15 |
9/17/2001 |
-7.1 |
10/8/1931 |
8.7 |
|
16 |
9/24/1931 |
-7.1 |
8/8/1932 |
8.2 |
|
17 |
7/20/1933 |
-7.1 |
6/10/1932 |
8.0 |
|
18 |
9/29/2008 |
-7.0 |
6/19/1933 |
7.6 |
|
19 |
10/13/1989 |
-6.9 |
6/3/1931 |
7.1 |
|
20 |
1/8/1988 |
-6.9 |
1/6/1932 |
7.1 |
|
One of the convenient things about this 80-year period is that it divides evenly into four 20-year periods which more-or-less coincide with important eras in the investing world. The first 20 years, from 1928 through late 1948, covers the Great Crash, the Great Depression, and World War II. The second period covers the post-war era through the culturally seminal year of 1968. The third period covers the great bear market of the early 1970s and the great crash of 1987 (which is at the very top of the list for a single day's move.) The most recent period covers the post-'87-crash, the dot-com boom and bust, to the recent turbulence.
One interesting thing to note is that the years 1929-1933 still dominate the top twenty. Another thing to notice is that there are no days in the top twenty for the years between 1939 and 1987.
Here we show the distribution of the largest daily changes upward (darker bar) versus those downward (light bar).
They all look somewhat similar until you pay close attention to the scales on the bottom of the graphs which are quite different. However, since each individual histogram is scaled according to its own data, the graphs of these four periods are not to be simply compared to each other - they differ more than first appears.
Here we see a crude attempt to use a common scale across all four by forcing the same minimum and maximum X-value onto all the charts.
This highlights the difficulty of doing this well for a few reasons. For instance, though the X-scale is the same across all four periods, the Y-scale is not. Even more importantly, the way I achieved even the minimal commonality of the maximum X-value was by cheating: I added spurious minimum and maximum values to the three series lacking the true minimums and maximums (from the 1948-1968 and 1968-1988 graphs, respectively), then manually erased the very small spurious bars from each graph after it had been rendered as a picture.
All of this points to some fairly obvious ways of better graphing that are very hard to accomplish with existing packages. In fact, I had first noticed this problem when generating graphs of multiple, related series with S-Plus, a language with highly-regarded graphing capabilities. This language is virtually the same as the freely available "R" language to which J has an interface. In fact, Thomas, who started this interactive graphics initiative, had mentioned that they use this interface specifically to generate graphs from J which it cannot do well on its own.
In fact, you may notice that these charts are slightly out-of-synch with the table of numbers because I re-ran the numbers subsequent to the big market moves in October, but have not re-done the charts. That's because it's easy to re-run the numbers but time-consuming to re-do the charts. This difficulty of updating charts was another motivation for Thomas to start work on the interactive graphics project and it's a common problem if you work with a lot of charts.
Beginner's regatta
There are getting to be quite a few packages available in J. Here is a list of them currently:
arc/zip |
Zip file utilities based on zlib 1.2.3 and minizip libraries. |
arc/ziptrees |
Zips and Unzips directory trees |
base library |
base library scripts and labs |
convert/misc |
miscellaneous scripts |
data/dbman |
Database manager |
data/jdb |
JDB |
data/sqlite |
sqlite enhanced API for J |
docs/wikihtml |
Offline browsing of wiki sections for Grid, Plot and Project Manager |
finance/actuarial |
Actuarial functions |
finance/interest |
Compound interest functions |
format/publish |
builds pdf reports from markup |
games/nurikabe |
Nurikabe |
general/dirtrees |
Copy and delete directory trees |
general/dirutils |
Additional directory utilities |
general/inifiles |
Platform neutral interface for INI files |
general/jayscript |
J Language Active Script Connector |
general/jod |
JOD J Object Dictionary |
general/jodsource |
JOD Object Dictionary Source |
general/pcall |
Pointer call to a DLL function |
general/sfl |
Standard Function Library from iMatix, a portable function library for C/C++ programs |
graphics/fvj3 |
Materials for Fractals, Visualization and J, 3rd edition including scripts for visualization. |
graphics/gnuplot |
Create gnuplot graphics |
graphics/graphviz |
Graph Visualization |
graphics/jturtle |
Turtle graphics |
graphics/treemap |
Displays a treemap |
gui/gtk |
GTK API |
gui/jobs |
Application framework to host analysis jobs |
gui/monthview |
Displays the Microsoft Monthview calendar control |
gui/util |
GUI utilities |
math/deoptim |
Differential Evolution for optimization of multidimensional functions |
math/fftw |
FFTW |
math/lapack |
LAPACK |
math/lbfgs |
LBFGS for unconstrained nonlinear optimization |
math/misc |
miscellaneous scripts |
media/animate |
Animation Utility |
media/gdiplus |
GDI+ Library |
media/image3 |
Utilities for accessing 24-bit jpeg, png, bmp, tga and portable anymaps in J. |
media/ming |
Flash SWF file generator based on Ming |
media/paint |
Bitmap image-editing application |
media/platimg |
Platform neutral image I/O utilities |
media/wav |
Windows WAV file creation and play |
stats/base |
Basic statistics package |
stats/dendrite |
Dendrite cluster analysis method |
stats/r |
Interfaces to R statistical package |
stats/rlibrary |
R library using Rserve interface |
tables/csv |
Read and write CSV files and strings |
tables/csvedit |
Grid based editor for CSV files |
tables/dsv |
Convert delimiter-separated strings and files to/from boxed arrays |
tables/excel |
Reads Excel files using OLE |
tables/tara |
Platform independent system for reading and writing Excel files |
web/jhp |
J Hypertext Processor |
xml/loose |
Loose XML parser based on regex |
xml/sax |
XML parser based on Expat library |
xml/xslt |
XSL Transform tool |
We need to work on using some of these, giving feedback to their creators to improve them, and publicize them. This brings us to our next topic which is one of these packages: JDB.
Show-and-tell
I've done some preliminary work with JDB. The first thing to do is to load up some different kinds of datasets to see how well it handles them. I have three datasets in mind for this preliminary investigation: the Netflix Challenge data, some options data, and some data on commodities.
These reflect my own interests and data I have readily available. Each of the three should test a different facet of JDB. The Netflix dataset is fairly large and should test JDB's capacity. The options data is a fairly complicated example of time-series data. Even today, the major databases do not have time-series handling built in - each user has to cobble it together ad-hoc.
A summary of my experiences to-date with JDB can be found here: DevonMcCormick/JDBWithNetflixChallengeData.
Advanced topics
Learning and teaching J
Scan of Meeting Notes
