The following code supports the effort outlined here to create arbitrarily large data sets for realistic testing of parallel processing.
NB.* parallelProbSets.ijs: generate large random datasets for testing parallel programs.
load 'files dates'
NB. Handle TSV (Tab-separated values) files; 'TAB LF CR'=. 9 10 13{a.
NB.* readTSVFl: read tab-delimited file into variable.
readTSVFl=: ([:<;._1&> TAB ,&.> [:<;._2 [:(],LF#~LF~:_1{]) CR-.~fread)
NB.* getTSVInfo: apply arbitrary function to each .tsv var named.
getTSVInfo=: 1 : 'u readTSVFl y'
NB.EG lnkey=: (0&{"1) getTSVInfo&.>rrmlnms
NB.* getFlsInfo: apply arbitrary function y to each var read from file by v.
getFlsInfo=: 2 : 0
if. nameExists 'SHOWGFI' do. if. SHOWGFI do. smoutput y,': ',":qts'' end. end.
u v y
NB.EG lnkey=: ((0&{"1) getFlsInfo readTSVFl)&.>rrmlnms
)
appendTSVFl=: 4 : '(x,~readTSVFl y) writeTSVFl y'
writeTSVFl=: 4 : '(enc2TSV x) fwrite y'
enc2TSV=: 13 : ';(LF,~[:}:[:; TAB,&.>~])&.><"1 y'
NB. Case 0: present-value cashflows along different interest-rate paths.
genCFs=: 13 : '|:/:~"1]1000+100%~<.900000*(360,y)?@$0'
NB.EG cf0=. genCFs 1e4 NB. 10,000 30-year cashflows
elimNeg=: 3 : '(100%~>:?0)+y-<./y'"1
maxRng=: 3 : 'y*(0.10+10%~?0)%>./y'"1
genIRs=: 3 : 0
irp=. ([:+/\1000%~[: <:[:+:0?@$~360,~]) y NB. Rates change randomly
irp=. maxRng elimNeg irp NB. Rates>0%, <:20%
irp=. irp/:*/"1 >:irp NB. Order for neatness
NB.EG ir0=. genIRs 1e4 NB. 10,000 30-year paths
)
wrCFIRFls=: 4 : 0
(":&.>genCFs x) writeTSVFl '.tsv',~'CF0_',":y
(":&.>genIRs x) writeTSVFl '.tsv',~'IR0_',":y
>:y
)
NB.EG 1e4 wrCFIRFls^:10]0 NB. Write 10 file sets w/10,000 records each
NB. Case 1: sort many records by date, movie, or user.
genDMURRecs=: 3 : '(100#.todate 70476+?y$6264),.(y,3)?@$20000 1e6 10'
NB.EG dmur0=. genDMURRecs 1e6
wrDMURFl=: 4 : '>:y[(":&.>genDMURRecs x) writeTSVFl ''.tsv'',~''DMUR0_'',":y'
NB.EG 1e6 wrDMURFl^:10]0 NB. Make 10 sets of 1 million records each