Originally presented on 5/13/2007 to the NYC Financial Engineering Meetup. It illustrates the use of J to manipulate data quickly and easily in order to explore topics in quantitative financial research. The previous work is here and the continuation of this is here.

More Data Munging

Now we'll work on removing parts of the data that give inordinate trouble. For instance, when we look at all the daily (price) returns,

 $rets=. ret1p&.>3{"1&.>CODAT
848
 $;rets
1881993

 load'mystats'
 ss=: 'Returns Distribution' plotHistoMulti ;rets
|domain error: to
| x.+(1{y.)*(*x.-~0{y.)* i.>:<.0.5+(|x.-0{y.)%1{y.
 usus ;rets
__ 0.87735849 __ _

 replInf0=: 3 : '(y.=__)}y.,:0'
 rets=. replInf0&.>rets
 usus ;rets
_101.6 0.87735849 _0.21019811 0.42054006

 (;rets){~10{./:;rets
_101.6 _84.94 _28.25 _26 _4.3333333 _3.7777778 _1.8849558 _1.8333333 _1.8064516 _1.729927
 (;rets){~10{.\:;rets
0.87735849 0.78333333 0.76470588 0.75324675 0.62962963 0.56045752 0.54189944 0.52631579 0.51775148 0.46969697

 pxs=. _1-.~&.>3{"1&.>CODAT
 <./#&>pxs
1
 I. 1=#&>pxs
781 847

 $&.>781 847 { CODAT
+---+---+
|2 6|1 6|
+---+---+

 (#&>CODAT){~10{./:#&>CODAT NB. 10 smallest data series
1 2 3 24 27 28 44 46 66 87

 I. 40>#&>pxs
220 781 844 845 846 847
 COINFO{~I. 40>#&>pxs
+----+------------------------------------+--------+------+----------+
|HLT |Hilton Hotels Corp. |43284810|005643| 1/02/1990|
+----+------------------------------------+--------+------+----------+
|MNST|Monster Worldwide Inc. |61174210|064156| 6/04/2001|
+----+------------------------------------+--------+------+----------+
|COH |Coach Inc. |18975410|140541| 9/01/2004|
+----+------------------------------------+--------+------+----------+
|CIT |CIT Group Inc. |12558110|149738|10/27/2004|
+----+------------------------------------+--------+------+----------+
|LH |Laboratory Corp. of America Holdings|50540R40|014960|11/01/2004|
+----+------------------------------------+--------+------+----------+
|LLL |L-3 Communications Holdings Inc. |50242410|110685|12/01/2004|
+----+------------------------------------+--------+------+----------+

 exclsz=. 40<#&>CODAT NB. Eliminate short series
 'COIX CODAT'=: (<exclsz)#&.>COIX;<CODAT
 COINFO=: (1,exclsz)#COINFO

 rets=. ret1p&.>_1-.~&.>3{"1&.>CODAT
 $;rets
1881564
 usus ;rets
_1 0.87735849 _0.21003814 0.40836759
 $_1-.~;rets
1485670

 rets=. _1-.~&.>ret1p&.>_1-.~&.>3{"1&.>CODAT
 usus ;rets
_0.85158151 0.87735849 0.00046699481 0.024474664

 ss2=. 'Returns Distribution After Initial Eliminations' plotHistoMulti ;rets

 5 10$0.001 roundNums rr{~50{./:rr NB. 50 lowest returns
_0.852 _0.851 _0.766 _0.761 _0.636 _0.623 _0.614 _0.61 _0.597 _0.585
_0.573 _0.537 _0.528 _0.519 _0.496 _0.495 _0.495 _0.494 _0.493 _0.49
 _0.48 _0.48 _0.475 _0.471 _0.467 _0.465 _0.46 _0.455 _0.446 _0.445
_0.445 _0.439 _0.438 _0.436 _0.425 _0.424 _0.424 _0.422 _0.422 _0.418
_0.415 _0.41 _0.409 _0.408 _0.405 _0.401 _0.4 _0.4 _0.4 _0.398

 5 10$0.001 roundNums rr{~50{.\:rr NB. 50 highest returns
0.877 0.783 0.765 0.753 0.677 0.63 0.56 0.542 0.526 0.518
 0.47 0.458 0.451 0.45 0.438 0.427 0.424 0.42 0.41 0.409
0.405 0.394 0.391 0.389 0.384 0.384 0.379 0.369 0.368 0.367
0.366 0.363 0.36 0.358 0.355 0.353 0.351 0.348 0.348 0.348
0.347 0.339 0.338 0.333 0.332 0.331 0.331 0.33 0.33 0.326

NB.* winsorizeAt: winsorize top and bottom at xth highest/lowest observation.
winsorizeAt=: 4 : 0
 grd=. /:y
 y=. (y<:y{~x{grd) }y,:y{~x{grd
 y=. (y>:y{~(-x){grd) }y,:y{~(-x){grd
)
 ss2=. 'Returns Distribution - Winsorized at 4000th' plotHistoMulti 4000 winsorizeAt ;rets

   (#rr)%~rr+/ . =4000 winsorizeAt rr
0.99461657

   usus           NB.  The definition of "usus"
3 : 0
   if. 0=L. y do. (]`|: @.(1<#$y))(<./,>./,mean (,`,: @.(1<#$y)) stddev) y
   else. (<./,>./,mean,stddev)&> y end.
)

   usus 4000 winsorizeAt rr
_0.09881698 0.10666229 0.00048500529 0.022861136

The previous work is here and the continuation of this is here.

DevonMcCormick/Research/HoldingWinnersSellingLosers1DataMunging (last edited 2011-02-28 23:01:29 by DevonMcCormick)