Originally presented 7/19/2007 to the NYC Financial Engineering Meetup. It illustrates the use of J to manipulate data quickly and easily in order to explore topics in quantitative financial research. The previous work is here; there is currently no further work on this available.

Short-Term Return Persistence Without Look-Ahead Bias (continued)

To re-iterate, the first steps we’ll take to re-do our previous work while avoiding look-ahead bias are:

1. rebuild the decile boundaries at each period,
2. place each past return period in its appropriate decile,
3. calculate the subsequent period (i.e. future at this point) return,
4. place this forward-looking return in its appropriate decile.

To simplify this problem, let’s do what we have to do for some fixed set of parameters then apply this to a moving window across all our returns. Start with simple, daily returns for the initial classification into deciles:

`   rets=: ,2 ret1p\(1{seltkrs){clpxs     NB. Look at only GLW daily returns for now`

Let’s see what our first group of returns look like – let’s look at the daily returns, the cumulative returns, and the decile into which each return falls for this period:

```   ndiv,cnobs     NB. # divisions (10=deciles), # observations
10 105
usus tt=. ndiv ntileix cnobs{.rets  NB. Returns->decile #
0 9 4.552381 2.8855774                 NB. “usus” -> min, max, mean, SD
frtab tt                            NB. Decile frequencies
10 0          NB. EG 10 returns fall into decile 0
11 1
11 2
10 3
5 4          NB. but only 5 in decile 4
16 5          NB. and 16 into decile 5 – returns cluster around middle
11 6
10 7
10 8
11 9
]xx=. (>./,<./)cnobs{.rets              NB. Get greatest and least return in period
0.032092426 _0.031565657
usus scld=. (<./xx)+(tt%>./tt)*-/xx     NB. to scale deciles to returns (for plot).
_0.031565657 0.032092426 0.00063388159 0.020410036```

The actual plot is generated like this:

``` plargs=. 'title GLW Returns for 1st 105 days;pensize 3;key Rets CumRets "Scaled Decile"'
vals=. ((] ,: [: <: [: */\ >:)cnobs{.rets),scld
plargs plot vals```

The circled portion emphasizes how the three slightly different returns shown by the pointed (darkest) line are either put into different deciles – as are the first two – or the same decile – as are the latter two.

Which looks like this:

Now we have to use the decile boundaries established in this initial period for a following period which brings up two immediate questions: how long should the following period be and how do we represent the preliminary deciles for use in the subsequent period? We have this problem because deciles are based on specific particular values: how do we translate a particular decile grouping from one set of values to another set?

Let’s clarify these questions by looking at a very simple example. We’ll pick ten random numbers from zero to 99 and classify these into only two groups – “two-tiles” instead of deciles.

```   2 ntile rr=. ?10\$100
+---------------------------+
¦8 9 21 24 29¦40 75 77 82 93¦
+---------------------------+```

Now we pick new groups of random numbers representing the subsequent period returns. We’ll show them in sorted order for ease of comparison to the two-tile grouping above. A new group could happen to fit neatly into the previously-determined grouping:

```   /:~newrr=. ?10\$100
2 16 22 23 29 40 43 55 57 91```

Or it might not:

```   /:~newrr=. ?10\$100
2 7 23 30 34 36 45 81 87 95```

As you can see, the first new group has no values between 29 and 40 – the top of two-tile zero and the bottom of two-tile one – so it’s easy to classify its members. The second new group is more of a problem since it has several values falling “between the cracks”. A simple way to solve this is to define the breakpoint values as the average of the boundary values from adjacent two-tiles, i.e. use 34.5 (average of 29 and 40) as the breakpoint. This would classify the first five values of the second new group into two-tile zero and the higher five values into two-tile one. Note also that we only define internal breakpoints – we leave the ends of the range open to accommodate any new values that fall outside the range of our initial values (as do several of the numbers in our second set).

Now that we’ve figured out what to do based on our simple example, we can define a function to calculate breakpoints from quantile groupings. We’ll test it on our small example to verify that it works then we can use it on our current set of observations to establish breakpoints for the subsequent set of values.

```   ntileGrp2BrkPts=: 13 : '-:(}:>./&>y)+}.<./&>y'
ntileGrp2BrkPts 2 ntile rr
34.5```

Also test it on the same numbers broken into more groups so we have some confidence that our solution scales properly:

```   4 ntile rr
+---------------------------+
¦8 9¦21 24 29¦40 75 77¦82 97¦
+---------------------------+
ntileGrp2BrkPts 4 ntile rr
15 34.5 79.5

3 ntile rr
+---------------------------+
¦8 9 21¦24 29 40 75¦77 82 97¦
+---------------------------+
]bp=. ntileGrp2BrkPts 3 ntile rr
22.5 76
bp bpntile bp,/:~rr
+-----------------------------------+
¦8 9 21¦22.5 24 29 40 75¦76 77 82 97¦
+-----------------------------------+```

Values equal to the breakpoints are classified into the higher quantile. Now take a look at the breakpoints we’ll get with our first set of observations:

`   0.0001 roundNums bp=. ntileGrp2BrkPts 10 ntile cnobs{.rets _0.0193 _0.0105 _0.0056 _0.0026 _0.0012 0.0051 0.0085 0.0137 0.0218`

Using breakpoints instead of the quantile number will require us to define an alternate “ntile” function that takes a left argument of specific breakpoint values instead of a number of quantiles:

```NB.* bpntile: break vector into pieces based on (internal) breakpoints x.
bpntile=: 4 : 0
grd=. /:x,y
ptn=. 1,(1) (grd i. i.#x)}0\$~#grd
ptn<;._1 ] 0,grd{x,y
)```

Check that using the breakpoint version gives the same result as the original quantilizing function:

```   bp=. ntileGrp2BrkPts 10 ntile cnobs{.rets
(10 ntile cnobs{.rets)-:bp bpntile cnobs{.rets
1```

The "1" indicates that the two are equivalent.

Revamping Previous Work

We had to come up with an alternate to our original quantilizing function “ntile” once we understood the necessity of basing a partitioning on previously-determined breakpoints rather than a simple, scalar number of quantiles. This leads us to revamp other functions based on the original way of partitioning our returns. So, whereas the function “ntileix” assigns quantile numbers to a vector of returns based simply on the number of quantiles, a new version of this needs to be based on explicit breakpoints.

This process of re-visiting and revamping old code is a continuous one. In fact, it may be the only way to achieve robust algorithms well-tailored to our own needs. So, as we did before, let’s write a new version of “ntileix”, based on breakpoints, and test it on some small, simple data.

```NB.* bpntileix: index vector elements by breakpoint-based quantiles.
bpntileix=: 4 : 0
grd=. /:x,y
ptn=. 1,grd e. i.#x
tt=. ptn<;._1 ] 0,grd
((#&>tt)#i.#tt) ((;tt)-#x)}_1\$~#y
)

3 ntile rr
+---------------------------+
¦8 9 21¦24 29 40 75¦77 82 97¦
+---------------------------+
bp bpntileix rr
1 0 1 2 1 1 0 2 2 0
rr,:bp bpntileix rr       NB. Check by eye
29 8 75 97 24 40 21 82 77 9
1 0  1  2  1  1  0  2  2 0
bp bpntileix /:~rr        NB. Easier to see on sorted data
0 0 0 1 1 1 1 2 2 2```

Using Current Breakpoints on the Next Set of Observations

Initially, we’ll look at a subsequent observation set that’s the same size as our initial one. We see that using breakpoints from one set on the next set of observations gives a different distribution.

```   nextobs=. cnobs{.cnobs}.rets
frtab (ntileGrp2BrkPts 10 ntile nextobs) bpntileix nextobs
10 0
11 1
11 2
10 3
10 4
11 5
11 6
10 7
10 8
11 9
nextile=. bp bpntileix nextobs
frtab nextile
27 0
18 1
8 2
4 3
9 5
3 6
6 7
11 8
19 9```

Robustness in Tiling

We’ve made a point of parameterizing our quantiling code to allow groupings other than deciles because we would like our results to be robust – they should apply whatever grouping we use. How do we look at results over multiple groupings?

Here’s one way. First get the observations from one period and build the frequency tables as above. First, we’ll do it for 10 groupings as we’ve been doing.

```   bp=. ntileGrp2BrkPts 10 ntile cnobs{.rets
nextobs=. cnobs{.cnobs}.rets
nextile=. bp bpntileix nextobs
tt=. |:frtab 2<\nextile
nx10=: (;0{tt) (;1{tt)}10 10\$0```

Now do the same for 9 and 11 groupings.

```   nextil9=. (ntileGrp2BrkPts 9 ntile cnobs{.rets) bpntileix nextobs
tt=. |:frtab 2<\nextil9
nx9=: (;0{tt) (;1{tt)}9 9\$0

nextil11=. (ntileGrp2BrkPts 11 ntile cnobs{.rets) bpntileix nextobs
tt=. |:frtab 2<\nextil11
nx11=: (;0{tt) (;1{tt)}11 11\$0```

The problem is that the resultant weightings are of different lengths.

```   \$&>wtTile&>nx9;nx10;nx11
9 10 11```

So, use a little math to make them all conform by figuring out their common multiple and scaling them all up to this:

```   */9 10 11
990
cgr=. 110 99 90#&.>wtTile&.>nx9;nx10;nx11```

The resulting graph shows us that our pattern holds for slight differences in the choice of quantile:

The previous work is here; there is currently no further work on this available.

DevonMcCormick/Research/HoldingWinnersSellingLosers5STReturnPersistenceAvoidingLookAhead2 (last edited 2011-03-01 00:07:39 by DevonMcCormick)