Smooth Operators
Smooth a jagged curve by repeated averaging:
meanSmooth=: 3 : 0
3 meanSmooth y
:
assert. 1=2|x
y=. (({.y)$~-:<:|x),y,({:y)$~-:<:|x NB. Weight end-points to move less.
if. 0<x do. x mean\y
else. (|x)([: mean {. , {:)\y end. NB. Neg. window averages only endpoints.
)[Here's a question: looking at the line which weights the end-points, is the following tacit version of that line any better?]
y=. (-:<:|x)([|.] ,~ [# [: ({:,{.)])y NB. Weight end-points to move less.Test by noising up a sine wave:
pp=. (1 o. i:5j99)++/<:+:3 100?@$0 'title Smoothing Example;pensize 2;keystyle fat;key Original "Noised Original" "3&meanSmooth^:10" "5&meanSmooth^:10" "9&meanSmooth^:10";keypos rti' plot (1 o. i:5j99),pp,(3 meanSmooth^:10] pp),(5 meanSmooth^:10]pp),:9 meanSmooth^:10]pp
An Example Use
Say we have some timing data on runs of a model on different input files arranged in order of increasing file size. We see from the graph that lines/second increases for larger files though it appears to be asymptotic. If we have repeated timings on the same set of files so we can average out noise, we’d like to identify those that are persistently faster or slower than they ought to be.
The trick here lies in determining what the speed ought to be since the expected value varies according to the number of lines per file as seen in the graph below. It would be useful to have some sort of local average of speeds to provide a basis for comparison, i.e. we need a smoother line than the one based on the actual measurements.
Once we have this smoothed line, we can identify the most significant outliers above or below this line to tell us which instances are anomalously slow or fast.
Here’s how we might do it for a vector of averaged lines/second called “base”.
sm1=. 5&meanSmooth"1^:20 base NB. Smoothed basis to identify outliers.
ord=. (0 1)-.~\:|difs=. base-sm1 NB. Don’t consider first two points.
'loo hio'=. (10{.ord#~_1=*ord{difs);10{.ord#~1=*ord{difsWe want to consider low outliers (faster) separately from high (slower) ones. Let’s plot all the results together to ensure it looks like we expect it to. We’re also excluding the first two points since they tend to be outliers only because of the initial steep slope of the curve.
showOutliers=: 3 : 0
'base sm0 sm1 loo hio'=. y
pd 'title Two Smoothers and Some Outliers;pensize 2;keystyle fat'
pd ';key Base "3 Smoother^:10" "5 Smoother^:20"'
pd base,sm0,:sm1 [ pd 'type line'
pd 'type point;pensize 4'
pd loo j. loo{base [ pd hio j. hio{base
pd 'show'
)
sm0=. 3&meanSmooth"1^:10 base
showOutliers base;sm0;sm1;hio;loo
Note that this graph has been post-processed to add axis labels, include the legend entries for the outliers, and to label the X-axis with the actual number of lines per file.
