Given a list of words, find the top m most frequent words and the corresponding frequencies.
Solution
The dyad x u/.y key is useful for such problems. It applies u to items of y that have the same keys as indicated by items of x . For example:
# /.~y NB. the word frequencies correponding to ~.y
{./.~y NB. the unique words, i.e. ~.y
({. , <@#)/.~ y NB. the unique words and the corresponding frequenciesFor the actual problem, we will use y (#,{.)/. i.#y , which gives a 2-column table of the frequencies and indices.
wordfreq=: 4 : 0
'c i'=. |: x. {. \:~ y (#,{.)/. i.#y
(i{y) ,. <"0 c
)For example:
sample=: 3 : 0
a=. 'abcdefghijklmnopqrstuvwxyz'
c=. 3 5 7 9
n=. 10^>.-:c
x=. ; <"1&.> (>.1e4%n)#&.> (n,&.>c) (a {~ ?@$)&.> #a
x {~ y ?@$ #x
)
x=: sample 1e6
$ x
1000000
8 {. x
┌───────┬─────────┬─────────┬─────────┬─────────┬─────────┬───┬─────┐
│wghgnkv│xaubfuowg│vlqwuvaji│viajpaaih│qcbamjdfh│dftavyazm│sjj│qjtws│
└───────┴─────────┴─────────┴─────────┴─────────┴─────────┴───┴─────┘
10 wordfreq x
┌───┬───┐
│sfn│832│
├───┼───┤
│bgp│819│
├───┼───┤
│yhg│818│
├───┼───┤
│abd│815│
├───┼───┤
│ctz│814│
├───┼───┤
│wkt│813│
├───┼───┤
│eim│810│
├───┼───┤
│ovd│808│
├───┼───┤
│rix│807│
├───┼───┤
│yrc│806│
└───┴───┘
Contributed by RogerHui.
