Differences between revisions 5 and 6
 ⇤ ← Revision 5 as of 2006-06-01 05:16:56 → Size: 1997 Editor: RogerHui Comment: x. y. etc. ← Revision 6 as of 2008-12-08 10:45:29 → ⇥ Size: 1997 Editor: anonymous Comment: converted to 1.6 markup Deletions are marked like this. Additions are marked like this. Line 70: Line 70: [[BR]] <
>

Given a list of words, find the top m most frequent words and the corresponding frequencies.

Solution

The dyad x u/.y key is useful for such problems. It applies u to items of y that have the same keys as indicated by items of x . For example:

```# /.~y             NB. the word frequencies correponding to ~.y
{./.~y             NB. the unique words, i.e. ~.y
({. , <@#)/.~ y    NB. the unique words and the corresponding frequencies```

For the actual problem, we will use y (#,{.)/. i.#y , which gives a 2-column table of the frequencies and indices.

```wordfreq=: 4 : 0
'c i'=. |: x. {. \:~ y (#,{.)/. i.#y
(i{y) ,. <"0 c
)```

For example:

```sample=: 3 : 0
a=. 'abcdefghijklmnopqrstuvwxyz'
c=. 3 5 7 9
n=. 10^>.-:c
x=. ; <"1&.> (>.1e4%n)#&.> (n,&.>c) (a {~ ?@\$)&.> #a
x {~ y ?@\$ #x
)

x=: sample 1e6
\$ x
1000000
8 {. x
┌───────┬─────────┬─────────┬─────────┬─────────┬─────────┬───┬─────┐
│wghgnkv│xaubfuowg│vlqwuvaji│viajpaaih│qcbamjdfh│dftavyazm│sjj│qjtws│
└───────┴─────────┴─────────┴─────────┴─────────┴─────────┴───┴─────┘

10 wordfreq x
┌───┬───┐
│sfn│832│
├───┼───┤
│bgp│819│
├───┼───┤
│yhg│818│
├───┼───┤
│abd│815│
├───┼───┤
│ctz│814│
├───┼───┤
│wkt│813│
├───┼───┤
│eim│810│
├───┼───┤
│ovd│808│
├───┼───┤
│rix│807│
├───┼───┤
│yrc│806│
└───┴───┘```

Contributed by RogerHui.

Puzzles/Word Frequencies (last edited 2008-12-08 10:45:29 by anonymous)