Top hundred words

Zipf's law is a neat, general fact about word frequency distribution. G K Zipf discovered that the frequency of the kth most frequent word is proportional to 1/k (Human Behavior and the Principle of Least Effort, an Introduction to Human Ecology (Reading, MA, Addison-Wesley, 1949), cited in Knuth, The Art of Computer Programming: vol 3, Sorting and Searching (Reading, MA: Addison-Wesley, 1973), 397). The top hundred words in this database adhere to the law quite well.

```                      frequency  cumulative  frequency  alphabet
(per milln) frequency        rank      rank
the                    68351.63   68351.63           1    318525
of                     33008.66  101360.29           2    212425
and                    28651.11  130011.40           3     11331
to                     27599.22  157610.62           4    322312
a                      23160.48  180771.10           5         1
in                     20670.81  201441.91           6    149032
is                     10571.15  212013.06           7    156934
that                   10549.02  222562.08           8    318470
was                     9939.26  232501.34           9    356587
it                      9882.90  242384.23          10    157771
for                     9309.44  251693.67          11    114281
on                      7636.66  259330.33          12    213645
with                    7171.07  266501.39          13    361235
he                      7167.84  273669.23          14    134413
be                      7153.17  280822.40          15     27945
I                       7036.88  287859.28          16    146205
by                      5866.89  293726.17          17     44040
as                      5793.35  299519.52          18     19178
at                      5154.12  304673.64          19     20631
you                     5043.27  309716.91          20    364651
are                     5000.14  314717.05          21     17618
his                     4963.47  319680.52          22    139433
not                     4899.77  329502.56          24    209444
this                    4789.41  334291.97          25    319827
have                    4685.82  338977.79          26    134106
from                    4625.21  343603.01          27    117354
but                     4616.26  348219.26          28     43732
which                   4131.11  352350.37          29    358956
she                     3991.77  356342.14          30    285912
they                    3982.95  360325.09          31    319435
or                      3975.58  364300.67          32    214838
an                      3836.07  368136.73          33     10593
her                     3692.13  371828.86          34    137067
were                    3482.45  375311.31          35    358233
there                   3025.87  378337.18          36    319027
we                      2953.92  381291.10          37    357241
their                   2929.78  384220.88          38    318680
been                    2924.28  387145.16          39     28958
has                     2873.74  390018.90          40    133676
will                    2775.94  392794.84          41    360225
one                     2764.69  395559.53          42    213720
all                     2630.80  398190.33          43      7706
would                   2617.11  400807.44          44    362548
can                     2355.35  403162.80          45     46162
if                      2247.43  405410.22          46    147000
who                     2226.26  407636.48          47    359548
more                    2195.16  409831.64          48    196881
when                    2193.48  412025.12          49    358850
said                    2149.41  414174.53          50    274265
do                      2139.12  416313.65          51     88648
what                    2053.98  418367.63          52    358673
its                     1888.51  422163.66          54    157935
so                      1844.57  424008.24          55    293328
up                      1816.81  425825.05          56    347711
into                    1803.28  427628.33          57    155127
no                      1789.08  429417.41          58    205310
him                     1787.13  431204.53          59    138999
some                    1783.31  432987.85          60    294419
could                   1753.24  434741.08          61     68666
them                    1668.31  436409.39          62    318729
only                    1646.85  438056.24          63    213824
time                    1609.99  439666.22          64    321515
out                     1547.86  441214.09          65    217118
my                      1526.21  442740.30          66    200056
two                     1514.46  444254.76          67    330909
other                   1513.23  445767.98          68    216850
then                    1475.27  447243.25          69    318748
may                     1455.47  448698.73          70    184593
over                    1443.56  450142.28          71    218315
also                    1409.47  451551.75          72      8585
new                     1404.41  452956.16          73    204064
like                    1366.44  454322.60          74    173657
these                   1328.58  455651.18          75    319382
me                      1316.41  456967.59          76    185895
after                   1302.93  458270.52          77      4998
first                   1287.14  459557.66          78    111382
did                     1283.43  462126.98          80     84058
now                     1281.59  463408.56          81    209859
any                     1279.86  464688.42          82     15074
people                  1215.83  465904.26          83    229078
than                    1203.22  467107.47          84    318396
should                  1172.27  468279.75          85    287398
very                    1159.18  469438.93          86    352460
most                    1112.14  470551.07          87    197488
see                     1097.46  471648.52          88    281471
where                   1096.15  472744.67          89    358869
just                    1060.74  473805.41          90    160985