--- In quizbowl_at_y..., "davidlevinson" <levin031_at_t...> wrote:
> If we assign "B" to be the baseline measure, and then look at
> ratios, we could develop something a little bit more stable
> e.g.
>
> LOG (GoogleCount(B)/GoogleCount(X))
>
> where X is the word in question
> and B is the baseline word or wordphrase (and preferably B >>
> any X we are likely to test).
>
> B should be large (it need not be "the"), but should be
> something common and unlikely to change relative position (e.g.
> "George Washington" ) Hits = 1,040,000
> or
> "William Shakespeare" Hits = 406,000
> but not
> "quiz bowl" 22,800 (not counting "quizbowl")
>
>
> I am open to what the Baseline word should be
How about that greatest of answers for when you don't have a good
guess--"Smith" [~1.9 x 10^7 hits]?
--AEI