Now that I'm finished with finals and papers, I can speak some final thoughts on this matter: > I think a good metric here is required, and with some work (like > controlling for the size of the google database), is possible. > I don't know the size of google's database, but the word "the" > shows up "2,560,000,000" times, so that is a lower bound. using "the" has been working quite well. I've yet to see a quizbowl question asking for a definite article. > If we assign "B" to be the baseline measure, and then look at > ratios, we could develop something a little bit more stable > e.g. > > LOG (GoogleCount(B)/GoogleCount(X)) > > where X is the word in question > and B is the baseline word or wordphrase (and preferably B >> > any X we are likely to test). Thought I'd test this / match scales. A few examples: For a question about Tales of Hoffman whose giveaway is Jacques Offenbach: Levinson-Castrioti difficulty (10-log2,020) = 6.69 Proposed scale (Levinson Method?) (log(2,560,000,000/2020)) = 6.10 For a question about Thomas Hardy's Return of the Native whose giveaway is Clym Yeobright: Levinson-Castrioti (or -Mathews) difficulty (10-log570)= 7.24 Proposed scale log(2,560,000,000/570) = 6.65 For a question about Eleanor of Aquitaine whose giveaway is "wife of Henry II of England:" Levinson-Castrioti difficulty (10-log9800)= 6.00 Proposed scale log(2,560,000,000/9800) = 5.41 For a question about the Raman Effect whose giveaway is "named for an Indian physicist:" Levinson-Castrioti difficulty (10-log1030)= 6.99 Proposed scale log(2,560,000,000/1030) = 6.40 For a question about Mjollnir whose giveaway is "hammer of Thor:" Levinson-Castrioti difficulty (10-log1030)= 6.89 Proposed scale log(2,560,000,000/1030) = 6.30 Levinson-Castrioti difficulty for "the:" 0.60 Proposed scale difficulty for "the:" 0.00 Will have to try this maybe a year later to see how much "difficulty drift" is occurring for the first scale, but I agree that the proposed scale will eliminate negative ranks should they occur in the future and will likely be much more stable as the database for google expands. I'll use Levinson's proposed scale for now. --Wesley
This archive was generated by hypermail 2.4.0: Sat 12 Feb 2022 12:30:46 AM EST EST