Difference between revisions of "D-Value"
Matt Weiner (talk | contribs) |
m (more capitalization style) |
||
(6 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | The D- | + | The '''D-Value''' is a statistic that estimates the number of points that a team would expect to score against an "average" quizbowl team. It is useful in comparing the performance of all teams that play a given [[packet]] set, in situations where multiple tournaments use the same packet set, thus making it impossible to directly compare the performance of two different teams. |
− | Starting in 2010, the D- | + | Starting in 2010, the D-Value replaced the [[S-Value]] as the statistic determining wild card bids to [[NAQT]] [[CCCT]] and [[ICT]]. |
== Official Formula == | == Official Formula == | ||
− | D- | + | D-Value = 20 x (Adjusted TPPTH<ref>[[Tossup]] points per tossup heard</ref> + Adjusted BHPTH<ref>[[Bonus]]es heard per tossup heard</ref> x PPB<ref>[[Points per bonus]]</ref>) |
+ | <references /> | ||
== Meaning of Component Statistics == | == Meaning of Component Statistics == | ||
Line 20: | Line 21: | ||
== NAQT Additional Modifications == | == NAQT Additional Modifications == | ||
− | There are two additional modifications done by NAQT to use the D- | + | There are two additional modifications done by NAQT to use the D-Value as a measure of team strength on NAQT questions. |
The first modification uses "difficulty correction factors" (DCs) that account for teams playing in combined (Division I and Division II) fields on the wrong packet set. This ensures that, for instance, a Division II team that is forced to play the Division I SCT set (because not enough Division II teams signed up for the tournament) is not penalized for playing a tougher set. While the DCs were derived arbitrarily, there is a small amount of evidence that they are at least in the right ballpark. | The first modification uses "difficulty correction factors" (DCs) that account for teams playing in combined (Division I and Division II) fields on the wrong packet set. This ensures that, for instance, a Division II team that is forced to play the Division I SCT set (because not enough Division II teams signed up for the tournament) is not penalized for playing a tougher set. While the DCs were derived arbitrarily, there is a small amount of evidence that they are at least in the right ballpark. | ||
− | The second modification uses an "order of finish correction" that accounts for the contingency that a statistically better team finishes behind a statistically worse team at a given SCT. In practice, it can be kind of tricky to determine exactly which teams to include. A good general rule of thumb is that if the average D- | + | The second modification uses an "order of finish correction" that accounts for the contingency that a statistically better team finishes behind a statistically worse team at a given SCT. In practice, it can be kind of tricky to determine exactly which teams to include. A good general rule of thumb is that if the average D-Value of a set of teams is increased by including the next-lowest-finishing team, then the next-lowest-finishing team is included in the set and the new average D-Value is computed. |
These modifications allow NAQT to reward teams for finishing higher at their tournament, and to not penalize (too much) teams that play an inappropriate packet set for their division. | These modifications allow NAQT to reward teams for finishing higher at their tournament, and to not penalize (too much) teams that play an inappropriate packet set for their division. | ||
Line 30: | Line 31: | ||
== Criticism of D-Values == | == Criticism of D-Values == | ||
− | The D- | + | The D-Value has been criticized as an example of [[mathturbation]], given the insufficiency and inaccuracy of the data sets used to compute it. However, since this criticism would apply to any reasonably simple, transparent, and unbiased ranking of teams given current data collection limitations, and since NAQT needs a reasonably simple, transparent, and unbiased ranking of teams to determine ICT wild card bids, the D-Value is seen as a relatively benign example. |
− | The D- | + | The D-Value has also been criticized for its ability to overestimate the ability of teams in exceptionally strong fields and underestimate the ability of teams in exceptionally weak fields. More complex strength-of-schedule adjustments could be used to diminish the influence of fields that strongly deviate from average strength. |
− | Based on where teams have been placed relative to their strength at other tournaments, the D- | + | Based on where teams have been placed relative to their strength at other tournaments, the D-Value clearly overrates the performance of DI teams playing on DII questions against mostly DII opponents in combined fields. Solving the issue of some teams playing what is essentially an entirely different tournament is one of the most difficult problems in the D-Value calculation. There is a fixed numerical de-multiplier in the current formula that assumes all teams experience a fixed and equal benefit from this scenario. Changing the SCT rules so that combined fields do not occur may be an easier and more logical solution to this problem than looking for a magic number that probably does not exist. |
− | One of the chief fallacies in discussions of the D- | + | One of the chief fallacies in discussions of the D-Value is that it "mostly" predicts ICT order of finish correctly, e.g. https://hsquizbowl.org/forums/viewtopic.php?p=340884#p340884 asserts that "about 75% of a team's ICT performance is predicted by the team's D value." This ignores the fact that anyone with a basic familiarity regarding a given year's college quizbowl teams can get a prediction of the ICT standings "about 75% right" and the only purpose of the D-Value is to correctly sort out which teams receive the 25th through 36th ICT invitations vs. which teams are 37th through the mid-40s. Whether the D-Value does any better than proposed alternative formulas or a human-based [[quizbowl death panel|ranking panel]] at sorting out the bubble teams is both a matter of ongoing dispute and the only real question that should be asked about the D-Value's reliability. |
+ | |||
+ | == See also== | ||
+ | |||
+ | [[A-value]] | ||
== External Links == | == External Links == | ||
− | [http://www.naqt.com/college/d-values.html NAQT's Explanation of D- | + | [http://www.naqt.com/college/d-values.html NAQT's Explanation of D-Values] |
[[Category: Statistics]] | [[Category: Statistics]] | ||
[[Category: NAQT]] | [[Category: NAQT]] |
Latest revision as of 08:03, 2 February 2023
The D-Value is a statistic that estimates the number of points that a team would expect to score against an "average" quizbowl team. It is useful in comparing the performance of all teams that play a given packet set, in situations where multiple tournaments use the same packet set, thus making it impossible to directly compare the performance of two different teams.
Starting in 2010, the D-Value replaced the S-Value as the statistic determining wild card bids to NAQT CCCT and ICT.
Official Formula
D-Value = 20 x (Adjusted TPPTH[1] + Adjusted BHPTH[2] x PPB[3])
Meaning of Component Statistics
The tossup points per tossup heard, or TPPTH, is computed by dividing the total number of tossup points a team scored by the number of tossups it heard. TPPTH is adjusted by multiplying by a strength-of-schedule factor that measures how difficult it was to score tossup points against a given team's slate of opponents, relative to all teams that played the set.
The bonuses heard per tossup heard, or BHPTH, is computed by counting the number of tossups a team answered correctly, and dividing by the number of tossups it heard. Most properly, all overtime tossups would be removed before computing BHPTH, but for ease of data collection this is not usually done. Like TPPTH, BHPTH is adjusted by multiplying the same strength-of-schedule factor.
The points per bonus, or PPB, is just the team's bonus conversion.
The resulting computation, TPPTH + (BHPTH x PPB), gives the expected number of points a team would score on a random tossup against a statistically average tossup-converting team. This number is converted to a more intuitive statistic, the number of points scored in an average game, by multiplying by 20 (since there are typically 20 tossups in an untimed match, and statistics for timed matches are typically normalized to "per 20 tossups heard").
NAQT Additional Modifications
There are two additional modifications done by NAQT to use the D-Value as a measure of team strength on NAQT questions.
The first modification uses "difficulty correction factors" (DCs) that account for teams playing in combined (Division I and Division II) fields on the wrong packet set. This ensures that, for instance, a Division II team that is forced to play the Division I SCT set (because not enough Division II teams signed up for the tournament) is not penalized for playing a tougher set. While the DCs were derived arbitrarily, there is a small amount of evidence that they are at least in the right ballpark.
The second modification uses an "order of finish correction" that accounts for the contingency that a statistically better team finishes behind a statistically worse team at a given SCT. In practice, it can be kind of tricky to determine exactly which teams to include. A good general rule of thumb is that if the average D-Value of a set of teams is increased by including the next-lowest-finishing team, then the next-lowest-finishing team is included in the set and the new average D-Value is computed.
These modifications allow NAQT to reward teams for finishing higher at their tournament, and to not penalize (too much) teams that play an inappropriate packet set for their division.
Criticism of D-Values
The D-Value has been criticized as an example of mathturbation, given the insufficiency and inaccuracy of the data sets used to compute it. However, since this criticism would apply to any reasonably simple, transparent, and unbiased ranking of teams given current data collection limitations, and since NAQT needs a reasonably simple, transparent, and unbiased ranking of teams to determine ICT wild card bids, the D-Value is seen as a relatively benign example.
The D-Value has also been criticized for its ability to overestimate the ability of teams in exceptionally strong fields and underestimate the ability of teams in exceptionally weak fields. More complex strength-of-schedule adjustments could be used to diminish the influence of fields that strongly deviate from average strength.
Based on where teams have been placed relative to their strength at other tournaments, the D-Value clearly overrates the performance of DI teams playing on DII questions against mostly DII opponents in combined fields. Solving the issue of some teams playing what is essentially an entirely different tournament is one of the most difficult problems in the D-Value calculation. There is a fixed numerical de-multiplier in the current formula that assumes all teams experience a fixed and equal benefit from this scenario. Changing the SCT rules so that combined fields do not occur may be an easier and more logical solution to this problem than looking for a magic number that probably does not exist.
One of the chief fallacies in discussions of the D-Value is that it "mostly" predicts ICT order of finish correctly, e.g. https://hsquizbowl.org/forums/viewtopic.php?p=340884#p340884 asserts that "about 75% of a team's ICT performance is predicted by the team's D value." This ignores the fact that anyone with a basic familiarity regarding a given year's college quizbowl teams can get a prediction of the ICT standings "about 75% right" and the only purpose of the D-Value is to correctly sort out which teams receive the 25th through 36th ICT invitations vs. which teams are 37th through the mid-40s. Whether the D-Value does any better than proposed alternative formulas or a human-based ranking panel at sorting out the bubble teams is both a matter of ongoing dispute and the only real question that should be asked about the D-Value's reliability.