An excellent evening I'm sure you'll all agree. What keeps me up at night however is that it would have only taken 1 swapped vote for the N70 to beat the objectively better F63. What would happen if our group of sane Port-ers was to be infiltrated by an extremist of The Sherry Forum intent on bringing us down? Well the consequences don't even bear thinking about- the N70 could have won! In this universe it didn't...
The solution of course it see what would have happened in other universes. So naturally, I went back to Monte Carlo in my finest tuxedo and started modelling us.
I created two scenarios of Port tastings which had 20 wines
1. All the port is objectively the same
2. All the port is objectively different
Secondly, I created several scoring systems
(i) - The sum (equivalent to the mean)
(ii) - Olympic sum (remove top and bottom scores)
(iii) - Median score.
(iv) - Filtered Median - For this, the wine had to be scored by at least sqrt(voters) to be counted
(v) - Quadrature scoring (square root sum)
(vi) - Geometric mean (square root product)
Thirdly, I made 13 stereotypical port tasters, randomly generated. For each taster it is first decided how distribute scores (i.e. simple 3,2,1, or 3,2,1,0.5,0.5, etc. for all permutations). Once that distribution is decided, they take a random distribution (σ=1) of the objective score (1-10) and score appropriately.
And then for the fourteenth taster. 3 scenarios here
a) They are a regular one of us
b) They are a highly skewed one of us (σ=5)
c) They are a member of TSF (a malicious voter who voters backwards). When the wines are objectively similar, they obviously struggle.
Finally, I made a measly million universes of Port tastings for each of the 6 scenarios to see what happened.
Accuracy is a percentage of universes that the WoTN was correctly identified.
CV is a normalised standard distribution - lower is better.
Table 1: Uniform Quality / Lowest CV is Best
Code: Select all
SCENARIO | SUM | OLYMPIC | MEDIAN | FILTER | QUADRAT | GEOMTR
--------------------------------------------------------------------------
All Equal (CV) | 0.5454 | 0.6533 | 0.5383 | 0.5383 | 0.4749 | 0.3565
One Skewed (CV) | 0.5454 | 0.6532 | 0.5387 | 0.5387 | 0.4749 | 0.3560
One Malicious (CV) | 0.5398 | 0.6479 | 0.5371 | 0.5371 | 0.4714 | 0.3541
--------------------------------------------------------------------------
MOST PRECISE METHOD | | | | | | WINNER
--------------------------------------------------------------------------
Code: Select all
SCENARIO (Acc / CV) | SUM | OLYMPIC | MEDIAN | FILTER | QUADRAT | GEOMTR
--------------------------------------------------------------------------
All Equal | 79%/0.36| 79%/0.36| 69%/0.28| 69%/0.28| 81%/0.26| 75%/0.28
One Skewed | 77%/0.48| 77%/0.38| 66%/0.32| 66%/0.32| 78%/0.29| 56%/0.42
One Malicious | 78%/0.42| 78%/0.35| 65%/0.32| 65%/0.32| 79%/0.27| 20%/0.35
--------------------------------------------------------------------------
BEST OVERALL METHOD | | | | | WINNER |
--------------------------------------------------------------------------
Summary:
- Quadrature Scoring is the overall most accurate method (81% peak) and the most resilient to intentional sabotage. It also maintains the lowest noise (CV) when a truth is present.
- Geometric Mean is highly precise for consensus among honest judges, but it is dangerously vulnerable to malicious voting (Accuracy drops from 75% to 20%).
- Olympic Scoring is robust against noise but consistently less accurate than the Quadrature method in identifying the true #1 wine.