Page 1 of 1

The problem with tastings

Posted: 13:59 Fri 27 Mar 2026
by akzy
Harken back if you will to TPF award ceremony 2022. .

An excellent evening I'm sure you'll all agree. What keeps me up at night however is that it would have only taken 1 swapped vote for the N70 to beat the objectively better F63. What would happen if our group of sane Port-ers was to be infiltrated by an extremist of The Sherry Forum intent on bringing us down? Well the consequences don't even bear thinking about- the N70 could have won! In this universe it didn't...

The solution of course it see what would have happened in other universes. So naturally, I went back to Monte Carlo in my finest tuxedo and started modelling us.

I created two scenarios of Port tastings which had 20 wines
1. All the port is objectively the same
2. All the port is objectively different

Secondly, I created several scoring systems
(i) - The sum (equivalent to the mean)
(ii) - Olympic sum (remove top and bottom scores)
(iii) - Median score.
(iv) - Filtered Median - For this, the wine had to be scored by at least sqrt(voters) to be counted
(v) - Quadrature scoring (square root sum)
(vi) - Geometric mean (square root product)

Thirdly, I made 13 stereotypical port tasters, randomly generated. For each taster it is first decided how distribute scores (i.e. simple 3,2,1, or 3,2,1,0.5,0.5, etc. for all permutations). Once that distribution is decided, they take a random distribution (σ=1) of the objective score (1-10) and score appropriately.

And then for the fourteenth taster. 3 scenarios here
a) They are a regular one of us
b) They are a highly skewed one of us (σ=5)
c) They are a member of TSF (a malicious voter who voters backwards). When the wines are objectively similar, they obviously struggle.

Finally, I made a measly million universes of Port tastings for each of the 6 scenarios to see what happened.
Accuracy is a percentage of universes that the WoTN was correctly identified.
CV is a normalised standard distribution - lower is better.

Table 1: Uniform Quality / Lowest CV is Best

Code: Select all

SCENARIO             |  SUM   | OLYMPIC | MEDIAN | FILTER | QUADRAT | GEOMTR
--------------------------------------------------------------------------
All Equal (CV)       | 0.5454 | 0.6533  | 0.5383 | 0.5383 | 0.4749  | 0.3565
One Skewed (CV)      | 0.5454 | 0.6532  | 0.5387 | 0.5387 | 0.4749  | 0.3560
One Malicious (CV)   | 0.5398 | 0.6479  | 0.5371 | 0.5371 | 0.4714  | 0.3541
--------------------------------------------------------------------------
MOST PRECISE METHOD  |        |         |        |        |         | WINNER
--------------------------------------------------------------------------
Table 2: Objective Ranking / High % & Low CV Best [\b]

Code: Select all

SCENARIO (Acc / CV)  |  SUM    | OLYMPIC | MEDIAN  | FILTER  | QUADRAT | GEOMTR
--------------------------------------------------------------------------
All Equal            | 79%/0.36| 79%/0.36| 69%/0.28| 69%/0.28| 81%/0.26| 75%/0.28
One Skewed           | 77%/0.48| 77%/0.38| 66%/0.32| 66%/0.32| 78%/0.29| 56%/0.42
One Malicious        | 78%/0.42| 78%/0.35| 65%/0.32| 65%/0.32| 79%/0.27| 20%/0.35
--------------------------------------------------------------------------
BEST OVERALL METHOD  |         |         |         |         |  WINNER |
--------------------------------------------------------------------------

Summary:
  • Quadrature Scoring is the overall most accurate method (81% peak) and the most resilient to intentional sabotage. It also maintains the lowest noise (CV) when a truth is present.
  • Geometric Mean is highly precise for consensus among honest judges, but it is dangerously vulnerable to malicious voting (Accuracy drops from 75% to 20%).
  • Olympic Scoring is robust against noise but consistently less accurate than the Quadrature method in identifying the true #1 wine.
I suggest a referendum - what do you people believe is best? Has anyone truly understood the threat from outsiders?

Re: The problem with tastings

Posted: 18:33 Fri 27 Mar 2026
by jdaw1
You have decided that TSF (boo hiss!) might send a malicious voter, but you seem to have decided that it would be a stupid malicious voter. Merely reversing the order is a light-weight thing to do. Such an infiltrator should give nothing to the Port that would otherwise be top, and maximum points to the Port that would otherwise come second.

This requires knowledge of the ‘would otherwise’. Our custom has been to show newbies how things work, by allowing them to vote late. This politeness has introduced a vulnerability not understood until this work of brave Comrade Zak. Henceforth we must assume that newbies are hostile, and so for our own safety we should require them to vote early.

Meanwhile, those interested in electoral systems might enjoy some papers located at the home page of the internet, perhaps starting with Various conundrums relating to electoral systems.



Please allow an alternative model.
• Wines have a truth, distributed Normal[0, 1].
• Attendees — who out of politeness should be numbered rather than named — each have a reciprocal-competence, σ, each person’s σ being drawn equiprobably from {⅛, ¼, ½, ¾, 1, 2, 4}.
• For each wine their observation is the wine’s truth, plus a drawn from Normal[0, σ].
• People order their observations.
• TPFers score 3,2,1 for best, second, third.
• The evil TPS villain scores 0,3,0,2,0,…,0,1.

Re: The problem with tastings

Posted: 20:41 Sun 29 Mar 2026
by akzy
jdaw1 wrote: 18:33 Fri 27 Mar 2026 but you seem to have decided that it would be a stupid malicious voter.
What part of TSF didn't you understand?
jdaw1 wrote: 18:33 Fri 27 Mar 2026 Henceforth we must assume that newbies are hostile, and so for our own safety we should require them to vote early.
To newbies, I am so very sorry but it's for the best.
Glad to see I'm not the only one worried. I thoroughly enjoyed the essay.

The new simulation has completed for the situation you described. The results remain the same in that quadrature scoring still is the safest. It only very narrowly beats out Olympic and our current scoring for accuracy but the standard deviation is significantly lower.

Interestingly, the current scoring and olympic have the highest accuracy but large stddev. The filtered median method produces a reasonable mix of accuracy with stddev.

Code: Select all

 
    | Method         | Accuracy (%) | Avg CV (σ/μ) |
     +----------------+--------------+--------------+
    | Sum            | 74.49%       | 1.0270       |
    | Olympic        | 74.49%       | 0.9522       |
    | Median         | 71.37%       | 0.5201       |
    | FilteredMedian | 71.37%       | 0.5201       |
    | Quadrature     | 70.87%       | 0.9354       |
    | GeometricMean  | 22.00%       | 0.3413       |

Re: The problem with tastings

Posted: 21:43 Sun 29 Mar 2026
by jdaw1
Financial markets have much used (links to be posted when sober) a trimmed mean, ignoring the upper and lower quartiles. Is there a workable version of this useable by us?

Re: The problem with tastings

Posted: 09:34 Mon 30 Mar 2026
by PhilW
Interesting analysis; so much to respond to...
Just a couple for now.

Before analysis of the more subtle attack described by Julian, the Quadrature method looked the most effective for defence against simple attacker, but the effort of calculation and hence risk of error might outweigh the risk of TSF (though perhaps not of "vote for my own wine" bias!).

Re: Olympic scoring; when you remove top and bottom, just checking that you're including the zeroes (not just low-points) since the attacker may vote zero on a wine which deserves points, etc (i.e. the zero is the outlier).

Whatever method it used should also be effective for smaller tastings; arguably perhaps it's even more important if TSF-attacker joins an 8-person tasting compared with 14-person.

An alternative suggestion which could also assist with removal of the attacker's effect (even Julian's more subtle attacker in some cases) of outlier high/low scoring, while still being viable for pencil and paper at an 18 port tasting:
- Score as normal (but don't total)
- Remove top/bottom (including zeroes) for each wine
- All wines with 0 total are given Nth place (e.g. tasting of five wines, two have zero, so are both 4th=)
[ keep repeating last two steps]
- Remove top/bottom of remaining scored (including zeros) for each wine
- All wines with 0 total are given Nth place (e.g. one of those remaining now has zero, is 3rd)
etc.
The above would remove the need for any calculation to determine order, though would as a consequence not show the degree by which wines were favoured, only an ordering. Variation _could_ be that once only three wines remain (or num_wines/2, or other threshold), they are assigned places based on totals at that time.

Re: The problem with tastings

Posted: 21:26 Mon 30 Mar 2026
by Alex Bridgeman
And I thought the problem with tastings was that we ran out of Port too early in the evening.

Re: The problem with tastings

Posted: 21:52 Mon 30 Mar 2026
by jdaw1
Phil: your recent proposal doubtless has many fine technical merits. Pray tell, could a tasting organiser, and hypothetically a tasting organiser who has had his share of the Port being tasted, be confident of being able to effect it accurately?

Re: The problem with tastings

Posted: 08:28 Tue 31 Mar 2026
by PhilW
jdaw1 wrote: 21:52 Mon 30 Mar 2026 Phil: your recent proposal doubtless has many fine technical merits. Pray tell, could a tasting organiser, and hypothetically a tasting organiser who has had his share of the Port being tasted, be confident of being able to effect it accurately?
Hah! In a thread where the square root sum is potentially being proposed, and you're picking me up on "crossing things out correctly might be too difficult at the end of the evening" :) (n.b. you may not be wrong)

Re: The problem with tastings

Posted: 18:12 Tue 31 Mar 2026
by Mike J. W.
Alex Bridgeman wrote: 21:26 Mon 30 Mar 2026 And I thought the problem with tastings was that we ran out of Port too early in the evening.
Yes, with the 1970 Horizontal held this past October being the perfect example.