Skip to main content
Simple Random Candidate Selection
Simple Random Candidate Selection
draft-hoffman-random-candidate-selection-00
This document is an Internet-Draft (I-D).
Anyone may submit an I-D to the IETF.
This I-D is not endorsed by the IETF and has no formal standing in the
IETF standards process.
The information below is for an old version of the document.
| Document | Type |
This is an older version of an Internet-Draft whose latest revision state is "Expired".
|
|
|---|---|---|---|
| Author | Paul E. Hoffman | ||
| Last updated | 2023-06-23 | ||
| RFC stream | (None) | ||
| Formats | |||
| Stream | Stream state | (No stream defined) | |
| Consensus boilerplate | Unknown | ||
| RFC Editor Note | (None) | ||
| IESG | IESG state | I-D Exists | |
| Telechat date | (None) | ||
| Responsible AD | (None) | ||
| Send notices to | (None) |
draft-hoffman-random-candidate-selection-00
Network Working Group P. Hoffman
Internet-Draft ICANN
Intended status: Informational 23 June 2023
Expires: 25 December 2023
Simple Random Candidate Selection
draft-hoffman-random-candidate-selection-00
Abstract
This document describes a process to randomly select a subset of
named candidates from a larger set of candidates. The process uses
an unpredictable value that can be trusted by all candidates.
This draft has a GitHub repository (https://github.com/paulehoffman/
draft-hoffman-random-candidate-selection). Issues and pull requests
can be made there.
Status of This Memo
This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
This Internet-Draft will expire on 25 December 2023.
Copyright Notice
Copyright (c) 2023 IETF Trust and the persons identified as the
document authors. All rights reserved.
Hoffman Expires 25 December 2023 [Page 1]
Internet-Draft Candidate Selection June 2023
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Overview of the Process . . . . . . . . . . . . . . . . . . . 3
2.1. Basic Steps . . . . . . . . . . . . . . . . . . . . . . . 3
3. Specifics for the Process . . . . . . . . . . . . . . . . . . 4
3.1. Start of Ceremony . . . . . . . . . . . . . . . . . . . . 4
3.1.1. Use of the FTSE 100 Index . . . . . . . . . . . . . . 5
3.1.2. Other Public Sources of Randomness . . . . . . . . . 5
3.2. Name Submission . . . . . . . . . . . . . . . . . . . . . 6
3.3. Closing Submissions . . . . . . . . . . . . . . . . . . . 6
3.4. Selecting _D_ . . . . . . . . . . . . . . . . . . . . . . 6
3.5. Calculating Hashes . . . . . . . . . . . . . . . . . . . 6
3.6. Selecting _S_ Candidates . . . . . . . . . . . . . . . . 7
4. Handling Ceremony Process Issues . . . . . . . . . . . . . . 7
5. Performing a Weighted Selection . . . . . . . . . . . . . . . 7
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8
8.1. Normative References . . . . . . . . . . . . . . . . . . 8
8.2. Informative References . . . . . . . . . . . . . . . . . 8
Appendix A. Sample Code . . . . . . . . . . . . . . . . . . . . 9
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
It is common to need to pick a subset of people from a larger group
using a random selection method. This is often done on an ad hoc
basis, but for some selections, a more formal process is needed,
particularly if the people in the larger group don't all trust the
administrator of the selection process to be unbiased.
This document gives a simple, understandable process that can be done
for groups and subsets of arbitrary size. The process is purposely
transparent and reproducible. It works with any group of entities
that nave names: people, companies, locations, and so on.
Hoffman Expires 25 December 2023 [Page 2]
Internet-Draft Candidate Selection June 2023
As a simple example, a future leadership committee will have a fixed
size. The members of the committee will be selected from a large
pool of volunteers. Someone is in charge of collecting the names of
the volunteers and making a randomized selection among them for the
leadership committee. They can use the process in this document to
make that selection in a way that is both provably random and
understandable.
Due to the formatting used in this document, the reader is encouraged
to read the HTML version, although the text version is still usable.
See [I-D.thomson-elegy-vrs] for a similar method as described here.
2. Overview of the Process
A few terms are used throughout this document:
ceremony: The act of collecting names into a pool, making a random
selection from the pool, and publishing the entire process in a
clear and transparent method.
ceremony administrator (CA): The person who performs the steps of
the ceremony.
candidate: A person, organization, or other namable entity that is
possibly being selected during the ceremony.
candidate name: The name used by each candidate in the pool. The
candidate name is expressed as a string of Unicode characters in
UTF-8 format [Unicode] [UTF-8].
difficult-to-predict string (_D_): A publicly-visible string that is
only known after the pool of candidates has been closed. (Note
that this is different from what is normally called a "random
number" or a "random string". True random numbers or strings are
designed to be nearly impossible to predict, whereas _D_ in this
process has weak but sufficient randomness.)
selection size (_S_): The number of candidates that will be selected
from the pool.
2.1. Basic Steps
The steps in a ceremony that follows this process is given here. See
Section 3 for more detail on the steps.
1. The CA starts the ceremony by performing the following steps at
the same time:
Hoffman Expires 25 December 2023 [Page 3]
Internet-Draft Candidate Selection June 2023
* Announces an end date for when the pool will be complete.
* Announces a later date on which _D_, the difficult-to-predict
string, will be selected.
* Announces the source where _D_ will be found on that later
date.
* Announces _S_, the number of candidates that will be selected.
* Opens up the pool of candidates for submission.
2. Candidates submit their names to the pool until the closing date,
and the CA puts the allowed names in the pool.
3. On the closing date, the CA publishes the candidate names from
the pool with the hexadecimal value of the UTF-8 encoding for
each candidate name.
4. On the date for selecting _D_, the CA gets _D_ from the announced
source.
5. The CA calculates the hashes used to make the selection. The CA
concatenates each candidate name with _D_ (name first, then _D_),
uses the SHA-256 hash function [SHA-2] on the resulting string,
and records the value of the hash as a UTF-8 string.
6. The CA arranges the set of hash values in alphabetic order from
highest to lowest. They then select the _S_ candidates from the
top of the list (that is, the names whose hash values are
largest).
3. Specifics for the Process
3.1. Start of Ceremony
Much of the trust in the selection process is based on the CA not
being able to influence the selection. If the CA can choose, or even
influence, the value of _D_, they can help establish the outcome of
the selection. Similarly, if one or more of the candidates can
influence the value of _D_, they can increase their chance of being
selected.
To make the process trustworthy, the value of _D_ must be unrelated
to the CA or the candidates, and it must be selected only after the
list of candidates is completed. The most important things for a
ceremony is that the source is announced before the ceremony starts,
that all participants and viewers of a ceremony can find the source
Hoffman Expires 25 December 2023 [Page 4]
Internet-Draft Candidate Selection June 2023
on the date specified by the CA, that all candidates believe that no
candidate can influence _D_ on that date, and that everyone gets the
same value when they go to the source for that date.
3.1.1. Use of the FTSE 100 Index
The process described in this document uses the closing value for the
FTSE 100 Index on the particular day selected by the CA. The FTSE
100 Index is a long-established index based on 100 stocks; it is
sometimes known by its stock ticker as "UKX". A common open source
of those values is the Wall Street Journal. The daily closing for
the FTSE 100 Index at the Wall Street Journal can currently be found
here (https://www.wsj.com/market-data/quotes/index/UK/UKX/historical-
prices).
Note that the location for sources of daily closing values can change
over time. The CA must check that the source they intende to use is
still active, and still available when the ceremony starts.
3.1.2. Other Public Sources of Randomness
Although the procedure in this document uses the FTSE 100 Index as a
public source of randomness, there are many other sources that can be
used by a CA, as long as the source chosen is trusted by the
candidates. There are many other stock indexes with enough stocks in
them to make prediction of the exact value have less than a 0.1%
chance. Having said that, using a future price of a single stock is
probably not a good public source of randomness because candidates
are likely to trust the variability of that less than the variability
of a basket of stocks.
Some systems that use public sources of randomness use the results of
an unrelated lottery, such as the type of lotteries that many
countries hold. These are probably trusted by candidates not be able
to be manipulated. However, lotteries normally are a set of numbers
between 1 and 100, often five or more such numbers. If the CA uses
such a lottery for this procedure, they need to specify how the
numbers from the lottery of the chosen date will be combined,
including whether or not the numbers from 1 to 9 need to be preceded
by a "0" character.
There are other public sources of randomness, such as cameras pointed
at lava lamps and so on. These are probably not good choices for the
type of ceremony described in this document because the operators of
such sources are not publicly trusted entities.
Hoffman Expires 25 December 2023 [Page 5]
Internet-Draft Candidate Selection June 2023
3.2. Name Submission
The CA is the sole arbitrator for whether a candidate is allowed to
enter the pool. The CA is also the sole arbitrator of what name
string (in UTF-8) the candidate can use in the pool.
The order that the candidates join the pool does not affect the
outcome of the selection process. Said another way, the pool is kept
as an unordered set of candidates, not an ordered list of candidates.
It is a good practice for the CA to have consistent rules for the
names, such as only using ASCII space characters (U+0020), only one
space between each name part, no trailing spaces, and so on. These
rules can be more difficult when the candidates are company names
(such as whether the legal standing of the company such as "Inc." is
included), but making consistent rules is not that difficult.
3.3. Closing Submissions
At the closing of submissions, the CA verifies that the length of the
set of candidates in the pool is larger than _S_. If the length is
the same as _S_, the rest of the steps are unneeded (and could be
confusing), because all candidates will automatically be selected.
If the length is shorter than _S_, the ceremony stops because there
are too few candidates.
The method for publishing the set of candidates is determined by the
CA. Figure 3 gives an example of how a CA might publish this
information.
3.4. Selecting _D_
On the day that the CA announced for the selection of _D_, the CA
goes the the source they announced and gets _D_. After the CA
retrieves _D_ from the announced source, they encode _D_ as a UTF-8
string. In the example of the FTSE 100 Index, a closing value for
the day announced at the beginning of the ceremony might be
"7623.10". This would be encoded in UTF-8 as the string of
characters whose value is U+373632372e3130.
3.5. Calculating Hashes
Different programming libraries have different requirements for the
input to hash functions. Appendix A uses the built-in hashlib
library in Python, which requires that text strings have a specified
encoding.
Hoffman Expires 25 December 2023 [Page 6]
Internet-Draft Candidate Selection June 2023
3.6. Selecting _S_ Candidates
The process of selecting is simply taking the _S_ candidates whose
hash value is highest. This can easily be determined by sorting the
text representation of the hash values because in UTF-8 and ASCII,
digits have lower codepoints than letters.
To complete the process in a transparent manner, the CA should
publish all known data for the ceremony. This includes _S_, _D_, the
hexadecimal value of _D_, all of the information for each candidate,
and the full list of selected candidates. Figure 5 shows an example
of what this publication might look like.
4. Handling Ceremony Process Issues
Ceremonies don't always go as planned. For example, after a ceremony
completes, one or more of the selected candidates might be removed
from the selected set due to voluntary withdrawal or established
rules (such as no two candidates being from the same geographic
region). In such cases, no new ceremony is needed: the CA simply
selects the next candidate(s) on the list that is ordered by hash
values.
Similarly, if after the selection process is completed, the size _S_
of the selected set needs to increase, the CA simply selects the next
candidate(s) on the list that is ordered by hash values.
5. Performing a Weighted Selection
In some selections, the CA wants to give candidates a weighted chance
of being selected. For example, a legislature might select its
leadership randomly, but weights the chance of being selected by the
size of the membership of the political party in the legislature.
The CA can create the pool with multiple names for each party, giving
each name a number.
For example, assume a legislature has 27 members of the Orange party,
20 members of the Yellow party, and 7 members of the Green party.
The CA could create a pool consisting of the names "Orange1",
"Orange2", ... "Orange27", "Yellow1", "Yellow2", ... "Yellow20",
"Green1", "Green2", ... "Green7". The selected party would be the
one whose name appears in the first name of the list of hashes.
6. IANA Considerations
This document has no IANA considerations.
Hoffman Expires 25 December 2023 [Page 7]
Internet-Draft Candidate Selection June 2023
7. Security Considerations
The value _D_ used in this process is explicitly not
cryptographically strong; in fact, it might provide only a few bits
of randomness. The FTSE 100 Index might be predictable after the
third digit from the right, but not the last three digits, meaning
that they only have randomness of about 10 bits. The value of _D_ is
concatenated into each candidate string before the whole string is
hashed, so incorrectly predicting even one character of _D_
completely changes the value of the hash for comparison.
Because of this, a candidate who has a lot of leeway in choosing
their name can possibly increase their chance of being selected by as
much as 0.1% with such source of randomness. The procedure in this
document assumes that candidates have very little leeway in choosing
their names; the CA must accept each name before it is put into the
pool. The combination of the limited leeway for choosing the names
in the pool and the necessity to predict _D_ exactly in order to gain
any benefit means that _D_ needs much less randomness that a random
number that would be used during encryption or authentication.
8. References
8.1. Normative References
[SHA-2] Eastlake 3rd, D. and T. Hansen, "US Secure Hash Algorithms
(SHA and SHA-based HMAC and HKDF)", RFC 6234,
DOI 10.17487/RFC6234, May 2011,
<https://www.rfc-editor.org/rfc/rfc6234>.
[Unicode] The Unicode Consortium, "The Unicode Standard (latest
version)", n.d.,
<https://www.unicode.org/versions/latest/>.
[UTF-8] Yergeau, F., "UTF-8, a transformation format of ISO
10646", STD 63, RFC 3629, DOI 10.17487/RFC3629, November
2003, <https://www.rfc-editor.org/rfc/rfc3629>.
8.2. Informative References
[I-D.thomson-elegy-vrs]
Thomson, M., "A Verifiable Random Selection Process", Work
in Progress, Internet-Draft, draft-thomson-elegy-vrs-00,
22 June 2023, <https://datatracker.ietf.org/doc/html/
draft-thomson-elegy-vrs-00>.
Hoffman Expires 25 December 2023 [Page 8]
Internet-Draft Candidate Selection June 2023
Appendix A. Sample Code
The following is a list of figures for an implementation of the
procedure shown in this document.
* The Python script in Figure 1 implements the algorithm from this
document.
* The file that contains the list of names is shown in Figure 2.
(The names are the winners of the Nobel laureates in Literature
for 2016 through 2021.)
* A file showing the UTF-8 representation of the names from Figure 2
is shown in Figure 3. This file is suitable for showing to the
candidates.
* The file that contains the _S_ and _D_ on separate lines is shown
in Figure 4.
* Figure 5 shows the result of running the program with that file as
input.
#!/usr/bin/env python3
# Program to randomly select some candidates from a group
# See draft-hoffman-reandom-candidate-selection
import hashlib, sys
from pathlib import Path
# Helper function to turn a UTF-8 string into its hex representation
def hexify(in_str):
return "".join([hex(c)[2:] for c in in_str.encode("utf8")])
# Santity check the input files given on the command line
if len(sys.argv) == 1:
exit("Must give the name of the candidate file, and possibly " + \
"the selection file, on the command line. Exiting.")
candidate_path = Path(sys.argv[1])
if not candidate_path.exists():
exit(f"The file {str(candidate_path)} doesn't exist. Exiting.")
try:
candidate_f = candidate_path.open(mode="rt", encoding="utf8")
except:
exit("The candidates file doesn't appear to be in UTF-8. Exiting.")
candidate_lines = candidate_f.read().splitlines()
# See if there is a second file for selecting
if len(sys.argv) == 3:
Hoffman Expires 25 December 2023 [Page 9]
Internet-Draft Candidate Selection June 2023
run_including_selection = True
selection_path = Path(sys.argv[2])
if not selection_path.exists():
exit(f"The file {str(selection_path)} doesn't exist. Exiting.")
try:
selection_f = selection_path.open(mode="rt", encoding="utf8")
except:
exit("The selection file doesn't appear to be UTF-8. Exiting.")
selection_lines = selection_f.read().splitlines()
# Extract D and S from the selection file
S_str = selection_lines[0]
try:
S = int(S_str)
except:
print(f"The first line of the selection file, '{S_str}', " + \
"is not an integer. Exiting.")
# D_str is the string for D, D_hex is the hex version for display
D_str = selection_lines[1]
D_hex = hexify(D_str)
else:
run_including_selection = False
# Get the candidates information
C_info = []
for C_str in candidate_lines:
C_hex = hexify(C_str)
if run_including_selection:
C_with_D_str = C_str + D_str
C_with_D_hex = hexify(C_with_D_str)
C_with_D_hash = hashlib.sha256(C_with_D_hex.encode("utf-8"))
C_info.append([C_str, C_hex, C_with_D_str, C_with_D_hex, \
C_with_D_hash.hexdigest()])
else:
C_info.append([C_str, C_hex])
# Print the results
if run_including_selection:
print(f"S is {S}")
print(f"D is \"{D_str}\"")
print(f" U+{D_hex}\n")
print("Candidate information, sorted by hash of name including D")
selected = []
# Sort by the hex of C_with_D_hash
for this_info in sorted(C_info, key=lambda a: a[4], reverse=True):
# Decrement S for each name that is selected
if S > 0:
selected.append(this_info[0])
S -= 1
Hoffman Expires 25 December 2023 [Page 10]
Internet-Draft Candidate Selection June 2023
print(f"{this_info[2]}")
print(f" U+{this_info[3]}")
print(f" {this_info[4]}")
print("\nSelected:\n " + "\n ".join(selected))
else:
for this_info in C_info:
print(f"{this_info[0]}")
print(f" U+{this_info[1]}")
Figure 1: Example Python code for this procedure
Bob Dylan
石黒 一雄
Olga Tokarczuk
Peter Handke
Louise Glück
Abdulrazak Gurnah
Figure 2: Sample name list file
Bob Dylan
U+426f622044796c616e
石黒 一雄
U+e79fb3e9bb9220e4b880e99b84
Olga Tokarczuk
U+4f6c676120546f6b6172637a756b
Peter Handke
U+50657465722048616e646b65
Louise Glück
U+4c6f7569736520476cc3bc636b
Abdulrazak Gurnah
U+416264756c72617a616b204775726e6168
Figure 3: Full information for the names
3
7623.10
Figure 4: Sample selection information file
Hoffman Expires 25 December 2023 [Page 11]
Internet-Draft Candidate Selection June 2023
S is 3
D is "7623.10"
U+373632332e3130
Candidate information, sorted by hash of name including D
石黒 一雄7623.10
U+e79fb3e9bb9220e4b880e99b84373632332e3130
f2e0d3bbd8eac635d799702bead0fbdf07ff79ef94a261789de50e81adb38a13
Louise Glück7623.10
U+4c6f7569736520476cc3bc636b373632332e3130
a54e282cbaa1f29543cd13d9a29e07e3a38413360172b722f8259c2baa3c38dd
Peter Handke7623.10
U+50657465722048616e646b65373632332e3130
8bb3bc197c6462b033e4d8e8cf703b13b1c55172572a85d56c639db5c57d3866
Olga Tokarczuk7623.10
U+4f6c676120546f6b6172637a756b373632332e3130
56166c4e0e6ca027f4150bac5ce83fbf5652e440214fd255308472fed9f8fb1b
Abdulrazak Gurnah7623.10
U+416264756c72617a616b204775726e6168373632332e3130
340413dc6b2574f5ddc5e88e1c986a229d9defccbae249789b07a5d2337981ff
Bob Dylan7623.10
U+426f622044796c616e373632332e3130
05eb403f4f59f5a7b21f5e5a4e8dbfbac59344fd5e8708ab618b5e2ed27a52de
Selected:
石黒 一雄
Louise Glück
Peter Handke
Figure 5: Output of running the program on the list of names and
selection information
Author's Address
Paul Hoffman
ICANN
Email: paul.hoffman@icann.org
Hoffman Expires 25 December 2023 [Page 12]