Simple, fast, flexible and efficient generation of probably unique identifiers (puid, aka random strings) of intuitively specified entropy using predefined or custom characters.
import Puid
let alphaId = try Puid(total: 1e5, risk: 1e12, chars: .alpha)
try alphaId.generate()
// => "uTJtdTPQFk"Puid provides full, explicit control over all important facets of random ID generation: entropy source, characters, and desired randomness. A general overview details information relevant to all PUID implementations.
Creating a random ID generator using Puid is a simple as:
import Puid
let sessionId = try Puid()
try sessionId.generate()
// => "1Uyt1bj-cAgsHRpWjyPya6"Options allow easy and complete control over random ID generation. The above example uses the default for each of:
- entropy source: Cryptographically strong random bytes
- characters: File system & URL safe characters
- randomness: 128 bits of entropy
These defaults are suitable for web session IDs.
Puid provides a CSPRNG entropy source (Puid.Entropy.System.csprng, using SecCopyRandomBytes) and a PRNG entropy source (Puid.Entropy.System.prng, using a seeded XorShift64* PRNG) via the entropy option:
let prngId = try Puid(entropy: .prng)
try prngId.generate()
// => "WONlvSz5wRzw6GUz1LqDTK"
let seeded = try Puid(entropy: .prngSeeded(seed: 42))
try seeded.generate()
// deterministic for the same seedThe entropy option can also designate any implementation of the PuidEntropySource protocol for using a custom entropy source:
let fixedBytes = Puid.Entropy.Fixed(hex: "d0 52 91 fd 13 62 16 fc bc 52 57 d1 a9 17 42 bf bf")
let fixedId = try Puid(entropy: fixedBytes)
try fixedId.generate()
// => "0FKR_RNiFvy8UlfRqRdCv7"Note: The Puid.Entropy.Fixed source is convenient for deterministic testing but not suitable for general use.
A convenience class, Puid.Entropy.Source, provides a means of using any RandomNumberGenerator implementation as a PuidEntropySource. If, for example, you had a favorite PRNG, say FavePrng, that generates a repeatable sequence of random UInt64 numbers via an initialization seed, you could use that PRNG as an entropy source:
let favePrng = Puid.Entropy.Source(using: FavePrng(seed: 42))
let prngId = try Puid(entropy: favePrng)
try prngId.generate()
// => A puid generated using bytes from the custom FavePrng entropy sourceThe characters used in ID generation are designated using the chars option. Puid provides a large set of predefined characters, as well as an option to specify any set of unique characters:
let alphaNumId = try Puid(chars: .alphaNum)
try alphaNumId.generate()
// => "cjm7wFkJQW5igrWUdjFnaA"
let customId = try Puid(chars: .custom("customID_CHARS"))
try customId.generate()
// => "oRmcAACtHsuAIuDSsooItACHIICo_S_IHo"Note: Puid validates that the custom chars are unique to maximizes the entropy captured during ID generation.
A critical aspect of random ID generation is, of course, the randomness of the IDs generated. Puid provides direct specification of ID randomness via the bits option for situations like session IDs (which are recommended to be 128-bit) or for 256-bit security tokens. But a more general, intuitive declaration of randomness is to explicitly specify the total number of IDs actually needed and assign an acceptable risk of repeat:
let randId = try Puid(total: 1e5, risk: 1e12, chars: .safe32)
try randId.generate()
// => "dqHqFD79QGd2TNP"
// Using a self-documenting risk wrapper (same result)
let wrapped = try Puid(total: 100_000, risk: RepeatRisk.oneIn(1e12), chars: .safe32)
try wrapped.generate()In the above example, a total of 100,000 IDs can be generated with a 1 in a trillion risk of repeat. Remember, all random ID generation has an inherent risk of repeat. There is simply no such thing as a univerally unique ID, regardless of the UUID moniker. Rather than blindly use one-size-fits-all (which, for UUID, may be better described as an inefficient, one-size-fits-none solution), Puid allows full control so that risk can be explicitly declared as appropriate for specific application need.
For those instances where bits of entropy is explicitly known:
let token = try Puid(bits: 256, chars: .hexUpper)
try token.generate()
// => "3AE2F836FB09E4D32850ABBA3A20A510B8F47D5CB8EA7CF6BFF10DE58F8FA7BD"Convert between puid strings and representative bits.
- encode(bits:): Accepts exactly the required number of bytes for this generator (ceil(ceil(bitsPerChar) * length / 8)). Throws on invalid size or if any sliced index exceeds the charset size.
- decode(_:): Returns representative bytes for this generator. Supported for ASCII charsets only; throws for non-ASCII charsets or length mismatch.
These functions facilitate storing a binary value of generated IDs while providing a means of recovering the puid String from that binary. They DO NOT provide a a generic binary encoding like Base64. Only the leading ceil(bitsPerChar) bits per character are used. Any unused trailing bits in the last byte are ignored by encode and zeroed by decode. Therefore, bytes -> encode -> decode does not guarantee an exact byte-for-byte round trip. However, puid -> decode -> encode will round-trip the puid string exactly.
import Puid
let alphaId = try Puid(bits: 55, chars: .alphaLower)
let puid = try alphaId.generate() // "arcdpmdppsmh"
let bytes = try alphaId.decode(puid) // [0x04, 0x44, 0x37, 0xb0, 0x6f, 0x7c, 0x98, 0x70]
let puid2 = try alphaId.encode(bits: bytes) // "rwfafkahbmgq"
assert(puid2 == puid)| Name | Count | ERE | ETE | Characters |
|---|---|---|---|---|
| .alpha | 52 | 5.7 | 0.84 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz |
| .alphaLower | 26 | 4.7 | 0.81 | abcdefghijklmnopqrstuvwxyz |
| .alphaUpper | 26 | 4.7 | 0.81 | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
| .alphaNum | 62 | 5.95 | 0.97 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 |
| .alphaNumLower | 36 | 5.17 | 0.65 | abcdefghijklmnopqrstuvwxyz0123456789 |
| .alphaNumUpper | 36 | 5.17 | 0.65 | ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 |
| .base16 | 16 | 4.0 | 1.0 | 0123456789ABCDEF |
| .base32 | 32 | 5.0 | 1.0 | ABCDEFGHIJKLMNOPQRSTUVWXYZ234567 |
| .base32Hex | 32 | 5.0 | 1.0 | 0123456789abcdefghijklmnopqrstuv |
| .base32HexUpper | 32 | 5.0 | 1.0 | 0123456789ABCDEFGHIJKLMNOPQRSTUV |
| .base36 | 36 | 5.17 | 0.65 | 0123456789abcdefghijklmnopqrstuvwxyz |
| .base36Upper | 36 | 5.17 | 0.65 | 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ |
| .base58 | 58 | 5.86 | 0.91 | 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz |
| .base62 | 62 | 5.95 | 0.97 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 |
| .bech32 | 32 | 5.0 | 1.0 | 023456789acdefghjklmnpqrstuvwxyz |
| .boolean | 2 | 1.0 | 1.0 | TF |
| .crockford32 | 32 | 5.0 | 1.0 | 0123456789ABCDEFGHJKMNPQRSTVWXYZ |
| .decimal | 10 | 3.32 | 0.62 | 0123456789 |
| .dna | 4 | 2.0 | 1.0 | ACGT |
| .geohash | 32 | 5.0 | 1.0 | 0123456789bcdefghjkmnpqrstuvwxyz |
| .hex | 16 | 4.0 | 1.0 | 0123456789abcdef |
| .hexUpper | 16 | 4.0 | 1.0 | 0123456789ABCDEF |
| .safeAscii | 90 | 6.49 | 0.8 | !#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnopqrstuvwxyz{|}~ |
| .safe32 | 32 | 5.0 | 1.0 | 2346789bdfghjmnpqrtBDFGHJLMNPQRT |
| .safe64 | 64 | 6.0 | 1.0 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_ |
| .symbol | 28 | 4.81 | 0.89 | !#$%&()*+,-./:;<=>?@[]^_{|}~ |
| .urlSafe | 66 | 6.04 | 0.63 | ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~ |
| .wordSafe32 | 32 | 5.0 | 1.0 | 23456789CFGHJMPQRVWXcfghjmpqrvwx |
| .zBase32 | 32 | 5.0 | 1.0 | ybndrfg8ejkmcpqxot1uwisza345h769 |
Note: The Metrics section explains ERE and ETE.
| Name | Description |
|---|---|
| .base16 | https://datatracker.ietf.org/doc/html/rfc4648#section-8 |
| .base32 | https://datatracker.ietf.org/doc/html/rfc4648#section-6 |
| .base32Hex | Lowercase of .base32HexUpper |
| .base32HexUpper | https://datatracker.ietf.org/doc/html/rfc4648#section-7 |
| .base36 | Used by many URL shorteners |
| .base58 | Bitcoin base58 alphabet (excludes 0, O, I, l) |
| .bech32 | Bitcoin SegWit address encoding |
| .dna | DNA nucleotide bases (Adenine, Cytosine, Guanine, Thymine) |
| .crockford32 | https://www.crockford.com/base32.html |
| .geohash | Used for encoding geographic coordinates |
| .safeAscii | Printable ASCII that does not require escape in String |
| .safe32 | Letters/numbers picked to reduce chance of English words |
| .safe64 | https://datatracker.ietf.org/doc/html/rfc4648#section-5 |
| .urlSafe | https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 |
| .wordSafe32 | Letters/numbers picked to reduce chance of English words |
| .zBase32 | Zooko’s Base32 |
Any String of up to 256 unique characters can be used for puid generation, with custom characters optimized in the same manner as the pre-defined character sets. The characters must be unique. This isn't strictly a technical requirement, PUID could handle duplicate characters, but the resulting randomness of the IDs is maximal when the characters are unique, so PUID enforces that restriction.
Entropy Representation Efficiency (ERE) is a measure of how efficient a string ID represents the entropy of the ID itself. When referring to the entropy of an ID, we mean the Shannon Entropy of the character sequence, and that is maximal when all the permissible characters are equally likely to occur. In most random ID generators, this is the case, and the ERE is solely dependent on the count of characters in the charset, where each character represents log2(count) of entropy (a computer specific calc of general Shannon entropy). For example, for a hex charset there are 16 hex characters, so each character "carries" log2(16) = 4 bits of entropy in the string ID. We say the bits per character is 4 and a random ID of 12 hex characters has 48 bits of entropy.
ERE is measured as a ratio of the bits of entropy for the ID divided by the number of bits require to represent the string (8 bits per ID character). If each character is equally probably (the most common case), ERE is (bits-per-char * id_len) / (8 bits * id_len), which simplifies to bits-per-character/8. The BPC displayed in the Puid Characters table is equivalent to the ERE for that charset.
There is, however, a particular random ID exception where each character is not equally probable, namely, the often used v4 format of UUIDs. In that format, there are hyphens that carry no entropy (entropy is uncertainty, and there is no uncertainly as to where those hyphens will be), one hex digit that is actually constrained to 1 of only 4 hex values and another that is fixed. This formatting results in a ID of 36 characters with a total entropy of 122 bits. The ERE of a v4 UUID is, therefore, 122 / (8 * 36) = 0.4236.
Entropy Transform Efficiency (ETE) is a measure of how efficiently source entropy is transformed into random ID entropy. For charsets with a character count that is a power of 2, all of the source entropy bits can be utilized during random ID generation. Each generated ID character requires exactly log2(count) bits, so the incoming source entropy can easily be carved into appropriate indices for character selection. Since ETE represents the ratio of output entropy bits to input entropy source, when all of the bits are utilized ETE is 1.0.
Even for charsets with power of 2 character count, ETE is only the theoretical maximum of 1.0 if the input entropy source is used as described above. Unfortunately, that is not the case with many random ID generation schemes. Some schemes use the entire output of a call to source entropy to create a single index used to select a character. Such schemes have very poor ETE.
For charsets with a character count that is not a power of 2, some bits will inevitably be discarded since the smallest number of bits required to select a character, ceil(log2(count)), will potentially result in an index beyond the character count. A first-cut, naïve approach to this reality is to simply throw away all the bits when the index is too large.
However, a more sophisticated scheme of bit slicing can actually improve on the naïve approach. Puid extends the bit slicing scheme by adding a bit shifting scheme to the algorithm, wherein a minimum number of bits in the "over the limit" bits are discarded by observing that some bit patterns of length less than ceil(log2(count)) already guarantee the bits will be over the limit, and only those bits need be discarded.
As example, using the :alphanum_lower charset, which has 36 characters, ceil(log2(36)) = 6 bits are required to create a suitable index. However, if those bits start with the bit pattern 11xxxx, the index would be out of bounds regardless of the xxxx bits, so Puid only tosses the first two bits and keeps the trailing four bits for use in the next index. (It is beyond scope to discuss here, but analysis shows this bit shifting scheme does not alter the random characteristics of generated IDs). So whereas the naïve approach would have an ETE of 0.485, Puid achieves an ETE of 0.646, a 33% improvement.
import Puid
let safe = Puid.Chars.metrics(.safe64)
print(safe.avgBits) // 6.0
print(safe.bitShifts) // [BitShift(value: 63, shift: 6)]
print(safe.ere) // 0.75
print(safe.ete) // 1.0
let alpha = Puid.Chars.metrics(.alpha)
print(alpha.avgBits) // > 6.0 (accounts for rare rejects)
print(alpha.bitShifts) // e.g., [BitShift(value: 51, shift: 6), ...]
print(alpha.ere) // ~0.71 (5.7 / 8)
print(alpha.ete) // < 1.0 (non power-of-two charset)These helpers return the maximum total number of IDs before reaching a repeat threshold, and the 1‑in‑N risk for a given total.
- total(atRiskProbability:): Accepts a probability p in (0, 1).
- total(atRiskOneIn:): Accepts a one‑in‑X threshold X > 1.
- risk(after:): Returns a 1‑in‑N value (Double) for the chance of at least one repeat after generating the given total.
import Puid
let id = try Puid(bits: 128, chars: .safe64)
// One‑in‑X form (e.g., one‑in‑a‑trillion)
let maxAtOneIn = id.total(atRiskOneIn: 1e12)
// Probability form (e.g., 1e‑12)
let maxAtProbability = id.total(atRiskProbability: 1e-12)
// Risk as 1‑in‑N for a given total
let oneIn = id.risk(after: 1_000_000)Swift Package Manager URL: https://github.com/puid/Swift-puid
MIT license. See LICENSE.