Puid

Simple, fast, flexible and efficient generation of probably unique identifiers (puid, aka random strings) of intuitively specified entropy using predefined or custom characters.

import Puid

let alphaId = try Puid(total: 1e5, risk: 1e12, chars: .alpha)
try alphaId.generate()
// => "uTJtdTPQFk"

Overview

Puid provides full, explicit control over all important facets of random ID generation: entropy source, characters, and desired randomness. A general overview details information relevant to all PUID implementations.

TOC

Usage

Creating a random ID generator using Puid is a simple as:

import Puid

let sessionId = try Puid()
try sessionId.generate()
// => "1Uyt1bj-cAgsHRpWjyPya6"

Options allow easy and complete control over random ID generation. The above example uses the default for each of:

entropy source: Cryptographically strong random bytes
characters: File system & URL safe characters
randomness: 128 bits of entropy

These defaults are suitable for web session IDs.

TOC

Entropy Source

Puid provides a CSPRNG entropy source (Puid.Entropy.System.csprng, using SecCopyRandomBytes) and a PRNG entropy source (Puid.Entropy.System.prng, using a seeded XorShift64* PRNG) via the entropy option:

let prngId = try Puid(entropy: .prng)
try prngId.generate()
// => "WONlvSz5wRzw6GUz1LqDTK"

let seeded = try Puid(entropy: .prngSeeded(seed: 42))
try seeded.generate()
// deterministic for the same seed

The entropy option can also designate any implementation of the PuidEntropySource protocol for using a custom entropy source:

let fixedBytes = Puid.Entropy.Fixed(hex: "d0 52 91 fd 13 62 16 fc bc 52 57 d1 a9 17 42 bf bf")
let fixedId = try Puid(entropy: fixedBytes)
try fixedId.generate()
// => "0FKR_RNiFvy8UlfRqRdCv7"

Note: The Puid.Entropy.Fixed source is convenient for deterministic testing but not suitable for general use.

A convenience class, Puid.Entropy.Source, provides a means of using any RandomNumberGenerator implementation as a PuidEntropySource. If, for example, you had a favorite PRNG, say FavePrng, that generates a repeatable sequence of random UInt64 numbers via an initialization seed, you could use that PRNG as an entropy source:

let favePrng = Puid.Entropy.Source(using: FavePrng(seed: 42))
let prngId = try Puid(entropy: favePrng)
try prngId.generate()
// => A puid generated using bytes from the custom FavePrng entropy source

TOC

Characters

The characters used in ID generation are designated using the chars option. Puid provides a large set of predefined characters, as well as an option to specify any set of unique characters:

let alphaNumId = try Puid(chars: .alphaNum)
try alphaNumId.generate()
// => "cjm7wFkJQW5igrWUdjFnaA"

let customId = try Puid(chars: .custom("customID_CHARS"))
try customId.generate()
// => "oRmcAACtHsuAIuDSsooItACHIICo_S_IHo"

Note: Puid validates that the custom chars are unique to maximizes the entropy captured during ID generation.

TOC

Randomness

A critical aspect of random ID generation is, of course, the randomness of the IDs generated. Puid provides direct specification of ID randomness via the bits option for situations like session IDs (which are recommended to be 128-bit) or for 256-bit security tokens. But a more general, intuitive declaration of randomness is to explicitly specify the total number of IDs actually needed and assign an acceptable risk of repeat:

let randId = try Puid(total: 1e5, risk: 1e12, chars: .safe32)
try randId.generate()
// => "dqHqFD79QGd2TNP"

// Using a self-documenting risk wrapper (same result)
let wrapped = try Puid(total: 100_000, risk: RepeatRisk.oneIn(1e12), chars: .safe32)
try wrapped.generate()

In the above example, a total of 100,000 IDs can be generated with a 1 in a trillion risk of repeat. Remember, all random ID generation has an inherent risk of repeat. There is simply no such thing as a univerally unique ID, regardless of the UUID moniker. Rather than blindly use one-size-fits-all (which, for UUID, may be better described as an inefficient, one-size-fits-none solution), Puid allows full control so that risk can be explicitly declared as appropriate for specific application need.

For those instances where bits of entropy is explicitly known:

let token = try Puid(bits: 256, chars: .hexUpper)
try token.generate()
// => "3AE2F836FB09E4D32850ABBA3A20A510B8F47D5CB8EA7CF6BFF10DE58F8FA7BD"

Encode / Decode

Convert between puid strings and representative bits.

encode(bits:): Accepts exactly the required number of bytes for this generator (ceil(ceil(bitsPerChar) * length / 8)). Throws on invalid size or if any sliced index exceeds the charset size.
decode(_:): Returns representative bytes for this generator. Supported for ASCII charsets only; throws for non-ASCII charsets or length mismatch.

These functions facilitate storing a binary value of generated IDs while providing a means of recovering the puid String from that binary. They DO NOT provide a a generic binary encoding like Base64. Only the leading ceil(bitsPerChar) bits per character are used. Any unused trailing bits in the last byte are ignored by encode and zeroed by decode. Therefore, bytes -> encode -> decode does not guarantee an exact byte-for-byte round trip. However, puid -> decode -> encode will round-trip the puid string exactly.

import Puid

let alphaId = try Puid(bits: 55, chars: .alphaLower)
let puid = try alphaId.generate() // "arcdpmdppsmh"
let bytes = try alphaId.decode(puid) // [0x04, 0x44, 0x37, 0xb0, 0x6f, 0x7c, 0x98, 0x70]
let puid2 = try alphaId.encode(bits: bytes) // "rwfafkahbmgq"
assert(puid2 == puid)

TOC

Predefined Character Sets

Name	Count	ERE	ETE	Characters
.alpha	52	5.7	0.84	ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
.alphaLower	26	4.7	0.81	abcdefghijklmnopqrstuvwxyz
.alphaUpper	26	4.7	0.81	ABCDEFGHIJKLMNOPQRSTUVWXYZ
.alphaNum	62	5.95	0.97	ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
.alphaNumLower	36	5.17	0.65	abcdefghijklmnopqrstuvwxyz0123456789
.alphaNumUpper	36	5.17	0.65	ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
.base16	16	4.0	1.0	0123456789ABCDEF
.base32	32	5.0	1.0	ABCDEFGHIJKLMNOPQRSTUVWXYZ234567
.base32Hex	32	5.0	1.0	0123456789abcdefghijklmnopqrstuv
.base32HexUpper	32	5.0	1.0	0123456789ABCDEFGHIJKLMNOPQRSTUV
.base36	36	5.17	0.65	0123456789abcdefghijklmnopqrstuvwxyz
.base36Upper	36	5.17	0.65	0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
.base58	58	5.86	0.91	123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
.base62	62	5.95	0.97	ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789
.bech32	32	5.0	1.0	023456789acdefghjklmnpqrstuvwxyz
.boolean	2	1.0	1.0	TF
.crockford32	32	5.0	1.0	0123456789ABCDEFGHJKMNPQRSTVWXYZ
.decimal	10	3.32	0.62	0123456789
.dna	4	2.0	1.0	ACGT
.geohash	32	5.0	1.0	0123456789bcdefghjkmnpqrstuvwxyz
.hex	16	4.0	1.0	0123456789abcdef
.hexUpper	16	4.0	1.0	0123456789ABCDEF
.safeAscii	90	6.49	0.8	!#$%&()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnopqrstuvwxyz{\|}~
.safe32	32	5.0	1.0	2346789bdfghjmnpqrtBDFGHJLMNPQRT
.safe64	64	6.0	1.0	ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_
.symbol	28	4.81	0.89	!#$%&()*+,-./:;<=>?@[]^_{\|}~
.urlSafe	66	6.04	0.63	ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~
.wordSafe32	32	5.0	1.0	23456789CFGHJMPQRVWXcfghjmpqrvwx
.zBase32	32	5.0	1.0	ybndrfg8ejkmcpqxot1uwisza345h769

Note: The Metrics section explains ERE and ETE.

TOC

Description of non-obvious character sets

Name	Description
.base16	https://datatracker.ietf.org/doc/html/rfc4648#section-8
.base32	https://datatracker.ietf.org/doc/html/rfc4648#section-6
.base32Hex	Lowercase of .base32HexUpper
.base32HexUpper	https://datatracker.ietf.org/doc/html/rfc4648#section-7
.base36	Used by many URL shorteners
.base58	Bitcoin base58 alphabet (excludes 0, O, I, l)
.bech32	Bitcoin SegWit address encoding
.dna	DNA nucleotide bases (Adenine, Cytosine, Guanine, Thymine)
.crockford32	https://www.crockford.com/base32.html
.geohash	Used for encoding geographic coordinates
.safeAscii	Printable ASCII that does not require escape in String
.safe32	Letters/numbers picked to reduce chance of English words
.safe64	https://datatracker.ietf.org/doc/html/rfc4648#section-5
.urlSafe	https://datatracker.ietf.org/doc/html/rfc3986#section-2.3
.wordSafe32	Letters/numbers picked to reduce chance of English words
.zBase32	Zooko’s Base32

Custom

Any String of up to 256 unique characters can be used for puid generation, with custom characters optimized in the same manner as the pre-defined character sets. The characters must be unique. This isn't strictly a technical requirement, PUID could handle duplicate characters, but the resulting randomness of the IDs is maximal when the characters are unique, so PUID enforces that restriction.

Metrics

Entropy Representation Efficiency

Entropy Representation Efficiency (ERE) is a measure of how efficient a string ID represents the entropy of the ID itself. When referring to the entropy of an ID, we mean the Shannon Entropy of the character sequence, and that is maximal when all the permissible characters are equally likely to occur. In most random ID generators, this is the case, and the ERE is solely dependent on the count of characters in the charset, where each character represents log2(count) of entropy (a computer specific calc of general Shannon entropy). For example, for a hex charset there are 16 hex characters, so each character "carries" log2(16) = 4 bits of entropy in the string ID. We say the bits per character is 4 and a random ID of 12 hex characters has 48 bits of entropy.

ERE is measured as a ratio of the bits of entropy for the ID divided by the number of bits require to represent the string (8 bits per ID character). If each character is equally probably (the most common case), ERE is (bits-per-char * id_len) / (8 bits * id_len), which simplifies to bits-per-character/8. The BPC displayed in the Puid Characters table is equivalent to the ERE for that charset.

There is, however, a particular random ID exception where each character is not equally probable, namely, the often used v4 format of UUIDs. In that format, there are hyphens that carry no entropy (entropy is uncertainty, and there is no uncertainly as to where those hyphens will be), one hex digit that is actually constrained to 1 of only 4 hex values and another that is fixed. This formatting results in a ID of 36 characters with a total entropy of 122 bits. The ERE of a v4 UUID is, therefore, 122 / (8 * 36) = 0.4236.

Entropy Transform Efficiency

Entropy Transform Efficiency (ETE) is a measure of how efficiently source entropy is transformed into random ID entropy. For charsets with a character count that is a power of 2, all of the source entropy bits can be utilized during random ID generation. Each generated ID character requires exactly log2(count) bits, so the incoming source entropy can easily be carved into appropriate indices for character selection. Since ETE represents the ratio of output entropy bits to input entropy source, when all of the bits are utilized ETE is 1.0.

Even for charsets with power of 2 character count, ETE is only the theoretical maximum of 1.0 if the input entropy source is used as described above. Unfortunately, that is not the case with many random ID generation schemes. Some schemes use the entire output of a call to source entropy to create a single index used to select a character. Such schemes have very poor ETE.

For charsets with a character count that is not a power of 2, some bits will inevitably be discarded since the smallest number of bits required to select a character, ceil(log2(count)), will potentially result in an index beyond the character count. A first-cut, naïve approach to this reality is to simply throw away all the bits when the index is too large.

However, a more sophisticated scheme of bit slicing can actually improve on the naïve approach. Puid extends the bit slicing scheme by adding a bit shifting scheme to the algorithm, wherein a minimum number of bits in the "over the limit" bits are discarded by observing that some bit patterns of length less than ceil(log2(count)) already guarantee the bits will be over the limit, and only those bits need be discarded.

As example, using the :alphanum_lower charset, which has 36 characters, ceil(log2(36)) = 6 bits are required to create a suitable index. However, if those bits start with the bit pattern 11xxxx, the index would be out of bounds regardless of the xxxx bits, so Puid only tosses the first two bits and keeps the trailing four bits for use in the next index. (It is beyond scope to discuss here, but analysis shows this bit shifting scheme does not alter the random characteristics of generated IDs). So whereas the naïve approach would have an ETE of 0.485, Puid achieves an ETE of 0.646, a 33% improvement.

Example

import Puid

let safe = Puid.Chars.metrics(.safe64)
print(safe.avgBits)    // 6.0
print(safe.bitShifts)  // [BitShift(value: 63, shift: 6)]
print(safe.ere)        // 0.75
print(safe.ete)        // 1.0

let alpha = Puid.Chars.metrics(.alpha)
print(alpha.avgBits)   // > 6.0 (accounts for rare rejects)
print(alpha.bitShifts) // e.g., [BitShift(value: 51, shift: 6), ...]
print(alpha.ere)       // ~0.71 (5.7 / 8)
print(alpha.ete)       // < 1.0 (non power-of-two charset)

Risk helpers

These helpers return the maximum total number of IDs before reaching a repeat threshold, and the 1‑in‑N risk for a given total.

total(atRiskProbability:): Accepts a probability p in (0, 1).
total(atRiskOneIn:): Accepts a one‑in‑X threshold X > 1.
risk(after:): Returns a 1‑in‑N value (Double) for the chance of at least one repeat after generating the given total.

import Puid

let id = try Puid(bits: 128, chars: .safe64)

// One‑in‑X form (e.g., one‑in‑a‑trillion)
let maxAtOneIn = id.total(atRiskOneIn: 1e12)

// Probability form (e.g., 1e‑12)
let maxAtProbability = id.total(atRiskProbability: 1e-12)

// Risk as 1‑in‑N for a given total
let oneIn = id.risk(after: 1_000_000)

TOC

Installation

Swift Package Manager URL: https://github.com/puid/Swift-puid

TOC

License

MIT license. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
.swiftpm/xcode		.swiftpm/xcode
.vscode		.vscode
Sources/Puid		Sources/Puid
Tests		Tests
.gitignore		.gitignore
.gitmodules		.gitmodules
.spi.yml		.spi.yml
.swift-format		.swift-format
.swiftlint.yml		.swiftlint.yml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
Makefile		Makefile
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Puid

TOC

Overview

Usage

Entropy Source

Characters

Randomness

Encode / Decode

Predefined Character Sets

Description of non-obvious character sets

Custom

Metrics

Entropy Representation Efficiency

Entropy Transform Efficiency

Example

Risk helpers

Installation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

puid/Swift-puid

Folders and files

Latest commit

History

Repository files navigation

Puid

TOC

Overview

Usage

Entropy Source

Characters

Randomness

Encode / Decode

Predefined Character Sets

Description of non-obvious character sets

Custom

Metrics

Entropy Representation Efficiency

Entropy Transform Efficiency

Example

Risk helpers

Installation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages