TextProbe is a simple and extensible PHP library for text analysis and pattern matching. It is designed to help developers probe, parse, and manipulate text efficiently using customizable rules and matchers.
- π§ Easy-to-use API for text matching and parsing
- π§ Extensible architecture β write your own matchers and rules
- π‘ Suitable for parsing logs, user input, or any structured text
You can install the library via Composer:
composer require makarms/text-probeThe library comes with several built-in probes to detect common patterns in text:
-
DiscordNewUsernameProbeβ extracts Discord usernames in the new format (e.g.,@username), enforcing Discordβs updated naming rules (length, characters, no consecutive dots). -
DiscordOldUsernameProbeβ extracts classic Discord usernames in the formatusername#1234, ensuring proper structure and valid discriminator. -
EmailProbeβ extracts email addresses. -
PhoneProbeβ extracts phone numbers (supports various formats). -
SlackUsernameProbeβ extracts Slack usernames (e.g.,@username), supporting Slack-specific username rules such as allowed characters, length limits, and no consecutive dots. -
TelegramUserLinkProbeβ extractst.melinks pointing to Telegram users. -
TelegramUsernameProbeβ extracts Telegram usernames (e.g.,@username).
-
DateProbeβ extracts dates in various formats (e.g.,YYYY-MM-DD,DD/MM/YYYY,2nd Jan 2023). -
DateTimeProbeβ extracts combined date and time in multiple common formats. -
TimeProbeβ extracts times (e.g.,14:30,14:30:15, optional AM/PM).
-
BankBicCodeProbeβ Extracts SWIFT/BIC codes (8β11 characters, e.g.,DEUTDEFF500). -
BankIbanNumberProbeβ Extracts IBAN numbers, supports spaces, validates using Mod-97. -
BankRoutingNumberProbeβ Extracts US Routing Numbers (9 digits), validates the checksum.
Supported formats: plain digits (e.g.,
4111111111111111), digits separated by spaces (e.g.,4111 1111 1111 1111) or dashes (e.g.,4111-1111-1111-1111). Only Luhn-valid numbers by default.
-
BankCardNumberProbeβ extracts major card schemes like Visa, Mastercard, Amex, and all other supported schemes listed below. -
BankAmexCardProbeβ American Express (prefixes: 34, 37), 15 digits. -
BankDinersClubCardProbeβ Diners Club (prefixes: 30[0-5], 309, 36, 38, 39), 13β14 digits. -
BankDiscoverCardProbeβ Discover (prefixes: 6011, 65, 644β649, 622126β622925), 16 digits. -
BankJcbCardProbeβ JCB (prefixes: 3528β3589), 16 digits. -
BankMaestroCardProbeβ Maestro (prefixes: 5018, 5020, 5038, 5612, 5893, 6304, 6759, 6761β6763), 16β19 digits. -
BankMastercardCardProbeβ Mastercard (prefixes: 51β55, 2221β2720), 16 digits. -
BankMirCardProbeβ MIR (prefixes: 2200β2204), 16 digits. -
BankRupayCardProbeβ RuPay (prefixes: 508, 60, 65, 81, 82), 16 digits. -
BankTroyCardProbeβ Troy (prefixes: 9792), 16 digits. -
BankUnionpayCardProbeβ UnionPay (prefixes: 62), 16β19 digits. -
BankVerveCardProbeβ Verve (prefixes: 5060, 5061, 6500β6509), 13β19 digits. -
BankVisaCardProbeβ Visa (prefixes: 4), 13β19 digits.
-
BankCardCvvCvcCodeProbeβ Extracts CVV/CVC codes (3β4 digits). -
BankCardExpiryProbeβ Extracts card expiration dates (formatsMM/YY,MM/YYYY,MM-YY,MM-YYYY, etc.).
-
BitcoinAddressProbeβ Extracts Bitcoin addresses (Base58 and Bech32 formats). -
EthereumAddressProbeβ Extracts Ethereum addresses (0x-prefixed, 40 hex characters). -
LitecoinAddressProbeβ Extracts Litecoin addresses (Base58 or Bech32). -
RippleAddressProbeβ Extracts Ripple/XRP addresses (starts with 'r', Base58). -
SolanaAddressProbeβ Extracts Solana addresses (Base58, 32β44 chars). -
TronAddressProbeβ Extracts TRON addresses (Base58, starts with 'T', 34 chars). -
UsdcAlgorandAddressProbeβ Extracts USDC addresses on Algorand (Base32, 58 chars). -
UsdcErc20AddressProbeβ Extracts USDC ERC20 addresses (Ethereum-compatible, 0x-prefixed). -
UsdcSolanaAddressProbeβ Extracts USDC addresses on Solana (same format as Solana addresses). -
UsdtErc20AddressProbeβ Extracts USDT ERC20 addresses (Ethereum-compatible, 0x-prefixed). -
UsdtOmniAddressProbeβ Extracts USDT Omni addresses (Bitcoin-based, starts with 1 or 3, 26β35 chars). -
UsdtTrc20AddressProbeβ Extracts USDT TRC20 addresses (TRON-based, Base58, starts with 'T', 34 chars).
GeoCoordinatesProbeβ extracts geographic coordinates in various formats (decimalordegrees/minutes/seconds,N/S/E/W).
HashtagProbeβ extracts hashtags from text (e.g.,#example), supporting Unicode letters, numbers, and underscores, detecting hashtags in any position of the text.
-
UUIDProbeβ extracts any valid UUID (v1βv6) without checking the specific version. Supports standard UUID formats with hyphens. -
UUIDv1Probeβ extracts UUID version 1, matching the formatxxxxxxxx-xxxx-1xxx-xxxx-xxxxxxxxxxxx, commonly used for time-based identifiers. -
UUIDv2Probeβ extracts UUID version 2, matching the formatxxxxxxxx-xxxx-2xxx-xxxx-xxxxxxxxxxxx, typically used in DCE Security contexts. -
UUIDv3Probeβ extracts UUID version 3, matching the formatxxxxxxxx-xxxx-3xxx-xxxx-xxxxxxxxxxxx, generated using MD5 hashing of names and namespaces. -
UUIDv4Probeβ extracts UUID version 4, matching the formatxxxxxxxx-xxxx-4xxx-xxxx-xxxxxxxxxxxx, randomly generated and commonly used for unique identifiers. -
UUIDv5Probeβ extracts UUID version 5, matching the formatxxxxxxxx-xxxx-5xxx-xxxx-xxxxxxxxxxxx, generated using SHA-1 hashing of names and namespaces. -
UUIDv6Probeβ extracts UUID version 6, matching the formatxxxxxxxx-xxxx-6xxx-xxxx-xxxxxxxxxxxx, an ordered version for better indexing and sorting.
-
DomainProbeβ extracts domain names, including internationalized (Unicode) domains. -
IPv4Probeβ extracts IPv4 addresses, supporting standard formats and excluding reserved/bogus ranges if necessary. -
IPv6Probeβ extracts IPv6 addresses, including compressed formats, IPv4-mapped addresses, and zone indexes (e.g.,%eth0). -
LinkProbeβ extracts hyperlinks, including ones with IP addresses, ports, or without a protocol. -
MacAddressProbeβ extracts MAC addresses in standard formats using colons or hyphens (e.g.,00:1A:2B:3C:4D:5Eor00-1A-2B-3C-4D-5E), accurately detecting valid addresses while excluding invalid patterns. -
UserAgentProbeβ extracts User-Agent strings from text, supporting complex structures like multiple product tokens, OS information, and browser identifiers.
-
DockerImageProbeβ extracts Docker image names with tags only (e.g.,nginx:1.25.1,redis:latest,ghcr.io/app/api: v2). Supports registries, multi-level namespaces, semantic and custom tags, while ignoring invalid or tagless image names (e.g., python, myapp/web). -
DockerContainerIdProbeβ extracts Docker container IDs in short and full formats from logs and CLI output (e.g., docker ps, docker logs, CI, orchestration traces). Detects lowercase hexadecimal IDs of 12 or 64 characters, ignoring strings of other lengths or with non-hex characters. -
DockerLabelProbeβ extracts Docker label key/value pairs from Dockerfiles and CLI commands (e.g.,LABEL version="1.0.0" description="API" vendor=acme). Detects fragments in the formkey=valueandkey="value", including multiple labels in a single instruction, without fully parsing Dockerfile syntax. -
DockerCliFlagProbeβ extracts Docker CLI flags from arbitrary text (e.g.,-p 8080:80,-v ./src:/app,--env KEY=VALUE,--name api,--rm). Detects short and long options in both space and equals forms, with or without arguments, making it suitable for parsing docker run commands, CI scripts, and orchestration logs without full CLI parsing. -
DockerfileInstructionProbeβ extracts Dockerfile instructions such asFROM,RUN,COPY,ENV,HEALTHCHECK, including multiline continuations with\. Matches instruction blocks regardless of indentation and supports case-insensitive detection of all core Dockerfile directives. -
DockerImageDigestProbeβ extracts Docker image digests in the formsha256:<64-hex>from logs, Docker/registry output and SBOM metadata, including references likeimage@sha256:<digest>, while always returning only the digest value.
You can implement your own probes by creating classes that implement the IProbe interface.
Each probe also supports using a different validator for the returned values by passing an instance of a class
implementing the IValidator interface to the probeβs constructor. This allows you to override the default validation
logic.
For example, BankCardNumberProbe uses a default validator based on the Luhn algorithm, but you can provide your
own validator if you want to enforce additional rules, such as limiting to specific card issuers or formats.
require __DIR__ . '/vendor/autoload.php';
use TextProbe\TextProbe;
use TextProbe\Probes\Contact\EmailProbe;
$text = "Please contact us at info@example.com for more details.";
$probe = new TextProbe();
$probe->addProbe(new EmailProbe());
$results = $probe->analyze($text);
foreach ($results as $result) {
echo sprintf(
"[%s] %s (position %d-%d)\n",
$result->getProbeType()->name,
$result->getResult(),
$result->getStart(),
$result->getEnd()
);
}[EMAIL] info@example.com (position 21-37)