Version 1.2.5.
This library contains the implementation of several types of string objects and various algorithms for working with strings.
The goal of the library is to make working with strings in C++ as simple and easy as in many other languages, especially scripting languages, while maintaining optimality and performance at the level of C and C++, and even improving them.
It's no secret that working with strings in C++ often causes pain. The std::string class is often inconvenient or inefficient.
Many functions that are usually necessary when working with strings are simply not there, and everyone has to write them themselves.
This library was not made as a universal combine that "can do everything", I implemented what I had to use at work, trying to do it in the most efficient way, and I modestly hope that I succeeded in something and will be useful to other people, either directly or as a source of ideas.
The library does not pretend to be a "change the header and everything works better" solution. I tried to make many methods compatible
with std::string and std::string_view, but I didn't bother with it much. Rewriting old code to work with simstr
will require some effort, but I assure you that it will pay off. And writing new code with its use is easy and enjoyable :)
The main difference between simstr and std::string is that instead of a single universal class, several types of objects are used to work with strings, each of which is good for its own purposes, and at the same time interacts well with each other. If you actively used std::string_view and understood its advantages and disadvantages compared to std::string, then the simstr approach will also be clear to you.
- Strings
char,char16_t,char32_t,wchar_t. - Transparent conversion of strings from one character type to another, with automatic conversion between UTF-8, UTF-16, UTF-32, using simdutf.
- Extensible "String Expression" system. Allows you to efficiently implement the conversion and addition (concatenation) of strings, literals, numbers and possibly other objects.
- String functions:
- Getting substrings.
- Searching for substrings and characters - from the beginning or from the end of the string.
- Various string trimming - right, left, everywhere, by whitespace characters, by specified characters.
- Replacing substrings.
- Replacing a set of characters with a set of corresponding substrings.
- Merging (join) containers of strings into a single string, with specifying separators and options - "skip empty", "separator after last".
- Splitting strings into parts by a specified separator. Splitting is possible directly into a container with strings, or by calling a functor for
each substring, or by iterating using the
Splitteriterator.
- Integration with
formatandsprintfformatting functions (with automatic buffer increase). Formatting is possible forchar,wchar_tstrings and strings compatible withwchar_tin size. That is, under Windows it ischar16_t, under Linux -char32_t. Writing my own formatting library was not part of my plans. - Parsing integers with the possibility of "fine" tuning during compilation - you can set options for checking overflow,
skipping whitespace characters, a specific radix or auto-selection by prefixes
0x,0,0b,0o, admissibility of the+sign. Parsing is implemented for all types of strings and characters. - Parsing double is currently implemented by calling the standard library and only works for
char,wchar_tstrings and types compatible withwchar_tin size. - Minimal Unicode support is included when converting
upper,lowerand case-insensitive string comparison. It only works for characters in the first plane of Unicode (up to 0xFFFF), and when changing case, it does not take into account cases where one code point can be converted into several, that is, the case conversion of characters corresponds tostd::towupper,std::towlowerfor the unicode locale, only faster and can work with any type of characters. - Implemented
hash mapfor string type keys, based onstd::unordered_map, with the possibility of more efficient storage and comparison of keys compared tostd::stringkeys. Case-insensitive key comparison is supported (Ascii or minimal Unicode (see previous paragraph)).
- simple_str<K> - the simplest string (or piece of string), immutable, not owning, analogue of
std::string_view. - simple_str_nt<K> - the same, only declares that it ends with 0. For working with third-party C-API.
- sstring<K> - shared string, immutable, owning, with shared character buffer, SSO support.
- lstring<K, N> - local string, mutable, owning, with a specified size of the SSO buffer.
- Overview and introduction
- Overview article on Habr
- Description of the "Expression Templates" technique used
simstr consists of three header files and two source files. You can connect as a CMake project via add_subdirectory (the simstr library),
you can simply include the files in your project. Building also requires simdutf (when using CMake
it is downloaded automatically).
simstr requires a compiler of standard no lower than C++20 to work - concepts and std::format are used.
The work was tested under Windows on MSVC-19 and Clang-19, under Linux - on GCC-13 and Clang-21.
The work in WASM was also tested, built in Emscripten 4.0.6, Clang-21.
Along with the library, two files are supplied that allow viewing simstr string objects in debuggers
more convenient.
It is described in more detail in here.
Benchmarks are performed using the Google benchmark framework. I tried to make measurements for the most typical operations that occur in normal work. I took measurements on my equipment, under Windows and Linux (in WSL), using MSVC, Clang, GCC compilers. Third-party results are welcome. I also took measurements in WASM, built in Emscripten. I draw your attention to the fact that a 32-bit build is assembled under WASM in Emscripten, which means that the sizes of SSO buffers in objects are smaller.
While no separate usage examples have been prepared, you can look at the texts of tests, benchmarks, and html preparation utilities from the benchmark results.
Also simstr is used in my projects: