Skip to content

DanHarltey/Fastenshtein

Repository files navigation

Fastenshtein

NuGet GitHub action build AppVeyor Build License Unit test coverage

One of the fastest .Net Levenshtein projects around.

Fastenshtein is an optimized and fully unit tested Levenshtein implementation. It is optimized for speed and memory usage.

From the included brenchmarking tests comparing random words of 3 to 20 random chars to other Nuget Levenshtein implementations.

Method Mean Ratio Rank Gen0 Allocated Alloc Ratio
Fastenshtein 1.077 ms 1.00 1 - 6345 B 1.000
FastenshteinStatic 1.122 ms 1.04 2 3.9063 265441 B 41.835
NinjaNye 1.899 ms 1.76 4 76.1719 4274593 B 673.695
StringSimilarity 2.899 ms 2.69 5 7.8125 543770 B 85.701
FuzzyStringsNetStandard 7.351 ms 6.81 6 414.0625 22967283 B 3,619.745

Usage

int levenshteinDistance = Fastenshtein.Levenshtein.Distance("value1", "value2");

Alternative method for comparing one item against many (quicker due to less memory allocation, not thread safe)

Fastenshtein.Levenshtein lev = new Fastenshtein.Levenshtein("value1");
foreach (var item in new []{ "value2", "value3", "value4"})
{
	int levenshteinDistance = lev.DistanceFrom(item);
}

How to include Fastenshtein in Microsoft SQL Server (SQLCLR)

We will create Fastenshtein as a CLR Scalar-Valued Function within SQL Server. This will allow the fast Levenshtein implementation to be used within SQL Server.

  1. To enable CLR integration for the server:

    sp_configure 'clr enabled', 1
    RECONFIGURE
  2. Beginning with SQL Server 2017 (14.x). Either configure CLR strict security or run the below to disable it:

    EXEC sp_configure 'show advanced options', 1;
    RECONFIGURE;
    
    EXEC sp_configure 'clr strict security', 0;
    RECONFIGURE;
  3. To load Fastenshtein onto the server, you must use the .Net framework version 4.6.2. This can be done in two ways:

    • Using assembly bits. Download "Fastenshtein SQL Assembly Hex" from the lastest release. Unzip the file and copy the full contents of the "Fastenshtein_net462.hex" file into the below:

      CREATE ASSEMBLY FastenshteinAssembly
      FROM 0x{contents of Fastenshtein_net462.hex}
      WITH PERMISSION_SET = SAFE;
    • Local path or network location to the assembly. Place the Fastenshtein.dll in a directory that the SQL Server instance has access to. To create the assembly (dll) either:

      • Compile the project “Fastenshtein” in Release config.

      OR

      • Download the pre-compiled dll from nuget unzip the package and use the dll in \lib\net462 folder.
      CREATE ASSEMBLY FastenshteinAssembly FROM 'C:\Fastenshtein.dll' WITH PERMISSION_SET = SAFE
  4. Then create the function

    CREATE FUNCTION [Levenshtein](@value1 [nvarchar](MAX), @value2 [nvarchar](MAX))
    RETURNS [int]
    AS 
    EXTERNAL NAME [FastenshteinAssembly].[Fastenshtein.Levenshtein].[Distance]
    GO
  5. It is now ready to be used:

    -- Usage
    DECLARE @retVal AS INTEGER
    SELECT @retVal = [dbo].[Levenshtein]('Test','test')
    SELECT @retVal

About

The fastest .Net Levenshtein around

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •