Skip to content

Commit 7ce8a26

Browse files
committed
Improved SIMD MD5 approach
1 parent cbd655a commit 7ce8a26

File tree

5 files changed

+232
-146
lines changed

5 files changed

+232
-146
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Advent of Code [![checks-badge]][checks-link] [![docs-badge]][docs-link]
22

33
Blazing fast Rust solutions for every [Advent of Code] puzzle from 2015 to 2024, taking
4-
**501 milliseconds** to solve all 500 stars. Each solution is carefully optimized for performance
4+
**498 milliseconds** to solve all 500 stars. Each solution is carefully optimized for performance
55
while ensuring the code remains concise, readable, and idiomatic.
66

77
## Features
@@ -67,7 +67,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
6767

6868
| Year | [2015](#2015) | [2016](#2016) | [2017](#2017) | [2018](#2018) | [2019](#2019) | [2020](#2020) | [2021](#2021) | [2022](#2022) | [2023](#2023) | [2024](#2024) |
6969
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
70-
| Benchmark (ms) | 15 | 111 | 82 | 35 | 15 | 220 | 8 | 6 | 5 | 4 |
70+
| Benchmark (ms) | 15 | 109 | 82 | 35 | 14 | 220 | 8 | 6 | 5 | 4 |
7171

7272
## 2024
7373

@@ -335,7 +335,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
335335
| 2 | [Bathroom Security](https://adventofcode.com/2016/day/2) | [Source](src/year2016/day02.rs) | 29 |
336336
| 3 | [Squares With Three Sides](https://adventofcode.com/2016/day/3) | [Source](src/year2016/day03.rs) | 24 |
337337
| 4 | [Security Through Obscurity](https://adventofcode.com/2016/day/4) | [Source](src/year2016/day04.rs) | 79 |
338-
| 5 | [How About a Nice Game of Chess?](https://adventofcode.com/2016/day/5) | [Source](src/year2016/day05.rs) | 34000 |
338+
| 5 | [How About a Nice Game of Chess?](https://adventofcode.com/2016/day/5) | [Source](src/year2016/day05.rs) | 33000 |
339339
| 6 | [Signals and Noise](https://adventofcode.com/2016/day/6) | [Source](src/year2016/day06.rs) | 3 |
340340
| 7 | [Internet Protocol Version 7](https://adventofcode.com/2016/day/7) | [Source](src/year2016/day07.rs) | 364 |
341341
| 8 | [Two-Factor Authentication](https://adventofcode.com/2016/day/8) | [Source](src/year2016/day08.rs) | 9 |
@@ -344,7 +344,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
344344
| 11 | [Radioisotope Thermoelectric Generators](https://adventofcode.com/2016/day/11) | [Source](src/year2016/day11.rs) | 719 |
345345
| 12 | [Leonardo's Monorail](https://adventofcode.com/2016/day/12) | [Source](src/year2016/day12.rs) | 1 |
346346
| 13 | [A Maze of Twisty Little Cubicles](https://adventofcode.com/2016/day/13) | [Source](src/year2016/day13.rs) | 3 |
347-
| 14 | [One-Time Pad](https://adventofcode.com/2016/day/14) | [Source](src/year2016/day14.rs) | 72000 |
347+
| 14 | [One-Time Pad](https://adventofcode.com/2016/day/14) | [Source](src/year2016/day14.rs) | 71000 |
348348
| 15 | [Timing is Everything](https://adventofcode.com/2016/day/15) | [Source](src/year2016/day15.rs) | 1 |
349349
| 16 | [Dragon Checksum](https://adventofcode.com/2016/day/16) | [Source](src/year2016/day16.rs) | 1 |
350350
| 17 | [Two Steps Forward](https://adventofcode.com/2016/day/17) | [Source](src/year2016/day17.rs) | 3606 |
@@ -366,7 +366,7 @@ Performance is reasonable even on older hardware, for example a 2011 MacBook Pro
366366
| 1 | [Not Quite Lisp](https://adventofcode.com/2015/day/1) | [Source](src/year2015/day01.rs) | 2 |
367367
| 2 | [I Was Told There Would Be No Math](https://adventofcode.com/2015/day/2) | [Source](src/year2015/day02.rs) | 8 |
368368
| 3 | [Perfectly Spherical Houses in a Vacuum](https://adventofcode.com/2015/day/3) | [Source](src/year2015/day03.rs) | 95 |
369-
| 4 | [The Ideal Stocking Stuffer](https://adventofcode.com/2015/day/4) | [Source](src/year2015/day04.rs) | 13000 |
369+
| 4 | [The Ideal Stocking Stuffer](https://adventofcode.com/2015/day/4) | [Source](src/year2015/day04.rs) | 12000 |
370370
| 5 | [Doesn't He Have Intern-Elves For This?](https://adventofcode.com/2015/day/5) | [Source](src/year2015/day05.rs) | 38 |
371371
| 6 | [Probably a Fire Hazard](https://adventofcode.com/2015/day/6) | [Source](src/year2015/day06.rs) | 454 |
372372
| 7 | [Some Assembly Required](https://adventofcode.com/2015/day/7) | [Source](src/year2015/day07.rs) | 27 |

src/util/md5.rs

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -141,10 +141,10 @@ fn common(f: u32, a: u32, b: u32, m: u32, s: u32, k: u32) -> u32 {
141141
pub mod simd {
142142
use std::array::from_fn;
143143
use std::simd::num::SimdUint as _;
144-
use std::simd::{LaneCount, Simd, SupportedLaneCount};
144+
use std::simd::*;
145145

146146
#[inline]
147-
pub fn hash_fixed<const N: usize>(buffers: &mut [[u8; 64]; N], size: usize) -> [[u32; N]; 4]
147+
pub fn hash_fixed<const N: usize>(buffers: &mut [[u8; 64]; N], size: usize) -> [Simd<u32, N>; 4]
148148
where
149149
LaneCount<N>: SupportedLaneCount,
150150
{
@@ -245,12 +245,7 @@ pub mod simd {
245245
c = round4(c, d, a, b, m2, 15, 0x2ad7d2bb);
246246
b = round4(b, c, d, a, m9, 21, 0xeb86d391);
247247

248-
[
249-
(a0 + a).swap_bytes().to_array(),
250-
(b0 + b).swap_bytes().to_array(),
251-
(c0 + c).swap_bytes().to_array(),
252-
(d0 + d).swap_bytes().to_array(),
253-
]
248+
[(a0 + a).swap_bytes(), (b0 + b).swap_bytes(), (c0 + c).swap_bytes(), (d0 + d).swap_bytes()]
254249
}
255250

256251
#[inline]

src/year2015/day04.rs

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,10 @@ fn worker(shared: &Shared) {
101101
#[cfg(feature = "simd")]
102102
mod simd {
103103
use super::*;
104+
use crate::util::bitset::*;
104105
use crate::util::md5::simd::hash_fixed;
105-
use std::simd::{LaneCount, SupportedLaneCount};
106+
use std::simd::cmp::SimdPartialEq as _;
107+
use std::simd::*;
106108

107109
#[expect(clippy::needless_range_loop)]
108110
fn check_hash_simd<const N: usize>(
@@ -123,12 +125,13 @@ mod simd {
123125
}
124126

125127
let [result, ..] = hash_fixed(buffers, size);
128+
let bitmask = (result & Simd::splat(0xfffff000)).simd_eq(Simd::splat(0)).to_bitmask();
126129

127-
for i in 0..N {
130+
for i in bitmask.biterator() {
128131
if result[i] & 0xffffff00 == 0 {
129132
shared.second.fetch_min(start + offset + i as u32, Ordering::Relaxed);
130133
shared.iter.stop();
131-
} else if result[i] & 0xfffff000 == 0 {
134+
} else {
132135
shared.first.fetch_min(start + offset + i as u32, Ordering::Relaxed);
133136
}
134137
}

src/year2016/day05.rs

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -111,8 +111,10 @@ fn worker(shared: &Shared) {
111111
#[cfg(feature = "simd")]
112112
mod simd {
113113
use super::*;
114+
use crate::util::bitset::*;
114115
use crate::util::md5::simd::hash_fixed;
115-
use std::simd::{LaneCount, SupportedLaneCount};
116+
use std::simd::cmp::SimdPartialEq as _;
117+
use std::simd::*;
116118

117119
#[expect(clippy::needless_range_loop)]
118120
fn check_hash_simd<const N: usize>(
@@ -133,11 +135,12 @@ mod simd {
133135
}
134136

135137
let [result, ..] = hash_fixed(buffers, size);
138+
let bitmask = (result & Simd::splat(0xfffff000)).simd_eq(Simd::splat(0)).to_bitmask();
136139

137-
for i in 0..N {
138-
if result[i] & 0xfffff000 == 0 {
139-
let mut exclusive = shared.mutex.lock().unwrap();
140+
if bitmask != 0 {
141+
let mut exclusive = shared.mutex.lock().unwrap();
140142

143+
for i in bitmask.biterator() {
141144
exclusive.found.push((start + offset + i as u32, result[i]));
142145
exclusive.mask |= 1 << (result[i] >> 8);
143146

0 commit comments

Comments
 (0)