-
Notifications
You must be signed in to change notification settings - Fork 339
exactfloat: Remove OpenSSL dependency #453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Just one quick comment -- there are cases where S2 needs to evaluate exact arithmetic expressions involving polynomials of degree at least 20 (see s2predicates.cc). These cases actually happen, and for certain kinds of input data they can actually occur frequently. So the performance of large multiplies may indeed be relevant, and it would be good to check how this change affects the various S2 benchmarks, especially those related to snap rounding (i.e. code that uses S2Builder). |
|
I didn't realize we had polynomials that big, where specifically? In the symbolic perturbation logic? I know the benchmarks are scrubbed before pushing to github, but since https://github.com/google/benchmark is open source now maybe @jmr can un-scrub them, then I could verify any changes. |
It will take a while to release them cleanly. It looks like I might need to move the benchmarks to their own files and call Here are the benchmarks that touch s2polygon_benchmark.txt For some, you will need to find your own large polygon like the Pacific Ocean. |
I see we have wasm tests for exactfloat that pass. Most s2geometry tests pass, too. The most obvious failures are wasm not supporting threads. |
Are your benchmarks vs OpenSSL or BoringSSL? How do they compare? |
|
Thanks for the quick turnaround. I believe I was using openSSL proper,
I'll double check and compare against BoringSSL too.
…On Fri, Sep 12, 2025 at 4:07 AM Jesse Rosenstock ***@***.***> wrote:
*jmr* left a comment (google/s2geometry#453)
<#453 (comment)>
I didn't try as hard for performance optimization as OpenSSL
Are your benchmarks vs OpenSSL or BoringSSL? How do they compare?
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGEMKXOBAM2RNTGBKGC2M33SKLOBAVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBUGY2TMNBXGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
One example is ExactEdgeCircumcenterSign() in s2predicates.cc. There are
others in that file as well.
…On Thu, Sep 11, 2025 at 8:05 PM smcallis ***@***.***> wrote:
*smcallis* left a comment (google/s2geometry#453)
<#453 (comment)>
I didn't realize we had polynomials that big, where specifically? In the
symbolic perturbation logic? I know the benchmarks are scrubbed before
pushing to github, but since https://github.com/google/benchmark is open
source now maybe @jmr <https://github.com/jmr> can un-scrub them, then I
could verify any changes.
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICG3CXQDUW42NOHA5HGXKT3SIZ63AVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBTGQ3TQMJUGI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
OK I'm testing against OpenSSL, and I instrumented bignum multiply to print the bit-width of the result. So far the worst case I've found is 17718 bits in the S2BufferOperation.RadiiAndErrorFractionCoverage test. So I added a "mega" class to my benchmarks with 18000 bits: I ran the S2BufferOperation and there were a few +1% regressions but more where it got faster by up to 10% if you believe the numbers: I didn't compute percentages but I'll include them here for posterity: s2edge_crossings_test: encoded_s2shape_index_test: @jmr I can run the s2polygon_test benchmarks too, but there's several flags defined in that file defining parameters for the benchmarks I'll need. |
|
As an aside @jmr if you get those benchmarks released here's a test_main.cc that can work with them: You just link with "gmock" instead of "gmock_main" and that test file and it lets you run the benchmarks like you'd expect with --benchmark_filter. |
|
That all sounds encouraging. I admit I'm surprised that all that carefully
crafted assembly code in the OpenSSL version doesn't have a bigger
performance impact, especially for numbers with thousands of bits.
Cheers,
Eric
…On Sat, Sep 13, 2025 at 12:35 PM smcallis ***@***.***> wrote:
*smcallis* left a comment (google/s2geometry#453)
<#453 (comment)>
As an aside @jmr <https://github.com/jmr> if you get those benchmarks
released here's a test_main.cc that can work with them:
#include "benchmark/benchmark.h"
#include "gtest/gtest.h"
int main(int argc, char* argv[]) {
benchmark::Initialize(&argc, argv);
testing::InitGoogleTest(&argc, (char**)argv);
// Just run benchmark if explicitly requested.
if (!benchmark::GetBenchmarkFilter().empty()) {
benchmark::RunSpecifiedBenchmarks();
exit(0);
}
return RUN_ALL_TESTS();
}
You just link with "gmock" instead of "gmock_main" *and* that test file
and it lets you run the benchmarks like you'd expect with
--benchmark_filter.
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICG3CVZT5FXS3LEA56GPF33SRWYVAVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOBYG43DCNZWG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
OK I used the compare.py that ships with benchmark now and disabled ASLR and CPU governors, and a min_time of 20s to get as fair a comparison as I could. All entries are (bignum - openssl)/openssl: @ericveach I think OpenSSL's philosophy was to minimize the variance in their operations to avoid timing attacks, they admit GMP is faster in general than their bignum. |
|
BTW, I originally implemented ExactFloat using MPFR but had to switch over
to OpenSSL due to the LPGL license on MPFR, which wasn't acceptable for
some Google products. MPFR was indeed faster. There is probably still a
version of ExactFloat based on MPFR (and that supports a lot more
operations and features) buried in Perforce somewhere.
…On Mon, Sep 15, 2025 at 9:45 AM smcallis ***@***.***> wrote:
*smcallis* left a comment (google/s2geometry#453)
<#453 (comment)>
OK I used the compare.py that ships with benchmark now and disabled ASLR
and CPU governors, and a min_time of 20s to get as fair a comparison as I
could. All entries are (bignum - openssl)/openssl:
Benchmark Time CPU Time Old Time New CPU Old CPU New
----------------------------------------------------------------------------------------------------------------------------
BM_BufferPoints/1 -0.0082 -0.0082 18892 18738 18890 18736
BM_BufferPoints/10 -0.0235 -0.0235 215633 210558 215609 210539
BM_BufferPoints/1000 -0.0574 -0.0574 23350243 22009805 23347813 22007591
BM_BufferConvexLoop/10/10 -0.0435 -0.0435 29588 28300 29585 28298
BM_BufferConvexLoop/1000/10 -0.0101 -0.0101 2592775 2566717 2592526 2566441
BM_BufferConvexLoop/1000/1000 +0.0009 +0.0009 2559509 2561839 2559259 2561581
BM_BufferConcaveLoop/10/10 -0.0002 -0.0001 37098 37093 37095 37089
BM_BufferConcaveLoop/1000/10 -0.0126 -0.0126 5177795 5112490 5177284 5111983
BM_BufferConcaveLoop/1000/100 +0.0151 +0.0151 8372052 8498848 8371219 8497918
BM_BufferLoDimFractal/48 -0.0490 -0.0490 179225 170447 179204 170431
BM_BufferLoDimFractal/3072 -0.0521 -0.0521 34339890 32551229 34335649 32548081
BM_BufferHiDimFractal/48 -0.0656 -0.0656 257797 240879 257771 240857
BM_BufferHiDimFractal/3072 -0.0617 -0.0616 29076370 27283717 29073327 27281040
OVERALL_GEOMEAN -0.0286 -0.0286 0 0 0 0
Benchmark Time CPU Time Old Time New CPU Old CPU New
---------------------------------------------------------------------------------------------------------------------------------------------------------
BM_GetIntersection<&S2::GetIntersection> +0.0002 +0.0002 19 19 19 19
BM_GetIntersection<&GetIntersectionStableLDIgnoreValidity> +0.0052 +0.0052 117 118 117 118
BM_GetIntersection<&S2::internal::GetIntersectionExact> +0.0037 +0.0037 1338 1343 1338 1343
OVERALL_GEOMEAN +0.0030 +0.0030 0 0 0 0
Benchmark Time CPU Time Old Time New CPU Old CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------
BM_MutableIndexBuildDestruct/0/0 -0.0047 -0.0046 1055 1050 1055 1050
BM_MutableIndexBuildDestruct/1/0 -0.0001 -0.0010 125978 125965 125889 125765
BM_MutableIndexBuildDestruct/2/0 -0.0019 -0.0019 193622 193247 193603 193229
BM_MutableIndexDecodeDestruct/0/0 -0.0192 -0.0191 118 115 118 115
BM_MutableIndexDecodeDestruct/1/0 -0.0573 -0.0573 12000 11313 11999 11311
BM_MutableIndexDecodeDestruct/2/0 -0.0375 -0.0375 11211 10791 11210 10790
BM_EncodedIndexInitDestruct/0/0 +0.0018 +0.0018 64 64 64 64
BM_EncodedIndexInitDestruct/1/0 -0.0237 -0.0237 188 184 188 184
BM_EncodedIndexInitDestruct/2/0 +0.0164 +0.0164 145 147 145 147
BM_S2PolygonDecodeDestructSnapped/0/0 -0.0530 -0.0533 190 180 190 180
BM_S2PolygonDecodeDestructSnapped/1/0 -0.0176 -0.0176 1274 1251 1274 1251
BM_S2PolygonDecodeDestructSnapped/2/0 -0.0024 -0.0024 12720 12690 12718 12688
BM_S2PolygonDecodeDestructSnapped/0/1 -0.0178 -0.0178 689 676 689 676
BM_S2PolygonDecodeDestructSnapped/1/1 -0.0085 -0.0085 10642 10551 10641 10550
BM_S2PolygonDecodeDestructSnapped/2/1 -0.0332 -0.0332 70155 67827 70148 67821
BM_LaxPolygonDecodeDestructSnapped/0/0 -0.0055 -0.0055 27 27 27 27
BM_LaxPolygonDecodeDestructSnapped/1/0 +0.0507 +0.0507 1297 1363 1297 1363
BM_LaxPolygonDecodeDestructSnapped/2/0 +0.0011 +0.0011 1817 1819 1816 1818
BM_LaxPolygonDecodeDestructSnapped/0/1 +0.0000 -0.0001 126 126 126 126
BM_LaxPolygonDecodeDestructSnapped/1/1 +0.0043 +0.0043 11971 12022 11970 12021
BM_LaxPolygonDecodeDestructSnapped/2/1 +0.0136 +0.0136 13872 14060 13870 14059
BM_EncodedPolygonInitDestructSnapped/0/0 +0.0335 +0.0335 5 5 5 5
BM_EncodedPolygonInitDestructSnapped/1/0 +0.0119 +0.0120 7 7 7 7
BM_EncodedPolygonInitDestructSnapped/2/0 +0.0059 +0.0059 10 10 10 10
BM_EncodedPolygonInitDestructSnapped/0/1 +0.1442 +0.1442 9 10 9 10
BM_EncodedPolygonInitDestructSnapped/1/1 -0.0200 -0.0200 28 28 28 28
BM_EncodedPolygonInitDestructSnapped/2/1 +0.0082 +0.0081 38 38 38 38
BM_S2PolygonGetEdge/0/0 +0.0142 +0.0145 3 3 3 3
BM_S2PolygonGetEdge/1/0 +0.0006 +0.0006 3 3 3 3
BM_S2PolygonGetEdge/2/0 +0.0091 +0.0093 3 3 3 3
BM_LaxPolygonGetEdge/0/0 -0.0052 -0.0053 3 3 3 3
BM_LaxPolygonGetEdge/1/0 -0.0051 -0.0051 3 3 3 3
BM_LaxPolygonGetEdge/2/0 +0.0014 +0.0014 4 4 4 4
BM_EncodedPolygonGetEdge/0/0 -0.0003 -0.0003 3 3 3 3
BM_EncodedPolygonGetEdge/1/0 -0.0010 -0.0011 2 2 2 2
BM_EncodedPolygonGetEdge/2/0 +0.0007 +0.0007 5 5 5 5
BM_EncodedPolygonGetEdge/0/1 +0.0065 +0.0065 28 28 28 28
BM_EncodedPolygonGetEdge/1/1 +0.0065 +0.0065 31 31 31 31
BM_EncodedPolygonGetEdge/2/1 +0.0072 +0.0072 33 33 33 33
BM_ContainsPointMutableIndexS2Polygon/0/0 +0.0074 +0.0074 167 168 167 168
BM_ContainsPointMutableIndexS2Polygon/1/0 -0.0007 -0.0007 179 179 179 179
BM_ContainsPointMutableIndexS2Polygon/2/0 +0.0079 +0.0079 513 517 513 517
BM_ContainsPointEncodedIndexPolygon/0/0 +0.0018 +0.0020 244 244 244 244
BM_ContainsPointEncodedIndexPolygon/1/0 -0.0088 -0.0088 241 239 241 239
BM_ContainsPointEncodedIndexPolygon/2/0 -0.0037 -0.0037 732 729 732 729
BM_ContainsPointEncodedIndexPolygon/0/1 -0.0091 -0.0093 503 499 503 499
BM_ContainsPointEncodedIndexPolygon/1/1 +0.0059 +0.0060 484 487 484 487
BM_ContainsPointEncodedIndexPolygon/2/1 -0.0132 -0.0132 1848 1824 1848 1824
BM_VertexDistanceMutableIndexS2Polygon/0/0 -0.0109 -0.0111 279 276 279 276
BM_VertexDistanceMutableIndexS2Polygon/1/0 -0.0058 -0.0058 725 720 724 720
BM_VertexDistanceMutableIndexS2Polygon/2/0 +0.0154 +0.0159 1905 1935 1904 1934
BM_VertexDistanceEncodedIndexPolygon/0/0 -0.0381 -0.0382 359 346 359 346
BM_VertexDistanceEncodedIndexPolygon/1/0 -0.0099 -0.0095 859 850 858 850
BM_VertexDistanceEncodedIndexPolygon/2/0 -0.0136 -0.0139 2469 2435 2468 2434
BM_VertexDistanceEncodedIndexPolygon/0/1 -0.0124 -0.0124 850 840 850 839
BM_VertexDistanceEncodedIndexPolygon/1/1 -0.0076 -0.0076 1563 1551 1563 1551
BM_VertexDistanceEncodedIndexPolygon/2/1 -0.0053 -0.0053 5153 5126 5153 5125
BM_DecodePolygonBuildIndexContainsPointSnapped/0/0 -0.0111 -0.0106 1505 1488 1504 1488
BM_DecodePolygonBuildIndexContainsPointSnapped/1/0 +0.0041 +0.0041 101370 101788 101359 101777
BM_DecodePolygonBuildIndexContainsPointSnapped/2/0 -0.0273 -0.0273 227031 220832 227008 220811
BM_DecodePolygonBuildIndexContainsPointSnapped/0/1 -0.0104 -0.0104 2057 2036 2057 2036
BM_DecodePolygonBuildIndexContainsPointSnapped/1/1 +0.0147 +0.0147 109824 111436 109813 111427
BM_DecodePolygonBuildIndexContainsPointSnapped/2/1 +0.0133 +0.0133 285840 289652 285811 289623
BM_DecodeIndexAndPolygonContainsPointSnapped/0/0 -0.0143 -0.0143 545 537 545 537
BM_DecodeIndexAndPolygonContainsPointSnapped/1/0 +0.0030 +0.0030 20178 20237 20176 20235
BM_DecodeIndexAndPolygonContainsPointSnapped/2/0 -0.0097 -0.0091 24945 24704 24929 24701
BM_DecodeIndexAndPolygonContainsPointSnapped/0/1 -0.0021 -0.0021 1096 1094 1096 1094
BM_DecodeIndexAndPolygonContainsPointSnapped/1/1 +0.0160 +0.0160 29431 29902 29428 29900
BM_DecodeIndexAndPolygonContainsPointSnapped/2/1 +0.0190 +0.0190 80886 82421 80878 82413
BM_EncodedIndexAndPolygonContainsPointSnapped/0/0 -0.0132 -0.0132 304 300 304 300
BM_EncodedIndexAndPolygonContainsPointSnapped/1/0 +0.0182 +0.0182 485 494 485 494
BM_EncodedIndexAndPolygonContainsPointSnapped/2/0 -0.0152 -0.0152 1047 1031 1047 1031
BM_EncodedIndexAndPolygonContainsPointSnapped/0/1 +0.0098 +0.0098 558 564 558 564
BM_EncodedIndexAndPolygonContainsPointSnapped/1/1 +0.0052 +0.0052 837 841 837 841
BM_EncodedIndexAndPolygonContainsPointSnapped/2/1 +0.0095 +0.0096 2364 2387 2364 2386
OVERALL_GEOMEAN -0.0012 -0.0012 0 0 0 0
@ericveach <https://github.com/ericveach> I think OpenSSL's philosophy
was to minimize the variance in their operations to avoid timing attacks,
they admit GMP is faster in general than their bignum.
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICG3CQ4ZUHA5L2AAGRGCTL3S3UL3AVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTEOJTGA3DANJWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
You might even be able to find it in the Google Code archive. [Update: No, it looks like it was removed before the first open-source release.] If not, I can upload it again if it's worth benchmarking against.
@smcallis could you run the BoringSSL benchmarks? Building with bazel should use that. |
|
Yeah I'll run those, probably be a day or two though. |
|
Here's vs boringSSL (no mean feat, the bazel build is pretty crufty at this point): |
|
It looks like the cases that need The most cycles I can find in I would rather optimize for ease of maintenance and not have to understand and carry this bignum code. I'm not sure how "targeting things like WASM are simpler" with this, since WASM already works. Are there other reasons we should want our own bignum? |
|
I agree that ExactFloat performance is not critical, although there are
data sets where it can be dominant. Specifically, it can be used heavily
when there are many data points along perfectly straight lines, e.g. a line
of longitude or the equator. An example would be finding the intersection
or union of two polygons that share a long straight border, where each
border is defined using many vertices but different vertices for each
polygon. That being said, S2 uses various numerical techniques to avoid
using ExactFloat whenever possible, so even in this sort of situation it's
unpredictable how much it will be needed.
BTW I don't think it's worth anyone's effort to dig up the MPFR-based
ExactFloat (which I think had a different name anyway) because of the LGPL
license.
I'm all in favor of removing the OpenSSL dependency if feasible.
Cheers,
Eric
…On Thu, Sep 18, 2025, 07:28 Jesse Rosenstock ***@***.***> wrote:
*jmr* left a comment (google/s2geometry#453)
<#453 (comment)>
It looks like the cases that need ExactFloat are very uncommon. I don't
see any ExactFloat samples when looking for S2Builder::SnapEdge in GWP.
The most cycles I can find in ExactFloat from S2 seem to be because of
ExactCircleEdgeIntersectionSign -> ExactFloat::operator- ->
ExactFloat::SignedSum and it's tiny.
I would rather optimize for ease of maintenance and not have to understand
and carry this bignum code.
I'm not sure how "targeting things like WASM are simpler" with this, since
WASM already works. Are there other reasons we should want our own bignum?
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICG3CXY5KOQAGFYTC2IQXT3TK6SHAVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMBXG43TSMJTGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
This was admittedly scratching my own itch. You have to have an openssl compiled with wasm installed before you can compile S2 with wasm, and it's known to be obnoxious. I'm travelling now but when I get back I'll see if I can get some estimates on how much smaller the wasm binary is without OpenSSL too. |
It actually wasn't that hard. I had already done it, just hadn't uploaded the results. Base is OpenSSL, experiment is MPFL: Mostly small differences, but I won't be distributing the I will make a new release with the |
|
Managed to get my WASM build up and running again, the output binary is ~10% smaller without OpenSSL in there. I'm sure that effect's larger for just S2 itself too. |
What are the raw numbers? |
|
It's 8.2 MB with and 7.4MB without.
…On Mon, Sep 22, 2025 at 1:52 PM Jesse Rosenstock ***@***.***> wrote:
*jmr* left a comment (google/s2geometry#453)
<#453 (comment)>
the output binary is ~10% smaller without OpenSSL in there
What are the raw numbers?
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAGEMKSDGMBI5OZSXUTP5VL3UBHMFAVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMRRGIYDAOBZGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
It turns out S2 is no longer the only (or even main) user of Can you:
|
Yeah I'll do all these, I'm employed with Google through October 5th so after that it won't make sense to use my google email though. |
ef7cd7c to
6c83bfe
Compare
|
Should be merged now. |
The cast itself adds 2^32 which is equivalent to taking the absolute value.
|
@ericveach There are ~15 unresolved conversations here where smcallis has responded. Can you (ericv) check that the latest code is ok and either resolve them or request further changes? |
|
This might work (declared just before the function:
…__attribute__((optimize("-fwrapv")))
Alternatively, explicitly writing out a "safe abs" seems to generate the
same code (i.e. a conditional move rather than a branch):
bn_ = Bignum(v < 0 ? -static_cast<unsigned>(x) : x);
This is defined behavior and would only generate I warning if compiled with
-Wsign-conversion. Surely the default warnings don't go that far?
On Thu, Oct 30, 2025 at 8:41 AM Jesse Rosenstock ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/s2/util/math/exactfloat/exactfloat.cc
<#453 (comment)>:
>
-ExactFloat::ExactFloat(const ExactFloat& b)
- : sign_(b.sign_), bn_exp_(b.bn_exp_) {
- BN_copy(bn_.get(), b.bn_.get());
+ if (v == std::numeric_limits<int>::min()) {
Yes, either that or figure out how to apply -fwrapv to this one function.
Since we have a test case for INT_MIN I'm not very worried about this UB.
—
Reply to this email directly, view it on GitHub
<#453 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AICG3CSFCAORRSBNZB7YXWD32IWRDAVCNFSM6AAAAACGIRXRYWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMBQGMYDSNBSHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@ericveach says he won't have more time for a while (not sure if it applies directly to this) so I'll see if it looks like his comments have been addressed and merge if they have. I'll cut a new release before merging this, though. |
|
If I do find some time, what's the easiest way to see just these unresolved comments? |
What I do:
|
| // Calculates abs(v) without UB. SafeAbs(INT_MIN) == INT_MIN. | ||
| // Generates the same code as std::abs(). | ||
| // https://godbolt.org/z/eT6KW1zGb | ||
| int SafeAbs(int v) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: make return type unsigned and remove cast at call site.
| // noting that the "best" way to implement this is with the Intel MULX, ADCQ, | ||
| // and ADOQ instructions (i.e. the _mulx_u64, _addcarry_u64, and | ||
| // _addcarryx_u64 intrinsics), but GCC and Clang do not support _addcarryx_u64 | ||
| // properly (and have no plans to do so). The issue is that gcc doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a link for "have no plans to do so"?
|
|
||
| // We need to add z1*10^half, which we can do by simply adding z1 at a shifted | ||
| // position in the output. | ||
| auto dst_z1 = dst.subspan(half); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less auto in a lot of places. I would like to know what types we have. C++ style is not Rust style.
https://google.github.io/styleguide/cppguide.html#Type_deduction
This removes the dependency on OpenSSL, which we only need for its bignum. The main benefit of this is that targeting things like WASM are simpler without that dependency.
I didn't try as hard for performance optimization as OpenSSL (which also has a hard requirement around constant-time operation for security purposes, thus lots of inline assembly), but benchmarks indicate better performance for small operations (likely due to absl::InlinedVector), and within a factor of two for multiply up to 1024-bits .
The size of exactfloats is largely determined by the number of sequential multiplies that have to be done and we don't have that deep of math. I think the worst case is
A.CrossProd(b).Dot(c)which would be two multiplies deep, which would get us up to 256-bit values (medium size above). So, I don't expect there to be any real perceivable performance difference.