Mixed-precision solver #6521

nrseman · 2025-10-07T16:36:03Z

This PR introduces mixed-precision building blocks for Krylov subspace methods and a highly optimized mixed-precision implementation of ILU0 + BiCGSTAB. Initial testing indicate 1.6x speed-up per linear iteration with minimal impact on iteration count. For information on how to activate the new solver see the README.md file in the opm/simulators/linalg/mixed folder.

Looking forward to your feedback.

PS! After rebasing on master yesterday, the SPE10 case I used as one of my performance tests failed endpoint validation due to negative SOGCR. I have not investigated further, but it seems to be a finite precision issue. You can see my hack to work around the issue in the mixed-dev branch on my fork

blattms · 2025-10-07T20:27:22Z

jenkins build this please

akva2 · 2025-10-08T06:27:05Z

i see malloc(), yet no free(). i see new[] yet no delete[].

mike drop.

multitalentloes · 2025-10-08T08:42:38Z

1.6 speedup is very promising @nrseman, great work!
I am quite interested in this topic, so I'll ask for a few questions if you don't mind (I have not yet looked at the exact implementation).

what is being done in full precision, and what is being done with single (or even lower) precision (linear solver, preconditioner, specific operations, parts of matrices etc)?
And was that combination of precisions attained experimentally by trial and error or has there been specific tools used to uncover calculations that are not very sensitive to the precision being used, or other types of arguments made to identify the particular mixed-precision schemes you have identified?

nrseman · 2025-10-08T17:55:41Z

i see malloc(), yet no free(). i see new[] yet no delete[].

mike drop.

I have renamed all *_new() methods to *_alloc() and added corresponding *_free() methods which are called from the destructor. @akva2, I don't think I'm using new[] anywhere. If you still see an issue please let me know.

nrseman · 2025-10-08T18:00:01Z

Thank you for your interest and your questions, @multitalentloes. I don't want to start a detailed technical discussion in this PR request. I'll contact you separately.

opm/models/discretization/common/tpfalinearizer.hh

opm/simulators/linalg/mixed/bslv.c

multitalentloes

Thank you for the contribution to OPM, I have some surface-level feedback on the code right now, and also taken the opportunity to note down some of the goals for this code further down that line.

Immediate code-fixes:

Compilation error when compiling with CUDA/HIP (see comment)
Compilation error type mismatch during variable assignment (see comment)
Ensure that this PR does not break OPM compilation for ARM CPUs (as they do not support avx2)

General stuff:

Use docstrings, see examples from other files in the codebase
Use clang-format to make it more consistent with the rest of the codebase
Use fmt::print instead of printf

Future work:

Registering the preconditioner in the factory (currently trivial placeholder). This will make it available to other multi-stage preconditioners in OPM, for instance letting us combine it with the default CPRW preconditioner, which is would have a great impact if performance is improved! From what we discussed outside of this PR this preconditioner is implemented with knowledge of how the BiCGSTAB is performed and that is something that should be looked more into.
Having these preconditioners registered in the MPI factory as well with wrapBlockPreconditioner so it can be used in parallel simulations

multitalentloes · 2025-11-19T10:06:00Z

opm/simulators/linalg/StandardPreconditioners_serial.hpp

            DUNE_UNUSED_PARAMETER(prm);
            return std::make_shared<MultithreadDILU<M, V, V>>(op.getmat());
        });
+        F::addCreator("mixed-ilu0", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {


this line produces compilation warnings due to unused variables

Addressed by adding

DUNE_UNUSED_PARAMETER(op);

and will be included in my next update.

multitalentloes · 2025-11-19T10:06:18Z

opm/simulators/linalg/StandardPreconditioners_serial.hpp

+            DUNE_UNUSED_PARAMETER(prm);
+            return std::make_shared<TrivialPreconditioner<V,V>>();
+        });
+        F::addCreator("mixed-dilu", [](const O& op, const P& prm, const std::function<V()>&, std::size_t) {


unused variable warnings here as well

Addressed by adding

DUNE_UNUSED_PARAMETER(op);

and will be included in my next update.

multitalentloes · 2025-11-19T10:07:45Z

opm/simulators/linalg/StandardPreconditioners_serial.hpp

+    virtual void update() override {};
+    virtual bool hasPerfectUpdate() const override {return true;}
+    virtual void pre (X& x, Y& y) override {};
+    virtual void post (X& x) override {};


unused variable warnings for all of these functions with arguments as well with the empty body. I think this has been resolved other places in the code with [[maybe_unused]] or a similar attribute

Resolved by prepending all arguments by [[maybe_unused]] and will be part of my next update.

multitalentloes · 2025-11-19T10:11:51Z

opm/simulators/linalg/mixed/wrapper.hpp

+    {
+        int nrows = A.N();
+        int nnz   = A.nonzeroes();
+        int b     = A[0][0].N();


M might be a GpuSparseMatrix when compiling on a machine with a GPU, so line will cause problems because using the [] operator is not available for that class. Probably easiest to do some type logic in the FlexibleSolver to avoid this. Causes compilation error out of the box for me.

multitalentloes · 2025-11-19T10:29:22Z

opm/models/discretization/common/fvbaselinearizer.hh

+        printf("n = %lu\n",x.dim());
+    }
+
+    void exportSparsity(const char *path='.')


causes compilation error '.' is a char, replace with "." to make it a string/char*

Interestingly, I don't get a compliation error, but your observation is correct and I have made the fix.

multitalentloes · 2025-11-19T10:31:06Z

opm/simulators/linalg/mixed/prec.c

+
+
+/*
+void mat3_view(double const *M, char const *name)


remove dead code?

opm/simulators/linalg/mixed/bslv.c

multitalentloes · 2025-11-19T10:39:04Z

opm/simulators/linalg/mixed/bslv.c

+
+}
+
+int bslv_pbicgstab3m(bslv_memory *mem, bsr_matrix *A, const double *b, double *x)


In general for new functions: Use docstring to document what the function does and what its expected inputs, outputs, and special assumptions are to make the implementation more accessible

@multitalentloes: I'm a bit confused about what you mean by docstring. Do you simply mean a multiiline comment or is it required to adhere to the doxygen standard? I see a mix in the current code. Also, is there a policy on whether the documentation is attached to the function definition of the function declaration?

Ideally it should adhere to the doxygen standard for those who use that. I would guess declaration is best, at least that is what I have done when declaration is in a separate headerfile

multitalentloes · 2025-11-19T10:40:04Z

opm/simulators/linalg/mixed/prec.c

+
+
+
+


Use clang-format on all the files to ensure that things are compatible with existing code, easiest to just to it now from the start to avoid for instance weird whitespacing such as these blank lines

opm/models/discretization/common/tpfalinearizer.hh

kjetilly · 2025-11-19T12:24:03Z

opm/simulators/linalg/mixed/wrapper.hpp

+{
+    public:
+
+    MixedSolver(M A, double tol, int maxiter, bool use_dilu)


I am a bit dubious to taking a copy of A here, it seems to work for some inexplicable reasons, but why not take a const ref (const M& A)? I think as it stands, the code is storing a pointer (data_) to a memory location that will be deleted after invocation. No idea why this works in practice here, though.

The fact that this works implies that a copy of the matrix object A is a shallow copy. Nevertheless, your solution is better and the change will be included in my next update.

nrseman · 2025-11-19T18:25:43Z

I have addressed all trivial issues and pushed the updates to the PR branch. What remains is the following:

Compile on ARM: Based on conversations with @blattms, the solution will be to perform a compile time check for avx2 support and conditionally enable the mixed solvers. This implies that the functionality will not be able on ARM for now.
Fix GPU compilation errors: I'd appreciate any help on this as I am currently not set up to compile for GPUs. @kjetilly, @multitalentloes is this something either of you can help me with?
Add function descriptions: Doc strings it is ...
clang-format: I'm saving this to the end.

nrseman · 2025-11-19T18:45:46Z

@blattms: Would the file below work for the compilation test we discussed? I have verified that the code compiles with the following command

gcc -mavx2 -c test_avx2.c

test_avx2.c

#include <immintrin.h>

typedef
struct bsr_matrix
{
    int nrows;
    int ncols;
    int nnz;
    int b;

    int *rowptr;
    int *colidx;
    double *dbl;
    float *flt;

} bsr_matrix;

void bsr_vmspmv3(bsr_matrix *A, const double *x, double *y)
{
    int nrows = A->nrows;
    int *rowptr=A->rowptr;
    int *colidx=A->colidx;
    const float *data=A->flt;

    const int b=3;

    __m256d mm_zeros =_mm256_setzero_pd();
    for(int i=0;i<nrows;i++)
    {
        __m256d vA[3];
        for(int k=0;k<3;k++) vA[k] = mm_zeros;
        for(int k=rowptr[i];k<rowptr[i+1];k++)
        {
            const float *AA=data+9*k;

            int j = colidx[k];
            __m256d vx = _mm256_loadu_pd(x+b*j);

            vA[0] += _mm256_cvtps_pd(_mm_loadu_ps(AA+0))*_mm256_permute4x64_pd(vx,0b00000000); //0b01010101
            vA[1] += _mm256_cvtps_pd(_mm_loadu_ps(AA+3))*_mm256_permute4x64_pd(vx,0b01010101); //0b01010101
            vA[2] += _mm256_cvtps_pd(_mm_loadu_ps(AA+6))*_mm256_permute4x64_pd(vx,0b10101010); //0b01010101
        }

        // sum over columns
        __m256d vy, vz;
        vz = vA[0] + vA[1] + vA[2];

        double *y_i = y+b*i;
        vy = _mm256_loadu_pd(y_i);       // optional blend to keep
        vz =_mm256_blend_pd(vy,vz,0x7);  // 4th element unchanged
        _mm256_storeu_pd(y_i,vz);
    }
}

kjetilly · 2025-11-19T21:11:50Z

@nrseman I think the only thing needed for this to compile on with GPU support enabled is the change in FlexibleSolver_impl.hpp here: kjetilly@88cfe8a . Essentially protecting the call to your code in a if constexpr (!is_gpu_operator...), so that the compiler never tries to use your code with a GPU matrix.

nrseman · 2025-11-24T19:23:20Z

@kjetilly: After updating my toolchain I ran into the posix_memalign visibility issue you have mentioned. I am a bit surprised because I was under the impression that all recent versions of glibc make it visible by default. Anyways, my fix is similar to yours and already included among the pushed updates.

blattms · 2025-11-25T07:20:03Z

@blattms: Would the file below work for the compilation test we discussed? I have verified that the code compiles with the following command

Please add a main method to make this compile. We need to be able to run this program.

blattms

I think there should be some kind of tests that makes sure that the solver runs at least. Currently, we will only know whether it compiles.

If you do C allocations, then you should probably check the results. I am not sure whether the memory is sufficient for arrays that are later realigned.

Is there are any code that checks that:

the blocks have the correct size
that the run is serial and not parallel
etc.

We need to prevent users from misusing this.
I might have missed something of course.

opm/simulators/linalg/mixed/bslv.c

blattms · 2025-11-25T13:53:20Z

opm/simulators/linalg/mixed/bslv.c

@@ -0,0 +1,260 @@
+#define _POSIX_C_SOURCE 200809L // required for posix_memalgin in <stdlib.h>


Any chance we can get rid of this in favor of a compiler argument or requesting a certain C standard?

For some reasonwith my toolchain adding -std=c11 does not fix this, but -D_ISOC11_SOURCE does. I removed the define statement from the code, but the macro definition needs to be added to by cmake. Right now I do that manually though -DCMAKE_C_FLAGS.

But we do not use posix_memalign anymore. Hence we don't need it. Seems resolved, right?

opm/simulators/linalg/mixed/bslv.c

blattms · 2025-11-25T14:05:03Z

opm/simulators/linalg/mixed/prec.c

+
+prec_t *prec_alloc()
+{
+    prec_t *P = malloc(sizeof(prec_t));


A check seems missing here.

I'm now checking with assert.

blattms · 2025-11-25T14:05:41Z

opm/simulators/linalg/mixed/bsr.c

+
+bsr_matrix* bsr_alloc()
+{
+    bsr_matrix *A=malloc(sizeof(bsr_matrix));


the result of malloc should probably be checked.

I'm now checking with assert.

blattms · 2025-11-25T14:06:31Z

opm/simulators/linalg/mixed/prec.c

+    // offsets for off-diagonal updates
+    int count;
+    count = prec_analyze(U,NULL);
+    P->offsets = malloc(3*(count+1)*sizeof(int));


Check missing...

I'm now checking with assert.

blattms · 2025-11-25T14:19:40Z

opm/simulators/linalg/mixed/bslv.c

+
+bslv_memory *bslv_alloc()
+{
+    bslv_memory *mem = malloc(sizeof(bslv_memory));


Check result?

I'm now checking with assert.

Might be good enough for OPM, because we never compile with -DNDEBUG to not make our simulator to fast 😅 One day that might change, though.

For "normal" projects I would use a check that even triggers in production. Asserts should be used to catch programming errors. This seems to be something that happens if system resources are exhausted.

blattms · 2025-11-25T14:22:51Z

opm/simulators/linalg/mixed/bsr.h

+{
+    int nrows;
+    int ncols;
+    int nnz;


Does using int for nnz limit us too much?

Once we start running models with more than 2 billion connections (~25M cells) then yes.
Let us just not lose sight of the fact that we are trying to get a fundamentally new feature into the simulator. Given that the current implementation only allows sequential runs, it is very unlikey that anyone will try to apply it to extrelemy large models.

blattms · 2025-11-25T14:30:22Z

Unfortunately, you will need to rebase this because of conflicts,

blattms · 2025-11-25T14:46:24Z

This module might get used by others. Please make sure that the new headers are installed by listing them in CMakeLists_files.cmake variable PUBLIC_HEADER_FILES

blattms · 2025-11-25T15:00:49Z

I am also getting some warnings:

/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:505:22: warning: operation on '((Opm::TpfaLinearizer<Opm::Properties::TTag::FlowWaterOnlyProblem>*)this)->Opm::TpfaLinearizer<Opm::Properties::TTag::FlowWaterOnlyProblem>::exportCount_' may be undefined [-Wsequence-point]
  505 |         exportCount_ = exportIndex_==idx ? ++exportCount_ : 0;
      |         ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:27: warning: unused parameter 'idx' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                       ~~~~^~~      |                       ~~~~^~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:38: warning: unused parameter 'tag' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                                ~~~~~~^~~
/build/opm-simulators/opm/models/discretization/common/fvbaselinearizer.hh:316:55: warning: unused parameter 'path' [-Wunused-parameter]
  316 |     void exportSystem(int idx, char *tag, const char *path="export")
      |                                           ~~~~~~~~~~~~^~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:507:32: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
  507 |         sprintf(tag,"_%03d_%02d",exportIndex_, exportCount_);
      |                                ^
In member function 'void Opm::TpfaLinearizer<TypeTag>::exportSystem(int, char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]',
    inlined from 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/simulators/flow/BlackoilModel_impl.hpp:300:57:
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:507:16: note: 'sprintf' output between 8 and 25 bytes into a destination of size 16
  507 |         sprintf(tag,"_%03d_%02d",exportIndex_, exportCount_);
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh: In member function 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]':
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:527:29: warning: '%s' directive writing up to 15 bytes into a region of size between 1 and 256 [-Wformat-overflow=]
  527 |         sprintf(filename,"%s%s.f64",name,tag);
      |                             ^~
In member function 'void Opm::TpfaLinearizer<TypeTag>::exportVector(GlobalEqVector&, const char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]',
    inlined from 'void Opm::TpfaLinearizer<TypeTag>::exportSystem(int, char*, const char*) [with TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:518:21,
    inlined from 'Opm::SimulatorReportSingle Opm::BlackoilModel<TypeTag>::nonlinearIterationNewton(int, const Opm::SimulatorTimerInterface&, NonlinearSolverType&) [with NonlinearSolverType = Opm::NonlinearSolver<Opm::Properties::TTag::FlowGasOilEnergyProblem, Opm::BlackoilModel<Opm::Properties::TTag::FlowGasOilEnergyProblem> >; TypeTag = Opm::Properties::TTag::FlowGasOilEnergyProblem]' at /build/opm-simulators/opm/simulators/flow/BlackoilModel_impl.hpp:300:57:
/build/opm-simulators/opm/models/discretization/common/tpfalinearizer.hh:527:16: note: 'sprintf' output between 5 and 275 bytes into a destination of size 256
  527 |         sprintf(filename,"%s%s.f64",name,tag);
      |         ~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I have not checked this and it might be false postives on architecture ppc64el. This is for a rebased version.

blattms · 2025-11-25T15:39:15Z

It seems like I can call the simulator without --matrix-add-well-contributions=true and do not get an error.

What does happen in this case?

Maybe we should enforce that option somehow?

blattms · 2025-11-25T15:50:20Z

opm/simulators/linalg/setupPropertyTree.cpp

+
+    // mixed-precision ILU0
+    if (conf == "mixed-dilu") {
+        return setupMixedDILU(conf, p);


The error message below should probably be adapted.

Also we need to skip this or fallback to something different once my PR is merged and we do not have avx2 .

nrseman · 2025-11-26T16:26:32Z

Fail if number of components is other than three
Fail or use a sensible fall-back if user does not pass --matrix-add-well-distributions=true

These two items have been addressd in my recent push by throwing if either of the above conditions are met.

nrseman · 2025-11-26T16:31:22Z

Check whether there is enough padding for realignment if that is needed

@blattms:I am not sure what you mean here. The only padding occuring is in the initializion of bslv_mem, but it is done up front and there is no realignment. I am simply rounding up the allocations to nearest multiple of cache line size. If you have something else in mind, please let me know.

nrseman · 2025-11-26T16:34:26Z

Check whether allocations were successfull

@blattms: As mentioned above, all allocations are now followed by asserts to ensure that they were successful.

nrseman · 2025-11-26T16:41:03Z

@multitalentloes: You have been quiet for a very long time. Do you mind summarizing your remaining must have requirements for merging in a check list similar to what @blattms provided above? This PR has been open for almost 2 months. It is time to get this done ...

blattms · 2025-11-26T20:41:28Z

I came across another compiler warning:

home/mblatt/src/dune/opm/opm-simulators/opm/simulators/linalg/mixed/bslv.c:66:24: warning: implicit declaration of function ‘aligned_alloc’ [-Wimplicit-function-declaration]
   66 |         mem->dtmp[i] = aligned_alloc(64,np*sizeof(double));
      |                        ^~~~~~~~~~~~~
/home/mblatt/src/dune/opm/opm-simulators/opm/simulators/linalg/mixed/bslv.c:66:22: warning: assignment to ‘double *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
   66 |         mem->dtmp[i] = aligned_alloc(64,np*sizeof(double));
      |                      ^

Let's see whether jenkins agrees.

blattms · 2025-11-26T21:04:33Z

Check whether there is enough padding for realignment if that is needed

@blattms:I am not sure what you mean here. The only padding occuring is in the initializion of bslv_mem, but it is done up front and there is no realignment. I am simply rounding up the allocations to nearest multiple of cache line size. If you have something else in mind, please let me know.

the comment was for a previous version where you used posix_memalign to change alignment for A->flt

blattms · 2025-11-26T21:19:56Z

jenkins build this serial rocm hipify please

blattms · 2025-11-26T21:34:11Z

Compilation with jenkins fails because of forbidden conversions. I'll add a task to my list above.

nrseman · 2025-11-27T00:47:09Z

@blattms: the link to jenkins ci does not work for me

nrseman · 2025-11-27T02:36:29Z

Install headers
Deactivate use of mixed precision code if HAVE_AVX2_EXTENSION is not true

@blattms: I pushed another update that addresses the items above. The changes use HAVE_AVX2_EXTENSION as simple preprocessor guards around the mixed precision code similar to how it is done for other parts of the code. It is not yet hooked up to your avx2 detection in opm-common, and will exclude the mixed precision code by default. You can manually enable mixed precision by passing the -DHAVE_AVX2_EXTENSION=TRUE to cmake during configuration. Be aware that passing -DHAVE_AVX2_EXTENSION=FALSE does not have the desired effect. I do not know cmake well enough to understand why that is the case, but it is not important as long as we can hook this up with your avx2 detection routines from opm-common. I'd appreciate if you have a chance to take a stab at it.

For completeness, I include my full cmake invokation here:

cmake   -D CMAKE_CXX_COMPILER="g++" \
        -D CMAKE_C_COMPILER="gcc" \
        -D CMAKE_C_FLAGS="-O3 -std=c11 -D_ISOC11_SOURCE -march=native" \
        -D CMAKE_VERBOSE_MAKEFILE=1 \
        -D HAVE_AVX2_EXTENSION=TRUE \
     ..

nrseman · 2025-11-27T02:38:53Z

@blattms: Note the -D_ISOC11_SOURCE entry in CMAKE_C_FLAGS above. Without it I get the same implicit declaration warnings for aligned_alloc as you do.

For this we remove the GCC pragmas and set the needed compiler flag when available.

blattms · 2025-11-27T16:32:06Z

You are, that was a problem with the incorrect C standard.

I cannot persuade CMake to use C11. Only C17 seems to work.

blattms · 2025-11-27T17:39:31Z

I now used OPM/opm-common#4840 and #6638 (which is based on this one with a few fixes and the same as haugenlabs#1) to test compilation. That fails, see https://ci.opm-project.org/job/opm-common-PR-builder/9096/console

To compile error is for a case where the matrix stores float instead of double.

In file included from /var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver_impl.hpp:36,
                 from /var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver1.cpp:22:
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp: In instantiation of 'Dune::MixedSolver<X, M>::MixedSolver(const M&, double, int, bool) [with X = Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >; M = const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&]':
/usr/include/c++/12/bits/stl_construct.h:119:7:   required from 'void std::_Construct(_Tp*, _Args&& ...) [with _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, allocator<Opm::MatrixBlock<float, 1, 1> > >&>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}]'
/usr/include/c++/12/bits/alloc_traits.h:635:19:   required from 'static void std::allocator_traits<std::allocator<void> >::construct(allocator_type&, _Up*, _Args&& ...) [with _Up = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; allocator_type = std::allocator<void>]'
/usr/include/c++/12/bits/shared_ptr_base.h:604:39:   required from 'std::_Sp_counted_ptr_inplace<_Tp, _Alloc, _Lp>::_Sp_counted_ptr_inplace(_Alloc, _Args&& ...) [with _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&>; _Alloc = std::allocator<void>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]'
/usr/include/c++/12/bits/shared_ptr_base.h:971:16:   required from 'std::__shared_count<_Lp>::__shared_count(_Tp*&, std::_Sp_alloc_shared_tag<_Alloc>, _Args&& ...) [with _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&>; _Alloc = std::allocator<void>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]'
/usr/include/c++/12/bits/shared_ptr_base.h:1712:14:   required from 'std::__shared_ptr<_Tp, _Lp>::__shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&>; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic]'
/usr/include/c++/12/bits/shared_ptr.h:464:59:   required from 'std::shared_ptr<_Tp>::shared_ptr(std::_Sp_alloc_shared_tag<_Tp>, _Args&& ...) [with _Alloc = std::allocator<void>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&>]'
/usr/include/c++/12/bits/shared_ptr.h:1009:14:   required from 'std::shared_ptr<typename std::enable_if<(! std::is_array< <template-parameter-1-1> >::value), _Tp>::type> std::make_shared(_Args&& ...) [with _Tp = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, allocator<Opm::MatrixBlock<float, 1, 1> > >&>; _Args = {const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, allocator<Opm::MatrixBlock<float, 1, 1> > >&, const double&, const int&, bool&}; typename enable_if<(! is_array< <template-parameter-1-1> >::value), _Tp>::type = Dune::MixedSolver<Dune::BlockVector<Dune::FieldVector<float, 1>, allocator<Dune::FieldVector<float, 1> > >, const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, allocator<Opm::MatrixBlock<float, 1, 1> > >&>]'
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver_impl.hpp:194:88:   required from 'void Dune::FlexibleSolver<Operator>::initSolver(const Opm::PropertyTree&, const Comm&) [with Comm = Dune::Amg::SequentialInformation; Operator = Dune::MatrixAdapter<Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > > >]'
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver_impl.hpp:300:19:   required from 'void Dune::FlexibleSolver<Operator>::init(Operator&, const Comm&, const Opm::PropertyTree&, std::function<typename Operator::domain_type()>, std::size_t) [with Comm = Dune::Amg::SequentialInformation; Operator = Dune::MatrixAdapter<Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > > >; typename Operator::domain_type = Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >; std::size_t = long unsigned int]'
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver_impl.hpp:65:13:   required from 'Dune::FlexibleSolver<Operator>::FlexibleSolver(Operator&, const Opm::PropertyTree&, const std::function<typename Operator::domain_type()>&, std::size_t) [with Operator = Dune::MatrixAdapter<Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >, Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > > >; typename Operator::domain_type = Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >; std::size_t = long unsigned int]'
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/FlexibleSolver1.cpp:27:1:   required from here
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp:54:17: error: cannot convert 'const float*' to 'const double*' in assignment
   54 |         data_ = &A[0][0][0][0];
      |                 ^~~~~~~~~~~~
      |                 |
      |                 const float*
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp: In instantiation of 'void Dune::MixedSolver<X, M>::apply(X&, X&, Dune::InverseOperatorResult&) [with X = Dune::BlockVector<Dune::FieldVector<float, 1>, std::allocator<Dune::FieldVector<float, 1> > >; M = const Dune::BCRSMatrix<Opm::MatrixBlock<float, 1, 1>, std::allocator<Opm::MatrixBlock<float, 1, 1> > >&]':
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp:64:18:   required from here
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp:78:55: error: cannot convert 'float*' to 'const double*'
   78 |         int count = bslv_pbicgstab3m(mem_, jacobian_, &b[0][0], &x[0][0]);
      |                                                       ^~~~~~
      |                                                       |
      |                                                       float*
In file included from /var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/wrapper.hpp:5:
/var/lib/jenkins/post-builder/workspace/opm-common-PR-builder/deps/opm-simulators/opm/simulators/linalg/mixed/bslv.h:75:70: note:   initializing argument 3 of 'int bslv_pbicgstab3m(bslv_memory*, bsr_matrix*, const double*, double*)'
   75 | int  bslv_pbicgstab3m(bslv_memory *mem, bsr_matrix *A, const double *b, double *x);
      |                                                        ~~~~~~~~~~~~~~^

nrseman · 2025-12-01T16:24:31Z

Just pushed an update with @blattms changes to support auto detection of avx2 support. Configuration will only succeed when using a version of opm-common that includes the CheckAVX2 cmake module intoduced in @blattms opm-commonPR called Added CMake check for AVX2.

nrseman · 2025-12-01T19:46:48Z

@blattms: I added a constexpr guard to skip the mixed precision solver when single-precision vectors are used. The mixed-precision code block in FlexibleSolvers_impl.hpp now looks as follows:

#if HAVE_AVX2_EXTENSION
          } else if (solver_type == "mixed-bicgstab") {
              if constexpr (Opm::is_gpu_operator_v<Operator>) {
                OPM_THROW(std::invalid_argument, "mixed-bicgstab solver not supported for GPU operatorsg");
            } else if constexpr (std::is_same_v<typename VectorType::field_type, float>){
                OPM_THROW(std::invalid_argument, "mixed-bicgstab solver not supported for single precision.");
            } else {
                const std::string prec_type = prm.get<std::string>("preconditioner.type", "error");
                bool use_mixed_dilu= (prec_type=="mixed-dilu");
                using MatrixType = decltype(linearoperator_for_solver_->getmat());
                linsolver_ = std::make_shared<Dune::MixedSolver<VectorType,MatrixType>>(
                                                                            linearoperator_for_solver_->getmat(),
                                                                            tol,
                                                                            maxiter,
                                                                            use_mixed_dilu
                                                                        );
#endif

The code does the right thing when BUILD_FLOW_FLOAT_VARIANTS is OFF, but fails with UMFPACK related errors when it is ON.

/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh: In substitution of ‘template<class M> using Dune::Impl::UMFPackRangeType = typename Dune::Impl::UMFPackVectorChooser<M>::range_type [with M = Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >]’:
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:274:11:   required from ‘class Dune::UMFPack<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > > >’
  274 |     using range_type = Impl::UMFPackRangeType<M>;
      |           ^~~~~~~~~~
/home/khaugen/repos/opm/simulators/opm/simulators/wells/MultisegmentWellEquations.cpp:162:60:   required from ‘void Opm::MultisegmentWellEquations<Scalar, IndexTraits, numWellEq, numEq>::apply(const BVector&, BVector&) const [with Scalar = float; IndexTraits = Opm::BlackOilDefaultFluidSystemIndices; int numWellEq = 4; int numEq = 3; BVector = Dune::BlockVector<Dune::FieldVector<float, 3>, std::allocator<Dune::FieldVector<float, 3> > >]’
  162 |     const BVectorWell invDBx = mswellhelpers::applyUMFPack(*duneDSolver_, Bx);
      |                                                            ^~~~~~~~~~~~~
/home/khaugen/repos/opm/simulators/opm/simulators/wells/MultisegmentWellEquations.cpp:476:1:   required from here
  450 |     template class MultisegmentWellEquations<T,BlackOilDefaultFluidSystemIndices,numWellEq,numEq>;                               \
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:181:29: error: invalid use of incomplete type ‘struct Dune::Impl::UMFPackVectorChooser<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >, void>’
  181 |     template<class M> using UMFPackRangeType = typename UMFPackVectorChooser<M>::range_type;
      |                             ^~~~~~~~~~~~~~~~
/gnu/store/wffw8rv5fi8jg1lmvbx5gzaqkh4nh2zb-dune-istl-openmpi-2.10.0/include/dune/istl/umfpack.hh:175:12: note: declaration of ‘struct Dune::Impl::UMFPackVectorChooser<Dune::BCRSMatrix<Dune::FieldMatrix<float, 4, 4>, std::allocator<Dune::FieldMatrix<float, 4, 4> > >, void>’
  175 |     struct UMFPackVectorChooser;
      |            ^~~~~~~~~~~~~~~~~~~~
make[2]: *** [CMakeFiles/opmsimulators.dir/build.make:1972: CMakeFiles/opmsimulators.dir/opm/simulators/wells/MultisegmentWellEquations.cpp.o] Error 1
make[2]: Leaving directory '/home/khaugen/repos/opm/simulators/build'
make[1]: *** [CMakeFiles/Makefile2:716: CMakeFiles/opmsimulators.dir/all] Error 2
make[1]: Leaving directory '/home/khaugen/repos/opm/simulators/build'
make: *** [Makefile:149: all] Error 2

I've pushed my latest version to a new branch mixed-float. It is hard to see how my three lines of code is the culprit! Are there additional configuration variables I need to set?

nrseman · 2025-12-02T00:53:21Z

This issue seems unrelated to my PR. The current compilation failures occur when applyUMFPack is called from MultisegmentWellEquations.cpp despite the fact that a constexpr guard should prevent UMFpack from being called when the vector argument is float. What am I doing wrong?

nrseman · 2025-12-02T00:54:55Z

@multitalentloes: You have been quiet for a very long time. Do you mind summarizing your remaining must have requirements for merging in a check list similar to what @blattms provided above? This PR has been open for almost 2 months. It is time to get this done ...

Bump ...

multitalentloes · 2025-12-02T07:59:52Z

opm/simulators/linalg/mixed/README.md

@@ -0,0 +1,23 @@
+# Mixed-precision linear solvers
+This folder contains mixed-precision building blocks for Krylov subspace methods
+and a highly optimized mixed-precision implementation of ILU0 preconditioned bicgstab.


DILU as well?

multitalentloes · 2025-12-02T08:01:13Z

opm/simulators/linalg/mixed/bslv.h

+
+// Solver memory struct
+typedef
+struct bslv_memory


document this struct too

multitalentloes · 2025-12-02T08:08:44Z

opm/simulators/linalg/mixed/prec.c

+        _mm256_storeu_pd(xi,vz);
+
+        //double z[4];
+        //_mm256_store_pd(z,vz);


some lines of dead code that look like they can be removed

multitalentloes · 2025-12-02T08:25:32Z

opm/simulators/linalg/mixed/README.md

+# Mixed-precision linear solvers
+This folder contains mixed-precision building blocks for Krylov subspace methods
+and a highly optimized mixed-precision implementation of ILU0 preconditioned bicgstab.
+Hopefully, this will inspire the exploration of mixed-precision algorithms in OPM.


Add some more in this readme about what makes it mixed-precision, what is computed and what is stored in what precision. It would make it easier to put in the context of the existing mixed-precision work in OPM and their corresponding publications.

multitalentloes · 2025-12-02T08:29:28Z

opm/simulators/linalg/mixed/prec.c

+    for(int k=0;k<9;k++) C[k]=M[k];
+}
+
+void mat3_matfms(double *C, const double *A, const double *B)


this and other matrix helper functions are not documented

Maybe also extract utility functions as this one to a separate file as it is not really specific to the precs themselves?

multitalentloes · 2025-12-02T08:58:20Z

Sorry for the delay.

Here is an updated list of things that should be in place before merging:

More documentation (I highlighted some functions, structs etc that must be documented, also would like use of compiler intrinsics to be explained so the code is more available to all developers)
use fmt for printing output, it is more robust and widely used in OPM, currently printf is used
OPM compiles fine on machines without AVX2 (has this been tested? Does it depend on Added CMake check for AVX2 opm-common#4840?)
clang-format must also be run before merging

With these resolved then we can get this preliminary version merged that should be expanded upon in the way discussed earlier in this PR.

blattms added the manual:enhancement This is an enhancement/improvent that needs to be documented in the manual label Oct 7, 2025

akva2 reviewed Oct 8, 2025

View reviewed changes

opm/models/discretization/common/tpfalinearizer.hh Outdated Show resolved Hide resolved

kjetilly reviewed Oct 15, 2025

View reviewed changes

opm/simulators/linalg/mixed/bslv.c Outdated Show resolved Hide resolved

multitalentloes requested changes Nov 19, 2025

View reviewed changes

kjetilly reviewed Nov 19, 2025

View reviewed changes

nrseman force-pushed the mixed-pr branch from 8a7b436 to eb60cda Compare November 19, 2025 15:45

blattms mentioned this pull request Nov 25, 2025

Added CMake check for AVX2 OPM/opm-common#4840

Open

blattms requested changes Nov 25, 2025

View reviewed changes

blattms reviewed Nov 25, 2025

View reviewed changes

kjetil added 7 commits November 25, 2025 12:03

export: poc inline implementation for matrices and vectors

0d37b86

export: move poc implementation to tpfalinearizer.hh

341e092

export: seemingly working implementation

0b11a63

export: deactivate export code for now

29008b7

bench: a handful of norne benchmark results

02538ed

solve: establishing trivial entry point

969e957

solve: populating bsr_matrix object

4c2a177

mixed: throw if well contributions are not added to matrix

3b62234

mixed: prepare for excplicit checking for avx2 support

28eedc3

mixed: add public header files

f4a8ff7

blattms added 2 commits November 27, 2025 13:26

Check for AVX2 support.

f34f3be

Compile mixed precision code only if AVX2 is supported.

40e8839

For this we remove the GCC pragmas and set the needed compiler flag when available.

Force a C standard that provides aligned_alloc

651d71f

I cannot persuade CMake to use C11. Only C17 seems to work.

blattms mentioned this pull request Nov 27, 2025

Feature/compile mixedprec only if avx2 is supported #6638

Draft

multitalentloes reviewed Dec 2, 2025

View reviewed changes

opm/simulators/linalg/mixed/bslv.h

// Solver memory struct

typedef

struct bslv_memory

Copy link

Member

multitalentloes Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document this struct too

multitalentloes reviewed Dec 2, 2025

View reviewed changes


		}

		int bslv_pbicgstab3m(bslv_memory mem, bsr_matrix A, const double b, double x)

		@@ -0,0 +1,260 @@
		#define _POSIX_C_SOURCE 200809L // required for posix_memalgin in <stdlib.h>

Mixed-precision solver #6521

Are you sure you want to change the base?

Mixed-precision solver #6521

Conversation

nrseman commented Oct 7, 2025

Uh oh!

blattms commented Oct 7, 2025

Uh oh!

akva2 commented Oct 8, 2025

Uh oh!

multitalentloes commented Oct 8, 2025

Uh oh!

nrseman commented Oct 8, 2025

Uh oh!

nrseman commented Oct 8, 2025

Uh oh!

Uh oh!

Uh oh!

multitalentloes left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrseman Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrseman Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrseman Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrseman commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nrseman commented Nov 19, 2025

test_avx2.c

Uh oh!

kjetilly commented Nov 19, 2025

Uh oh!

nrseman commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blattms commented Nov 25, 2025

Uh oh!

blattms left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nrseman Nov 19, 2025 •

edited

Loading

nrseman Nov 19, 2025 •

edited

Loading

nrseman Nov 20, 2025 •

edited

Loading

nrseman commented Nov 19, 2025 •

edited

Loading

nrseman commented Nov 24, 2025 •

edited

Loading

blattms left a comment •

edited

Loading