Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
0b97da9
Unfinished changes with prototype function
arpitj1 Jun 6, 2024
69ef423
Loop over linalg.generic's input and output ops
arpitj1 Jun 6, 2024
7678a05
Some comments
arpitj1 Jun 6, 2024
0e88095
Partial changes from coding session to implement fusion of linalg.gen…
arpitj1 Jun 11, 2024
b57c0b8
Incremental changes to fuse linalg and for loop- Logic for shifted op…
arpitj1 Jun 19, 2024
f54c33d
ran clang format
arpitj1 Jun 25, 2024
56e2c54
some compile time fixes
arpitj1 Jun 25, 2024
e253040
Some compile fixes
arpitj1 Jul 2, 2024
e99b8a5
Fixed all the compilation issues. Sample MLIR not raised
arpitj1 Jul 3, 2024
34f595c
Bug fixes, generating some output at getLinalgArgMap
arpitj1 Jul 16, 2024
05bad97
Almost implementated remap in affine dim for multi idx
arpitj1 Jul 17, 2024
5bbf5ef
Added submap op support and refactored the code to use submap
arpitj1 Jul 24, 2024
9018d92
bunch of fixes. Now able to generate raise linalg code
arpitj1 Jul 30, 2024
ec041a0
Now almost working second loop raising to linalg
arpitj1 Jul 31, 2024
23138fc
Fixes to correctly raise 2 level for loops to linalg.generic
arpitj1 Jul 31, 2024
5f20bd7
Missed file update to enable linalg dialect in polygeist
arpitj1 Jul 31, 2024
b0e96aa
Fix for syms and dims calculation
arpitj1 Aug 6, 2024
ea76f0a
More tests added to cover different loop cases
arpitj1 Aug 7, 2024
591c84e
Now able to compile 3/any number of loops with parallel iter type; Ad…
arpitj1 Aug 7, 2024
b0108e3
Non iter-arg variant of matrix-mul and conv are now raised to linalg.…
arpitj1 Aug 7, 2024
4362c80
submap canonicalizer implemented
arpitj1 Aug 21, 2024
77c8168
Added reduction loops for linalg
arpitj1 Aug 22, 2024
98f0119
Fix for incorrect for loop dims
arpitj1 Aug 28, 2024
59eec0b
Linalg.generic 4 loop cases raised- todo: reduction and some if-else …
arpitj1 Sep 5, 2024
a363f13
Adding test case for all passing raising and lowering, example case o…
arpitj1 Sep 18, 2024
814ca51
Added pass remove iter args from scf; Added psuedo code for submap ca…
arpitj1 Oct 12, 2024
701f25a
Added removal of iter_args for affine loops
arpitj1 Oct 12, 2024
d285fb5
Temporary reverted pass registeration as the code was failing
arpitj1 Oct 12, 2024
c40e7a9
WIP commit
arpitj1 Oct 15, 2024
788a3c4
Added submap of submap canonicalizer with test- failing
arpitj1 Oct 18, 2024
8265216
Added canonicalization for linalg with submap and test cases
arpitj1 Oct 25, 2024
532773a
Added modified 2d kernel for harris score- raised successfully to lin…
arpitj1 Oct 25, 2024
e2b4b2d
Added harris score kernel with gradient kernel- just to be able to ra…
arpitj1 Oct 25, 2024
f2ab09e
Initial working implementation of debufferize flow for linalg with ex…
arpitj1 Jan 13, 2025
2342381
Added more complex case to show debufferization ; Fixed bugs in debuf…
arpitj1 Jan 13, 2025
fde88fe
Fixed clang format
arpitj1 Jan 13, 2025
cf9f953
Ran git clang format locally to fix regression failures
arpitj1 Jan 13, 2025
f10c47a
Working implementation for function args memrefType with noinline att…
arpitj1 Jan 17, 2025
490f924
Added debufferization Alloc Removal pass, add working examples with l…
arpitj1 Jan 17, 2025
e20708c
Added support for debufferization across nested regions - working for…
arpitj1 Jan 31, 2025
4a7efe7
Bug fix for erasing the op correctly
arpitj1 Jan 31, 2025
6d8832f
Bug fixes for 1. recursive parent search in sorting users 2. traversi…
arpitj1 Jan 31, 2025
6ca2aeb
Added cases of buffer capture which doesn't debufferize
arpitj1 Jan 31, 2025
803ec30
Canonicalization gets rid of memref capture by loop
arpitj1 Feb 1, 2025
fb0ac18
Working implementation for scf.for op and scf.if op; added bug fix to…
arpitj1 Feb 7, 2025
0472c34
Added data structures to track expandedUsers that can include for loo…
arpitj1 Feb 7, 2025
3272f2c
Added logic in for loop case to find all users of iter_args and updat…
arpitj1 Feb 8, 2025
da2ae5b
Added a bunch of tests with nested regions- all getting connected and…
arpitj1 Feb 8, 2025
a570c1b
Added more complex region cases with mix of if-else statements
arpitj1 Feb 8, 2025
7ee707b
Generic solver to represent linalg.generic as kernel.def ops
arpitj1 May 8, 2025
c8561b4
Adding cases for generic solver
arpitj1 May 12, 2025
07d0dcb
Backup of previous edits
arpitj1 May 28, 2025
009ab9b
Temp changes for kernel dialect
Jun 11, 2025
c0f36d3
Enabled kernel dialect correctly running on sample IR with kernel def…
Jun 11, 2025
6a67379
Added linalgToKernel pass- compile failure
arpitj1 Jun 12, 2025
7f9d00f
Working pattern matching and replacement for linalg generics
arpitj1 Jun 12, 2025
d765bb9
Partial changes for different files for kernel and input
arpitj1 Jun 12, 2025
15ef84e
Crash fix
arpitj1 Jun 13, 2025
44fed6c
Improved lib
arpitj1 Jun 26, 2025
4a95c7f
Removing redundant file
arpitj1 Jun 26, 2025
f1e5f02
Renamed kernel lib
arpitj1 Jun 26, 2025
e941c5e
Added min_abs_index test
arpitj1 Jun 26, 2025
a99fad9
Fixed a bunch of bugs in raiseToLinalg while raising polybench
arpitj1 Jun 27, 2025
4e782d5
Fixed raise to linalg and canonicalizer to generate subview
arpitj1 Jun 28, 2025
bd15b6d
Fixed submap simplification, improved raisedToLinalg to work with non…
arpitj1 Jul 31, 2025
cb34836
Added parallel fission pass
arpitj1 Aug 1, 2025
53c5d14
Added pattern for parallel to seq for loops
arpitj1 Aug 1, 2025
60b81d2
Added raise-to-linalg-pipeline
arpitj1 Aug 1, 2025
7b2f5d9
Added linalgGenericEliminateSubmaps and commented out submapToSubviewOp
arpitj1 Aug 1, 2025
71e441f
Canonicalization fix
arpitj1 Aug 1, 2025
e421a86
bug fix for non nullptr in submap creation
arpitj1 Aug 1, 2025
56724a5
Fix in linalg debufferizer - failure return and only insert memref.co…
arpitj1 Aug 1, 2025
c3c2700
improved matcher to create a dependency graph and use it for matching
arpitj1 Aug 1, 2025
ca12291
Runtime failure but match happening correctly to kernel dialect
arpitj1 Aug 3, 2025
7c204f2
Working match for linalg kernel match for gemm
arpitj1 Aug 3, 2025
37dd847
Added debug prints
arpitj1 Aug 3, 2025
7e3f0d0
Able to raise gemv
arpitj1 Aug 4, 2025
3b56eb3
blas C codes- for raising to linalg
arpitj1 Oct 15, 2025
fa99aa8
Debug prints for RaiseTolinalg and 2. SelectFunc pass to process just…
arpitj1 Oct 15, 2025
ed30a14
Update RemoveIterArgs to work with chain ops before store for affine.for
arpitj1 Oct 17, 2025
a816708
Added int op support
arpitj1 Oct 17, 2025
0edd38e
remote iter args improved and test added
arpitj1 Oct 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions blas/dasum.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

// DASUM: Sum of absolute values
// result = sum(|x[i]|)
// x: vector of length N with stride incx
double dasum(int N, const double* x, int incx) {
double result = 0.0;

for (int i = 0; i < N; i++) {
result += fabs(x[i * incx]);
}

return result;
}

// Simple version (stride = 1)
double simple_dasum(int N, const double* x) {
double result = 0.0;

for (int i = 0; i < N; i++) {
result += fabs(x[i]);
}

return result;
}

// Single precision version
float sasum(int N, const float* x, int incx) {
float result = 0.0f;

for (int i = 0; i < N; i++) {
result += fabsf(x[i * incx]);
}

return result;
}

void print_vector(const double* x, int N, const char* name) {
printf("%s: [", name);
for (int i = 0; i < N; i++) {
printf("%.1f", x[i]);
if (i < N - 1) printf(", ");
}
printf("]\n");
}

int main() {
const int N = 6;

double x[] = {1.0, -2.0, 3.0, -4.0, 5.0, -6.0};

printf("ASUM Test: sum of absolute values\n");
print_vector(x, N, "x");

double result = simple_dasum(N, x);

printf("\nasum(x) = %.1f\n", result);

printf("\nManual verification:\n");
printf("|1.0| + |-2.0| + |3.0| + |-4.0| + |5.0| + |-6.0|\n");
printf("= 1.0 + 2.0 + 3.0 + 4.0 + 5.0 + 6.0\n");
printf("= 21.0\n");

// Test with stride
printf("\n\nTesting with stride=2 (every other element):\n");
double result_stride = dasum(3, x, 2);
printf("asum(x[::2]) = %.1f\n", result_stride);
printf("Manual: |%.1f| + |%.1f| + |%.1f| = %.1f\n",
x[0], x[2], x[4], fabs(x[0]) + fabs(x[2]) + fabs(x[4]));

return 0;
}
78 changes: 78 additions & 0 deletions blas/daxpy.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#include <stdio.h>
#include <stdlib.h>

// DAXPY: Constant times a vector plus a vector
// y = alpha * x + y
// x: vector of length N with stride incx
// y: vector of length N with stride incy (modified in place)
// alpha: scaling factor
void daxpy(int N, double alpha, const double* x, int incx, double* y, int incy) {
for (int i = 0; i < N; i++) {
y[i * incy] += alpha * x[i * incx];
}
}

// Simple version (stride = 1)
void simple_daxpy(int N, double alpha, const double* x, double* y) {
for (int i = 0; i < N; i++) {
y[i] += alpha * x[i];
}
}

// Single precision version
void saxpy(int N, float alpha, const float* x, int incx, float* y, int incy) {
for (int i = 0; i < N; i++) {
y[i * incy] += alpha * x[i * incx];
}
}

void print_vector(const double* x, int N, const char* name) {
printf("%s: [", name);
for (int i = 0; i < N; i++) {
printf("%.2f", x[i]);
if (i < N - 1) printf(", ");
}
printf("]\n");
}

int main() {
const int N = 5;
const double alpha = 2.0;

double x[] = {1.0, 2.0, 3.0, 4.0, 5.0};
double y[] = {10.0, 20.0, 30.0, 40.0, 50.0};

printf("AXPY Test: y = alpha * x + y\n");
printf("alpha = %.2f\n", alpha);
print_vector(x, N, "x");
print_vector(y, N, "y (before)");

// Apply axpy
simple_daxpy(N, alpha, x, y);

print_vector(y, N, "y (after)");

printf("\nManual verification:\n");
printf("y[0] = 2.0*1.0 + 10.0 = 12.00\n");
printf("y[1] = 2.0*2.0 + 20.0 = 24.00\n");
printf("y[2] = 2.0*3.0 + 30.0 = 36.00\n");
printf("y[3] = 2.0*4.0 + 40.0 = 48.00\n");
printf("y[4] = 2.0*5.0 + 50.0 = 60.00\n");

// Test with stride
printf("\n\nTesting with stride=2:\n");
double x2[] = {1.0, 2.0, 3.0, 4.0, 5.0, 6.0};
double y2[] = {100.0, 200.0, 300.0, 400.0, 500.0, 600.0};

printf("x: [1, 2, 3, 4, 5, 6]\n");
printf("y (before): [100, 200, 300, 400, 500, 600]\n");
printf("Computing: y[::2] += 10.0 * x[::2]\n");

daxpy(3, 10.0, x2, 2, y2, 2); // y[0,2,4] += 10*x[0,2,4]

printf("y (after): [%.1f, %.1f, %.1f, %.1f, %.1f, %.1f]\n",
y2[0], y2[1], y2[2], y2[3], y2[4], y2[5]);
printf("Expected: [110.0, 200.0, 330.0, 400.0, 550.0, 600.0]\n");

return 0;
}
76 changes: 76 additions & 0 deletions blas/dcopy.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#include <stdio.h>
#include <stdlib.h>

// DCOPY: Copy vector x to vector y
// y = x
// x: source vector of length N with stride incx
// y: destination vector of length N with stride incy
void dcopy(int N, const double* x, int incx, double* y, int incy) {
for (int i = 0; i < N; i++) {
y[i * incy] = x[i * incx];
}
}

// Simple version (stride = 1)
void simple_dcopy(int N, const double* x, double* y) {
for (int i = 0; i < N; i++) {
y[i] = x[i];
}
}

// Single precision version
void scopy(int N, const float* x, int incx, float* y, int incy) {
for (int i = 0; i < N; i++) {
y[i * incy] = x[i * incx];
}
}

void print_vector(const double* x, int N, const char* name) {
printf("%s: [", name);
for (int i = 0; i < N; i++) {
printf("%.1f", x[i]);
if (i < N - 1) printf(", ");
}
printf("]\n");
}

int main() {
const int N = 5;

double x[] = {1.0, 2.0, 3.0, 4.0, 5.0};
double y[5] = {0.0, 0.0, 0.0, 0.0, 0.0};

printf("COPY Test\n");
print_vector(x, N, "x (source)");
print_vector(y, N, "y (before)");

// Copy x to y
simple_dcopy(N, x, y);

print_vector(y, N, "y (after)");

// Verify
printf("\nVerification: ");
int correct = 1;
for (int i = 0; i < N; i++) {
if (x[i] != y[i]) {
correct = 0;
break;
}
}
printf("%s\n", correct ? "PASS" : "FAIL");

// Test with stride
printf("\n\nTesting with stride:\n");
double src[] = {10.0, 20.0, 30.0, 40.0, 50.0, 60.0};
double dst[6] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0};

printf("Source: [10, 20, 30, 40, 50, 60]\n");
printf("Copying every other element (incx=2) to every position (incy=1):\n");
dcopy(3, src, 2, dst, 1); // Copy src[0,2,4] to dst[0,1,2]
printf("Result: [%.1f, %.1f, %.1f, %.1f, %.1f, %.1f]\n",
dst[0], dst[1], dst[2], dst[3], dst[4], dst[5]);
printf("Expected: [10.0, 30.0, 50.0, 0.0, 0.0, 0.0]\n");

return 0;
}
79 changes: 79 additions & 0 deletions blas/ddot.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#include <stdio.h>
#include <stdlib.h>

// DDOT: Compute dot product of two vectors
// result = sum(x[i] * y[i])
// x: vector of length N with stride incx
// y: vector of length N with stride incy
double ddot(int N, const double* x, int incx, const double* y, int incy) {
double result = 0.0;

for (int i = 0; i < N; i++) {
result += x[i * incx] * y[i * incy];
}

return result;
}

// Simple version (stride = 1)
double simple_ddot(int N, const double* x, const double* y) {
double result = 0.0;

for (int i = 0; i < N; i++) {
result += x[i] * y[i];
}

return result;
}

// Single precision version
float sdot(int N, const float* x, int incx, const float* y, int incy) {
float result = 0.0f;

for (int i = 0; i < N; i++) {
result += x[i * incx] * y[i * incy];
}

return result;
}

int main() {
const int N = 5;
double x[] = {1.0, 2.0, 3.0, 4.0, 5.0};
double y[] = {2.0, 3.0, 4.0, 5.0, 6.0};

printf("DOT Product Test\n");
printf("x: [");
for (int i = 0; i < N; i++) {
printf("%.1f ", x[i]);
}
printf("]\n");

printf("y: [");
for (int i = 0; i < N; i++) {
printf("%.1f ", y[i]);
}
printf("]\n\n");

// Test simple version
double result = simple_ddot(N, x, y);
printf("dot(x, y) = %.1f\n", result);

// Manual verification
double manual = 0.0;
for (int i = 0; i < N; i++) {
manual += x[i] * y[i];
printf(" %.1f * %.1f = %.1f\n", x[i], y[i], x[i] * y[i]);
}
printf("Expected: %.1f, Actual: %.1f\n\n", manual, result);

// Test with stride
printf("Testing with stride=2 (every other element):\n");
double result_stride = ddot(3, x, 2, y, 2);
printf("dot(x[::2], y[::2]) = %.1f\n", result_stride);
printf("Manual: %.1f*%.1f + %.1f*%.1f + %.1f*%.1f = %.1f\n",
x[0], y[0], x[2], y[2], x[4], y[4],
x[0]*y[0] + x[2]*y[2] + x[4]*y[4]);

return 0;
}
Loading
Loading