Skip to content

Conversation

@pradeeptrgit
Copy link
Collaborator

No description provided.

Phanikumar and others added 30 commits April 15, 2024 16:28
AMD Internal : [CPUPL-4598]

Change-Id: Id7fb339ecf3efa2535cf88807773ca928bdbe41c
Test-Suite related files have been reformatted using customized clang-format tool. Also, python script to do the formatting is added.

Script usage:
python/python3 fla_format_code.py <path to folder or specific file>

Examples:
python fla_format_code.py test/main/src
python fla_format_code.py test/main/src/test_getrf.c

AMD-Internal: CPUPL-4751
Signed-off-by: tprnaidu <TPratap.Naidu@amd.com>

Change-Id: I46a4e2a441309517bae6d27c014a699572384f76
Rename global_thread_mutex to fla_global_thread_mutex. Make it
static as its used only in FLA_Context.c

AMD-Internal: CPUPL-4957
Change-Id: I7a9d4273c2c2203ef7ef553a01d1075afa245dbc
details: netlib cmake file uses cmake_path command which is handled from cmake 3.20.0 version
AMD-Internal: [CPUPL-4890]
Signed-off-by: ksaithar <katteboina.saitharun@amd.com>
Change-Id: Ib14c7cbd2ee4ad296c37703d9895558254dfbe94
details:Added extreme value test cases
AMD-Internal: [CPUPL-4768]
Signed-off-by: ksaithar <katteboina.saitharun@amd.com>
Change-Id: Ia648037ca7a191ceec660ecaaf129d577ab710be
Added new test API to verify LAPACK GELSS API functionality
Disabled row space checking test for gelss, gels, gelsd

Signed-off-by: vprasada <vprasada@amd.com>
Change-Id: I5d0ffd416dcb6d2e0412f57094c4ee383a6a8908
Remove variables in code paths where they are set but not used

Signed-off-by: Vibhav Gupta <Vibhav.Gupta@amd.com>
AMD-Internal: CPUPL-4361
Change-Id: If5b1858a178a63bb4b5ba90eb354a562e6fb56fd
…riable

Support added for environment variable, AOCL_ENABLE_INSTRUCTIONS. With this,
users can set specific ISA code path to use for optimized functions. If user
has chosen higher level ISA than supported by target CPU, we choose best
supported architecture on target CPU. If user has chosen a lower level ISA,
then same will be used. Any ISA selection lower than AVX2 defaults to generic
reference code path.

Valid values for AOCL_ENABLE_INSTRUCTIONS: SSE2, AVX, AVX2, AVX512 and GENERIC.
All values are case-insensitive.

AMD-Internal: CPUPL-4611

Change-Id: I780278c6a2ebe12ce0e61310917b9d666c8111f5
…t inputs of gbtrf and gbtrs APIs

details:Added early return and incorrect input value test cases
AMD-Internal: [CPUPL-4981] [CPUPL-4982]
Signed-off-by: ksaithar <katteboina.saitharun@amd.com>
Change-Id: Ic3f0afcd8312c4a0b1dc37dce339368c8901c6d3
details:Added extreme value test cases
AMD-Internal: [CPUPL-4981]
Signed-off-by: ksaithar <katteboina.saitharun@amd.com>
Change-Id: I72ec8a07d1f433d8b48a564c0c5172894fe70f4b
AMD-Internal: CPUPL-4751
Signed-off-by: tprnaidu <TPratap.Naidu@amd.com>

Change-Id: Ia03d15ea194522614b5108e037e3f04706519cd3
AMD Internal : [CPUPL-5031]

Change-Id: I09a42edfb1bd12e3b893a2a1ae1f5ff7f15074ff
Added test cases for SYTRF

Signed-off-by: Venkatesha Ch<vprasada@amd.com>
AMD-Internal: CPUPL-4309
Change-Id: Ic137b5a1455c41417f05b1c80cc8b6137b4b099a
    1. Lapack code added for DTRTRI and DTRTI2 api's.
	2. In lined gemv , dscal, dswap and dtrmv blas api’s for small input sizes.

AMD Internal : [CPUPL-4604]

Change-Id: I34f0d87dc55b21d2c7384ab7133a89026ded560f
In accordance with the defination of xerbla() in netlib, the return type is updated to void.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>

Change-Id: I2e797473e25dd8008a97f0ab8457a1ed0b00ab90
Optimization of code paths of DGESVD for M >= N cases.
Function inlining and operations optimization performed.
Minimization of work buffer usage for the optimized paths.

AMD Internal: CPUPL-4606
Signed off by: Vasanth R (varajago@amd.com)

Change-Id: I68dd865e04d4001bd4baa346149de026ba19525b
Optimize DGELS for small sizes between 8 to 100.
Optimizations steps
- DLANGE API vectorized for functionality that finds largest number
in a matrix
- Skip further processing if last column is found to be 0 on entry
in DLARF
- Vectorized code in DLARF fusing DGEMV and DGER operations in the
path taken for left apply of elementary reflector. AVX2 in this
commit. AVX512 will be done in followup commit
- Add ctest for covering performance tests for DGELS with M in range
10-40 and N 10

Few Eigen netlib tests fail threshold test with the current
optimization. DEV and DVX netlib tests compare Eigen values generated
by DGEEV call where only Eigen value is requested vs when both Eigen
values and vectors are requested. But the test compares the output
directly by value and not norm of their difference. Hence those
failures are ignored as the difference in output with reference and
intrinsic path is in 15th decimal digit or higher.

AMD-Internal: CPUPL-4593
Change-Id: I866de586a4f57788704c75b862361efa730c940f
Added AOCL-BLAS version of dgbtf2 API. Here the algorithm will directly call compute kernels of idamax(), dscal(), dswap() and dger(). This changes is applicable only when AOCL-BLAS feature is enabled.

CPUPL-4599
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: Iaa377a283c89cdb521aa2c73f04e09ac96427d52
Updated the algorithm of dgbtrs to directly call dger() BLIS kernel. This change is applicable only when AOCL_BLAS feature is enabled.

CPUPL-4601
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: Ie3d98908d16742db71d629bb5e6ca5eab62d081a
With AOCL-BLAS feature enabled, observed drot declaration conflict issue at built time. Removed blis.h from fla_lapack_avx2_kernels.h to resolve this.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: Id5bf257bf688cc8838c910dcf82d420a1b22c75f
  1. singularity check added for Non-unit matrix to fix the issues

AMD Internal : [CPUPL-5070]

Change-Id: Id0c2d9fb565d67afed070718608d4ec360e4534a
VU and VL value updates.

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: Ib06075d6f6a796963b4268f487d8532275c364cc
…GTSV

Added incorrect inputs, early return and extreme value test cases.

AMD-Internal: [CPUPL-5001], [CPUPL-5002], [CPUPL-5003], [CPUPL-5004]
Signed-off-by: sujithhp <sujithhp@amd.com>
Change-Id: I0c75116f2d44ce7b57df0dcccb19e412d409caca
Compiler flag -mavx512dq was missed out in auto tools build
of AOCL-LAPACK and was only set in CMake based build. This is
fixed now.

AMD-Internal: CPUPL-5086
Change-Id: I83051c78a6396a267e32102ed5b4e63d647ca448
AMD Internal : [CPUPL-5088]

Change-Id: I1ada887837abb1689725d59fca382f22c911e6aa
Change-Id: I7f754061a237f387848124c6d34955b630669d0f
To prevent symbol redefinition conflicts between BLIS and libFLAME when AOCL_BLAS feature is enabled, renaming  dim_t to fla_dim_t in libFLAME library.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I535f10d409ccc0009e2f2b1ef1a1ac7a07f01ea4
Internal BLIS kernels expect signed 64-bit integers in LP64 mode. Accordingly, the necessary variables in the dgbtf2 and dgbtrs APIs have been updated.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I2cf6ea3437a3b79430e079d65704b23c6691fe50
Temporarily disabled AOCL-BLAS version of ZGETRF.

Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
Change-Id: I746ba35d84023ceca98316c9dc468ec10b4a3eb2
…d clang-format tool.

Updates:
Enhanced the scripts/fla_format_code.py to accept single/multiple
file/files or folder/folders path/paths at a time as an input.

Script usage:
    python/python3 scripts/fla_format_code.py \
        <specific file/folder or list of multiple files/folders paths>
Examples:
    python3 scripts/fla_format_code.py test/main/src
    python3 scripts/fla_format_code.py test/main/src src/map
    python3 scripts/fla_format_code.py test/main/src/test_getrf.c
    python3 scripts/fla_format_code.py test/main/src/test_getrf.c src/map/lapack2flamec/FLA_gesdd.c

AMD-Internal: CPUPL-4751
Signed-off-by: tprnaidu <TPratap.Naidu@amd.com>

Change-Id: I89a3ca48ccb843d1b61ea8d2bc7326b367d21c1f
Phanikumar and others added 30 commits March 19, 2025 09:34
   1. Division of two pointer variable holding the same address , one variable getting updated value and another variable not updating due to that we are experiencing
      wrong results. To avoid that the numerator variable copy into local variable before division would solve the issue.

AMD Internal : [CPUPL-5915]

Change-Id: I532dc852061ba184438e7f89da8890aaae591e4f
Included netlib lapack 3.12 test suite support to window.

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: Ieda921505d570df8b4ca16f1ecac2ac5f344d1c3
Updating DTL log to new macro based statments

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6333]
Change-Id: I7b7a4ad386382d57c52b2621f98f022b6a505b87
NAN checks added in fringe kernel cases in DLARFG
optimized code. This enables propagation of NAN values
to the output norm from inputs.

Change-Id: I107ac1921d38dd598b61f794dbbffd9a79b10205
Printing of test results moved to validate functions
to have flexibility of printing intermediate results.
Only failed cases are added for printing.

AMD Internal: CPUPL-4319

Change-Id: I2181d95d46f34e30e0e4447c493f312e7a86f5f2
Benchmarked DGESDD for differnt threads and sizes.
Size thresholds derived from the data to choose optimal
number of threads for different sub modules/APIs.
Affected APIs are DORMQR, DORMLQ and DLABRD.

Signed-off-by: Vasanthakumar R <varajago@amd.com>
AMD-Internal: CPUPL-5828

Change-Id: Ie87b1174eafc3c181fdda916bbfe469337b40f75
1. Updated dsyevd(), dlaed4() and dlaed6() to store machine machine parameters in static varaiables.
2. Optimized for loop blocks in dlaed4().

Change-Id: I8aa14f34b09727cda1d7eb97053c0e7e6181168a
Signed-off-by: Sridhar Govindaswamy <Sridhar.Govindaswamy@amd.com>
DGELS was not correctly recognizing singular matrix due to precision error from dgeqrf and dlarf APIs

This commit makes the following changes to resolve the issue

1. Changing SSE/AVX2/AVX512 fmadd instructions in dgeqrf and dlarf to serparate multiply and addition operations.
2. Changes to the vector kernels of dgeqrf and dlarf to improve performance.
3. Further, a change has been made in validate_sygvd to correct OV/UV testing.
4. With the current change, existing netlib-test errors are reduced. But addition 2 errors are introduced for
   DEV and 12 errors for DVX. These errors only show in netlib-tests with the custom ilaenv parameters defined
   for these tests. With the default ilaenv values of libflame, all these tests passes.

Change-Id: I75f7448fc731bdeffa64515c4f9bf4231b78f652
Signed-off-by: samahmad Sameer.Ahmad@amd.com
AMD-Internal: CPUPL-5869
libflame_interface.hh file split into multiple files each
contain different categories of APIs. The split files are
named according to the categories.
Missing APIs were added into libflame_interface.hh where
the other split .hh files are also included via #include
directive.

AMD-Internal: CPUPL-6373
Change-Id: Id725721470bcefbe63bc3a165824d6ac2d166e04
For static library build of AOCL-LAPACK, the header path of AOCL-Utils
is sufficient. But recent changes made in usage of AOCL_ROOT forced setting
AOCL-Utils path as well. Fixed the same

AMD-Internal: CPUPL-6421
Change-Id: I4b5d98dffd6f02ea8b0a5406b279adc3474772da
Generating consolidated libflame_interface.hh file during configure time.

AMD-Internal: CPUPL-6373
Change-Id: I95a3421a2d19d1f4ff4fd6bb7728fd6b80086e6f
In cmake file to set dependent libraries path, the variable name
for AOCL-Utils was wrongly set in Windows path section. This was
causing issue in static library generation. Fixed the same. Also
fixed incorrect header path for AOCL-Utils under Windows OS config.

AMD-Internal: CPUPL-6424

Change-Id: Id0e4223e63ae6102099455eb6aea111af89897f5
Updated validate function to support new status printing macro,
in case of n = 0

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6423]
Change-Id: I1ca509c3ef2bc9a8202a4d619ac97ce139bc5c27
Fix for [unused-variable] and [used uninitialized] found in windows build

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: Ib028834275adbaf0a1bf4b039525dd321f4bb480
The commit make the following changes:

1. Change libflame package name to aocl-lapack.
2. Change dependent blis package name to aocl-blas.
3. Use aocl-blas/aocl-blas-mt based on threading
   config.
4. aocl-lapack pc file is now genreated in
   "{CMAKE_INSTALL_PREFIX}/pkgconfig/" dir instead
   of "{CMAKE_INSTALL_PREFIX}/share/pkgconfig/"
4. The following error is fixed:
   "Error in running lapack tests, when linking to
   AOCL-BLAS using pkgconfig"
5. Using Python3 module instead of hardcoded
   python executable.

Change-Id: I021d67a1ddbb79b6d8d0b5d39a3b676e17fdeb7a
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6426
The vector kernels of LANGE were not correctly handling NaN values
as per reference code. This commit makes following changes:

1. For each loaded vector, it will check for Nan values and store
   it in a flag register. After the loop it will check flag, if
   the flag is true then it will return Nan.
2. Enable Extreme test cases for LANGE main test.

Change-Id: I1c0c7f9ba1d9840f9ebea9aa010a09bdda085209
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6417
-> Fixed DGELSS failure in lapacke_col_major test
	Argument passed to compute_matrix_norm was wrong,
	updated with expected argument.
-> Fixed windows ilp64 build warnings

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: I4f51faadb026a80f3e9c506177a98ac65e5cf592
Updated the doxygen file input path based on recent
cpp changes

Signed-off-by: Venkatesha <vprasada@amd.com>
Change-Id: I395a085b4f14addcbc4e7f022f7bfcf7e7f3903b
Main test suite is displaying zero error for all GESDD tests.
Updated the residual variables to display correct values.

AMD-Internal: CPUPL-6473
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: I222e0bef09de88cbdd47f7d63910ca3cdf46405a
-> This commit fixes defect of aocl-utils dependency not being
reflected in the pkg-config file.
-> pkg-config file is now generated in
   {{INSTALL_DIR}}/lib/pkgconfig

Change-Id: Ic1a60e9a118db1e772ff237541c5c2dad7279293
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6436
HINT paths for AOCL-Utils header needed change. Also,
fixed the logic so that header path of AOCL-Utils is sufficient
for static builds.

AMD-Internal: [CPUPL-6489]
Change-Id: I1877380b31694d64905780887b9452f1b6d81a10
Added test_ormqr.c, validate_ormqr.c for validating performance and
accuracy of LAPACK API ORMQR.

NOTE: 1) Modified ORGQR, ORG2R input m, n to valid values
         in case of config inputs.
      2) Added ORGQR workspace query for getting lwork value instead
         of using lwork from geqrf.
      3) Added code to display the selected interface for main test
         in output including matrix layout for lapacke.

AMD-Internal: CPUPL-6485
Change-Id: Ied015eaeb64df3513bb92c49d9c51ece2cb5a51a
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Added ASAN flag in C_FLAGS, LD_FLAGS to resolve the issue.

AMD-Internal: CPUPL-6475
Change-Id: Idd675ac06c13a8d0b46344811448976b399be559
Fixed an issue with incorrect argument being passed in the wrapper code
of ZGETRFNPI for upper case and lower case versions. Also, minor
spacing issues in parameters of SPFFRT2 and SPFFRTx functions.

AMD-Internal: CPUPL-6540

Change-Id: I325bd216c74c2719dfb37fa66f66734c02f2a3c6
Performance regression observed during benchmarking addressed by
re-tuning optimal threads allocation for DGESDD sub-modules.

AMD Internal: CPUPL-6487

Change-Id: Ic98436b7e4d0575bff89d9148386bbcbdd730ac4
BLAS_LIBRARY is not set correctly when using pkgconfig to
find AOCL-BLAS library.

Fixed variable name mismatch while updating BLAS_LIBRARY

Change-Id: Ic2d35c337c88d1a3e2d7deb0f834e6ca397d257b
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6436
-> Updated main test suite to use lapacke.h header file instead of
	current Test Prototype.
-> Created new directory test/main/src/lapacke, where all
	lapacke invoke functions are defined and declared.
-> Updated Makefile and CMakeLists.txt accordingly.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6542]
Change-Id: I61553a8283db81c903c94075ff86bc8d3c559894
Updated main testsuite to work with case insensitive
input character arguments

AMD-Internal: CPUPL-6554
Signed-off-by: dnikku <Deepika.Nikku@amd.com>
Change-Id: Ifc967e2739ff2915353806b6c5944dc28a5b540f
Rolling back aocl-lapack to flame and aocl-blas dependency
to blis to keep external applications dependency on these
package names.

Change-Id: I2c2a8388d6994fa16133970044d09739c01079d4
Signed-off-by: samahmad <Sameer.Ahmad@amd.com>
AMD-Internal: CPUPL-6436
Test suite updates to include lapacke.h resulted in test build failure for make build. Updated Makefile to fix the issue.

Signed-off-by: Venkatesha <vprasada@amd.com>
AMD-Internal: [CPUPL-6583]
Change-Id: If2b66fb41616c854be22ae687f7e1afebd4a1845
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants