Releases: IntelPython/dpctl
v0.16.1
This release includes bug fixes and provides a change needed by numba_dpex project to support dispatching kernels
consuming instances of sycl::local_accessor template type.
Changed
- Changed behavior of
dpctl.tensor.usm_ndarray.__dlpack_device__method to return device id of the parent unpartitioned device if array is allocated on a sub-device instead of raising an exception: #1604
- Array creation functions and the
usm_ndarrayconstructor indpctl.tensorsubmodule now use cached default-selected device to improve performance: #1606 - Changed treatment of
axiskeyword fordpctl.tensor.tensordotanddpctl.tensor.vecdotto align with Python Array API 2023.12 specification: #1608 - Changed implementation of
DPCTLQueue_SubmitRange,DPCTLQueue_SubmitNDRangein DPCTLSyclInterface library to supportsycl::local_accessorarguments needed bynumba_dpex; the enumDPCTLKernelArgT\ ypeto correspond to C++ disjoint types: #1609, #1611, #1612
Fixed
- Fixed a crash on Windows platform during execution of getter of
dpctl.SyclPlatfom.default_contextproperty: : #1604 - Fixed kernel submission error on NVidia CUDA GPUs during
dpctl.tensor.matmuloperation: #1605 - Fixed corruption of context cache table entries: #1607
- Fixed incorrect result from
dpctl.tensor.tensordotreported in issue #1570: #1608 - Fixed output of
python -m dpctl --libraryto fix specified library name: #1615
v0.16.0
This release is virtually identical to 0.15.1 as far as features are concerned.
This release is meant to be built with DPC++ 2024.1.0, that no longer support older integrated Gen9 Intel GPUs, such as those that came with Intel Core 10th generation and older.
v0.15.1
Summary
This release reaches milestone of 100% compliance of dpctl.tensor functions with Python Array API 2022.12 standard for the main namespace.
Added
- Added reduction functions
dpctl.tensor.min,dpctl.tensor.max,dpctl.tensor.argmin,dpctl.tensor.argmax, anddpctl.tensor.prodper Python Array API specifications: #1399 - Added dedicated in-place operations for binary elementwise operations and deployed them in Python operators of
dpctl.tensor.usm_ndarraytype: #1431, #1447 - Added new elementwise functions
dpctl.tensor.cbrt,dpctl.tensor.rsqrt,dpctl.tensor.exp2,dpctl.tensor.copysign,dpctl.tensor.angle, anddpctl.tensor.reciprocal: #1443, #1474 - Added statistical functions
dpctl.tensor.mean,dpctl.tensor.std,dpctl.tensor.varper Python Array API specifications: #1465 - Added sorting functions
dpctl.tensor.sortanddpctl.tensor.argsort, and set functionsdpctl.tensor.unique_values,dpctl.tensor.unique_counts,dpctl.tensor.unique_inverse,dpctl.tensor.unique_all: #1483 - Added linear algebra functions from the Array API namespace
dpctl.tensor.matrix_transpose,dpctl.tensor.matmul,dpctl.tensor.vecdot, anddpctl.tensor.tensordot: #1490, #1525, #1541 - Added
dpctl.tensor.clipfunction: #1444, #1505 - Added custom reduction functions
dpt.logsumexp(reduction using binary functiondpctl.tensor.logaddexp),dpt.reduce_hypot(reduction using binary functiondpctl.tensor.hypot): #1446 - Added inspection API to query capabilities of Python Array API specification implementation: #1469
- Support for compilation for NVIDIA(R) sycl target with use of CodePlay oneAPI plug-in: #1411, #1124
- Added
dpctl.utils.intel_device_infofunction to query additional information about Intel(R) GPU devices: gh-1428 and gh-1445 - Added support for two new device descriptors,
dpctl.SyclDevice.max_mem_alloc_sizeanddpctl.SyclDevice.max_clock_frequency: #1530
Changed
- Functions
dpctl.tensor.result_typeanddpctl.tensor.can_castbecame device-aware: #1488, #1473 - Implementation of method
dpctl.SyclEvent.wait_forchanged to usesycl::event::waitinstead ofsycl::event::wait_and_throw: gh-1436 dpctl.tensor.astypewas changed to supportdevicekeyword as per Python Array API specification: #1511- C++ header files in
libtensor/include/kernelscontaining implementations of SYCL kernels no longer depends on "pybind11.h": #1516
Fixed
v0.15.0
Summary
The 0.15.0 represents a milestone in which dpctl.tensor.usm_ndarray object now implements all special Python operators, except __matmul__ and __rmatmul__.
The dpctl.tensor increases its array-API conformance test suite pass rate to 81.8%, (passed: 916, failed: 84, skipped: 119).
Details
Added
- Added
dpctl.tensor.floor,dpctl.tensor.ceil,dpctl.tensor.truncelementwise functions. - Added
dpctl.tensor.hypot,dpctl.tensor.logaddexpelementwise functions. - Added trigonometric (
dpctl.tensor.sin,dpctl.tensor.cos,dpctl.tensor.tan) and hyperbolic (dpctl.tensor.sinh,dpctl.tensor.cosh,dpctl.tensor.tanh) elementwise functions and their inverses (dpctl.tensor.asin,dpctl.tensor.asinh,dpctl.tensor.acos,dpctl.tensor.acosh,dpctl.tensor.atan,dpctl.tensor.atanh). - Added
dpctl.tensor.roundfunction. - Added
dpctl.tensor.signanddpctl.tensor.remainderelementwise functions. - Added bitwise elementwise functions
dpctl.tensor.bitwise_and,dpctl.tensor.bitwise_xor,dpctl.tensor.bitwise_or,dpctl.tensor.bitwise_invert - Added bitwise shift functions
dpctl.tensor.bitwise_left_shiftanddpctl.tensor.bitwise_right_shift. - Added
dpctl.tensor.atan2anddpctl.tensor.signbitelementwise functions. - Added
dpctl.tensor.minumumanddpctl.tensor.maximumbinary elementwise functions. - Supported equality checking and hashing for
dpctl.SyclPlatform. - Implemented
typesproperty for all unary and binary elementwise functions #1361 - Added
dpctl.tensor.repeatanddpctl.tensor.tilefunctions. - Added
dpctl.tensor.matrix_transposefunction.
Changed
- Enabled support for Python arithmetic, in-place arithmetic, reflexive arithmetic, comparison, and bitwise operators for
dpctl.tensor.usm_ndarraytype #1324. - Removed
dpctl.tensor.numpy_usm_sharedobsolete class and associated tests which were being skipped #1310 - Transitioned
dpctlcodebase to Cython 3. - Improved performance of boolean reduction functions
dpctl.tensor.allanddpctl.tensor.any. - Improved performance of summation function
dpctl.tensor.sum. - Improved in-place arithmetic operations for addition, subtraction and multiplication.
- Updated codebase per SYCL-2020 intel/llvm compiler deprecation warnings.
- Improved performance of advanced boolean indexing for arrays whose size fits in 32-bit signed integer type.
- Removed deprecated
DPCTLDevice_GetMaxWorkItemSizesfunction from the SyclInterface library. - Improved performance of
dpctl.tensor.reshapein the case when a copy is being made. - Improved performance of
dpctl.tensor.rollfunction.
Fixed
v0.14.5
This release builds on 0.14.3 and 0.14.4 releases and addresses some performance gaps as well as implements several new elementwise functions.
Added
- Added
dpctl.tensor.log2anddpctl.tensor.log10: #1267 - Added
dpctl.tensor.negative,dpctl.tensor.positive,dpctl.tensor.square#1268 - Added
dpctl.tensor.logical_not,dpctl.tensor.logical_and,dpctl.tensor.logical_or,dpctl.tensor.logical_xor#1270
Changed
dpctl.tensor.astypebehavior fornewdtype=Nonechanges #1261dpctl.tensor.usm_ndarayconstructor default value ofdtypekeyword argument changed toNone: #1265- Support for
outarguments that overlap with inputs for unary elementwise functions#1281 - Copying from one array to another a no-op if both arrays view into the same memory #1284
v0.14.4
This is hot-fix for 0.14.3 release.
Added
- Added
dpctl.tensor.less_equal,dpctl.tensor.greater,dpctl.tensor.greater_equal: #1239
Changed
- Optimized in-place arithmetic operations for updating matrix with rows/columns via broadcasting: #1244
Fixed
- Fixed handling of 0d arrays in
dpctl.tensor.sum: #1238
v0.14.3
Added
- Added support of
axis=Noneindpctl.tensor.concat#1125 - Added caching for
dpctl.SyclDevice.filter_stringproperty #1127 - Added
dpctl.tensor.isdtypefrom array API #1133 - Added
dpctl.tensor.unstack,dpctl.tensor.moveaxis,dpctl.tensor.swapaxes#1137, #1174 - Allow for mutation of
dpctl.tensor.usm_ndarray.flags.writable#1141 - Added
dpctl.tensor.wherefrom array API #1147 - Include libtensor headers in
dpctlinstallation layout #1185 - Added new properties of
dpctl.tensor.usm_ndarrayobject #1199 - Added a list of unary and binary elementwise functions from array API:
- #1203:
dpctl.tensor.add,dpctl.tensor.divide,dpctl.tensor.isnan,dpctl.tensor.isinf,dpctl.tensor.isfinite,dpctl.tensor.cos,dpctl.tensor.abs,dpctl.tensor.equal - #1205:
dpctl.tensor.sqrt - #1209: implements
outkeyword argument - #1211:
dpctl.tensor.multiply,dpctl.tensor.subtract - #1214:
dpctl.tensor.not_equal - #1216:
dpctl.tensor.exp,dpctl.tensor.sin - #1217:
dpctl.tensor.real,dpctl.tensor.imag,dpctl.tensor.proj - #1218:
dpctl.tensor.log,dpctl.tensor.log1p,dpctl.tensor.expm1 - #1221:
dpctl.tensor.floor_divide - #1235:
dpctl.tensor.less - #1237: in-place support for addition, multiplication and subtraction
- #1203:
- Added
dpctl.tensor.allanddpctl.tensor.any#1204 - Added
dpctl.tensor.sum#1210
Changed
- Updated examples of native Python extensions built using
dpctl#1108 - Used security flags to compile and link native extensions of
dpctl#1109 - Changed types of
dpctl.tensor.finfoanddpctl.tensor.iinfooutput structure per array API spec #1110 - Consolidated multiple USM temporaries life-time management
host_tasks to improve test suite stability #1111 - MAINT: Improved cmake target dependency tracking #1112
- MAINT: Improved docstrings for existing
dpctl.tensorfunctions #1123 - Changed default value of
modekeyword indpctl.tensor.takeanddpctl.take.putfromcliptowrap#1132 - Added support for (nested) sequence of
dpctl.tensor.usm_ndarrayobjects indpctl.tensor.asarray#1139 - Improved exception handling in
dpctl.tensor.usm_ndarray.__setitem__special method #1146 - Simplified implementation of copy-and-cast kernels and removed special casing for 2D arrays to conserve binary size #1165
- Improved speed of
dpctl.tensor.usm_ndarrayprinting functionality #1187 - Require DPC++ RT 2023.1 to build and run
dpctl#1195 - Compile offloading native extensions with
-fno-sycl-id-queries-fit-in-intfixing gh-1184, #1200 - Transition to conda-forge ecosystem #1213
Fixed
- Fix to add empty values check for
dpctl.tensor.place#1105, #1106 - Fixed gh-1089 by improving
dpctl.tensor.asarrayhandling of NumPy arrays viewing into host-accessible USM allocation objects. - MAINT: Fixed build break with newer GCC and SYCLOS #1118
- Fixed a bug in basic indexing of
dpctl.tensor.usm_ndarray#1136
v0.14.2
Added
- Added
dpctl.SyclDevice.partition_max_sub_devicesproperty #1005 - Added
dpctl.program.SyclKernel.max_sub_group_sizeproperty #1028 - Implemented printing of
usm_ndarray#1013, #1043, #1060 - Implemented support for advanced indexing for
dpctl.tensor.usm_ndarray#1095, #1097, #1099, #1101 - Implemented support for platform listing in
dpctl.__main__script #1014 - Improved performance of
dpctl.tensor.asnumpy#1026 - Added
UsmNDArray_Make*C-API for constructingdpctl.tensor.usm_ndarrayfrom native allocations #1050, #1067 - Added support for
dpctl.SyclDevice.native_vector_width_*device descriptors #1075 - Added
dpctl::tensor::usm_ndarray::get_shape_vectoranddpctl::tensor::usm_ndarray::get_strides_vectormethods #1090
Changed
-
Removed
dpctl.select_host_device,dpctl.has_host_device,dpctl.SyclDevice.is_host, anddpctl.SyclDevice.has_aspect_hostsince support for host device has been removed in DPC++ 2023 and from SYCL 2020 spec #1028 -
usm_ndarrayis made writable by default #1012, and writable flag is now checked by__setitem__. -
Added convenience signature for C++ utility function in "dpctl4pybind11.hpp" #1016
-
Improved error reported when attempting to submit kernel that uses a data-type unsupported by target device #1018, #1040
-
Updated C++ code to require DPC++ 2023.0.0 or newer #1028, #1066
-
The
dpctl.tensor.Deviceclass supportsprint_device_infomethod #1029, equality comparison, and hashing #1048 -
Updated version of pybind11 used to 2.10.2 #1031
-
Improved internal utility responsible for reduction of iteration space dimensionality #1044, #1054
-
Changed return type of
DCPCTLUSM_GetPointerTypefunction in SyclInterface library #1061, #1065 -
Updated supported version of DLPack to 0.8 #1073
-
Implemented queue cache per context/device pair and deployed it in
dpctl.memory,dpctl.tensor.from_dlpackanddpctl.tensorarray creation functions #1076, #1079 -
Maintainance, CI work: #1001, #1009, #1011, #1024, #1030, #1032, #1035, #1037, #1039, #1041, #1045, #1047, #1055, #1057, #1059, #1068, #1070, #1074,#1077, #1078, #1081, #1084, #1085, #1088, #1086, #1092, #1093
Fixed
- Fixed error gh-998 in forming Python exception, #999.
- A small memory leak fixed, #1000
- Improved dtype support in
dpctl.tensor.full, PR #1002 - Added missing header file #1008 fixing gh-1007
- Fixed a typo in device-specific dtype mapping #1015
- Fixed default device integer type to align with NumPy's behavior on Windows #1017
- Fixed unexpected overflow in
dpctl.tensor.linspacewhen one of the parameters is the largest floating point value #1034 - Constructors
dpctl.tensor.empty,dpctl.tensor.zeros, andusm_ndarrayconstructor itself no longer allow to create array with data-types not supported by targeted device #1042 - Fixed parameter validation in
dpctl.SyclQueueconstructor #1052 - Fixed
usm_typeof the resulting array indpctl.tensor.trilanddpctl.tensor.triufunctions #1062 - Used DPC++ configuration files to ensure correct use of conda compiler toolchain on Linux #1072
- Fixed issue with empty argument of
dpctl.tensor.meshgridfunction #1080 - Fixed linking problem on Windows enabling
dpctlto be functional on Windows for devices not supporting some data types #1083
Full Changelog: 0.14.0...0.14.2
v0.14.0
[0.14.0] - 11/18/2022
Added
- Implemented
dpctl.tensor.linspacefunction from array-API #875. - Implemented
dpctl.tensor.eyefunction from array-API #896. - Implemented
dpctl.tensor.trilanddpctl.tensor.triufunctions from array-API #910. - Added data type objects to
dpctl.tensornamespace,finfo,iinfo,can_cast, andresult_typefunctions #913. - Implemented
dpctl.tensor.meshgridcreation function from array-API #920. - Implemented convenience class to represent output of
dpctl.tensor.usm_ndarray.flagsproperty #921. - Added new device attributes and kernel's device-specific attributes #894.
- Added
dpctl.utils.onetrace_enabledcontext manager for targeted trace collection #903. - Added support for
streamkeyword in__dlpack__method, enabling support for sendingusm_ndarrayusing mpi4py #906. dpctl.tensor.asarraycan now transition data between incompatible devices, #951.- Introduced
"syclinterface/dpctl_sycl_types_casters.hpp"header file with declaration of conversion routines between SYCL type pointers and SyclInterface library opaque pointers #960. - Added C-API to
dpctl.program.SyclKernelanddpctl.program.SyclProgram. Added type casters for new types to "dpctl4pybind11" and added an example demonstrating its use #970. - Introduced "dpctl/sycl.pxd" Cython declaration file to streamline use of SYCL functions from Cython, and added an example demonstrating its use #981.
- Added experimental support for sharing data allocated on sub-devices via dlpack #984.
- Added
dpctl.SyclDevice.sub_group_sizesproperty to retrieve supported sizes of sub-group by the device #985.
Changed
- Improved queue compatibility testing in
dpctl.tensor's implementation module #900. - Added automatic measurement of array-API conformance test suite in CI #901.
- Improved performance of array metadata transfer from host to device #912.
- Used
os.add_dll_directoryon Windows to ensure thatDPCTLSyclInterfacelibrary can be found #918. - Refactored
dpctl.tensor's implementation module #941 to streamline adding new functionality. Streamlineddpctl::tensor::usm_ndarrayclass implementation. - Added debugging messaging in case when
DPCTLDynamicLib::getSymbolencounters errors #956. - Updated code base according to changes in DPC++ compiler #952, #957, #958.
- Changed
dpctlto use pybind11 2.10.1 #967. - Extended
dpctl.tensor.fullto accept 0d and higher dimensional arrays for fill-value parameter #982 and #995.
Fixed
- Improved SyclDevice constructor error message #893.
- Fixed issue gh-890 about
dpctl.tensor.reshapefunction #915. - Fixed unexpected
UnboundLocalErrorexception in #922. - Fixed bugs in
dpctl.tensor.arangein #945. - Fixed issue with type inferencing in
dpctl.tensor.asarrayin #949. - Added missing docstrings for
dpctl.SyclDeviceproperties #964.
v0.13.0
Added
- Implemented and deployed dedicated kernels for copying with casting #781, used in
__setitem__, implementaion ofasarray,dpctl.tensor.copyfunctions. - Implemented dedicated copying kernel for
dpctl.tensor.reshapefunction #810, added support forcopykeyword #807. - Implemented dedicated kernel to copy with casting from
numpy.ndarrayintodpctl.tensor.usm_ndarray#817. - Implemented
dpctl.tensor.permute_dimsfunction from array-API #787. - Implemented
dpctl.tensor.expand_dimsfunction from array-API #788. - Implemented
dpctl.tensor.squeezefunction from array-API #790. - Implemented
dpctl.tensor.broadcast_tofunction from array-API #791. - Implemented
dpctl.tensor.broadcast_arraysfunction from array-API #798. - Implemented
dpctl.tensor.flipfunction from array-API #801. - Implemented
dpctl.tensor.usm_ndarray.mTproperty per array-API #805. - Implemented
dpctl.tensor.rollfunction from array-API #809. - Implemented
dpctl.tensor.arangefunction from array-API #814. - Implemented
dpctl.tensor.zerosfunction from array-API #816. - Implemented
dpctl.tensor.zerosfunction from array-API #816. - Implemented
dpctl.tensor.ones,dpctl.tensor.full,dpctl.tensor.empty_like,dpctl.tensor.zeros_like,dpctl.tensor.ones_like,dpctl.tensor.full_likefunctions from array-API #822. - Implemented
DPCTLQueue_Memsetfunction in SyclInterface library #812, and exposed it fordpctl.memory.MemoryUSM*classes #815. - Implemented
dpctl.utils.get_coerced_usm_typeto deduced usm type of the output array from types of input arrays in compute-follows-data execution model #797. - Added
dpctl.SyclDevice.profiling_timer_resolutionproperty #825. - Added
dpctl.SyclDevice.platformanddpctl.SyclPlatform.default_contextproperties #827. - Provided pybind11 example for functions working on
dpctl.tensor.usm_ndarraycontainer applying oneMKL functions #780, #793, #819. The example was expanded to demonstrate implementing iterative linear solvers (Chebyshev solver, and Conjugate-Gradient solver) by asynchronously submitting individual SYCL kernels from Python #821, #833, #838. - Wrote manual page about working with
dpctl.SyclQueue#829. - Added cmake scripts to dpctl package layout and a way to query the location #853.
- Implemented
dpctl.tensor.concatfunction from array-API #867. - Implemented
dpctl.tensor.stackfunction from array-API #872.
Changed
- Enhanced coverage collection for SyclInterface library by also collecting it during pytest run and combining traces with those collected during C-test run #818. This change also allows to not rebuild SyclInterface library when building C-test executable.
- Exported
keep_args_aliveutility indpctl4pybind11.hppheader #820. The utility usessycl::handler::host_taskto keep given Python arguments alive until eacsycl::eventfrom the given vector of events is complete. The host task is scheduled on the SYCL queue provided as the first argument. - Changed the size of struct underlying
dpctl.SyclEventto avoid storing Python object previously used to keep kernel arguments scheduled withdpctl.SyclQueue.submit#823. - Fixed docstring for
dpctl.SyclTimer#824. - Changed type of exceptions raised on failure to create
dpctl.SyclDevicefromValueErrortodpctl.SyclDeviceCreationError#826. - Improved performance of pybind11 type casters #837.
- Changed implementation of
dpctl.SyclProgramfrom using deprecatedsycl::programtosycl::kernel_bundle#845. - Removed deprecated device aspects, added new supported aspects #844.
- Updated vendored
dlpack.hto version 0.7 #847.
Fixed
- Fixed
dpctl.lsplatform()to work correctly when used from within Jupyter notebook #800. - Fixed script to drive debug build #835 and fixed code to compile in debug mode #836.
- Fixed filter selector string produced in outputs of
dpctl.lsplatform(verbosity=2)anddpctl.SyclDevice.print_device_info#866. - Fixed issue with slicing reported in gh-870 in #871.
New contributor: @npolina4 contributed #867, #872 and reported #870