Conversation
serge-sans-paille
left a comment
There was a problem hiding this comment.
Some early review, a few open questions but nothing looks bad here. Way to go!
| { | ||
| inline void get_cpuid(int reg[4], int level, int count = 0) noexcept; | ||
|
|
||
| inline std::uint32_t get_xcr0_low() noexcept; |
There was a problem hiding this comment.
it would be good to somehow ensure that this type is the same as ref_t
There was a problem hiding this comment.
And this could be a private static method for better encapsulation right?
| constexpr bool fma4() const noexcept | ||
| { | ||
| return utils::bit_is_set<16>(m_regs.reg8[2]); | ||
| } |
There was a problem hiding this comment.
Open question: should we have a macro that makes the generation of those field cleaner to the eye?
There was a problem hiding this comment.
Either way. I am personally not big on macros. I find it easier read this way (could also be made to fit on one line) and find&replace is quite enough a change is required.
|
|
||
| namespace detail | ||
| { | ||
| inline void get_cpuid(int reg[4], int level, int count) noexcept |
There was a problem hiding this comment.
Same here: could be private for better encapsulation.
There was a problem hiding this comment.
On the contrary, I was wondering whether to make it part of public API.
It's a well defined function, unlikely to change, and that could be useful for users to get features we do not expose (not sure we'll go to cover 100% of cpuid).
| * The full license is in the file LICENSE, distributed with this software. * | ||
| ****************************************************************************/ | ||
|
|
||
| #include "../config/xsimd_inline.hpp" |
There was a problem hiding this comment.
why do you need this?
There was a problem hiding this comment.
It for header including what they use (otherwise I get nasty errors in my editor).
But this one did not belong here but in xsimd_register.hpp where XSIMD_INLINE is used.
AntoinePrv
left a comment
There was a problem hiding this comment.
Comments on change may make relative to the previous implementation
| #elif defined(__INTEL_COMPILER) | ||
| __cpuid(reg.data(), level); |
There was a problem hiding this comment.
Should we change to use inline ASM for intel compiler? Missing the count option here.
There was a problem hiding this comment.
I assumle so, feel free to give it a try!
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
What about __clang__ and __INTEL_COMPILER? Should we reproduce the get_cpuid pattern?
There was a problem hiding this comment.
Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).
There was a problem hiding this comment.
It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT
|
@serge-sans-paille what do you think of this? I wanted to add an |
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
Yes, I think so. Also, IIRC, __clang__ comes with __GNUC__ defined too, and Intel compiler defines __GNUC__ or _MSC_VER depending on the platform. So we may simplify this logic (but first we need to verify that my asumption is right).
|
@AntoinePrv please ping me when you need a review! |
|
@serge-sans-paille this is ready. This is intended to make a clean abstraction for CPUID.
|
| regs.reg1 = detail::get_cpuid(0x1); | ||
| regs.reg7 = detail::get_cpuid(0x7); | ||
| regs.reg7a = detail::get_cpuid(0x7, 0x1); | ||
| regs.reg8 = detail::get_cpuid(0x80000001); |
There was a problem hiding this comment.
Open question: could we avoid some get_cpuid call in some situation? For instance reg8 is only used for fma4, and reg7 for avx2 or later, which mean we could skip filling reg7 if we don't have avx, and fma4 if have fma3.
There was a problem hiding this comment.
Let's postpone this as a further optimization then?
| #elif defined(__INTEL_COMPILER) | ||
| __cpuid(reg.data(), level); |
There was a problem hiding this comment.
I assumle so, feel free to give it a try!
| #elif defined(_MSC_VER) && _MSC_VER >= 1400 | ||
| return static_cast<xcr0_reg_t>(_xgetbv(0)); | ||
|
|
||
| #elif defined(__GNUC__) |
There was a problem hiding this comment.
It's true that clang defines __GNUC__. And it happens to be true for icx too, see https://godbolt.org/z/56j57vhnT
|
@serge-sans-paille in 1/2 day I can push another commit (easily revertable) with things I've been trying lately. |
|
ok, ping me when you need an extra pair of eyes.
|
dd51e01 to
d63a75f
Compare
|
@serge-sans-paille with this last commit we have
It's more LOC (though there is more documentation now, and many empty classes / brace lines), but it's also more declarative from now on. We just need to add more members to the enums to read other things. If you like the design I'd be happy to add tests for all classes with know implications (e.g. |
| ssse3 = x86_cpu.ssse3(); | ||
| sse4_1 = x86_cpu.sse4_1(); | ||
| sse4_2 = x86_cpu.sse4_2(); | ||
| fma3_sse42 = x86_cpu.fma3(); |
There was a problem hiding this comment.
do you think you could have supported_arch inherit from x86_cpu to avoid this sequence of assignment?
There was a problem hiding this comment.
Absolutely, this piece of code was meant as temporary.
Though I planned to this this in follow-up PR, once this and arm64_cpu_features is done.
| return x86_cpuid_regs(detail::x86_cpuid(leaf, subleaf)); | ||
| } | ||
|
|
||
| constexpr x86_cpuid_regs() noexcept = default; |
There was a problem hiding this comment.
I'm still not convinced by the need for this default constructor.
There was a problem hiding this comment.
IMHO types without a default constructor are a pain to work with.
This default constructor is well defined, it returns all zeros bitsets.
In fact we use it in this very file for the lazy reading in x86_cpu_features:
- Create well-defined but empty
x86_cpuid_regswhen creatingx86_cpu_features. - Replace them as with a "read" version as needed.
I think I understand you point of view, that is may be weird to have the default ctor for a type intended to read a register, that does not do that. Perhpas this is more of an aggregate parser...
| { | ||
| constexpr I mask = make_bit_mask<I>(static_cast<I>(Bit)); | ||
| return value | mask; | ||
| } |
There was a problem hiding this comment.
Why having these free functions instead of having their implementation in the uint_bitset class directly? Do you plan to reuse them elsewhere?
There was a problem hiding this comment.
Good point, they came before uint_bitset, that is why they are here.
Either way. I could see constexpr bit manipulation utilities used in the rest of xsimd instead of magic numbers.
This is a no-addition, non-breaking refactor of existing CPU id features as first class citizen as part of #1245.
supported_archkeeps the same structure and merges use of both class at the moment but both class could be combined a user-friendlyx86_cpu_features.