XSIMD implementation is slower than scalar performance with mixed precision operations

The following code performs operations on mixed types

```
    static void Xtensor_Uint16_2000x2000_DivideBy2Double_Xtensor(benchmark::State& aState)
    {
        xt::xtensor<uint16_t, 2> vInput = generateRandomInt16From0To100(cContainerAssignShape);
        auto vOutput = xt::xtensor<uint16_t, 2>::from_shape(cContainerAssignShape);

        for (auto _ : aState)
        {
            vOutput = vInput / 2.0;
        }
    }
````

When computing the expression, the result of uint16_t / int is an int. The resulting int then needs to be stored into vOutput as a uint16_t. As a result, `std::copy` is getting called in the inner loop inside `load_aligned`. The following code resolves this issue:

```
    static void Xtensor_Uint16_2000x2000_DivideBy2Double_Xtensor(benchmark::State& aState)
    {
        xt::xtensor<double, 2> vInput = generateRandomInt16From0To100(cContainerAssignShape);
        auto vOutput = xt::xtensor<double, 2>::from_shape(cContainerAssignShape);

        for (auto _ : aState)
        {
            vOutput = vInput / 2.0;
        }
    }
````
Is there a way to implement xsimd without copying data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XSIMD implementation is slower than scalar performance with mixed precision operations #2881

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

XSIMD implementation is slower than scalar performance with mixed precision operations #2881

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions