Skip to content

XSIMD implementation is slower than scalar performance with mixed precision operations #2881

@spectre-ns

Description

@spectre-ns

The following code performs operations on mixed types

    static void Xtensor_Uint16_2000x2000_DivideBy2Double_Xtensor(benchmark::State& aState)
    {
        xt::xtensor<uint16_t, 2> vInput = generateRandomInt16From0To100(cContainerAssignShape);
        auto vOutput = xt::xtensor<uint16_t, 2>::from_shape(cContainerAssignShape);

        for (auto _ : aState)
        {
            vOutput = vInput / 2.0;
        }
    }

When computing the expression, the result of uint16_t / int is an int. The resulting int then needs to be stored into vOutput as a uint16_t. As a result, std::copy is getting called in the inner loop inside load_aligned. The following code resolves this issue:

    static void Xtensor_Uint16_2000x2000_DivideBy2Double_Xtensor(benchmark::State& aState)
    {
        xt::xtensor<double, 2> vInput = generateRandomInt16From0To100(cContainerAssignShape);
        auto vOutput = xt::xtensor<double, 2>::from_shape(cContainerAssignShape);

        for (auto _ : aState)
        {
            vOutput = vInput / 2.0;
        }
    }

Is there a way to implement xsimd without copying data?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions