Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
748b15b
Gate weights cache on runtime option instead of compile-time macro (#…
hboyraz May 15, 2026
e6bf149
Revert "Add grammar fields to GenerationConfig for constrained decodi…
GregoryComer May 15, 2026
125d651
Add a16w8 per-op test for mean_dim (#19594) (#19594)
christine-long-meta May 15, 2026
a8cfe2b
Thread method-scoped kernel registry through Program and Method (#19561)
JacobSzwejbka May 15, 2026
d1db6b7
Add CoreML-stable RMSNorm for llama eager paths (#19523) (#19523)
telgamal-1 May 16, 2026
42d87c4
Fix GenerationConfig initialization in qnn_multimodal_runner
kirklandsign May 16, 2026
7dbd972
Fix libc++.so.1 missing for qnn-context-binary-utility (#19622)
psiddh May 16, 2026
824cbff
Prevent _safe_softmax decomposition in traceand rewire replaceSafeSof…
ethansfng May 16, 2026
87b667b
[Executorch] Enable madvise based mmap (#19586)
pytorchbot May 18, 2026
df3fa0d
NXP backend: Add QAT tests for AvgPool, MaxPool and Mul tensor ops (#…
roman-janik-nxp May 18, 2026
c531386
NXP backend: Sync config importer to nxp internal CI changes (#19552)
robert-kalmar May 18, 2026
3c68b67
Arm backend: Stabilize MobileNetV3 fp16 TOSA test (#19590)
bdemirb May 18, 2026
9beca1f
Arm backend: Stabilize VGF bilinear fp16 test (#19613)
bdemirb May 18, 2026
2649109
Arm backend: Update gcc to 15.2
perheld May 13, 2026
ed499c0
Arm backend: Generate random conv2d test inputs lazily (#19556)
zingo May 18, 2026
9fe9952
Arm backend: Add static cache integration test with llama (#18404)
xingguo01 May 18, 2026
a142873
Add a16w8 per-op test for var (#19596)
christine-long-meta May 18, 2026
3ceb89c
Gemma 4 31B: chat template, inv_freq dedup, CI hardening (#19614)
mergennachin May 18, 2026
6ca2589
docs(android): add Java API Reference (Javadoc) link, sidebar entry, …
madhesh60 May 18, 2026
985eeb7
docs: add Android LLM runner page and HuggingFace (#19611)
omkar-334 May 18, 2026
760aa39
Revert "Wire target_config Buck deps on cmsis_nn_py (#19604)" (#19644)
metascroy May 18, 2026
1c9c115
[ExecuTorch][MmapDataLoader] Issue F_RDADVISE on Apple platforms in U…
pytorchbot May 18, 2026
30c9a26
[Docs] Fix python specifiers in tutorials (#19280)
vacu9708 May 18, 2026
2a0a2f8
Update iOS SwiftPM docs for ExecuTorch 1.0.0 (#19565)
Vasanthadithya-mundrathi May 18, 2026
20415bf
Fix out_of_bounds_read in getConstantDataPtr (XNNCompiler.cpp) (T2673…
meta-codesync[bot] May 18, 2026
28f38d6
Add Tensor.copyDataInto to Java API (#19171)
jgibson2 May 18, 2026
d62addb
Guard weight_dequant.args[1] access in _quantize_fused_conv_bias pass
ethansfng May 18, 2026
bed30e8
Wrap iOS18 quantization errors with ExecuTorch-specific hint (#19249)
john-rocky May 18, 2026
7c495fa
Revert "Update iOS SwiftPM docs for ExecuTorch 1.0.0" (#19652)
metascroy May 18, 2026
f2f0b72
Set torch seed for pytest (#19651)
GregoryComer May 18, 2026
12bb0e7
[Docathon] Fix docathon content issues (#19444)
ozgecinko May 18, 2026
869af13
Fix broken tests caused by runtime gtest skipping (#19658)
metascroy May 19, 2026
0a82163
Back out "Make ScalelessRMSNorm a torch.nn.RMSNorm; fix SDPACustom vi…
billmguo May 19, 2026
12c25cf
Add entry_at(i) accessor to LoadBackendOptionsMap (#19645)
metascroy May 19, 2026
d3f80b6
Retry all curl errors (#19656)
GregoryComer May 19, 2026
1e76bb3
Arm backend: Don't execute eagerly with sym-ints (#19638)
oscarandersson8218 May 19, 2026
acf1ad9
Handle rank-changing views in RemovePermutesAroundElementwiseOps (#19…
mcremon-meta May 19, 2026
7324ed4
Arm backend: Preserve inputs for pow zero decomposition (#19637)
Sebastian-Larsson May 19, 2026
e0d310d
Arm backend: Enable MYPY in examples/arm (#19633)
Sebastian-Larsson May 19, 2026
cc3afbe
NXP backend: Enable `constant_pad_nd` with new Neutron flow. (#19543)
MartinPavella May 19, 2026
b213b42
Arm backend: Allow PatternQuantizer to annotate all nodes with None (…
AdrianLundell May 19, 2026
d6a4ba5
Arm backend: Move tosa-dialect tests into package (#19640)
oscarandersson8218 May 19, 2026
5eb8492
Arm backend: Fix stale docgen generation (#19551)
Christoffer-JL May 19, 2026
3a13e41
Arm backend: Reject Squeeze no-op partition (#19662)
Christoffer-JL May 19, 2026
a4c3897
Arm backend: Rename test_arm_baremetal.sh to test_arm_backend.sh (#19…
zingo May 19, 2026
1d754c8
Add a16w8 per-op test for conv1d (#19597)
christine-long-meta May 19, 2026
85bd01d
Add a16w8 per-op test for gelu (#19598)
christine-long-meta May 19, 2026
f3387d0
Arm backend: Add Qwen3-VL_2B_IT FP32 layer tests (#19628)
tom-arm May 19, 2026
1ec6812
Wire target_config Buck deps on cmsis_nn_py
rascani May 19, 2026
6bc1762
Migrate broken tests to ET_SKIP_IF macro (#19659)
metascroy May 19, 2026
41a38d8
Arm backend: Add multiple get_attr folding crash workaround (#19663)
AdrianLundell May 19, 2026
afd32cc
Add a16w8 per-op test for bmm (#19599)
christine-long-meta May 19, 2026
92b7411
Add a16w8 per-op test for split (#19600)
christine-long-meta May 19, 2026
4c5e722
NXP Backend: Update eiQ Neutron SDK to 3.1.1 (#19668)
robert-kalmar May 19, 2026
3d86cc7
Module deep-copies LoadBackendOptionsMap on load (#19673)
metascroy May 19, 2026
f8cfc73
Add MLX backend support for Gemma 4 31B (#19524)
mergennachin May 19, 2026
f220e71
Skip argmin/argmax with dim=None in CoreML partitioner (#19247)
john-rocky May 20, 2026
2d7ffad
Qualcomm AI Engine Direct - Debugger Convergence Phase 2: Migrating t…
winskuo-quic May 20, 2026
0ed5508
Qualcomm AI Engine Direct - Fix Full (#19359)
winskuo-quic May 20, 2026
0c9b0df
Qualcomm AI Engine Direct - Updating Claude skill for new op developm…
qti-horodnic May 20, 2026
2874dcb
Qualcomm AI Engine Direct - Adding QNN backend support for randn core…
qti-horodnic May 20, 2026
bf438ce
Qualcomm AI Engine Direct - Adding QNN backend support for tan core A…
qti-horodnic May 20, 2026
5358bcf
Fix broken ConvBNReLu from new Convert1DConvTo2D pass (#19558) (#19558)
JakeStevens May 20, 2026
10c8958
Drop _loadedBackendOptions ivar from Apple bindings (#19680)
metascroy May 20, 2026
d66a37c
Bump PyTorch pins to 2.12 (#19643)
JacobSzwejbka May 20, 2026
a4bd823
Fix race condition in XNNPACK weights cache during concurrent `init()`
billmguo May 20, 2026
477707f
Qualcomm AI Engine Direct - Minor qnn_config fix (#19388)
winskuo-quic May 20, 2026
3eb57fa
Qualcomm AI Engine Direct - Adding QNN backend support for scatter.sr…
qti-horodnic May 20, 2026
6f052fe
Handle rank-changing views in FuseCascadedTransposeOrPermuteOps (#19539)
mcremon-meta May 20, 2026
885ebb9
Qualcomm AI Engine Direct - heap profiling at runtime with HTP backen…
jethroqti May 20, 2026
82cf123
Arm backend: Fix stale docgen generation pt.2 (#19685)
AdrianLundell May 20, 2026
32a86b6
Arm backend: Add VGF Swin2SR example and OOTB smoke test (#19670)
usamahz May 20, 2026
9851477
Arm backend: remove flag from Vk pipeline session (#19669)
Erik-Lundell May 20, 2026
7724fd7
Arm backend: Clarify aot_arm_compiler being a test script (#19684)
martinlsm May 20, 2026
d87890d
Add conversation-history instrumentation tests for LlmModule (#19679)…
psiddh May 20, 2026
3b5d18d
Revert PyTorch 2.12 pin bump (#19698)
JacobSzwejbka May 20, 2026
8debe93
Add FP8 placeholder support to ExecuTorch serialization (#19043)
YufengShi-dudu May 20, 2026
c0cbc74
Arm backend: Modernise and standalone the executor runner (#19018)
usamahz May 20, 2026
6745047
Add XNNPACK, MobileNetV2, MobileBERT, Llama, ResNet18 to RISC-V testi…
luhenry May 20, 2026
1feb56c
Zephyr: Move CI testing to a script (#19542)
zingo May 20, 2026
fdd4f43
NXP backend: Remove `max_pool2d` maximum kernel size restriction. (#1…
MartinPavella May 20, 2026
6c74cdc
NXP backend: Add support for `sigmoid` with the new Neutron flow. (#1…
MartinPavella May 20, 2026
a76d9cd
NXP backend: Add support for `leaky_relu` with the new Neutron flow. …
MartinPavella May 20, 2026
6ba868e
Thread kernel_registry through Module::load_method (#19641)
wliuyx May 20, 2026
cca3129
Run RISC-V tests with multiple RVV QEMU configurations
luhenry May 16, 2026
7eba60a
Add XNNPACK coverage instrumentation for riscv64
luhenry May 16, 2026
2c8507d
Align RISC-V workflow display name to others
luhenry May 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 3 additions & 3 deletions .ci/docker/common/install_android.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ install_ndk() {
ARCH=$(uname -m)
if [ "${ARCH}" = "aarch64" ]; then
# aarch64 NDK is not cached on S3, download from Google directly
curl -Os --retry 3 "https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK_VERSION}-linux.zip"
curl -Os --retry 3 --retry-all-errors "https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK_VERSION}-linux.zip"
else
# The NDK installation is cached on ossci-android S3 bucket
curl -Os --retry 3 "https://ossci-android.s3.amazonaws.com/android-ndk-${ANDROID_NDK_VERSION}-linux.zip"
curl -Os --retry 3 --retry-all-errors "https://ossci-android.s3.amazonaws.com/android-ndk-${ANDROID_NDK_VERSION}-linux.zip"
fi
unzip -qo "android-ndk-${ANDROID_NDK_VERSION}-linux.zip"

Expand All @@ -62,7 +62,7 @@ install_cmdtools() {

pushd /tmp
# The file is cached on ossci-android S3 bucket
curl -Os --retry 3 "https://ossci-android.s3.us-west-1.amazonaws.com/${CMDTOOLS_FILENAME}"
curl -Os --retry 3 --retry-all-errors "https://ossci-android.s3.us-west-1.amazonaws.com/${CMDTOOLS_FILENAME}"
unzip -qo "${CMDTOOLS_FILENAME}" -d /opt

ls -lah /opt/cmdline-tools/bin
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/common/install_cache.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ install_ubuntu() {

install_binary() {
echo "Downloading sccache binary from S3 repo"
curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
curl --retry 3 --retry-all-errors https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
chmod +x /opt/cache/bin/sccache
}

Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_docs_reqs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ if [ -n "$BUILD_DOCS" ]; then
# Ignore error if gpg-agent doesn't exist (for Ubuntu 16.04)
apt-get install -y gpg-agent || :

curl --retry 3 -sL https://deb.nodesource.com/setup_16.x | sudo -E bash -
curl --retry 3 --retry-all-errors -sL https://deb.nodesource.com/setup_16.x | sudo -E bash -
sudo apt-get install -y nodejs

curl --retry 3 -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
curl --retry 3 --retry-all-errors -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list

apt-get update
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/common/install_linter.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,5 @@ source "$(dirname "${BASH_SOURCE[0]}")/utils.sh"
pip_install -r requirements-lintrunner.txt

# Install google-java-format
curl -L --retry 3 https://github.com/google/google-java-format/releases/download/v1.23.0/google-java-format_linux-x86-64 > /opt/google-java-format
curl -L --retry 3 --retry-all-errors https://github.com/google/google-java-format/releases/download/v1.23.0/google-java-format_linux-x86-64 > /opt/google-java-format
chmod +x /opt/google-java-format
54 changes: 53 additions & 1 deletion .ci/scripts/export_model_artifact.sh
Original file line number Diff line number Diff line change
Expand Up @@ -195,9 +195,17 @@ case "$HF_MODEL" in
PREPROCESSOR_FEATURE_SIZE=""
PREPROCESSOR_OUTPUT=""
;;
SocialLocalMobile/gemma-4-31B-it-HQQ-INT4)
MODEL_NAME="gemma4_31b"
TASK=""
MAX_SEQ_LEN=""
EXTRA_PIP=""
PREPROCESSOR_FEATURE_SIZE=""
PREPROCESSOR_OUTPUT=""
;;
*)
echo "Error: Unsupported model '$HF_MODEL'"
echo "Supported models: mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Mini-4B-Realtime-2602, openai/whisper-{small, medium, large, large-v2, large-v3, large-v3-turbo}, google/gemma-3-4b-it, Qwen/Qwen3-0.6B, nvidia/diar_streaming_sortformer_4spk-v2, nvidia/parakeet-tdt, facebook/dinov2-small-imagenet1k-1-layer, SocialLocalMobile/Qwen3.5-35B-A3B-HQQ-INT4"
echo "Supported models: mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Mini-4B-Realtime-2602, openai/whisper-{small, medium, large, large-v2, large-v3, large-v3-turbo}, google/gemma-3-4b-it, Qwen/Qwen3-0.6B, nvidia/diar_streaming_sortformer_4spk-v2, nvidia/parakeet-tdt, facebook/dinov2-small-imagenet1k-1-layer, SocialLocalMobile/Qwen3.5-35B-A3B-HQQ-INT4, SocialLocalMobile/gemma-4-31B-it-HQQ-INT4"
exit 1
;;
esac
Expand Down Expand Up @@ -459,6 +467,50 @@ if [ "$MODEL_NAME" = "qwen3_5_moe" ]; then
exit 0
fi

# Gemma 4 31B uses a prequantized checkpoint and custom export script
if [ "$MODEL_NAME" = "gemma4_31b" ]; then
pip install safetensors huggingface_hub gguf

# Download prequantized model outside OUTPUT_DIR to avoid uploading on failure
LOCAL_MODEL_DIR=$(mktemp -d)
INDUCTOR_CACHE=$(mktemp -d)
trap 'rm -rf "$LOCAL_MODEL_DIR" "$INDUCTOR_CACHE"' EXIT

python -c "from huggingface_hub import snapshot_download; snapshot_download('${HF_MODEL}', local_dir='${LOCAL_MODEL_DIR}')"

# Sanity check: run inference on the prequantized model
echo "::group::Inference sanity check"
INFERENCE_OUTPUT=$(python -m executorch.examples.models.gemma4_31b.inference \
--prequantized "$LOCAL_MODEL_DIR" \
--prompt "What is the capital of France?" \
--max-new-tokens 32 \
--temperature 0 \
--no-compile 2>&1)
echo "$INFERENCE_OUTPUT"
if ! echo "$INFERENCE_OUTPUT" | grep -q "Paris"; then
echo "ERROR: Inference sanity check failed — expected 'Paris' in output"
exit 1
fi
echo "::endgroup::"

# Copy tokenizer for the runner
cp "$LOCAL_MODEL_DIR/tokenizer.json" "${OUTPUT_DIR}/tokenizer.json"

# Export to .pte/.ptd (short cache dir avoids objcopy symbol length issues)
echo "::group::Export"
TORCHINDUCTOR_CACHE_DIR="$INDUCTOR_CACHE" \
python -m executorch.examples.models.gemma4_31b.export \
--prequantized "$LOCAL_MODEL_DIR" \
--output-dir "${OUTPUT_DIR}"
echo "::endgroup::"

test -f "${OUTPUT_DIR}/model.pte"
test -f "${OUTPUT_DIR}/aoti_cuda_blob.ptd"
ls -al "${OUTPUT_DIR}"

exit 0
fi

MAX_SEQ_LEN_ARG=""
if [ -n "$MAX_SEQ_LEN" ]; then
MAX_SEQ_LEN_ARG="--max_seq_len $MAX_SEQ_LEN"
Expand Down
2 changes: 1 addition & 1 deletion .ci/scripts/setup-emscripten.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ set -ex

# need version >= 17
install_node() {
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
curl --retry 3 --retry-all-errors -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
source "$HOME/.nvm/nvm.sh"
nvm install 22
}
Expand Down
4 changes: 2 additions & 2 deletions .ci/scripts/setup-macos.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ install_buck() {
# team for help.
BUCK2_VERSION=$(cat ci_commit_pins/buck2.txt)
BUCK2=buck2-aarch64-apple-darwin-${BUCK2_VERSION}.zst
curl -s "https://ossci-macos.s3.amazonaws.com/${BUCK2}" -o "${BUCK2}"
curl -s --retry 3 --retry-all-errors "https://ossci-macos.s3.amazonaws.com/${BUCK2}" -o "${BUCK2}"

zstd -d "${BUCK2}" -o buck2

Expand Down Expand Up @@ -68,7 +68,7 @@ install_sccache() {
# NB: The function is adopted from PyTorch MacOS build workflow
# https://github.com/pytorch/pytorch/blob/main/.github/workflows/_mac-build.yml
if ! command -v sccache &> /dev/null; then
sudo curl --retry 3 "https://s3.amazonaws.com/ossci-macos/sccache/sccache-v0.4.1-${RUNNER_ARCH}" --output "${SCCACHE_PATH}/sccache"
sudo curl --retry 3 --retry-all-errors "https://s3.amazonaws.com/ossci-macos/sccache/sccache-v0.4.1-${RUNNER_ARCH}" --output "${SCCACHE_PATH}/sccache"
sudo chmod +x "${SCCACHE_PATH}/sccache"
fi

Expand Down
4 changes: 2 additions & 2 deletions .ci/scripts/setup-mediatek-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ install_neuropilot() {
echo "Start installing neuropilot."
mkdir -p "${MEDIATEK_INSTALLATION_DIR}"

curl -Lo /tmp/neuropilot-express.tar.gz "https://s3.ap-southeast-1.amazonaws.com/mediatek.neuropilot.com/06302508-4c94-4bf2-9789-b0ee44e83e27.gz"
curl -Lo /tmp/neuropilot-express.tar.gz --retry 3 --retry-all-errors "https://s3.ap-southeast-1.amazonaws.com/mediatek.neuropilot.com/06302508-4c94-4bf2-9789-b0ee44e83e27.gz"
echo "Finishing downloading neuropilot sdk."
tar zxvf /tmp/neuropilot-express.tar.gz --strip-components=1 --directory "${MEDIATEK_INSTALLATION_DIR}"
echo "Finishing unzip neuropilot sdk."
Expand All @@ -33,7 +33,7 @@ setup_neuropilot() {
}

setup_calibration_data() {
curl -Lo /tmp/imagenette2-160.tgz https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
curl -Lo /tmp/imagenette2-160.tgz --retry 3 --retry-all-errors https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
tar zxvf /tmp/imagenette2-160.tgz --strip-components=1 --directory "${MEDIATEK_INSTALLATION_DIR}"
}

Expand Down
2 changes: 1 addition & 1 deletion .ci/scripts/setup-openvino.sh
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ else
echo "Using OpenVINO stable release: ${OPENVINO_BUILD}"
fi

curl -Lo /tmp/openvino_toolkit.tgz --retry 3 --fail ${OPENVINO_URL}
curl -Lo /tmp/openvino_toolkit.tgz --retry 3 --retry-all-errors --fail ${OPENVINO_URL}
tar -xzf /tmp/openvino_toolkit.tgz
mv "${OPENVINO_EXTRACTED_DIR}" openvino

Expand Down
2 changes: 1 addition & 1 deletion .ci/scripts/setup-samsung-linux-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ download_and_extract() {
local out_file="$3"

echo "Downloading from ${download_url}..."
curl -fsSL --retry 3 \
curl -fsSL --retry 3 --retry-all-errors \
-H "apikey: ${API_KEY}" \
-o "${out_file}" \
"${download_url}"
Expand Down
4 changes: 2 additions & 2 deletions .ci/scripts/setup-vulkan-linux-deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ install_swiftshader() {

_tmp_archive="/tmp/${_swiftshader_archive}"

curl --silent --show-error --location --fail --retry 3 \
curl --silent --show-error --location --fail --retry 3 --retry-all-errors \
--output "${_tmp_archive}" "$_https_amazon_aws/${_swiftshader_archive}"

tar -C "${_swiftshader_dir}" -xzf "${_tmp_archive}"
Expand All @@ -35,7 +35,7 @@ install_vulkan_sdk() {

_tmp_archive="/tmp/vulkansdk.tar.gz"

curl --silent --show-error --location --fail --retry 3 \
curl --silent --show-error --location --fail --retry 3 --retry-all-errors \
--output "${_tmp_archive}" "${_vulkan_sdk_url}"

tar -C "${_vulkan_sdk_dir}" -xJf "${_tmp_archive}"
Expand Down
1 change: 1 addition & 0 deletions .ci/scripts/test_cortex_m_e2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ et_root_dir=$(realpath "${script_dir}/../..")

# Quantization is the default for the cortex-m55 target; run.sh's
# arg parser only recognizes --no_quantize, so we omit any explicit flag.
export ARM_FVP_INSTALL_I_AGREE_TO_THE_CONTAINED_EULA=True
bash "${et_root_dir}/examples/arm/run.sh" \
--model_name="${MODEL}" \
--target=cortex-m55 \
Expand Down
2 changes: 1 addition & 1 deletion .ci/scripts/test_ios_ci.sh
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ mv $MODEL_NAME*.pte "$APP_PATH/Resources/Models/MobileNet/"

say "Downloading Labels"

curl https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt \
curl --retry 3 --retry-all-errors https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt \
-o "$APP_PATH/Resources/Models/MobileNet/imagenet_classes.txt"

say "Creating Simulator"
Expand Down
31 changes: 23 additions & 8 deletions .ci/scripts/test_model_e2e.sh
Original file line number Diff line number Diff line change
Expand Up @@ -228,9 +228,21 @@ case "$HF_MODEL" in
AUDIO_FILE=""
IMAGE_PATH=""
;;
SocialLocalMobile/gemma-4-31B-it-HQQ-INT4)
MODEL_NAME="gemma4_31b"
RUNNER_TARGET="gemma4_31b_runner"
RUNNER_PATH="gemma4_31b"
EXPECTED_OUTPUT="Paris"
PREPROCESSOR=""
TOKENIZER_URL=""
TOKENIZER_FILE="tokenizer.json"
AUDIO_URL=""
AUDIO_FILE=""
IMAGE_PATH=""
;;
*)
echo "Error: Unsupported model '$HF_MODEL'"
echo "Supported models: mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Mini-4B-Realtime-2602, nvidia/diar_streaming_sortformer_4spk-v2, openai/whisper series (whisper-{small, medium, large, large-v2, large-v3, large-v3-turbo}), google/gemma-3-4b-it, Qwen/Qwen3-0.6B, nvidia/parakeet-tdt, facebook/dinov2-small-imagenet1k-1-layer, SocialLocalMobile/Qwen3.5-35B-A3B-HQQ-INT4"
echo "Supported models: mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Mini-4B-Realtime-2602, nvidia/diar_streaming_sortformer_4spk-v2, openai/whisper series (whisper-{small, medium, large, large-v2, large-v3, large-v3-turbo}), google/gemma-3-4b-it, Qwen/Qwen3-0.6B, nvidia/parakeet-tdt, facebook/dinov2-small-imagenet1k-1-layer, SocialLocalMobile/Qwen3.5-35B-A3B-HQQ-INT4, SocialLocalMobile/gemma-4-31B-it-HQQ-INT4"
exit 1
;;
esac
Expand All @@ -244,19 +256,19 @@ echo "::group::Prepare $MODEL_NAME Artifacts"


# Download tokenizer files (skip for models that bundle tokenizer in export or do not use one)
if [ "$MODEL_NAME" != "parakeet" ] && [ "$MODEL_NAME" != "voxtral_realtime" ] && [ "$MODEL_NAME" != "sortformer" ] && [ "$MODEL_NAME" != "dinov2" ] && [ "$MODEL_NAME" != "qwen3_5_moe" ]; then
if [ "$MODEL_NAME" != "parakeet" ] && [ "$MODEL_NAME" != "voxtral_realtime" ] && [ "$MODEL_NAME" != "sortformer" ] && [ "$MODEL_NAME" != "dinov2" ] && [ "$MODEL_NAME" != "qwen3_5_moe" ] && [ "$MODEL_NAME" != "gemma4_31b" ]; then
if [ "$TOKENIZER_FILE" != "" ]; then
curl -L $TOKENIZER_URL/$TOKENIZER_FILE -o $MODEL_DIR/$TOKENIZER_FILE
curl -L --retry 3 --retry-all-errors $TOKENIZER_URL/$TOKENIZER_FILE -o $MODEL_DIR/$TOKENIZER_FILE
else
curl -L $TOKENIZER_URL/tokenizer.json -o $MODEL_DIR/tokenizer.json
curl -L $TOKENIZER_URL/tokenizer_config.json -o $MODEL_DIR/tokenizer_config.json
curl -L $TOKENIZER_URL/special_tokens_map.json -o $MODEL_DIR/special_tokens_map.json
curl -L --retry 3 --retry-all-errors $TOKENIZER_URL/tokenizer.json -o $MODEL_DIR/tokenizer.json
curl -L --retry 3 --retry-all-errors $TOKENIZER_URL/tokenizer_config.json -o $MODEL_DIR/tokenizer_config.json
curl -L --retry 3 --retry-all-errors $TOKENIZER_URL/special_tokens_map.json -o $MODEL_DIR/special_tokens_map.json
fi
fi

# Download test files
if [ "$AUDIO_URL" != "" ]; then
curl -L $AUDIO_URL -o ${MODEL_DIR}/$AUDIO_FILE
curl -L --retry 3 --retry-all-errors $AUDIO_URL -o ${MODEL_DIR}/$AUDIO_FILE
elif [[ "$MODEL_NAME" == *whisper* ]] || [ "$MODEL_NAME" = "voxtral_realtime" ]; then
if ! command -v ffmpeg >/dev/null; then
if [ "$(uname -s)" = "Linux" ] && command -v apt-get >/dev/null; then
Expand All @@ -278,7 +290,7 @@ fi

# Download test image for vision models
if [ -n "${IMAGE_URL:-}" ]; then
curl -L "$IMAGE_URL" -o "${MODEL_DIR}/test_image.jpg"
curl -L --retry 3 --retry-all-errors "$IMAGE_URL" -o "${MODEL_DIR}/test_image.jpg"
fi

ls -al
Expand Down Expand Up @@ -368,6 +380,9 @@ EOF
qwen3_5_moe)
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0 --cuda_graph"
;;
gemma4_31b)
RUNNER_ARGS="$RUNNER_ARGS --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --prompt 'What is the capital of France?' --max_new_tokens 128 --temperature 0 --cuda_graph"
;;
voxtral_realtime)
RUNNER_ARGS="--model_path ${MODEL_DIR}/model.pte --tokenizer_path ${MODEL_DIR}/$TOKENIZER_FILE --preprocessor_path ${MODEL_DIR}/$PREPROCESSOR --audio_path ${MODEL_DIR}/$AUDIO_FILE --temperature 0"
# Add CUDA data path if present
Expand Down
49 changes: 47 additions & 2 deletions .ci/scripts/test_riscv_qemu.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# LICENSE file in the root directory of this source tree.

# CI wrapper: install RISC-V cross-compile + qemu-user tooling, then run the
# RISC-V Phase 1 smoke test (export, cross-compile, qemu-user execution) via
# RISC-V smoke test (export, cross-compile, qemu-user execution) via
# examples/riscv/run.sh. The bundled-IO comparison and Test_result: PASS
# check are done by run.sh.

Expand All @@ -14,5 +14,50 @@ set -eu
script_dir=$(realpath "$(dirname "${BASH_SOURCE[0]}")")
et_root_dir=$(realpath "${script_dir}/../..")

model="add"
xnnpack=false
quantize=false
verbose=false
verbose_xnnpack=false

usage() {
cat <<EOF
Usage: $(basename "$0") [options]
Options:
--model=<NAME> Which model to export and run (default: add)
--xnnpack Enable the XNNPACK backend (AOT partitioner + runtime)
--quantize Produce an 8-bit quantized model
--verbose Enable XNNPACK partitioner DEBUG logging and dump the lowered graph
--verbose-xnnpack Build XNNPACK with XNN_LOG_LEVEL=4 to log microkernel dispatch
-h, --help Show this help
EOF
}

for arg in "$@"; do
case $arg in
--model=*) model="${arg#*=}" ;;
--xnnpack) xnnpack=true ;;
--quantize) quantize=true ;;
--verbose) verbose=true ;;
--verbose-xnnpack) verbose_xnnpack=true ;;
-h|--help) usage; exit 0 ;;
*) echo "Unknown option: $arg" >&2; usage; exit 1 ;;
esac
done

run_extra_args=()
if ${xnnpack}; then
run_extra_args+=(--xnnpack)
fi
if ${quantize}; then
run_extra_args+=(--quantize)
fi
if ${verbose}; then
run_extra_args+=(--verbose)
fi
if ${verbose_xnnpack}; then
run_extra_args+=(--verbose-xnnpack)
fi

bash "${et_root_dir}/examples/riscv/setup.sh"
bash "${et_root_dir}/examples/riscv/run.sh"
bash "${et_root_dir}/examples/riscv/run.sh" --model="${model}" "${run_extra_args[@]}"
Loading
Loading