You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Compiling a static-shape (iOS) LLM program whose weights are linear blockwise-INT4
(blockwise_shift_scale) for the Neural Engine crashes coreai-build with a SIGSEGV inside MPSGraph's anePreCompileBinary. The byte-for-byte-identical
program structure with palettized weights (lut_to_dense) compiles cleanly to the
ANE. So the ANE pre-compiler cannot legalize a linear-INT4 static program and segfaults
instead of failing gracefully.
This blocks a matched-quantization ANE-vs-GPU comparison on Qwen3-0.6B: the dynamic
(--platform macOS) export ships linear INT4, the static (--platform iOS) export
ships palettized weights, and there is currently no way to put the GPU export's
exact INT4 scheme on the ANE — the attempt crashes the compiler.
Environment
macOS 27.0 (26A5353q), Apple M4 Max (Mac16,9)
coreai-build: Metal toolchain v27.1.5194.15, build 3600.67.5.8.1
Crash. The same static structure with linear INT4 weights. The CLI couples the
quant scheme to the platform (--platform iOS --compression 4bit → RuntimeError: macOS quantization preset provided, but platform is iOS), so the linear
INT4 is applied at the MLIR level via the same quantize_weights primitive the diffusion
pipeline uses (coreai_models/export/compiler.py::apply_mlir_quantization):
# repro_ane_int4_crash.py — run from a coreai-models checkout: uv run python repro_ane_int4_crash.pyimportasyncio, torchfromtransformersimportAutoConfigfromcoreai_models.export.pipelineimportExportConfigfromcoreai_models.export.iosimportexport_ios_modelfromcoreai_models.export.metadataimportbuild_aimodel_metadatafromcoreai_models.models.registryimportget_model_entryfromcoreai_opt.coreai_utilsimportCompressionGranularity, DType, quantize_weightsfromcoreai_opt.coreai_utils.commonimportQSchemeasyncdefmain():
hf_id, ctx="Qwen/Qwen3-0.6B", 4096cfg=AutoConfig.from_pretrained(hf_id); cfg.max_position_embeddings=ctxentry=get_model_entry(cfg.model_type)
model=entry.ios_class.from_hf(hf_id, max_context_length=ctx, target_dtype=torch.float16).eval()
ec=ExportConfig(hf_model_id=hf_id, variant="iOS", max_context_length=ctx,
compute_precision="float16", compression="int4_linear",
output_dir="exports", output_name="qwen3_0_6b_ios_int4linear")
prog=awaitexport_ios_model(model, cfg, ec)
# linear symmetric INT4, per-block 32 — same scheme as the macOS `4bit` presetprog=quantize_weights(prog, dtype=DType.INT4, qscheme=QScheme.SYMMETRIC,
granularity=CompressionGranularity.PER_BLOCK, block_size=32,
weight_num_threshold=32768, in_place=True)
prog.optimize()
out="exports/qwen3_0_6b_ios_int4linear/qwen3_0_6b_ios_int4linear.aimodel"prog.save_asset(out, build_aimodel_metadata(hf_id))
print("saved", out)
asyncio.run(main())
The produced .aimodel is valid (coreai-build inspect shows the same 34 static-shape
functions as the palettized control — extend_{256..4096}_{8,16,64}, prompt_opt_*, gather_embeddings_*, load_embeddings — with weight op blockwise_shift_scale, dtype Int4). Only the AOT compile crashes.
Expected
Compile to the ANE, or fail with a diagnostic (e.g. "INT4 linear weights are not
supported on the ANE; use palettization"). A segfault is never acceptable.
Actual
coreai-build runs ~5 min at 100% CPU, then terminates with SIGSEGV (exit 139),
no stdout/stderr diagnostic, and no .aimodelc. Reproduced 2/2 runs. Crash report
(~/Library/Logs/DiagnosticReports/coreai-build-*.ips):
The control (palettized) and crash (linear-INT4) .aimodels differ only in the
weight encoding (lut_to_dense vs blockwise_shift_scale); structure, shapes, and the
fp16 embedding front-end are identical. So the trigger is specifically the linear
blockwise-INT4 weight form on the ANE pre-compile path.
The dynamic (GPU) --platform macOS export lowers to the sameblockwise_shift_scale
form and compiles/runs fine on the GPU MPSGraph path — only the ANE pre-compiler
crashes on it.
--preferred-compute neural-engine on the dynamic export is a no-op (still a GPU
MPSGraph delegate, 0 ANE regions), so recompiling the existing GPU INT4 export onto the
ANE is not an alternative.
Summary
Compiling a static-shape (iOS) LLM program whose weights are linear blockwise-INT4
(
blockwise_shift_scale) for the Neural Engine crashescoreai-buildwith aSIGSEGVinside MPSGraph'sanePreCompileBinary. The byte-for-byte-identicalprogram structure with palettized weights (
lut_to_dense) compiles cleanly to theANE. So the ANE pre-compiler cannot legalize a linear-INT4 static program and segfaults
instead of failing gracefully.
This blocks a matched-quantization ANE-vs-GPU comparison on Qwen3-0.6B: the dynamic
(
--platform macOS) export ships linear INT4, the static (--platform iOS) exportships palettized weights, and there is currently no way to put the GPU export's
exact INT4 scheme on the ANE — the attempt crashes the compiler.
Environment
coreai-build: Metal toolchain v27.1.5194.15, build 3600.67.5.8.1coreai-core 1.0.0b1,coreai-torch 0.4.0,coreai-opt 0.2.0Qwen/Qwen3-0.6Bviacoreai.llm.exportReproduction
Control (works). Uniform 4-bit palettized static export → compiles to ANE:
uv run coreai.llm.export qwen3-0.6b --platform iOS \ --compression 4bit_weight_palettized_group32 --output-name qwen3_0_6b_ios_pure4bit xcrun coreai-build compile exports/qwen3_0_6b_ios_pure4bit/qwen3_0_6b_ios_pure4bit.aimodel \ --platform iOS --preferred-compute neural-engine --architecture h18p --output /tmp/ok # OK — compiled .aimodelc has 31 `*_ANE_region_*` segments, 0 non-ANE.Crash. The same static structure with linear INT4 weights. The CLI couples the
quant scheme to the platform (
--platform iOS --compression 4bit→RuntimeError: macOS quantization preset provided, but platform is iOS), so the linearINT4 is applied at the MLIR level via the same
quantize_weightsprimitive the diffusionpipeline uses (
coreai_models/export/compiler.py::apply_mlir_quantization):The produced
.aimodelis valid (coreai-build inspectshows the same 34 static-shapefunctions as the palettized control —
extend_{256..4096}_{8,16,64},prompt_opt_*,gather_embeddings_*,load_embeddings— with weight opblockwise_shift_scale, dtypeInt4). Only the AOT compile crashes.Expected
Compile to the ANE, or fail with a diagnostic (e.g. "INT4 linear weights are not
supported on the ANE; use palettization"). A segfault is never acceptable.
Actual
coreai-buildruns ~5 min at 100% CPU, then terminates withSIGSEGV(exit 139),no stdout/stderr diagnostic, and no
.aimodelc. Reproduced 2/2 runs. Crash report(
~/Library/Logs/DiagnosticReports/coreai-build-*.ips):Notes
.aimodels differ only in theweight encoding (
lut_to_densevsblockwise_shift_scale); structure, shapes, and thefp16 embedding front-end are identical. So the trigger is specifically the linear
blockwise-INT4 weight form on the ANE pre-compile path.
--platform macOSexport lowers to the sameblockwise_shift_scaleform and compiles/runs fine on the GPU MPSGraph path — only the ANE pre-compiler
crashes on it.
--preferred-compute neural-engineon the dynamic export is a no-op (still a GPUMPSGraph delegate, 0 ANE regions), so recompiling the existing GPU INT4 export onto the
ANE is not an alternative.
slice_updatelowering crash on the static path).Happy to attach the full
.ipscrash reports.