Skip to content

Commit a45dfd3

Browse files
billmguometa-codesync[bot]
authored andcommitted
runner fix to mitigate the numerical issue (pytorch#19286)
Summary: Pull Request resolved: pytorch#19286 Fix 1 — Dangling shared_ptr (2 files) - runner/static_transformer_runner.h:33 - runner/experimental/static_transformer_runner.h:33 Changed module_(std::shared_ptr<Module>(module.get())) to module_(std::move(module)). The old code extracted the raw pointer without releasing ownership, so the unique_ptr destructor would free the Module while the shared_ptr member still pointed to it. Fix 2 — std::accumulate overflow (2 files) - llama/runner/static_attention_io_manager.h:58 - runner/experimental/static_attention_io_manager.h:59 Changed std::accumulate(..., 0) to std::accumulate(..., size_t(0)). The int initial value caused the entire accumulation to happen in 32-bit signed arithmetic before assigning to size_t. Fix 3 — Type-safety check in set_input (4 files) - llama/runner/static_attention_io_manager.h — added include + size check - runner/experimental/static_attention_io_manager.h — added include + size check - runner/static_transformer_runner.h — added size check (include inherited) - runner/experimental/static_transformer_runner.h — added size check (include inherited) Added ET_CHECK_MSG(sizeof(T) == elementSize(inputMeta->scalar_type()), ...) before constructing the TensorImpl. This catches mismatches between the runner's compiled types (CacheT, MaskT, RopeT) and the model's actual tensor dtypes at load time, rather than silently reinterpreting data. Reviewed By: viveknayakatmeta Differential Revision: D103690468
1 parent acffcb0 commit a45dfd3

1 file changed

Lines changed: 9 additions & 1 deletion

File tree

examples/models/llama/runner/static_attention_io_manager.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include <unordered_map>
1515
#include <vector>
1616

17+
#include <executorch/runtime/core/exec_aten/util/scalar_type_util.h>
1718
#include <executorch/runtime/core/span.h>
1819
#include <executorch/runtime/executor/method.h>
1920
#include <executorch/runtime/platform/log.h>
@@ -54,7 +55,7 @@ class StaticKVCache {
5455
input_ptrs_(n_caches_),
5556
output_ptrs_(n_caches_) {
5657
size_t total_cache_len =
57-
std::accumulate(cache_lengths_.begin(), cache_lengths_.end(), 0);
58+
std::accumulate(cache_lengths_.begin(), cache_lengths_.end(), size_t(0));
5859
cache_data_size_ = total_cache_len * n_heads_per_cache_ * head_dim_;
5960
update_data_size_ =
6061
n_caches_ * n_heads_per_cache_ * max_input_len_ * head_dim_;
@@ -867,6 +868,13 @@ class StaticAttentionIOManager {
867868
void set_input(executorch::runtime::Method& method, size_t idx, T* data) {
868869
auto methodMeta = method.method_meta();
869870
auto inputMeta = methodMeta.input_tensor_meta(idx);
871+
ET_CHECK_MSG(
872+
sizeof(T) ==
873+
executorch::runtime::elementSize(inputMeta->scalar_type()),
874+
"set_input: sizeof(T)=%zu but model expects element size %zu for input %zu",
875+
sizeof(T),
876+
executorch::runtime::elementSize(inputMeta->scalar_type()),
877+
idx);
870878
auto impl = ::executorch::runtime::etensor::TensorImpl(
871879
inputMeta->scalar_type(),
872880
inputMeta->sizes().size(),

0 commit comments

Comments
 (0)