Skip to content

[BUG] demo code raise no output schema exception #1105

@breadbread1984

Description

@breadbread1984

Bug description

demo code at section Extract and save Item embeddings raises no output schema exception.

Steps/Code to reproduce bug

just follow the demo

Expected behavior

Environment details

  • Merlin version:
  • merlin 0.0.1
  • merlin-core 0+untagged.1.g6d396aa
  • merlin-dataloader 0+untagged.1.g1441a12
  • merlin-hps 1.0.0
  • merlin-models 0+untagged.1.geb1e541
  • merlin-sok 2.0.0
  • merlin-systems 0+untagged.1.ga19d311
  • Platform: Linux 12ce9556ef42 5.4.0-200-generic
  • Python version: 3.10.12
  • PyTorch version (GPU?): N/A
  • Tensorflow version (GPU?): 2.12.0+nv23.6

Additional context

  File "/root/raid/common_models/recommend_system/merlin/aliccp/extract_item_feature.py", line 56, in main
    item_embeddings = workflow.fit_transform(Dataset(item_features)).to_ddf().compute()
  File "/usr/local/lib/python3.10/dist-packages/nvtabular/workflow/workflow.py", line 236, in fit_transform
    self.fit(dataset)
  File "/usr/local/lib/python3.10/dist-packages/nvtabular/workflow/workflow.py", line 213, in fit
    self.executor.fit(dataset, self.graph)
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 501, in fit
    ).sample_dtypes()
  File "/usr/local/lib/python3.10/dist-packages/merlin/io/dataset.py", line 1169, in sample_dtypes
    _real_meta = self.engine.sample_data(n=n)
  File "/usr/local/lib/python3.10/dist-packages/merlin/io/dataset_engine.py", line 64, in sample_data
    _head = _ddf.partitions[partition_index].head(n)
  File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/core.py", line 1268, in head
    return self._head(n=n, npartitions=npartitions, compute=compute, safe=safe)
  File "/usr/local/lib/python3.10/dist-packages/dask/dataframe/core.py", line 1302, in _head
    result = result.compute()
  File "/usr/local/lib/python3.10/dist-packages/dask/base.py", line 314, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/usr/local/lib/python3.10/dist-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/usr/local/lib/python3.10/dist-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/usr/local/lib/python3.10/dist-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/usr/local/lib/python3.10/dist-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/usr/local/lib/python3.10/dist-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/usr/local/lib/python3.10/dist-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/usr/local/lib/python3.10/dist-packages/dask/utils.py", line 72, in apply
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 103, in transform
    transformed_data = self._execute_node(node, transformable, capture_dtypes, strict)
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 117, in _execute_node
    upstream_outputs = self._run_upstream_transforms(
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 135, in _run_upstream_transforms
    node_output = self._execute_node(
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 117, in _execute_node
    upstream_outputs = self._run_upstream_transforms(
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 135, in _run_upstream_transforms
    node_output = self._execute_node(
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 125, in _execute_node
    transform_output = self._run_node_transform(node, transform_input, capture_dtypes, strict)
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 255, in _run_node_transform
    raise exc
  File "/usr/local/lib/python3.10/dist-packages/merlin/dag/executors.py", line 242, in _run_node_transform
    transformed_data = node.op.transform(selection, input_data)
  File "/usr/local/lib/python3.10/dist-packages/merlin/systems/dag/ops/workflow.py", line 107, in transform
    output = self.workflow._transform_df(transformable)
  File "/usr/local/lib/python3.10/dist-packages/nvtabular/workflow/workflow.py", line 256, in _transform_df
    raise ValueError("no output schema")
ValueError: no output schema

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions