Skip to content

Benchmark test nixlbench/kvcache#10

Open
kpjeeja wants to merge 1 commit into
intel-staging:libfabricfrom
kpjeeja:jeeja_benmark_test
Open

Benchmark test nixlbench/kvcache#10
kpjeeja wants to merge 1 commit into
intel-staging:libfabricfrom
kpjeeja:jeeja_benmark_test

Conversation

@kpjeeja
Copy link
Copy Markdown

@kpjeeja kpjeeja commented Jan 20, 2026

Enable KVCahe/ Nixl Benchmark test to support Intel Gaudi

Signed-off-by: Jeeja <jeejakp@habana.ai>
@github-actions
Copy link
Copy Markdown

👋 Hi kpjeeja! Thank you for contributing to intel-staging/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀


matrix: np.ndarray
mem_type: Literal["cuda", "vram", "cpu", "dram"]
mem_type: Literal["cuda", "vram", "cpu", "dram","hpu"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we setting mem_type as hpu in any test/usecase ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure. Currently it includes both cuda/cpu, so added hpu. KVBench i can't run it as it requires model support

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it is not used.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it then

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test uses the mem_type == device in many places, so i will keep the hpu in mem_type similar to cuda

)

if (
not os.environ.get("HABANA_VISIBLE_MODULES")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to check for devices too as its an exported variable

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next line has a device check - self.traffic_pattern.mem_type == "hpu"

if synapse_lib.found() and hlthunk_lib.found()
synapseai_dep = declare_dependency(dependencies: [synapse_lib, hlthunk_lib])
elif hlthunk_lib.found()
# Fallback to just hl-thunk if libSynapse not available
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this fallback ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have adopted it from the nixl meason file.

openmp_dep = dependency('openmp', required: true)

# Check for etcd-cpp-api - use multiple methods for discovery
# Try pkg-config first
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

??

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd it was not finding the path . so i have to add it to search in the default path

# Check for etcd-cpp-api - use multiple methods for discovery
# Try pkg-config first
etcd_dep = dependency('etcd-cpp-api', required : false)
if not etcd_dep.found()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this change ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, otherwise it was never finding the etcd lib.

if not etcd_available
error('No runtime available or not found')
endif
#if not etcd_available
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

static synDeviceId deviceHandle;
static synStreamHandle stream;

namespace Synapseaiutils {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this file not in utils ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need some clean. will move it as part of another JIRA

}

int
init_synapse_device() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easy to upstream if its class based/ cpp

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will take it as a cleanup activity

utils_dep = declare_dependency(
link_with: utils_lib,
dependencies: utils_deps,
dependencies: [synapseaiutils_deps, utils_deps],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if HAVE_SYNAPSEAI is undefined ? wont this cause a problem?

nixl_worker_sources,
include_directories: inc_dir,
dependencies: nixl_worker_deps,
dependencies: [synapseaiutils_deps, nixl_worker_deps],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

getVramDescSynapseai(int devid, size_t buffer_size, uint8_t memset_value) {
void *host_addr = calloc(1, buffer_size);
memset(host_addr, memset_value, buffer_size);
auto device_buffer = Synapseaiutils::allocate_synapse_memory(buffer_size, host_addr);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err check

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it already taken care in the allocation part. i will cleanup with the directory change

Comment thread pyproject.toml

[tool.meson-python.args]
setup = ['-Dinstall_headers=false']
setup = ["-Dinstall_headers=false"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what issue is seen here ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, not required

Copy link
Copy Markdown

@jeromean jeromean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic, err checks, code reorg needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants