Skip to content

推理减速问题 #72

@susimonxu

Description

@susimonxu

使用opendelta来微调cpmbee的10b后,使用加载lora的方式进行推理(如下所示)和原本进行推理速度相比会变慢(减慢50%),请问如何解决。
是否可以将lora与原权重进行合并。

    tokenizer = CPMBeeTokenizer()
    model = CPMBeeTorch(config=config)
    delta_model = LoraModel(backbone_model=model, modified_modules=["project_q", "project_v"], backend="hf")
    model.load_state_dict(torch.load(args.delta), strict=False)
    model.load_state_dict(torch.load(ckpt_path), strict=False)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions