Thank you for your great contribution, and congratulations on WeDetect being accepted to CVPR 2026! 🎉
I'm very impressed by WeDetect's speed compared to other open-vocabulary detectors. I'm planning to fine-tune WeDetect-Tiny for a design-domain task: recognizing artistic text, graphics, logos, etc. on flat design images (like T-shirt prints, cushion covers, and tote bags), in order to separate editable elements from backgrounds.
However, I've noticed that most open-source datasets in the design field (e.g., InfoDet, ARText, Logo datasets, LICA) are annotated in English, while WeDetect was pre-trained with Chinese annotations. This makes me curious about the model's cross-language transfer learning capability.
Could you please share some suggestions or best practices for the fine-tuning stage in this scenario? For example:
Should I translate the English class labels into Chinese to better align with the pre-trained text encoder?
Or can WeDetect generalize well to English prompts directly?
Are there any recommended fine-tuning strategies (e.g., learning rate, PEFT vs full fine-tuning) for this kind of cross-lingual adaptation?
Any advice would be greatly appreciated. Thanks again for your amazing work!
Thank you for your great contribution, and congratulations on WeDetect being accepted to CVPR 2026! 🎉
I'm very impressed by WeDetect's speed compared to other open-vocabulary detectors. I'm planning to fine-tune WeDetect-Tiny for a design-domain task: recognizing artistic text, graphics, logos, etc. on flat design images (like T-shirt prints, cushion covers, and tote bags), in order to separate editable elements from backgrounds.
However, I've noticed that most open-source datasets in the design field (e.g., InfoDet, ARText, Logo datasets, LICA) are annotated in English, while WeDetect was pre-trained with Chinese annotations. This makes me curious about the model's cross-language transfer learning capability.
Could you please share some suggestions or best practices for the fine-tuning stage in this scenario? For example:
Should I translate the English class labels into Chinese to better align with the pre-trained text encoder?
Or can WeDetect generalize well to English prompts directly?
Are there any recommended fine-tuning strategies (e.g., learning rate, PEFT vs full fine-tuning) for this kind of cross-lingual adaptation?
Any advice would be greatly appreciated. Thanks again for your amazing work!