2 freely available possible datasets have already been identified, more are welcome:
- Mozilla Common Voice https://voice.mozilla.org/en
CC-0 license
- Openslr resources http://openslr.org/resources.php
Each resource has own license ranging from "unrestricted" to "CC-BY-NC-ND 3.0"
Remark: Some of the Openslr data is likely to have been used for training various STT systems, as such it may not always be the most fair indicator
Open questions:
- Which ground truth materials might we use for evaluating vendors' solutions? Will we build our own dataset? Or both?
- How will we include these resources into our product?
2 freely available possible datasets have already been identified, more are welcome:
CC-0 license
Each resource has own license ranging from "unrestricted" to "CC-BY-NC-ND 3.0"
Remark: Some of the Openslr data is likely to have been used for training various STT systems, as such it may not always be the most fair indicator
Open questions: