This project includes the data and models described in the paper:
"Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training", Hila Gonen and Yoav Goldberg, arXiv:1810.11895
- Python 2.7
- DyNet 2.0
- Carmel Finite-State Toolkit
As a first step, after cloning/downloading the repository, fetch all data files using the fetch_data.sh script as follows (from the codeswitching folder):
sh fetche_data.sh
This will download all data files needed for this repository into the respective directories.
Please refer to the detailed README in the evaluation_dataset folder.
Please refer to the detailed README in the language_model folder.
If you find this project useful, please cite the paper:
@article{gonen2018,
title ={Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training},
author ={Gonen, Hila and Goldberg, Yoav},
journal ={arXiv preprint arXiv:1810.11895},
year ={2018}
}
If you have any questions or suggestions, please contact Hila Gonen.
This project is licensed under Apache License - see the LICENSE file for details.