Lab Assignment wey come from AI for Beginners Curriculum.
For dis lab, we go challenge you to train Word2Vec model wey dey use Skip-Gram technique. Train one network wey get embedding to fit predict words wey dey near for
You fit use any book wey you like. You fit find plenty free texts for Project Gutenberg, like dis direct link to Alice's Adventures in Wonderland) wey Lewis Carroll write. Or, you fit use Shakespeare plays, wey you fit get if you run dis code:
path_to_file = tf.keras.utils.get_file(
'shakespeare.txt',
'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')If you get time and wan sabi more for dis topic, try check dis things:
- How embedding size dey affect the results?
- How different text styles dey affect the result?
- Pick some very different types of words and their synonyms, get their vector representations, use PCA to reduce dimensions to 2, and plot them for 2D space. You dey see any pattern?
Disclaimer:
Dis dokyument don translate wit AI translation service Co-op Translator. Even as we dey try make sure say e correct, abeg make you sabi say machine translation fit get mistake or no dey accurate well. Di original dokyument for im native language na di main source wey you go fit trust. For important information, e better make professional human translator check am. We no go fit take blame for any misunderstanding or wrong interpretation wey fit happen because you use dis translation.