Resono is designed to be a platform for creators of video content that would like real feedback on their videos. Resono captures viewer's emotion using their webcam while watching the video and generates analytics to better inform the creators on whether the video meets certain thresholds and goals or not. Additionally, the video being watched has its speakers diarized (labeled) and their speech transcribed and summarized. The transcript is used to generate subtitles as well.
Resono is meant to be a companion to another project, Shortify.
The project combines Python, Deep Learning, and Java.
- We're using JavaFX for the frontend.
- The Facial Emotion Analysis model is our own custom-built CNN model.
- The diarization model is PyAnnote.
- The transcription model is OpenAI's Whisper model.
- After cloning the repo, inside
NLP/, runmodel.py. That is the diarization and transciption Flask server. As of now, it will return an 'srt' file (subtitles) for the video in question. - Inside
Java/python_backend/, runserver2.py. This is the Flask server that houses the CNN model for emotion recognition. - Finally, in
Java/webcam-viewer/, run using Maven or in IntelliJ.