I'm preparing a web agent for participating into the Webbench and have 3 questions:
-
For READ tasks, do you judge the text answer by exact or fuzzy matching? do you also require evidence to back up the answer?
-
For CREATE/UPDATE/DELETE tasks, some of them require login/auth, I'm wondering what's the best way to send you the login information or you'll supply with your set of login credentials for benchmark testing?
-
For submission, do I just need to send you my github repo url?
I'm preparing a web agent for participating into the Webbench and have 3 questions:
For READ tasks, do you judge the text answer by exact or fuzzy matching? do you also require evidence to back up the answer?
For CREATE/UPDATE/DELETE tasks, some of them require login/auth, I'm wondering what's the best way to send you the login information or you'll supply with your set of login credentials for benchmark testing?
For submission, do I just need to send you my github repo url?