Skip to content

1899 validate exercises by fetching them and checking the JSON#147

Open
philschatz wants to merge 22 commits into
mainfrom
validate-exercises
Open

1899 validate exercises by fetching them and checking the JSON#147
philschatz wants to merge 22 commits into
mainfrom
validate-exercises

Conversation

@philschatz
Copy link
Copy Markdown
Member

@philschatz philschatz commented May 13, 2022

A step in the baking process (cnx-epub) fetches the exercise and fails if certain tags that are on the exercise do not correspond to content in the book.

  • openstax/ce#1899

Problem

An exercise was added to a new module. The exercise had an element in the book that was necessary to answer the question and the rules for looking up the element are wonky:

  1. If there are no context-cnxmod:<UUID> tags on the exercise then look up the context-cnxfeature:<elementID> on the page where the exercise is linked (<-- this case silently failed)
  2. If there are context-cnxmod:<UUID> tags then loop through them and pick the first UUID that is in the book

Solution

image

TODO

  • Mock fetch
  • Add tests
  • Probably send custom error messages for each possible error case
  • Maybe use Mementos Extension-specific Configuration Directory to cache the GET requests for the exercise JSON
    • decided to use a tempdir
  • debounce validation so if the user edits the exercise link text we do not make a fetch request for every single key stroke
  • include an in-memory cache so we do not read from the filesystem on every keypress
  • verify that CM's have a way to invalidate the cache (maybe by restarting the workspace?)

Slack Context

@philschatz philschatz requested a review from a team May 13, 2022 20:52
@philschatz philschatz force-pushed the validate-exercises branch from 4f16a5e to 1547176 Compare May 13, 2022 21:24
@philschatz philschatz force-pushed the validate-exercises branch from 1547176 to 4d0d6b7 Compare May 13, 2022 22:38
@philschatz philschatz marked this pull request as ready for review May 18, 2022 05:49
Copy link
Copy Markdown
Contributor

@TylerZeroMaster TylerZeroMaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are some pretty major changes to validation. It's definitely more explicit. It also helps cut down on repetitive .filter(...).map(...) operations.

I am not very well versed in how exercises work; however, I read through the Slack thread and the cnx-epub PR you linked and, as far as I can tell, you covered everything.

@philschatz
Copy link
Copy Markdown
Member Author

philschatz commented May 21, 2022

The CLI found several errors which would show up as errors to CM's when they try to commit any of these repos:

These errors seem to be correct. Just search for MISSING EXERCISE in this AP Biology PDF

Also, we could make them Warnings which would notify CMs without preventing them from committing

@ne-newton
Copy link
Copy Markdown

I will make a card for the CMs to address the above issues.
@philschatz I think your suggestion of warning CMs, without preventing them from commiting is the way to go.

@philschatz philschatz force-pushed the validate-exercises branch from df6926f to 649ec14 Compare May 23, 2022 21:27
@philschatz
Copy link
Copy Markdown
Member Author

I found a way to concisely read/write a cache of the exercise JSON.

For now, I'll write them to a temp directory inside gitpod, so they will disappear when the workspace disappears.

Once that works for a while we can discuss committing the exercises to git.

To be programming-language-agnostic, the filenames in the cache directory are ${encodeURIComponent('/api/exercises?tag=')}.json and every language has a function to encode a URI component. That way characters like / won't mess up with the filenames.

As a proof of concept, this is a proxy server that reads/writes to the cache directory using the format that the Language Server will read/write.

'use strict';

// Usage:
// PORT=3001 TARGET_URL=https://exercises.openstax.org CACHE_DIR=./exercise-cache/ node ./proxy.js

const fs = require('fs');
const path = require('path');
const restify = require('restify')
const bent = require('bent');
const server = restify.createServer();

function getEnv(name) {
  const v = process.env[name]
  if (v) return v
  throw new Error(`Missing required environment variable ${name}`)
}

const port = Number.parseInt(getEnv('PORT'))
const target = getEnv('TARGET_URL')
const cacheDirName = getEnv('CACHE_DIR')
const enableNetwork = !!process.env['ENABLE_NETWORK']

// Always GET json and error if the response is anything except 200
const rootGet = bent(target, 'GET', 'json', 200)

const cachePath = path.resolve(cacheDirName)

function toFilename(url) {
  return path.join(cachePath, `${encodeURIComponent(url)}.json`)
}

function readCache(filename) {
  const cacheItem = JSON.parse(fs.readFileSync(filename, 'utf-8'))
  if (cacheItem.version !== 1) throw new Error('Unsupported storage format. Expected version 1')
  return cacheItem.body
}

async function updateCache(url) {
  if (!enableNetwork) throw new Error('Network requests are disabled. To enable them set the environment variable ENABLE_NETWORK=1')
  console.log('Fetching', target, url)
  const jsonBody = await rootGet(url)
  const cacheItem = {
    version: 1,
    body: jsonBody,
    // etag: 
  }
  fs.writeFileSync(toFilename(url), JSON.stringify(cacheItem, null, 2))
  return jsonBody
}

function asyncWrapper(fn) {
  return (req, res, next) => {
    return Promise.resolve(fn(req))
      .then((result) => res.send(result))
      .catch((err) => next(err))
  }
}

server.get('/.*', asyncWrapper(async (req) => {
  const url = req.url
  const filename = toFilename(url)
  let jsonBody
  if (fs.existsSync(filename)) {
    jsonBody = readCache(filename)
    // Async update the cache
    if (enableNetwork) void updateCache(url)
  } else {
    jsonBody = await updateCache(url)
  }
  return jsonBody
}))

server.listen(port, function() {
  console.log(`Proxy server started up on port ${port}`);
});

@philschatz philschatz force-pushed the validate-exercises branch from 45596cc to 87ecc9c Compare May 24, 2022 23:52
@philschatz philschatz changed the title validate exercises by fetching them and checking the JSON 1899 validate exercises by fetching them and checking the JSON May 25, 2022
So pushing a book is not affected
@philschatz
Copy link
Copy Markdown
Member Author

(copied from https://github.com/openstax/ce/issues/1899)

Closing for now but should reopen when work resumes.

About 8,000 exercises are missing cnxmod tags. Dantemss is working on getting them back. So I'll pause on this until the tags are restored.

Ryan mentioned that fixing Exercises is usually done at a different time than editing content because exercises are stored in a different service. So, having validation errors pop up in POET is not very useful and preventing [Push Content] would be very undesirable. Also, having Messages in the "Problems" area that don't get resolved will lead to 👁️ fatigue, ignoring the warnings all the time.

My plan is to:

  1. wait until the tags have been restored
  2. Open the books that use exercises to see how many errors show up
  3. If only a couple show up I'll address the remaining issues on the PR (in-mem cache, debounce fetching from exercises)

Slack Chat

@philschatz philschatz closed this May 31, 2022
@philschatz philschatz deleted the validate-exercises branch May 31, 2022 20:26
@philschatz philschatz restored the validate-exercises branch June 28, 2022 17:29
@philschatz
Copy link
Copy Markdown
Member Author

philschatz commented Jun 28, 2022

Added some fixes: @04ba826e

@philschatz philschatz reopened this Jun 28, 2022
@ne-newton
Copy link
Copy Markdown

Where does this stand now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants