Frequently Asked Questions

Do you allow for submissions that aren’t in Python?

We maintain primary support for Python given its compatibility with our pipeline. However, if a competitor can implement call wrapper functions based on other languages from the submitted ‘’, it will be allowed. We recommend running local tests with our starter kit, downloaded from CodaLab, before uploading a submission with non-Python components.

How will you handle violations of the time constraints?

We will enforce a pre-selected time constraint per-task and implement time-outs appropriately. For the purposes of the aggregate AUP metric computation, if a submission times out on a task, that task will be flagged as incomplete and artificially assigned a task score worse than the worst-performing submission that completed the task.

Can we submit methods to run on only a certain subset of the tasks?

Yes, for efficiency in development you may want to check your performance on only a few tasks at a time instead of the entire development set. In your submission zip archive, you can include a .yaml file to specify this chosen subset. See the starter kit for detailed instructions. The tasks that are left out will be considered incomplete and scored as such (see above).

How can we specify the preferred submission to be promoted to the evaluation phase?

By default, we will always automatically select your last submission to be evaluated for the final competition rankings. CodaLab maintains a submission history with logs, scores, errors, etc. where participants can easily view and download their old submissions, so please resubmit your preferred solution as necessary.

How much customization is allowed for our methods?

At their core, your methods should be able to handle the axes of variation described for the development and evaluation tasks. Hybrid methods that include an ‘if-else’ statement and utilize different approaches depending on the task’s type or dimensional characteristics, for example, would also be acceptable under this guideline. Ultimately, the use of separate, unseen tasks for the final evaluation naturally discourages and penalizes overfitting to the development tasks.

For details on how the code of your methods should be structured, we recommend you look at the starter kit’s provided submission skeleton and open-source baseline methods to get a better sense of this.