Abstract:
|
The verdicts of most online programming judges are, essentially, binary: the submitted codes are either “good enough” or not. Whilst this policy is appropriate for competitive or recruitment platforms, it can hinder the adoption of online judges on educative settings, where it could be adequate to provide better feedback to a student (or instructor) that has submitted a wrong code. An obvious option would be to just show him or her an instance where the code fails. However, that particular instance could be not very significant, and so could induce unreflectively patching the code. The approach considered in this paper is to data mine all the past incorrect submissions by all the users of the judge, so to extract a small subset of private test cases that may be relevant to most future users. Our solution is based on parsing the test files, building a bipartite graph, and solving a Set Cover problem by means of Integer Linear Programming. We have tested our solution with a hundred problems in Jutge.org. Those experiments suggest that our approach is general, efficient, and provides high quality results. |