This might be sketchy, as there isn't a perfect amount of certainty that leninet_parse can handle the string-forms of values python has already parsed into a number, and the converted to a string, but I think they will always be in a safe form. This would let us remove the expected_python_output field of all valid tests, as we'd parse the string to an Int or Float that we'd compare against our expected_program_output.
The sketchy bit would be doing all the work, having all the tests pass, and then having one case in the future where Python returns a string that lenient parse fails to parse back into a number.