To manually encode/decode a particular .txt file:
1. Make sure the `.txt` file is in the project directory, or add it
2. Run the command `python3 manual_test.py` in the project directory
3. The prompt will ask you for the name of the text file (without quotes)
to encode (e.g. `data.txt`), and the name of the file to decode into
(e.g. `output.txt`) which will then appear in the specfied location.
To run the unit tests, run the command python3 test.py in the root directory.
This will produce filename-out.txt files in the root directory that correspond
to the output of encoding/decoding filename.txt.
Since ASCII art tends to have many consecutively repeating characters in order to create uniform, aesthetically-pleasing images, I decided on an encoding method that would account for this characteristic. By condensing repetition of characters into this encoding, we save a lot more space used for transport while preserving the file content information.
For example, we see in data.txt that the first line contains 41 space characters and
then two commas ,. My encoding compresses this line into a shorter string that can be
interpreted as: [character]♥[number of times repeated]× for each character that has more than
4 consecutive repetitions. This gives us an encoding: ♥41×,,, which contains 7 characters as
opposed to 43 characters. If a character (say @) repeats less than 5 consecutive times, we can
simply add that character to the encoding that many number of times rather than including @♥1×,
@♥2×, @♥3×, or @♥4× because this would take up greater than or equal to as much space as
just writing it as @, @@, @@@, or @@@@ respectively.
I chose '♥' as a separator because it's a cute non-ASCII character that should never appear
in our 100x100 .txt files, and ended with a × character to signify the end of the integer
that represents the number of times a character is repeated, which is always in the range
[5, 100]. With these, I am able to compress the ASCII bitmaps into a shorter string on average
and linear in the worst case where there are no greater-than-four consecutive repetitions at all.