-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi Kalebu, the script is very useful. I have several thousand files, some of which are duplicates. But the script has exited with an error when it encounters a non utf-8 encoded file.
I am running this on a Ubuntu Mate 18.04.5 LTS (Bionic Beaver) computer
(I renamed the script to remove-duplicate-files.py3 , and I am calling it like so...
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
S=$(date) ; python3 ./remove-duplicate-files.py3 ; E=$(date) ; echo -e "start = $S ..... \n end = $E"
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
this is the output I get (I have re-run it, so the previous duplicates have already been cleaned)
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
**************** DUPLYTHON ****************************
---------------- WELCOME ----------------------------
---------------- WELCOME ----------------------------
Cleaning .................
Traceback (most recent call last):
File "./remove-duplicate-files.py3", line 69, in
App.main()
File "./remove-duplicate-files.py3", line 65, in main
self.welcome();self.clean();self.cleaning_summary()
File "./remove-duplicate-files.py3", line 53, in clean
print(raw_string, '.. cleaned ')
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Can you suggest a change to the script so it does not fail with a filename that has a non utf-8 character in it?
And can it be made to print the name of the file it exited on?
Also it would be useful if the script can be placed in a different directory than the one I want to clean, and would ask me the name of the directory I want to clean.
Thanks,
bradw2002