Oh my. So I once thought (early 2022 when I built this) I have issues with some non ASCII file names.
So now I added a test that puts 🤗 into a name and Voilà it worked! I thought it's all good until I performed a scan on some other FTP space than the test area.
Turns out you can set the encoding on the ftplib FTP object and these emojis and stuff work no problem!
But when you have an é in a name:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xeb in position 868: invalid continuation byte
But all good1! These can be remidied changing the encoding to latin-1. Yes? :D
Yes. But then emojies no longer work! 😫
https://stackoverflow.com/q/77089678/469322
solution
Honestly I don't know how we would solve this completely now.
A 1st step: Less directory listing!
I already modified our mkdirs function no so that it no longer looks up the parent dir of each part of the directory to be created to check for already existing.
Now It just fires the mkd and catches ftplib.error_perm with error code 550. This code is not specifically for "already exists" but close enough. And its faster as well!
Listing for each part of the path is rather expensive.
So under normal circumstances we no longer do directory listing at all on the FTP! 🙌
But we can! Maybe we should drop the option entirely. But then we'd still need a unittest that verifies that update still works with weird file names.
For kicks I just created a file named tëstfilé🤗.txt to trip off ANY encoding :D
When uploading it with WinSCP it turned into tëstfilé??.txt on the server and when copying back tëstfilé%3F%3F.txt
so not event THEY have it solved!
Oh my. So I once thought (early 2022 when I built this) I have issues with some non ASCII file names.
So now I added a test that puts
🤗into a name and Voilà it worked! I thought it's all good until I performed a scan on some other FTP space than the test area.Turns out you can set the encoding on the
ftplibFTP object and these emojis and stuff work no problem!But when you have an
éin a name:But all good1! These can be remidied changing the encoding to
latin-1. Yes? :DYes. But then emojies no longer work! 😫
https://stackoverflow.com/q/77089678/469322
solution
Honestly I don't know how we would solve this completely now.
A 1st step: Less directory listing!
I already modified our
mkdirsfunction no so that it no longer looks up the parent dir of each part of the directory to be created to check for already existing.Now It just fires the
mkdand catchesftplib.error_permwith error code550. This code is not specifically for "already exists" but close enough. And its faster as well!Listing for each part of the path is rather expensive.
So under normal circumstances we no longer do directory listing at all on the FTP! 🙌
But we can! Maybe we should drop the option entirely. But then we'd still need a unittest that verifies that
updatestill works with weird file names.For kicks I just created a file named
tëstfilé🤗.txtto trip off ANY encoding :DWhen uploading it with WinSCP it turned into
tëstfilé??.txton the server and when copying backtëstfilé%3F%3F.txtso not event THEY have it solved!