add correct mecab installation instructions#132
add correct mecab installation instructions#132stet-stet wants to merge 1 commit intofacebookresearch:mainfrom
Conversation
|
Hi @stet-stet! Thank you for your pull request and welcome to our community.We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
|
I just noticed that #97 will install MeCab correctly, automatically. However since we do not know why the auto-installation of MeCab was dropped, I will still leave this PR open. |
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
TL;DR
I present this PR to prevent people from having issues like #54, #18.
Update: I also strongly suspect that issues such as #111 was caused by an incorrect configuration of MeCab(with the default encodings), which may cause an assertion in fastBPE.hpp to fail (line 480), therefore resulting in failiure to produce output files after fastBPE.
At least in my system locale, failing to set any one of these utf-8-enabling flags(see install_external_tools.sh) led to empty outputs in the embed task, encoding errors (at $LASER/source/lib/romanize_lc.py), and much confusion. Regrettably, it is quite hard to know this fact before you have this problem.
Also, I changed README.md a bit, so that hopefully mecab feels a bit more optional for people not dealing with the Japanese language.
Additional question: Why was the auto-installation of Mecab dropped?