Cross-linguistic Analysis of Euphemism Usage and Grammatical Structure in Turkish, English, and Spanish
The goal of our project is to identify the parts of speech of certain euphemisms. This can give interesting insights about both the grammatical structure of the languages as well as cultural influences such as average politeness of a sentence, etc.
The script/commands we plan to use:
- Writing a script to eliminate [PET_BOUNDARY] from the data
- Using shell tools to grep for certain words
- Using tr and sed to convert data to uppercase/lowercase if needed
- Deleting @@@@@@@@@@ as seen in the above screenshot
- Using wc to do word counts
- Or wc -l to do line counts for the data points with a specific property
- Outputting our results to a csv file
- Removing extraneous punctuation