Android Malware Detection With NLP

©️license & copyright©️:

📧 nivk99 - Niv Kotek

📧 oriazadok - Oria Zadok

❓What is Natural Language Processing (NLP)❓

Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to process and analyze large amounts of natural language data. The goal is a computer capable of "understanding" the contents of documents, including the contextual nuances of the language within them. The technology can then accurately extract information and insights contained in the documents as well as categorize and organize the documents themselves.

Challenges in natural language processing frequently involve speech recognition, natural-language understanding, and natural-language generation.

💡facts💡

The name of the paper : Malware Detection With NLP.

The feature: API call

The type of classifier: static:

💡 The Attack 💡

The name of the attack:

problem attack.

Type of weak spots:

The number of features.

The way to find the weakness:

A. First we opened the classifier and checked which files it works on.

B. We printed the names of the applications that the classifier took as "test" and those that he took as "training". (From this section we work on the applications that were taken for testing)

C. We found that the features are xml files that describe the API of the applications.

D. We looked for the weak spots of the features and changed the values of the features in the xml files.

E. After section D was unsuccessful, we took several features from an application that was classified as "benign" and put them into an application that was classified as "malicious".

F. After a considerable number of activations of the classifier after each change we were able to narrow the search range.

G. We found that a number of features (which we took from the application which was classified as "benign") that we add (More than 5000 lines)to an application that is classified as "malicious" lowers the accuracy of the classifier.

H. After that, we added empty xml tags in the amount of more than 5000 lines of code to the beginning of the features of the application which was classified as " malicious " and we found that even now it lowers the accuracy of the classifier.

I. Next, we added more than 5000 lines of code empty xml tags to the end of the app's features that were classified as "bad" and found that the classifier found them to be bad.

J. After that, we added in a comment block the empty xml tags in the amount of more than 5000 lines of code to the beginning of the features of the application classified as " malicious " and we found that even now it lowers the accuracy of the classifier.

K. After that, we added the same empty xml tags (which have no meaning) to other apps that were classified as " malicious ", but this time the classifier found them to be malicious.

L. After that, we added various empty (meaningless) xml tags to other apps that were classified as " malicious", and found that the accuracy of the classifier decreased.

In conclusion: we discovered that the weak point of the classifier is the number of features. That is, if we add to the beginning of the features of the application an amount of more than 5000 lines of code of empty characters (which have no meaning) or any features from an application that was classified as " benign", we will lower the accuracy of the classifier.

Examples of features that can be added to the application:

A. <"stam name="android.nlp"">

B. <"package name="android.support.v4.app"">

How you intend to exploit it:

A. We will add a code that checks which bad apps the classifier takes for "testing".

B. We will add a code that opens the " malicious " application with Apktool.

C. We will add to the smali group a file with a name starting with the letter "A" (so that the features we add will be first in the xml).

D. We will add to the file that starts with the letter "A" any features from the application that was classified as " benign " (at least 5000 lines of code)

D. We will add a code that closes the " malicious " app after the changes with apktool

E. We will run the classifier on the application after the changes.

important comments:

A. We did a test with 18 benign apps and 18 malicious apps, of which about 4 benign and 4 malicious apps were taken for testing. We ran the classifier and the accuracy came out 1.0. After that we opened the 4 malicious applications of the test with apktool and added several features from a benign application randomly. After that, we closed the applications and ran the classifier and got 0.5 accuracy(The addition was done in the same way for all 4 apps)

B. Any tags can be added as features or any features from a benign application

C. Must have at least more than 5000 lines of code of adding features

D . The addition of the features must be at the beginning of the xml.

E. The testing and addition of features was done by apps randomly (that is, the weak point of the classifier is the number of features)

💡video💡:

💡 Add code 💡

💡Results💡

💡Droidbot💡

DroidBot is a lightweight test input generator for Android. It can send random or scripted input events to an Android app, achieve higher test coverage more quickly, and generate a UI transition graph (UTG) after testing.

We will use Droidbot to , to verify that the attack did not damage the functionality of the application.

The test was done on 100 apps before and after the change can be seen from the results the attack did not damage the functionality of the application.

Droidbot results can be seen here

💡The Android Malware Detection With NLP Wiki! 💡

For more explanation

🔗Links🔗:

An android malware detection system implemented in Python using NLP technique of document vectors based on the work of Tomas Mikolov and Quoc le. Their paper can be found at: https://arxiv.org/abs/1405.4053

Our article can be found at the link here

✨To clone this project:✨

https://github.com/nivk99/androidMalwareDetectionWithNLP.git

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.idea		.idea
dataset		dataset
image		image
utilities		utilities
README.md		README.md
Result-Droidbot.txt		Result-Droidbot.txt
doc2vec_training_plus_binary_classification.py		doc2vec_training_plus_binary_classification.py
main.py		main.py
new.gif		new.gif
new_.gif		new_.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Android Malware Detection With NLP

©️license & copyright©️:

❓What is Natural Language Processing (NLP)❓

💡facts💡

💡 The Attack 💡

The name of the attack:

Type of weak spots:

The way to find the weakness:

Examples of features that can be added to the application:

How you intend to exploit it:

important comments:

💡video💡:

💡 Add code 💡

💡Results💡

💡Droidbot💡

💡The Android Malware Detection With NLP Wiki! 💡

🔗Links🔗:

✨To clone this project:✨

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Android Malware Detection With NLP

©️license & copyright©️:

❓What is Natural Language Processing (NLP)❓

💡facts💡

💡 The Attack 💡

The name of the attack:

Type of weak spots:

The way to find the weakness:

Examples of features that can be added to the application:

How you intend to exploit it:

important comments:

💡video💡:

💡 Add code 💡

💡Results💡

💡Droidbot💡

💡The Android Malware Detection With NLP Wiki! 💡

🔗Links🔗:

✨To clone this project:✨

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages