This repository sets up an environment for analyzing the usage of assertions in open source projets. The analysis is done using the Piglet source code query engine.
The aggregated results of the study are available in the OpenDocument
spreadsheet named Results.ods. The experiments can also be re-run
independently by following the instructions below.
- A Java runtime environment (version 17 or later)
- Apache Ant
- To clone repositories considered in this analysis:
- A high-performance computer, as the analysis takes quite some time (!)
- Clone the project in the folder of your choice.
- At the command line, type
ant setup. This will download the code analysis tool and all the public repositories on which it was applied.
- At the command line, type
ant analyze -Dproject=xxx, wherexxxis the name of one of the project profiles included in the repository (e.g.guava). - A summary of the findings is printed directly in stdout, and a more detailed report is written to an HTML file with the same name as the project. This file can be opened in a browser to examine the exact locations where the patterns have been detected.
- Type
java -jar lib/piglet-y.y.jar xxx.profile, wherexxxis the name of one of the project profies included in the repository (profiles are stored in theProfilessubfolder) andy-yis the version number. - Step 2 is as above.
A report in the form of an HTML file is produced, normally saved in the
folder Reports and having the same name as the project being analyzed.
It can be open locally in a web browser to allow a detailed analysis of the found
tokens and the source code they correspond to. A text browser such as
Lynx is
recommended for viewing results, especially for projects yielding a large
number of found tokens (it is much faster than GUI browsers).
Otherwise, the results of an analysis are also stored in machine-readable
JSON files, which serialize the tokens collected by each finder. These
files serve a double purpose: first, they allow further processing
of the tool’s results by automated means (such as auxiliary user-
defined scripts). Second, they are used by Piglet as a cache: when a
finder is asked to analyze a project for which a corresponding JSON
file exists, analysis is skipped and the finder merely deserializes
the previously computed results. These files, when they exist, are stored
in the folder .cache/xxx, where xxx is the corresponding
project name. There is one JSON file per token finder.
NOTE: the Piglet tool uses multiple threads and sets a timeout for some of its operations. This may cause slight variations in the total count of some patterns, depending on the speed of the host machine and the exact order in which tasks are executed by the thread manager.
The SPARQL queries evaluated by the engine are located in the Patterns
folder.
- Apache Hadoop (1.9M LOC)
- Bootique (25K LOC)
- ElasticSearch (3.7M LOC)
- Google Guava (30K LOC)
- GraalVM (1.8M LOC)
- IntelliJ (5.0M LOC)
- JabRef (222K LOC)
- Jenkins (199K LOC)
- JMars (202K LOC)
- JSR 166 (291K LOC)
- LibreOffice (255K LOC --only Java)
- MidPoint (943K LOC)
- Neo4j (990K LOC)
- Synthia (12K LOC)
- TeXtidote (7K LOC)
- Thunderbird (42K LOC)
- Ziggy (82K LOC)
The debug project is only used to test each of the SPARQL queries.
The hash of the latest commit in each downloaded repository can be displayed by typing:
ant -S hashes