-
Notifications
You must be signed in to change notification settings - Fork 151
Writing an IFDS Analysis
In order to solve a data-flow analysis using IFDS you have to create an instance of the IFDSTabulationProblem interface.
For that you create a new class that inherits from IFDSTabulationProblemand implements all its pure-virtual functions.
A guide on how to implement these functions you can find below.
First, though, we have to supply an AnalysisDomain as type parameter to the IFDSTabulationProblem template.
The AnalysisDomain is a struct that aggregates a number of type aliases that are used by the interface, including the instruction-type n_t and the data-flow fact type d_t.
In most cases, it is a good idea to use the pre-defined LLVMIFDSAnalysisDomainDefault that defines the type-aliases based on the most common uses on LLVM IR (n_t = const llvm::Instruction * and d_t = const llvm::Value *.
If you need to customize some of these types, just create a new struct inheriting from LLVMIFDSAnalysisDomainDefault that shadows the customized types.
So, for example to customize d_t:
struct MyAnalysisDomain : psr::LLVMIFDSAnalysisDomainDefault {
using d_t = MyDataflowFact;
};
class MyIFDSAnalysis : public psr::IFDSTabulationProblem<MyAnalysisDomain> {
...
};In IFDS, data-flow facts are propagated through the program in a point-wise manner.
That is, the IFDSSolver repeatedly invokes the flow-functions per instruction for each incoming data-flow fact in isolation.
Thus, a FlowFunction in PhASAR has the signature d_t -> set<d_t>, computing the data-flow facts that should hold after the current statement if the analysis has already inferred that the incoming "source" data-flow fact holds before the current statement.
Unconditionally generating data-flow facts is done by artificially generating them from the special zero fact.
For the IFDSSolver to know, what flow function should be invoked at a certain instruction, we have to override the so-called flow-function factories of the IFDSTabulationProblem.
For each reached instruction exactly one flow-function factory is called exactly once (except for the summary flow-function factory -- see below).
The resulting flow function is cached within the solver.
There are the following flow-function factories:
-
getNormalFlowFunction:- Intra-procedural data flow. Gets invoked for each instruction that is neither a function-call nor a return/resume.
- Use the flow-function templates from
FlowFunctions.hto simplify the process. If none of the pre-defined helper functions fits your needs, you may want to uselambdaFlowand provide a C++-lambda expression as flow function.
-
getCallToRetFlowFunction:- Intra-procedural data flow at function calls. Gets invoked for each function call instruction and models all data flows that are not affected by the call.
- Consider using the helper function
mapFactsAlongsideCallSite
-
getCallFlowFunction:- Inter-procedural data flow at function calls. Gets invoked for each callsite-callee pair and is responsible for mapping arguments at the callsite to the corresponding parameters within the callee. Additionally, globals are mapped to the callee context.
- Consider using the helper function
mapFactsToCallee
-
getRetFlowFunction:- Inter-procedural data flow at function returns. Gets invoked for each
retandresumeinstruction and is responsible for mapping the return-value to the callsite and optionally mapping back parameters to their corresponding arguments at the callsite. - Consider using the helper function
mapFactsToCaller
- Inter-procedural data flow at function returns. Gets invoked for each
-
getSummaryFlowFunction[optional]:- For special functions where you want to customize their effects on the analysis. Prevents the function's body being analyzed by the IFDS analysis.
- If this function returns non-nullptr, the
getCallFlowFunctionfunction will not be called. - Useful for modeling the effects of compiler-intrinsics or standard-library functions
- Defaults to unconditionally returning
nullptr
In addition to the flow-function factories, you have to implement a number of miscellaneous functions:
-
ctor: For being able to create an instance of your IFDS analysis class, you (obviously) have to provide a constructor. As the base-classIFDSTabulationProblemdoes not provide a default-constructor, we have to explicitly initialize the base as well. In fact you have to provide the following arguments to the base:-
IRDB: An instance ofLLVMProjectIRDBthat owns the LLVM IR to analyze -
EntryPoints: A vector of (mangled) function names that identify the functions within theIRDBwhere the analysis should start -
ZeroValue[optional]: A dummy data-flow fact that is assumed to hold at each instruction unconditionally. UsuallyLLVMZeroValue::getInstance().
Important: If not provided as ctor argument, you have to explicitly initialize the inheritedZeroValuedata-member to a non-nullopt value in your ctor.
-
-
psr::NToString(n_t)[optional]: Customizes, how an instruction (n_t) should be printed, for LLVM values making use ofllvmIRToString. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
psr::DToString(d_t)[optional]: Customizes, how a data-flow fact (d_t) should be printed, for LLVM values making use ofllvmIRToStringorllvmIRToShortString. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
psr::FToString(d_t)[optional]: Customizes, how a function (d_t) should be printed, for LLVM functions only printing the name. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
isZeroValue[optional]: Checks whether the given data-flow fact is the special zero value. Defaults to==comparison withgetZeroValue(). -
initialSeeds: Creates a map from instruction (n_t) to data-flow fact (d_t) that defines, what facts should hold unconditionally at the points where the analysis should start. Should be based on theEntryPointsctor parameter. You may want to use the utilitiesaddSeedsForStartingPointsorforallStartingPointsfor this. -
emitTextReport[optional]: Customizes, how the analysis results should be printed to the terminal.
The flow-function factories use the custom type FlowFunctionPtrType as a return type.
This type hides the memory management mechanism used from the actual analysis interface.
Currently, FlowFunctionPtrType maps to std::shared_ptr<FlowFunction<D>>, but this may change in the future.
- Home
- Building PhASAR
- Getting Started:
- Using PhASAR with Docker
- FAQ
- Tutorials
- Contributing
- Reference Material
- Update to Newer LLVM Versions