In this lecture, we will learn about some mathematical, formal and theoretical foundations about data flow analysis. To start with, recalling the following concepts is necessary:
Partial Order
A poset is defined as a pair \((P, \sqsubseteq)\), where \(P\) is a set, and \(\sqsubseteq\) is a binary relation that defines a partial ordering over \(P\) which has the properties of Reflexivity, Antisymmetry and Transitivity.
Upper and Lower Bounds
Given a poset \((P, \sqsubseteq)\) and its subset \(S, S \subseteq P\), we can say that:
Note that not every poset has the lub or glb, and if a poset has, it will be unique.
The proof can be simple. Firstly for example for a poset \((P, \subseteq)\), where \(P = \left\{ \left\{a, b,c\right\}, \left\{a, c\right\}, \left\{a\right\}, \left\{c\right\} \right\}\), the poset do not have the glb. Secondly, for a poset \(P\) assume \(\exist \sqcap P_1 \not= \sqcap P_2\), then we have \(\sqcap P_1 \sqsubseteq \sqcap P_2\) and \(\sqcap P_1 \sqsubseteq \sqcap P_2\), which indicating a contradiction that \(\sqcap P_1 = \sqcap P_2\).
What‘s more, if \(S = \{a, b\}\), then \(\sqcap S\) can be written as \(a \sqcap b\), \(\sqcup S\) can be written as \(a \sqcup b\).
Lattice
Given a poset \((P, \sqsubseteq)\), \(\forall~ a, b \in P, \exist~ a \sqcup b, a \sqcap b\), we call \(P\) a lattice.
Semilattice
Given a poset \((P, \sqsubseteq)\), \(\forall~a,b \in P\),
Complete Lattice
If a lattice \((P, \sqsubseteq)\) satisfy that \(\forall S \subseteq P, \exist~\sqcap S, \sqcup S\), it is called a complete lattice, in which \(\top = \sqcup P\) called top and \(\bot = \sqcap P\) called bottom.
\((\Z, \subseteq)\) is not a complete lattice, considering \(\N \subseteq \Z, \not \exist \sqcup \N (+ \infty)\). Moreover, every finite lattice is a complete lattice, but not every complete lattice is finite.
For most of our data flow analysis, we focus on the finite complete lattices.
Product Lattice
Given lattices \(L_1 = (P_1, \sqsubseteq), L_2 = (P_2, \sqsubseteq), \dots, L_n = (P_n, \sqsubseteq)\), if \(\forall i, \exist \sqcup L_i, \sqcap L_i\), then we can have a product lattice \(L^n = (P,\sqsubseteq)\), where \(P = P_1 \times P_2 \times \dots \times P_n\).
Moreover, there are
If every \(L_i\) is complete, then \(L^n\) is complete.
On the abstraction basis of lattice, a data flow analysis can be viewed as a framework \((D,L,F)\) consisting:
Add some details to the framework, we can get a formal expression of our iterative algorithm:
Given a CFG with \(k\) nodes, the iterative algorithm updates \({\rm OUT}[n]\) for every node \(n\) in each iteration.
Assume the domain of the values in data flow analysis is \(V\), then we can define a k-tuple
as an element of set \((V_1 \times V_2 \times \dots \times V_k)\) denoted as \(V^k\), to hold the values of the analysis after each iteration.
Each iteration can be considered as taking an action to map an element of \(V^k\) to a new element of \(V^k\), through applying the transfer functions and control-flow handing, abstracted as a function \(F:V^k \to V^k\)
Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations.
In this context, the algorithm would begin with the state \(X_0 = (\bot, \bot, \dots, \bot)\) and iteratively apply a function \(F\) to the current state. Then we got \(X_1 =(v_1^1, v_2^1, \dots, v_k^1) \to X_2 = (v_1^2, v_2^2, \dots, v_k^2) \to \dots \to X_s\), where for each \(i\), \(X_{i+1} = F(X_i)\). And for the final state, there is \(F(X_s) = X_s\), that is to say, \(X_s\) is a fixed point of the function \(F\).
After all the work done previously, we can eventually start to use mathematical tricks to answer a few important questions:
Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution?
To answer this question, we should inspect the monotonicity of the function \(F\), as we do in Lecture 03.
A function \(F: L \to L\) is monotonic if \(\forall~x, y \in L, x \sqsubseteq y \Rightarrow f(x) \sqsubseteq f(y)\). (\(L\) is a lattice)
If so, is there only one solution or only one fixed point? If more than one, is our solution the best one(most precise)?
We have a Fixed-Point Theorem that:
Given a complete lattice \((L, \sqsubseteq)\), if \(F: L \to L\) is monotonic and \(L\) is finite, then
We can give a proof for the least fixed point, that of the greatest is similar.
\(P.f.\)
We firstly prove the existence of a fixed point.
By the definition of \(\bot\) and \(F: L \to L\), we have:
As \(F\) is monotonic, we have:
As \(L\) is finite, for some \(k\), we have:
Thus we found the fixed point.
Secondly, we prove the fixed point is the least one.
Assume we have another fixed point \(x\). By the definition of \(\bot\), we have \(\bot \sqsubseteq x\).
As \(F\) is monotonic, using induction, we can easily get
Thus \(\exist~k,F^k(\bot) \sqsubseteq F^k(x) = x\), which means
So that the fixed point is uniquely the least one.
Using the fixed point theorem, we cal easily describe that the iterative algorithm will finally arrive at a best fixed point on the lattice, if \(F\) is monotonic.
NJU Static Program Analysis 05: Data Flow Analysis III
原文:https://www.cnblogs.com/Shimarin/p/15013546.html