Finding and Analyzing the Minimum Set of Driver Nodes in control of Boolean Networks

We study the minimum number of driver nodes control of which leads a Boolean network (BN) from an initial state to a target state in a specified number of time steps. We show that the problem is NP-hard and present an integer linear programming-based method that solves the problem exactly. We mathematically analyze the average size of the minimum set of driver nodes for random Boolean networks with bounded in-degree and with a small number of time steps. The results of computational experiments using randomly generated BNs show good agreements with theoretical analyses. A further examination in realistic BNs demonstrates the efficiency and generality of our theoretical analyses.


Introduction
Development of a control theory for complex biological systems is one of the primary goals of systems biology because of its potential applications to drug discovery and treatment of intractable diseases [40,41]. While a lot of useful results have been obtained for control of linear systems, not so many practical results were obtained for nonlinear systems [12]. It does not seem to be useful to apply existing theories and methods to control of biological systems because biological systems are complex and contain many nonlinear subsystems. Therefore, it is necessary to develop useful control strategies for complex biological systems.
Among various mathematical models proposed for modeling complex and nonlinear biological systems, Boolean network (BN) [37,38] attracts considerable attentions as a nonlinear model of gene regulatory networks [5,7,10,11,24,76]. Gene regulatory networks are models of sets of interactions among genes, the analysis of which is one of the key topics in systems biology and theoretical biology. BNs have been widely used to model the regulatory networks in mammalian cortical area [31], T cell large granular lymphocyte leukemia [75], mammalian cell cycle [26], and to reproduce the wild-type gene expression patterns [8]. Of course, BNs cannot model all aspects of biological systems. However, it is suggested that BNs provide good approximations to the nonlinear functions in many biological systems [11] and BNs can be used to elucidate how perturbations may alter normal behavior and thus lead to testable predictions [9]. In fact, many BN-based models have been proposed for describing various regulatory behaviors of biological systems as mentioned above (see also [9]). There are two types of BNs: synchronous BNs and asynchronous BNs. The difference between these two types lies on whether or not the states of nodes are updated synchronously. Asynchronous BNs may be more appropriate to model certain behaviors of biological systems. However, there are various subtypes in asynchronous BNs [9], and there is no widely-used control model of asynchronous BNs. On the other hand, many studies have been done on controllability of synchronous BNs [2,17,28,49,50,60,84]. Therefore, in order to make use of the established control models, we focus on synchronous BNs in this paper.
Many studies have been done on the analyses of BNs. For example, distribution of attractors [7,24,38,76], singleton attractors detection and the average case complexity analyses [30,81,83], relationship between network topology and chaotic behavior [10,11], and inference of BNs from gene expression data [5,48,56] have been extensively studied. Stimulated by works on control of the probabilistic Boolean network (PBN) [22,71], Akutsu et al. studied the problem of finding control strategies for BNs and showed it is NP-hard even in considerably restricted cases [2].
Recently, extensive studies have also been done on the controllability and observability of BNs [16, 27, 28, 50, 53-55, 60, 84], based on the concept of Semi-tensor product matrices that was proposed by Cheng and his colleagues [17][18][19]. Zhao et al. and Fornasini et al. considered the optimal control of BNs over infinite are highly nonlinear. Therefore, we study in this paper the minimum set of driver nodes control of which leads a given Boolean network from a specified initial state to a specified target state in a specified number of time steps.
The purpose of this study is to develop a practical method using Integer Linear Programming (ILP) to select the minimum number of driver nodes and to perform theoretical analyses on the average size of the minimum set of driver nodes, both for BNs. We focus on cases where the maximum in-degree is bounded by some constant K. It is to be noted that most nodes in many real networks are considered to have small in-degree. For example, gene regulatory networks are generally thought to be made up of only a few nodes with high degree (knowns as hubs) and a large majority of poorly connected nodes with low degree. Thus, gene regulatory networks are assumed to have a hierarchical scale free network topology [13]. Many of existing studies on BNs also focused on the cases of bounded in-degree [2,3,6,21].
The paper is organized as follows. In Sec. 2, after reviewing BNs, we define the problem to be discussed and show that it is NP-hard. In Sec. 3, we elucidate an ILP-based method for finding the minimum set of driver nodes for given BNs and initial/target states. In Sec. 4, we mathematically analyze boundary values of the minimum number of control nodes for random BNs: upper bounds for cases of time steps M = 1 and M = 2. We also analyze the expected number of control nodes needed in final time step. In Sec. 5, we present results of computational experiments. Finally, we conclude with future work.

Problem
In this section, we briefly review BNs [38] and then introduce the problem.

Boolean network
A Boolean network G(V, F ) consists of a set of N nodes V = {v 1 , . . . , v N } and a list of Boolean functions F = (f 1 , . . . , f N ). The state of v i at time t is denoted by v i (t). The global state (or simply the state, or Gene Activity Profile (GAP)) of a BN at time t is denoted by the vector v(t) = (v 1 (t), . . . , v N (t)). For each node v i , the Boolean function is in the form of The Boolean function f i is a logical combination of k i variables. Here, we denote the logical AND of x and y, logical OR of x and y, logical NOT of x, and exclusive OR of x and y by x ∧ y, x ∨ y,x, and x ⊕ y, respectively. Then the state of node v i at time t + 1 is determined by . We also write the transition rule for v i as v i (t + 1) = f i (v(t)) and the transition rule for the whole BN as v(t + 1) = f (v(t)).

Minimum set of control nodes in BN control
Akutsu et al. defined a control problem for BNs called BN CONTROL [2]. In BN CONTROL, nodes are classified into two classes: internal nodes and external nodes where internal nodes are original usual nodes in the BN and external nodes are control nodes. Control nodes (also called driver nodes) are nodes by controlling the values of which we can lead the BN to a desired state. There are two ways to obtain control nodes: select from original nodes [6] or add new nodes to the BN [33]. Here, we consider selecting n nodes from the original BN as control nodes for the reason that gene activation has been shown to be critical in control of cells such as iPS cells (induced pluripotent stem cells) generation [80].
Let V = {v 1 , . . . , v n , v n+1 , . . . , v N } be the set of all nodes in a BN, where v 1 , . . . , v n are control nodes and v n+1 , . . . , v N are internal nodes. For convenience, we denote u(t) = (u 1 (t), . . . , u n (t)) = (v 1 (t), . . . , v n (t)) and v (t) = (v n+1 (t), . . . , v N (t)). The number n is the number of control nodes selected to drive the BN to a desired state. The transition rule for the whole BN becomes v(t + 1) = f (u(t), v (t)). For a global state v, a global state w is called a predecessor of v if v = f (w). That is, there exists an edge from w to v in the state transition diagram of a given BN if w is a predecessor of v. Then the problem of Minimum Set of Driver Nodes is defined as follows.

Definition 1 (Minimum Set of Driver Nodes).
Instance: a BN with N nodes, an initial state of the network v 0 , and a desired state of the network v M at the M -th time step.
Problem: find the minimum set of driver nodes. By controlling the sequence u(0), . . . , u(M ) we have v(0) = v 0 and v(M ) = v M . The minimum number of driver nodes is denoted as N .

Example 1.
Here follows an example to illustrate this problem. Recall that the problem is to find the minimum number of control nodes for a given BN, v 0 , v M , and M . For example, we consider a BN with 5 nodes (see Fig. 1(a)) defined by transition rules shown in Fig. 1 Note that we can freely assign 0/1 values to v 4 (t) (for t = 1, 2, . . .) regardless of the value assigned to v 1 (t). For convenience, we consider a node is selected as a control node at time t if the original Boolean function is not applied at the first time. For example, v 1 is selected as a control node at t = 1, and v 4 is selected as a control node at t = 2 in Table 1(a). Note that it is possible that multiple nodes are selected as control nodes at the same time step.
Therefore, in this case, the number of control nodes is 2 and the set of control nodes is {v 1 , v 4 }.
Therefore, in this case, the minimum set of driver nodes for this instance is {v 1 } and the minimum number of driver nodes is 1. It is to be noted that in our model, the minimum set of driver nodes depends on v 0 , v M , and M although it is determined independent of these factors in the case of structural controllability of linear systems [58,61]. If we need to determine the minimum set of driver nodes independently of these factors, a large number of driver nodes would be required in most cases because there exist many states not having predecessors and, to be discussed in Sec. 4, not a small number of driver nodes are required in order to cope with even one such state.
It has been proved that the problem of finding control strategies for BNs is NP-hard even if the number of control nodes is 1 (Theorem 4 in [2]). We can use the same reduction to prove that the problem of finding the minimum set of driver nodes is NP-hard. To this end, it is enough to let x 1 in the proof to be a constant node that always outputs 0. Then, there exists a satisfiable assignment if and only if {x 1 } is the minimum set of driver nodes. Therefore, we have: It is to be noted that although the minimum driver set problem is NP-hard for BNs, the problem for structural controllability of linear systems can be solved in polynomial time because the problem can be reduced to the maximum matching problem for bipartite graphs [58]. It should also be noted that the Boolean values of the driver nodes are updated synchronously. Therefore, there is no ordering on the driver nodes, which also holds in the case of structural controllability of linear systems [58,61].
Here, we briefly explain the differences between our proposed model and the attractor-based control methods [4,63,70], using the example given in the above. Recall that {v 1 } is the set of driver nodes in our model. {v 1 } is also the set of driver nodes in each of the attractor-based control methods, regardless of the initial and target states. However, if v 0 = (0, 0, 0, 1, 1), v M = (0, 0, 0, 0, 0), and M = 1, we do not need any driver node (i.e., the minimum number of driver nodes is 0). On the other hand, if v 0 = (0, 0, 0, 0, 0), v M = (1, 1, 1, 1, 1), and M = 1, we need to control all nodes (i.e., the minimum number of driver nodes is 5). It is to be noted that we cannot specify u(0) but can only specify u(1) because v(0) = v 0 must be satisfied in our definition. These examples show that the minimum set of driver nodes depends on the initial and target states and the number of time steps, whereas the set of driver nodes is uniquely determined in each of the attractor-based control methods. These also show that there exist cases in which we need more nodes than those given by the attractor-based control methods if the number of time steps to reach the target state is restricted. Therefore, we cannot directly compare our model with the attractor-based control methods. Although the attractor-based control methods have many practical merits, our model might be useful if there exist some restrictions on the number of time steps (e.g., duration of therapy) and/or the target state is not an attractor (but can be a new stable state due to driver nodes) of a given BN.

Integer Linear Programming-Based Method
In this section, we elucidate an integer programming-based method for our proposed problem. Integer programming [77] is a mathematical optimization program in which some or all variables are restricted to be integers. When it refers to ILP [72], all of the objective function and constraints are linear. Since ILP has been shown to be very powerful in solving a variety of NP-hard problems [36] and the minimum driver set problem for BNs is NP-hard, it is reasonable to apply ILP.
We employ the framework introduced in [6] although considerable modifications are needed in order to select the minimum set of driver nodes. To give the ILP formalization, we need some definitions, many of which have been introduced in [6].
We define two functions σ b (x) and τ b (x) by where b ∈ {0, 1}. Note that any Boolean function with k inputs can be written as . , x i k ) holds for every node v i , we need to do two things: (1) We need constraints which ensure any obtained feasible binary solution equals to the value of v i . If (2) as follows.
(2) Ensure that where τ is defined at the beginning of this section.
In this representation, w i = 1 corresponds to the case that x i is selected as a control node, to which z i,t gives 0-1 assignment for x i,t . These constraints can be represented by Note that we aim at minimizing N i=1 w i which gives the number of control nodes.
By putting all the constraints together, we have the following ILP-formulation for Minimum Driver Nodes Set problem.

Theoretical Analysis
The purpose of this section is to analyze the distribution of the minimum number of driver nodes for random Boolean networks (with in-degree bound) and for random initial states. However, it is quite difficult to obtain a general result because no simple property is known for characterizing the minimum set of driver nodes, different from the case of linear structural controllability studies [58,61]. Therefore, we focus on cases with a small number (e.g., 1 and 2) of time steps between the initial and target states.
Recall that we denote the number of all nodes as N , the number of control nodes as n, and the number of time steps needed from the initial state v 0 to the target state v M as M . Although the set of driver nodes depends on v 0 , v M , and M , we assume here for simplicity that it is independent of v 0 and v M . Fix the initial state. Then, the number of possible control sequences is (2 n ) M because the number of possible 0-1 assignments to the driver nodes is 2 n at each time step. Since there exist 2 N possible target states, in order to drive the BN to any specified target state, we have However, it just gives a lower bound on n with respect to N and M , and we may have a smaller bound if the set of driver nodes depends on v 0 and v M . By analyzing Eq. (2) we can get: for a fixed N , n decreases as M increases; for a fixed M , n increases as N increases.
In this section, we show that the minimum number of control nodes n has an upper bound for fixed M = 1 and M = 2, and a lower bound for a fixed N , where the set of control nodes depends on v 0 and v M . We find two functions u(N, L) (Eq. (16)) and l(N ) (Eq. (18)) as the upper and lower bounds estimation of the minimum number of control nodes.

Upper bound of n
In the following, we will show upper bounds of the number of control nodes n for cases M = 1 and M = 2.
When M = 1 which means the desired state is required to be achieved in one time step, we need to compare f (v 0 ) and v 1 .
i is 1 2 for a random BN and a random initial state (including states of control nodes), the expected number of nodes selected as control nodes is upper bounded by When M = 2 which means the desired state is required to be achieved at t = 2, we need to consider two possibilities: the target state has or does not have predecessors in the state transition diagram. Figure 2 shows how we select and assign values to control nodes. Here we assume: (1) predecessors are distributed uniformly at  random; (2) once a state has predecessors, the number of its predecessors takes the expected number, i.e., we will have m 1 = m 2 in the following discussion; (3) there are m 3 states having predecessors in a BN. As shown in Fig. 2, Routes 1 and 2 indicate the ways how we select and assign values to control nodes when the target state has and does not have predecessors, respectively.
In Route 1, which occurs when the target state has predecessors, the number of predecessors is denoted as m 1 . Denote the global state this BN achieves automatically at t = 1 as X 1 . We shall select a state among m 1 predecessors from X 1 to which we need the least number of control nodes n 1 .
However, if the target state does not have predecessors, we execute Route 2. In this case, we should select control nodes in two steps. Firstly, we shall select one state that has predecessors. According to [3], the number of such states m 3 is at most 2 N +1−L where L = N 2 2 K +1 . We shall select the one from which to the target state we need the least control nodes, denoted as X 2 . The least number of control nodes selected in this step is denoted as n 2 . Secondly, suppose X 2 has m 2 predecessors, we shall select the predecessors from X 1 to which we need the least control nodes. The least number of control nodes selected in this step is denoted as n 2 . The total least control nodes is the sum of both parts, i.e., n 2 = n 2 + n 2 .
Here follows two examples to illustrate the above two routes (see Fig. 3). Figure 3(a) shows the example of Route 1, i.e., the route when the target state V 2 has predecessors (we use V instead of v). As we assumed, V 2 has m 1 predecessors. The initial state V 0 goes to X 1 by Boolean functions. The case when we need to control is that X 1 cannot go to any of those m 1 predecessors. We can control X 1 to become any state among a 1 , . . . , a m1 , but finally we will select X from a 1 , . . . , a m1  so that we need the least number of control nodes n 1 to control X 1 to X (by inverting the states of n 1 nodes). Figure 3(b) shows the example of Route 2, i.e., the route when V 2 does not have predecessors. In this case, at the final time t = 2 we should control some node to V 2 . According to our assumption, there are m 3 states in a BN that have predecessors, namely, X 2 1 , . . . , X 2 m3 . We can control any of them to become V 2 . We shall select X 2 from these states so that the least number of control nodes are needed to control X 2 to V 2 (by inverting the states of n 2 nodes). Analogously, X 2 is supposed to have m 2 predecessors as we assumed. Since the state is X 1 at t = 1, we shall control X 1 to become one of the predecessors of X 2 , namely, b 1 , . . . , b m2 . Then X will be selected from these states so that the least number of control nodes are needed to control X 1 to be X (by inverting the states of n 2 nodes).

Proposition 2. For a random BN of N nodes, in-degree K, if we want to change a state to be one of m states at the same time by control nodes, the minimum number of control nodes is expected to be
Proof. If we want to change a state to be another state immediately, nodes on which values are different will be selected as control nodes. The number of control nodes X is a binomially distributed random variable and X ∼ B(N, 1 2 ). Note that if we denote the number of control nodes as X i for the ith predecessor i ∈ [1, m], i ∈ N, then X 1 , . . . , X m are i.i.d. random variables from a discrete distribution with cumulative distribution function F (x) and probability mass function The first-order statistics X (1) is always the minimum of the samples, namely Here, we give the derivation of Pr(X (1) = x). Let Note that Pr(X (1) ≤ x) = Pr(there are at most m − 1 trails greater than x) Thus, the probability mass function of X (1) can be computed by Then, we can obtain the expectation value of X (1) by computing Proof. Since there are two possibilities to achieve V 2 from V 0 , we need to divide the problem into two cases.

Case 1.
The target state has predecessors. We need to select a global state X from m 1 predecessors by changing which to X 1 we need the least control nodes. By Proposition 2, the minimum number of control nodes is expected to be

Case 2.
The target state has no predecessors. In this case, the selection and value assignment of control nodes include two steps: First, select a global state X 2 changing which to V 2 we need the least control nodes among m 3 states having predecessors. The minimum number of control nodes is expected to be Second, select a global state X among m 2 predecessors of X 2 changing which to X 1 we need the least control nodes. The minimum number of control nodes is expected to be

Minimum Set of Driver Nodes in Control of BNS
Note that the probabilities of Cases 1 and 2 occur are P 1 and P 2 , respectively. Then, overall the minimum number of control nodes is n = P 1 n 1 + P 2 (n 2 + n 2 ) The range of P 1 , P 2 , m 1 , m 2 , and m 3 can be obtained from [3].

Proposition 4. Suppose that for a random BN of N nodes, and for each node K input nodes are randomly selected and then a Boolean function is randomly selected from 2 2 K possible Boolean functions (including constant Boolean functions). Assume that m 1 and m 2 equal to the expectation value of number of predecessors and m 3 equals to the expectation value of number of states having predecessors. Then the minimum number of control nodes for time step M = 2 is expected to be at most
Then g(m) is decreasing since Eq. (8) can be written as Note that in Eq. (7), based on the assumption we have and m 2 · m 3 = 2 N .
We also know P 1 + P 2 = 1. Substituting Eqs. (9)-(11) to Eq. (7) we obtain Since g(m 1 ) is decreasing much faster than g( 2 N m1 ) where 2 N m1 is big enough to make the function value stable at comparatively much smaller values for m 1 ≥ 2 L−1 , Eq. (12) achieves its maximum when m 1 equals to its minimum. Thus, we have That is to say, the minimum number of control nodes is expected to be Note that the second term of Eq. (14) becomes very small and can be ignored when N > 3, and thus we can approximate Eq. (14) by Therefore, the function approximately gives an upper bound for the number of control nodes n.   control nodes n will ultimately approach to a positive constant (See Figs. 4 and 5). Hence, we try to find out the relationship between this constant and the number of all nodes N . In the following, we called this constant as stable n.

Lower bound function of n
Here we only consider the case K = 2. In this case, there are 2 2 2 = 16 Boolean functions to each node with 2 inputs. We shall discuss the following two cases.

Case 1. Boolean functions are only constant Boolean functions.
Constant Boolean functions are 1 or 0 which means the corresponding node is assigned values 1 or 0 always. Recall that our problem is to control a network from a random initial state to the global state (0, . . . , 0), without loss of generality. In this case, only when a node (assumed as y) is assigned the constant Boolean function 1 does it need to be controlled. Then the probability of one node needs to be controlled is 1 2 . Note that this case happens with probability 2 16 (two out of 16 Boolean functions are constant) and the BN has N nodes, the expected number of control nodes is

Case 2.
Only non-constant Boolean functions are assigned to nodes.
In this case, there are 14 nonconstant Boolean functions that can be assigned to nodes. Here, we shall consider pairs of nodes because nodes might influence each other when they have same input nodes. We analyze the following two cases.

Subcase 1. Any two nodes have two same input nodes.
Assume nodes y and z, they have two same input nodes x j and x k , then totally there are 14 2 = 196 possible combinations of Boolean functions assigned to y and z because we only consider nonconstant Boolean functions here. All possibilities are shown in the following Tables A.1 and A.2.
As we can see from Tables A.1 and A.2, we need to control either the node y or z in 50 out of all 196 possible combinations. Denote the possibility that a node needs to be controlled as p 1 in this case, then where 2 2 / N 2 is the probability that two nodes have two same input nodes and 50 196 is the probability that a pair of such nodes with only nonconstant Boolean functions needs to be controlled.

Subcase 2.
Any two nodes have only one same input node.
If any two nodes y and z have only one same input node, then only in this circumstances we have to control some node(s): y(t + 1) = x k and z(t + 1) =x k where x k is the same input node, and can be empty, x j ,x j (j = k), or; y(t + 1) = x k and z(t + 1) = x k where x k is the same input node, and can be empty, x j ,x j (j = k). Analogously, we can learn from Tables A.1 and A.2 that there are 9 + 9 = 18 combinations of Boolean functions where we need to control either y or z. Denote the possibility that a node needs to be controlled as p 2 , we have is the probability that two nodes have only one same input node and 18 196 is the probability that such a pair of nodes with only nonconstant Boolean functions needs to be controlled.
Since there are N 2 pairs of nodes and this case occurs with probability 14 16 , the expected number of control nodes is To sum up, the expected number of control nodes needed is at least n 1 + n 2 , denoted as a new function l(N ). Therefore, Since the case that constant functions and nonconstant functions can all be assigned to nodes is not included here, the l(N ) can only provide an estimate of the lower bound of minimum number of control nodes (See Table 2).

How n changes by N (fixed M, K)
We implement the ILP problem with different in-degree upper bound K, and different time steps M . We start with random initial states and the target state is (0, 0, . . . , 0). The curve of Eq. (16) is called as Upper Bound Curve. Figure A.1 shows the comparison among the upper bound curves and ILP implementation results for the case of M = 2 with different K. We can see that Eq. (16) gives a very good bound of the number of control nodes n for the case of K = 2.
However, the upper bound curve is less tight when K increases. The reason is as follows. On one hand, note that in Eq. (12) is a discrete function, it has a convex shape as shown in Fig. A.2). Here, ∆ is defined as: ∆g(m) = g(m) − g(m + ∆m), m ∈ R. Therefore, the upper bound estimate of n will increase when K ↑ (for fixed N , M = 2). On the other hand, the computer experiment results show that n ↓ when K ↑ (for fixed N , M = 2). As a result, the upper bound curve is less tight for K = 3, 4, . . . . However, things will not get worse. The upper bound curve is almost the same for K = 4, 5, . . . because the power 2 L−1 (L = N 2 2 K +1 ) in Eq. (16) almost stays the same (at around 0.5000) when K ≥ 4. Furthermore, we fit the dots of (N, n) for fixed K and M with curves, as shown in Figs. A.3 and A.4. We find that n approximately follows a linear relationship with N for K = 2 and K = 3. The slopes of these fitting lines decrease as the number of time steps M increases.

How n changes by M (fixed N, K)
From Figs. A.3 and A.4, we can find that the slopes of those fitting lines decrease as the number of time steps M increases, for fixed K. This result is in accordance with our theoretical analysis in Sec. 4.2. Figures 4 and 5 show the relationship between n and M with a fixed N more specifically. Here we post further explanations of the ILP implementation results. We catch global states at each time t and the corresponding number of control nodes newly selected at that time. We find some rules of the selection of control nodes, e.g., the selection of control nodes finishes in rather early several time steps; and in most of cases some control nodes are selected at final step, which means the target state has no predecessor.
We set the times of repetition as 100. We apply the average of the results of 100 randomly generated networks. We try BNs with 10, 20 nodes. Then we get some conclusions from computer simulation results as follows: (1) Control nodes are selected at very limited time steps. Take K = 2, N = 10, M = 10 as an example. Control nodes are selected at up to three time step. That is, if we denote n i as the number of control nodes newly selected at time t = i, then we have only up to three n i = 0. Note that this n i is different from the number of control nodes needed in the ith time step.
(3) After finishing the selection of control nodes (note that the control nodes selected may be in use for more than one time step), it takes only 3 to 5 more time steps for the Boolean network to achieve the target state. Therefore, the target state can be achieved within around 8 time steps. In other time steps, the networks states are just repetitive (the same or in a loop). This explains why curves stabilize from around M = 8 in Figs. 4 and 5.

Simulation for realistic BNs
For further evaluation of our algorithm as well as our analysis on the size of the minimum set of driver nodes, we examined six realistic BNs from literatures [15,23,25,26,43,52,62]. We applied our algorithm to control the above realistic BNs from a random initial state to any one of attractor states. The number (#) of control nodes describes the nodes needed to drive the realistic BN from a random initial state to an attractor state for the case M = 2. We took the average over 10 repetitions since the set of driver nodes depends on the initial state. The upper bound function u(N ) is from Eq. (16) (note that L = N 2 2 K +1 , and u(N, L)| K=2 < u(N, L)| K=3 , therefore we apply u(N ) = u(N, L)| K=2 here). The lower bound function is from Eq. (18). The simulation results are shown in Tables 3-8.
As can been seen from Tables 3-8, our algorithm and the analysis in the minimum set of control nodes work very well even for realistic BNs. The upper bound function and lower bound function give very good boundaries for the minimum number of driver nodes. More importantly, the genes most frequently identified as driver nodes (shown in table captions) are biologically meaningful. In flower morphogenesis in arabidopsis thaliana network, genes UFO, WUS and AP1 are most frequently selected. Since UFO is related to ectopic activation, and WUS and AP1 are related to meristem, the obtained control genes seem to have important control  [15]. N = 15. There are 10 single attractors. Nodes most frequently identified as driver nodes: v 2 = UFO: when expressed constitutively, Unusual Floral Organs (UFO) causes ectopic activation of AP3 within flowers; v 9 = WUS: WUSCHEL gene (WUS) controls meristem function by direct regulation of cytokinin-inducible response regulators [51]; v 5 = AP1: plays central roles in determining floral meristem identity [47].  Notes: BN model of the T-cell receptor signaling pathway from [43]. N = 40. There are 1 attractor of length 6 and 8 single attractors. Nodes most frequently identified as driver nodes: v 14 = Gads: GRB2-related adaptor protein (Gads) is related to growth factor receptor signaling, where it couples activated receptors with downstream effector pathways [59]; v 30 = Calcin: a key signaling enzyme in T-lympocyte activation [20]; v 39 = NFkB: nuclear factor kappa B (NF-κB) is an inducible and ubiquitously expressed transcription factor for genes involved in cell survival, cell adhesion, inflammation, differentiation, and growth [14]. Notes: BN model of the control of the mammalian cell cycle from [26]. N = 10. There are two attractors: one of length 7 and another one is a single attractor. Nodes most frequently identified as driver nodes: v 4 = E2F: E2F transcription factor can regulate expression of numerous cellular genes controlling proliferation, including porto-oncogenes and genes regulating cell cycle progression [82]; v 5 = CycA: cell cycle regulation of the cyclin A gene promoter is mediated by a variant E2F site [78]; v 8 = UbcH10: UbcH10 gene codes for a protein that belongs to the ubiquitin-conjugating enzyme family [79]. Notes: BN model of the control of the budding yeast cell cycle regulation from [52]. N = 12. There are 7 attractors of length 1. Nodes most frequently identified as driver nodes: v 2 = Cln3: the cell-cycle sequence starts when the cell commits to division by activating Cln3 [52]. v 3 = SBF: when the cell grows large enough, transcription factors including SBF is activated to trigger cyclin genes [52]; v 11 = Cdc20/cdc14: budding yeast Cdc20 is a target of the spindle checkpoint [35]. Notes: BN model of the control of T-helper cell differentiation from [62]. N = 23. There are 3 single-point attractors. Nodes most frequently identified as driver nodes: v 16 = STAT1: one of the firstly identified (Signal Transducer and Activator of Transcription) STAT genes in the interferon (IFN) signal transduction pathways in mammalian cells. Identified as key components linking cytokine signals to transcriptional events in cells [32], STAT proteins possess the ability to transduce signals from the cell membrane to the nucleus to activate gene transcription [57]; v 15 = JAK1: activation of the Janus kinase/signal transducers and activators of transcription (JAK/STAT) pathway by numerous cytokines and growth factors is one of the most well-studied intracellular signaling networks [42]; v 9 = IL-12: Interleukin-12 (IL-12) is an important immune regulatory cytokine that exerts potent anti-tumor activity and a member of a small family of heterodimeric cytokines [44].  Notes: BN model of the control of the fission yeast cell cycle regulation from [23]. N = 10.
roles since morphogenesis should be related to ectopic activation and meristem. In T-cell receptor signaling pathway network, since NF-κB is known to play a central role in various types of cell activity, it is appropriate that NF-κB is included by the control process. In mammalian cell cycle network, since cyclin A is involved in the control of S phase and mitosis in mammalian cells [78] and mediated by E2F, it is likely that both E2F and CycA work as the control genes. As for T-helper cell differentiation network, since the JAK/STAT pathway plays an important role in control of cells in vivo, using STAT1 and JAK1 as control nodes is considered to be reasonable.

Conclusions
In this paper, we have studied the minimum set of driver nodes for control of BNs. We showed that the problem of computing the minimum driver set is NP-hard. In order to cope with this difficulty, we developed an ILP-based method that works efficiently for medium scale BNs (e.g., BNs with more than a hundred nodes). We also analyzed the average size of the minimum driver set for random BNs with bounded in-degree. Since it is quite hard to perform theoretical analysis for general cases, we focused on the cases where the number of time steps is one and two (i.e., M = 1 and M = 2). Although the case of M = 1 can be analyzed in a straightforward manner, analysis is hard even for the case of M = 2. Based on some assumptions, we derived an upper bound of the average number of the minimum driver set for M = 2. The derived bound shows a very good agreement with the results of computational experiments using the developed ILP-based method for the case of K = 2 (i.e., the maximum in-degree is 2) and a good agreement for the case of K = 3. It was observed from the results of computational experiments that the number of driver nodes approaches not to 0 but to some positive constant as the number of time steps increases. In order to explain this phenomena, we gave an estimate of the lower bound of the number of minimum driver set, which showed a good agreement with the result of a computational experiment. Although we could analyze restricted cases only, interesting findings were obtained, which include (almost) the linear increase of the minimum number of driver nodes to the total number of nodes, and the existence of a lower limit of the number of driver nodes against the increase of the number of time steps. Although we have focused on random BNs in this paper, it is also important to study controllability of realistic BNs. To this end, we further test our algorithm as well as the analysis on the minimum set of driver nodes with 6 realistic BNs. Simulation results have shown that the upper bound and lower bound function derived in Sec. 4 not only works for mathematically constructed BNs but also works very well for realistic BNs, in which cases they gave a very good bounds for the minimum number of control nodes to drive a BN from a random initial state to a target state.
An improved computation method should be developed because the computation time of the proposed ILP-based method increases rapidly as the maximum in-degree increases. As mentioned in Sec. 2, there is no ordering among the selected driver nodes. However, giving a way to prioritize them may be useful in medical applications because it is practically impossible to control many nodes simultaneously. Therefore, prioritization methods should be studied. In this paper, the theoretical analysis of the minimum number of driver nodes was limited up to M = 2. Therefore, the theoretical analysis of more general cases, even for the case of M = 3, is left as an open problem.