Introduction

Nowadays, the usefulness of *Bayesian networks* [58] in representing knowledge with uncertainty and efficient reasoning is widely accepted. A Bayesian network consists of a qualitative part, a *directed acyclic graph* (DAG), and a quantitative one, a collection of numerical parameters, usually conditional probability tables.
The knowledge represented in the graphical component is expressed in terms of dependence and independence relationships between variables. These relationships are encoded using the presence or absence of links between nodes in the graph. The knowledge represented in the numerical part quantifies the dependences encoded in the graph, and allows us to introduce uncertainty into the model. All in all, Bayesian networks provide a very intuitive graphical tool for representing available knowledge.

Another attraction of Bayesian networks is their ability to efficiently perform reasoning tasks [46,58]. The independences represented in the DAG are the key to this ability, reducing changes in the knowledge state to local computations. In addition, important savings in storage requirements are possible since independences allow a factorization of the global numerical representation (the joint probability distribution).

There has been a lot of work in recent years on the automatic learning of Bayesian networks from data. Consequently, there are a great many learning algorithms which may be subdivided into two general approaches: methods based on *conditional independence tests* (also called *constraint-based*), and methods based on a *scoring function*^{1} and a *search* procedure.

The algorithms based on independence tests perform a qualitative study of the dependence and independence relationships among the variables in the domain, and attempt to find a network that represents these relationships as far as possible. They therefore take a list of conditional independence relationships (obtained from the data by means of conditional independence tests) as the input, and generate a network that represents most of these relationships. The computational cost of these algorithms is mainly due to the number and complexity of such tests, which can also cause unreliable results. Some of the algorithms based on this approach obtain simplified models [22,27,37,38,45], whereas other are designed for general DAGs [28,13,55,66,72,73].

The algorithms based on a scoring function attempt to find a graph that maximizes the selected score; the scoring function is usually defined as a measure of fit between the graph and the data. All of them use a scoring function in combination with a search method in order to measure the goodness of each explored structure from the space of feasible solutions. During the exploration process, the scoring function is applied in order to evaluate the fitness of each candidate structure to the data. Each algorithm is characterized by the specific scoring function and search procedure used. The scoring functions are based on different principles, such as entropy [44,19,22,62], Bayesian approaches [11,12,20,34,35,36,41,42,54,61,68], or the Minimum Description Length [8,33,50,69,70,71].

There are also hybrid algorithms that use a combination of constraint-based and scoring-based methods: In several works [64,65,67,21,25] the independence-based and scoring-based algorithms are maintained as separate processes, which are combined in some way, whereas the hybridization proposed by [2,3] is based on the development of a scoring function that quantifies the discrepancies between the independences displayed by the candidate network and the database, and the search process is limited by the results of some independence tests.

In this paper, we focus on the scoring+search approach. Although algorithms in this category have commonly used local search methods [10,20,18,25,42], due to the exponentially large size of the search space, there is a growing interest in other heuristic search methods, i.e. simulated annealing [18], tabu search [9,56], branch and bound [71], genetic algorithms and evolutionary programming [51,57,74], Markov chain Monte Carlo [47,57], variable neighborhood search [30,60], ant colony optimization [23,60], greedy randomized adaptive search procedures [24], and estimation of distribution algorithms [7].

All of these employ different search methods but the same search space: the space of DAGs. A possible alternative is the space of the orderings of the variables [26,29,31,34,52]. In this paper, however, we are more interested in the space of equivalence classes of DAGs [59], i.e. classes of DAGs with each representing a different set of probability distributions. There is also a number of learning algorithms that carry out the search in this space [4,16,21,53,67]. This feature reduces the size of the search space, although recent results [39] confirm that this reduction is not as important in terms of the DAG space as previously hoped (the ratio of the number of DAGs to the number of equivalence classes is lower than four). The price we have to pay for this reduction is that the evaluation of the candidate structures does not take advantage of an important property of many scoring functions, namely decomposability, and therefore the corresponding algorithms are less efficient.

In this paper we propose a new search space which is closely related to the space of equivalence classes of DAGs, and which we have called the space of *restricted acyclic partially directed graphs* (RPDAGs). We define a local search algorithm in this space, and show that by using a decomposable scoring function, we can evaluate locally the score of the structures in the neighborhood of the current RPDAG, thus obtaining an efficient algorithm while retaining many of the advantages of using equivalence classes of DAGs. After the original submission of this paper, [17] proposed another learning algorithm that searches in the space of equivalence classes of DAGs and which can also score the candidate structures locally, using a canonical representation scheme for equivalence classes, called *completed acyclic partially directed graphs* (CPDAGs).

The rest of the paper is organized as follows: section 2 discusses some preliminaries and the advantages and disadvantages of carrying out the search process in the spaces of DAGs and equivalence classes of DAGs. Section 3 describes the graphical objects, RPDAGs, that will be included in the proposed search space. In Section 4, a detailed description of the local search method used to explore this space is provided. Section 5 shows how we can evaluate RPDAGs efficiently using a decomposable scoring function. Section 6 contains the experimental results of the evaluation of the proposed algorithm on the Alarm [5], Insurance [6] and Hailfinder [1] networks, as well as on databases from the UCI Machine Learning Repository. We also include an empirical comparison with other state-of-the-art learning algorithms. Finally, Section 7 contains the concluding remarks and some proposals for future work.