15-381/681 Homework 4

Part I: Resolution in Propositional Logic

Download and the file logic-files.zip and extract its contents. It contains a copy of utils.py that you will be familiar with from Homework 1, plus a new file logic2.py that contains code for manipulating formulae in propositional and predicate logic, and a file wumpus.py that we'll get to later.

Skim the contents of logic2.py so you understand the basics of what it provides.

Try the following expressions by doing "python -i logic2.py" and then typing the expressions into the Python read-eval-print loop:

a = expr('p>>q')
a
a.op
a.args
b = to_cnf(a)
b
b.op
b.args
Resolution establishes the validity of a sentence by assuming its negation and deriving a contradiction:
kb = PropKB()
pl_resolution(kb, 'p | q | ~p')
Some sentences are neither always true nor always false. If a sentence is not valid (always true), resolution will not be able to derive a contradiction, so it fails and returns False.
pl_resolution(kb, a)
pl_resolution(kb, ~a)
Resolution requires the sentence to be expressed in CNF. Note that a sentence and its negation can have very different representations in CNF:
c = expr('(p<=>q) <=> (q<=>p)')
to_cnf(c)
to_cnf(~c)
This version of the resolution algorithm is not very clever. To prove sentence c above requires 1949 calls to the pl_resolve function:
pl_resolution(kb, c)
Examine the expressions created during the resolution procedure and you will see several problems that you can fix to create a smarter resolution theorem prover.

Q1. (4 pts) A literal should never appear twice in a clause. For example, the expression to_cnf('p | q | p') returns (p | q | p), but this is equivalent to (p | q). Modify the dissociate function to ensure uniqueness of all terms. (Look in utils.py for help with this.)

Q2. (4 pts) If any argument to a conjunction is the constant TRUE, it should be removed from the argument list. If any argument is FALSE, the entire conjunction should be replaced by FALSE. Modify the associate function to accomplish this. Make the corresponding modification for disjunctions. Note that FALSE and TRUE are instances defined in logic2.py; they are not the Python constants False and True.

Q3. (4 pts) If a conjunction's (or disjunction's) arguments include both a literal and its negation, such as p and ~p, the entire conjunction (disjunction) should be replaced by the constant FALSE (or TRUE, respectively). Modify the associate function to enforce this.

Q4. (2 pts) If your modifications are correct, to_cnf(c), where c is the biconditional expression defined above, should return TRUE. What does to_cnf(~c) return?

Q5. (4 pts) The modifications you made to associate and dissociate also affect the resolution algorithm. You should find that pl_resolution(kb, c) now finds the proof after 35 calls to pl_resolve instead of 1949. But a look at the actual resolutions shows there is still a lot of redundancy. This is because on each round the algorithm tries to resolve every clause ever derived with every other clause ever derived, yielding N*(N-1)/2 resolutions. We can do better. Hint: going forward, you may want to modify the print statement in the inner loop to only show pl_resolve calls where the result list was non-empty.

Modify the pl_resolution function to maintain a set of all derived literals. A literal is either a proposition like p or its negation, ~p. The initial set of derived literals comes from the clauses supplied as input. For example, if asked to prove 'p>>q' we would convert this to CNF as ~p|q and then negate it to get p&~q, which yields two clauses, both literals: p and ~q. At the beginning of each round of resolution you should print the set of literals derived so far.

Q6. (30 pts) If we've already derived a literal p, then any clause of form (p | q) is useless for resolution because we know the clause is true even if q is false. If the clause (p | q) unifies with (~p | r) then we could just as easily unify p with (~p | r). On the other hand, if the clause (p | q) unifies with (~q | r) it will yield (p | r), which cannot lead us to infer r because p is alread proven. This is why (p | q) is useless.

The pl_resolution function maintains a list called clauses containing all the clauses derived so far. Modify the algorithm so that if any clause contains a literal we already know to be true (because it appears in the derived literals set), that clause is removed from clauses before we form pairs for resolution. (See cautionary note below about removing things from lists we're iterating over.) You will also need to revise the termination condition for the algorith since the subset test will no longer suffice. Since we're removing things from clauses, in order to prevent the algorithm from generating the same clauses repeatedly you will need to keep a separate list of every clause ever generated, and only add new resolvents to clauses if they have not been generated before. Your algorithm should terminate if a round produces no new clauses. Verify that your theorem prover still works correctly.

If you implement this refinement correctly, you should be able to prove the biconditional expression c in roughly 9 calls to pl_resolve.

Here are some additional expressions you could use as test cases. Only one of these is valid:

d = expr('(p>>q>>r) <=> (r<<q<<p)')

e = expr('((p>>q)>>r) <=> (r<<(q<<p))')

Q7. (20 pts) We saw in lecture that if we derive a new sentence in round i, that sentence must involve at least one clause that was derived in round i-1, because otherwise the new sentence would have been derived earlier. Modify pl_resolution to maintain a list of clauses newly-derived in the current round. In order to tell if a clause is newly-derived, you must keep track of all clauses ever derived, even if those clauses have been removed from the list of resolution candidates by the code you wrote for Q6. In this new version of pl_resolution, when you choose pairs of clauses to resolve, at least one must come from the newly-derived list.

Q8. (10 pts) The file wumpus.py contains an axiomatization of a tiny bit of the wumpus world, and a call to the theorem prover to prove that square [1,2] does not contain a pit, i.e., ~P12. Running this file with the original version of logic2.py takes 3139 calls to pl_resolve before a contradiction is found, and produces 106 clauses. Running it on your modified version should take far fewer calls. In our implementation it takes 654, with just 38 clauses in the last round. But what happens if we ask the theorem prover to prove P12 instead of ~P12? Since this conclusion is not valid, the only way for the resolution algorithm to return False is by exhausting all possible derivations without finding a contradiction. But the number of combinations is enormous, and the combinatorial explosion kills us. Most of these sentences are useless for reaching the kinds of conclusions we care about in the wumpus world, namely, Pij or ~Pij.

Invent a wumpus-specific heuristic for filtering useless clauses from the list. (Hint: are clauses containing more than one "breezy" literal of any use?) Modify pl_resolution to take an optional third argument wump that defaults to False. When this argument is True it should apply your wumpus filtering heuristic to all newly-derived clauses; when False it should skip the heuristic and just do normal resolution.

With this wumpus heuristic, asking pl_resolution to prove P12 should fail in under 1000 calls to pl_resolve. In our implementation it takes under 500. Asking it to prove ~P12 finds a contradiction in 435 calls instead of 645.

Q9. (4 pts) Write a function pl_nature(expression) that takes an expression as input and returns one of the following: "tautology" if the expression is always true, "contradiction" if the expression is always false, or "meh" if the expression is neither a tautology nor a contradiction. Your function should make use of the pl_resolution function.

A note about removing items from a list while iterating over the list: in Python this doesn't work. Try the following example:


  a = [1,1,2,1,1,1,3,4,1,1,5,1,6]

  for x in a:
    if x == 1:
      a.remove(x)

  a

You will note that a still has some 1 elements in it. This is because remove destructively modifies the list and therefore messes up the iterator set up by the for loop. To avoid this error you need to iterate over a copy of the list, not the original. You can make a copy by writing []+a.

Part II: Predicate Logic

Q10. (12 pts) Convert each of the following sentences into predicate logic using reasonably named predicates, functions, and constants. If you feel a sentence is ambiguous, clarify which meaning you're representing in logic.

Someone in Pittsburgh likes pierogies.
Everyone who likes ham also likes cheese.
Mary is tall and Bill is not.
All of the houses near Sue's house are either large or old (but not both).
Between any pair of real numbers there is at least one real number.
There exists a dog that is cuter than all other dogs.

Q11. (6 pts) What is the most general unifier θ of the following two literals?

R(x, y, x, f(y)) R(f(z), z, f(g(2)), w)

Hand-in Instructions

Create a zip archive containing exactly these files: (1) your modified logic2.py file, (2) an answers.pdf file containing your answers to the written questions Q4, Q10, and Q11, and (3) a README.txt file with your name and Andrew id. Note: do not use other archive formats such as tar, rar, or bz2; you must submit a zip file.

Submit your zip file via Autolab.

Back to class home page.