Exercise 1.11
How should the boolean query x AND NOT y be handled? Why is naive evaluation of this query normally very expensive? Write out a postings merge algorithm that evaluates this query efficiently.
Answer:
Find any terms of x that not matched with terms of y. The algorithm may be like this:
INVERTED_INDEX *px = x
INVERTED_INDEX *py = y
INVERTED_INDEX *z = allocate()
while px != null:
if (px).docId < (py).docId:
(z).docId = (px).docId
px = (*px).next
else:
py = (*py).next
(*z).next = allocate()
z = (*z).next
&z = null