Proving correctness for greedy algorithm in string removal problem

Question

Problem Statement: You are given a string s and two integers x and y. You can perform two types of operations any number of times. Remove substring "ab" and gain x points. For example, when removing "ab" from "cabxbae" it becomes "cxbae". Remove substring "ba" and gain y points. For example, when removing "ba" from "cabxbae" it becomes "cabxe". Return the maximum points you can gain after applying the above operations on s.

A greedy algorithm solves this problem in O(n). The greedy algorithm works by eliminating all pairs with a higher scoring value before removing pairs of a lower scoring value. Can anyone present a proof for the correctness of a greedy algorithm here. I have been trying, and I cannot convince myself.

We require you to credit the original source of all copied material: https://cs.stackexchange.com/help/referencing. Copying without attribution constitutes plagiarism, and plagiarism is not ok. — D.W., Aug 13 '21 at 17:12
Please show us what you've tried, for trying to prove correctness. We have a guide on how to approach such proofs: https://cs.stackexchange.com/q/59964/755. — D.W., Aug 13 '21 at 17:13

score 1 · Answer 1 · answered Aug 13 '21 at 10:12

(For the greedy algorithm described to be optimal, it is also necessary to assume that $x$ and $y$ are nonnegative.)

Without loss of generality, we can assume $y \ge x$.

Suppose to the contrary that the greedy algorithm is not optimal. Then there exist some string (possibly an intermediate state) containing a consecutive pair $\mathrm{ba}$ whose removal is suboptimal. Consider what happens to those letters in the optimal solution.

If neither is removed in the optimal solution, then also removing the pair gives a better solution – a contradiction.
If exactly one of those letters is removed in the optimal solution, then modifying the solution by first removing the $\mathrm{ba}$ pair, and omitting whatever removal applied to one of them later, gives another solution that is at least as good, contradicting the assumed suboptimality of removing the $\mathrm{ba}$ pair. (Because the other member of the pair was never removed in the optimal solution, this change works, in that it cannot break any later removals.)
If both of those letters are removed in the optimal solution, that means they are paired up with an $\mathrm{a}$ on the left and a $\mathrm{b}$ on the right. In this case, we can modify the solution by first removing the $\mathrm{ba}$ pair, and later pairing up and removing those other $\mathrm{a}$ and $\mathrm{b}$, which is again a contradiction.

Proving correctness for greedy algorithm in string removal problem

1 Answers1