1

I came across this problem that asks you to implement a regular expression matcher with support for '.' and '*', where

'.' Matches any single character.

'*' Matches zero or more of the preceding element.

isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "a*") → true
isMatch("aa", ".*") → true
isMatch("ab", ".*") → true
isMatch("aab", "c*a*b") → true

While I was able to solve this in a linear fashion, I came across lots of solutions that used DP, like the one below,

class Solution {
    public boolean isMatch(String text, String pattern) {
        boolean[][] dp = new boolean[text.length() + 1][pattern.length() + 1];
        dp[text.length()][pattern.length()] = true;

        for (int i = text.length(); i >= 0; i--){
            for (int j = pattern.length() - 1; j >= 0; j--){
                boolean first_match = (i < text.length() && 
                                       (pattern.charAt(j) == text.charAt(i) ||
                                        pattern.charAt(j) == '.'));
                if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){
                    dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];
                } else {
                    dp[i][j] = first_match && dp[i+1][j+1];
                }
            }
        }
        return dp[0][0];
    }
}

I'm having a hard time understanding this. I've solved a few DP problems that involved grids (shortest path in 2d grid, largest square in binary 2d grid), using a DP table there made perfect sense to me. However here I'm completely lost, I'm unable to understand how traversing a 2d table helps in solving this problem. Further more it appears we know when the characters don't match in the loop, so I don't understand why we don't terminate the search there (this is also probably due to my lack of understanding of how a table traversal leads to a solution). Is there a clear intuitive explanation for problems like these?

Rnet
  • 111
  • 3
  • 1
    Some related questions, maybe even duplicates: https://cs.stackexchange.com/q/645/98, https://cs.stackexchange.com/q/47216/98, https://cs.stackexchange.com/q/2057/98. Tl;dr: don't think about the table; think about the recurrence underneath. Coding with a table is the simple part. – Raphael Mar 30 '18 at 10:55

1 Answers1

3

dp[i][j] represents whether text[i..] matches pattern[j..].

If pattern[j..] is starting with x* for some symbol x (including .), then text[i..] matches pattern[j..], or x*pattern[(j+2)..], if and only if

  • text[i..] matches pattern[(j+2)..] (i.e. x* matches an empty string), or

  • text[i] is x and text[(i+1)..] matches x*pattern[(j+2)..] (i.e. x* matches a non-empty string).

This is implemented by this part of code:

if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){
    dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];
}

On the other hand, if pattern[j..] is not starting with x*, text[i..] matches pattern[j..] if and only if text[i] matches pattern[j] and text[(i+1)..] matches pattern[(j+1)..]. This is implemented by this part of code:

else {
    dp[i][j] = first_match && dp[i+1][j+1];
}
xskxzr
  • 7,455
  • 5
  • 23
  • 46