10. Regular Expression Matching

October 05, 2022

10. Regular Expression Matching

Regular Expression Matching

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1:

Input: s = "aa", p = "a"
Output: false

Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true

Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input: s = "ab", p = ".*"
Output: true

Explanation: ".*" means "zero or more (*) of any character (.)".

Constraints:

1 <= s.length <= 20
1 <= p.length <= 30
s contains only lowercase English letters.
p contains only lowercase English letters, '.', and '*'.
It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.

A brief explanation

There are three basic types of pattern entry required, and they are a character (a, b, c, etc.), an asterisk (*), and a full-stop (.).

A character (E.G. “a”) will match exactly itself only one time. For example, a string of “a” and a pattern of “a” will match, but a string of “aa” and a pattern of “a” will not.

An asterisk will match the preceding pattern character 0 or more times. For example, a string of “a” and a pattern of “a*” will match, as will a string of “aaaa” and a pattern of “a*”. In a demonstration of a zero-match, a string of “ac” and a pattern of “ab*c” will also match, because “a” was matched once, “b” was matched 0 times, and “c” was matched once.

Finally, a full stop will match any character only one time. For example, a string of “a” will match “.”, as will a string of “ab” against a pattern of “..”, but a string of “aa” will not match a pattern of “.”.

One additional complication is that asterisks can follow full stops. This means that a string of “abc” will match “.*”, because the any-character allowing full-stop was matched 3 times.

The “0 or more” complication

What makes this challenge tricky, is that a part of the pattern might match 10 times, or it might match 0 times.

A good example is the string “abc” and the pattern “a.*c”. We know that “a” will match “a”, and we know that “.” will match “b”. But because “.” has a trailing asterisk, it will also match “c”, however, if it does match “c”, then the “c” in the pattern is never used, and the match fails.

How do we solve this? Well, we have to run multiple checks in parallel. We have to check both paths, as well as any other paths that occur when traversing through both paths. Two diverging paths quickly become 4, and then 8, and before you know it you have hundreds of possible routes to match the pattern against the string.

Approach 1) Recursion

function isMatch(str, pat) {
    return recursiveIsMatch(0, 0, str, pat);
}
function recursiveIsMatch(i, j, str, pat) {
    const inputStringLength = str.length;
    const patternLength = pat.length;

    // Reached the end of the pattern
    if (j == patternLength) {
        // Return whether or not we've also reached the end of the string (entire string has passed)
        return i == inputStringLength;
    }

    // If the current pattern character is followed by a * (is a wildcard)
    if (j + 1 < patternLength && pat.charAt(j + 1) == '*') {
        // Assume 0 matches of the current pattern character, move on to the next point in the pattern (after the asterisk)
        if (recursiveIsMatch(i, j + 2, str, pat)) return true;

        // Loop through the remaining characters, so long as they match by character (or .)
        while (
            i < inputStringLength &&
            (pat.charAt(j) == '.' || str.charAt(i) == pat.charAt(j))
        ) {
            // Check the rest of the string (1 character forward), against the next point in the pattern (after the asterisk)
            if (recursiveIsMatch(++i, j + 2, str, pat)) return true;
        }
    }
    // If the current pattern character is not a wildcard, and matches the current string character
    else if (
        i < inputStringLength &&
        (pat.charAt(j) == '.' || str.charAt(i) == pat.charAt(j))
    ) {
        // Move onto the next character, and the next character of the pattern
        return recursiveIsMatch(i + 1, j + 1, str, pat);
    }

    // String does not match current point in pattern
    return false;
}

This runs pretty much exactly as stated in the above complication. We traverse both the string and the pattern, branching off at every point to conduct separate checks until a path finally succeeds.
If no paths have succeeded by the time we reach the end of the string, the pattern was not a match, and we fail out.

Approach 2) Cheating

I’d be remiss if I didn’t point out that you can actually just use JavaScript’s built-in RegEx functionality for this challenge. But obviously, that’s not the point of this exercise:

var isMatch = function (s, p) {
    const regex = new RegExp('^' + p + '$', 'm');
    return regex.test(s);
};

Approach 3) Dynamic programming

To understand the approach, let’s take some examples —

s = "aa" and p = "aa", since all the characters in both s and p are same, hence it’s a match.

Now, what about s = "aabb" and p = "aab*" 🤔? We know that substrings bb and b* are match because * can be replaced by one b. Since, we already know that remaining substrings aa and aa are match, hence the whole strings also a match.

What can we infer from this? Right, if we have solution of part of a problem, we can use that partial result and can go forward. Also, we can use the already calculated result without calculating it again.

Does this ring a bell 🔔? Yes, this problem satisfies the following two properties -

Optimal Substructure — Any problem has optimal substructure property if its overall optimal solution can be constructed from the optimal solutions of its subproblems.

Overlapping Subproblems — Any problem has overlapping sub-problems if finding its solution involves solving the same subproblem multiple times.

It is now evident that we can use good old Dynamic Programming to solve this problem. Below are the steps —

Create a boolean 2D DP array with sizes as boolean[][] dp = new boolean[s.length() + 1][p.length() + 1]. We are adding extra 1 to incorporate the case in case either or both of the strings are empty.

If both strings are empty, then it’s a match, thus, dp[0][0] = true.

Let’s take an example s = "aab" and p = "c*a*b" and create a DP table.

The first column — it means p is empty and it will match to s only if s is also empty which we have stored in dp[0][0]. Thus, the remaining values of dp[0][i] will be false.
The first row — this is not so easy. It means which p matches empty s. The answer is either an empty pattern or a pattern that represents an empty string such as "a*", "x*y*", "l*m*n*" and so on. In the above example, if s = "" and p = "c*", then due to *, c can be replaced by 0 cs which gives us an empty string. Hence, dp[0][2] = true.
For non-empty strings, let’s say that s[i - 1] == p[j - 1] this means the (i - 1)th and (j - 1)th characters are the same. This means we have to check if the remaining strings are a match or not. If they are a match, then the current substrings will be a match, otherwise, they won’t be a match i.e., dp[i][j] = dp[i - 1][j - 1]. We’re taking (i - 1)th and (j - 1)`th characters to offset empty strings as we’re assuming our strings start from index 1.
If p[j - 1] == ".", then it means any single character can be matched. Therefore, here also, we will have to check if the remaining string is a match or not. Thus, dp[i][j] = dp[i - 1][j - 1].
If p[j - 1] == "*", then it means either it represents an empty string (0 characters), thus dp[i][j] = dp[i][j - 2] or s[i - 1] == p[j - 2] || p[j - 2] == ".", the then-current character of the string equals the char preceding * in the pattern so the result is dp[i-1][j].

💡 Try to evaluate the table by yourself to make it more clear.

var isMatch = function (s, p) {
    const rows = s.length;
    const columns = p.length;
    /// Base conditions
    if (rows == 0 && columns == 0) {
        return true;
    }
    if (columns == 0) {
        return false;
    }
    // DP array
    const dp = Array.from({ length: s.length + 1 }, () => [false]);
    // Empty string and empty pattern are a match
    dp[0][0] = true;
    // Deals with patterns with *
    for (let i = 1; i < columns + 1; i++) {
        if (p[i - 1] === '*') {
            dp[0][i] = dp[0][i - 2];
        }
        else {
            dp[0][i] = false;
        };
    }
    // For remaining characters
    for (let i = 1; i < rows + 1; i++) {
        for (let j = 1; j < columns + 1; j++) {
            if (p[j - 1] === '*') {
                if (p[j - 2] === s[i - 1] || p[j - 2] === '.') {
                    dp[i][j] = dp[i][j - 2] || dp[i - 1][j];
                } else {
                    dp[i][j] = dp[i][j - 2];
                }
            } else if (p[j - 1] === s[i - 1] || p[j - 1] === '.') {
                dp[i][j] = dp[i - 1][j - 1];
            } else {
                dp[i][j] = false;
            }
        }
    }
    return dp[rows][columns];
};

Time Complexity (Regular Expression Matching)

Since we are dealing with each character of both s and p the time complexity will be O(m × n) where m and n are the lengths of s and p respectively.

Space Complexity

We need a DP array for our intermediate operations of dimensions m × n, hence the space complexity will also be O(m × n).

Approach 4) Backtracking / Depth-first search (DFS)

There are two normal cases without consideration of *:

1. s === p
2. s !== p, but p === '.' and s is any character

And then let’s consider how to handle the pattern *:

If we have a x* in the pattern(x stands for any character), we may ignore this part of
the pattern, or delete a matching character in the text

// no match: aa , c*  || match: aaa , a*

(isMatch(s, p.substring(2)) 
|| 
(first_match && isMatch(s.substring(1), p)));

According to these analyses, we can construct a depth-first search algorithm, it’s a recursive algorithm

var isMatch = function(s, p) {
    let sLen = s.length;
    let pLen = p.length;
    if (pLen === 0) return sLen === 0;
    let first_match = (sLen !== 0 && (p[0] === s[0] || p[0] === '.'));
    
    // If a star is present in the pattern, it will be in the 
    // second position p[1]. Then, we may ignore this part of 
    // the pattern, or delete a matching character in the text. 
    // If we have a match on the remaining strings after any of 
    // these operations, then the initial inputs matched.
    if (pLen >= 2 && p[1] === '*') {
        // no match:  aa , c*       ||      match:       aaa , a*
        return (isMatch(s, p.substring(2)) || (first_match && isMatch(s.substring(1), p)));
    } else {
        return first_match && isMatch(s.substring(1), p.substring(1));
    }
};

Conclusion

That’s all folks! In this post, we solved LeetCode problem 10. Regular Expression Matching

I hope you have enjoyed this post. Feel free to share your thoughts on this.

You can find the complete source code on my GitHub repository. If you like what you learn. feel free to fork 🔪 and star ⭐ it.

In this blog, I have tried to solve leetcode questions & present the most important points to consider when improving Data structure and logic, feel free to add, edit, comment, or ask. For more information please reach me here
Happy coding!

Search This Blog

Leetcode Solutions | Platform for learning to code

10. Regular Expression Matching

Regular Expression Matching

Approach 1) Recursion

Approach 2) Cheating

Approach 3) Dynamic programming

Approach 4) Backtracking / Depth-first search (DFS)

That’s all folks! In this post, we solved LeetCode problem 10. Regular Expression Matching

Comments

Post a Comment

Popular Post

1578. Minimum Time to Make Rope Colorful

446. Arithmetic Slices II - Subsequence

31. Next Permutation

268. Missing Number

220. Contains Duplicate III

622. Design Circular Queue