LC 1032. 字符流

题目描述

这是 LeetCode 上的 1032. 字符流 ，难度为困难。

设计一个算法：接收一个字符流，并检查这些字符的后缀是否是字符串数组 words 中的一个字符串。

例如，words = ["abc", "xyz"] 且字符流中逐个依次加入 $4$ 个字符 'a'、'x'、'y' 和 'z' ，你所设计的算法应当可以检测到 "axyz" 的后缀 "xyz" 与 words 中的字符串 "xyz" 匹配。

按下述要求实现 StreamChecker 类：

StreamChecker(String[] words) ：构造函数，用字符串数组 words 初始化数据结构。
boolean query(char letter)：从字符流中接收一个新字符，如果字符流中的任一非空后缀能匹配 words 中的某一字符串，返回 true；否则，返回 false。

示例：

输入：
["StreamChecker", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query", "query"]
[[["cd", "f", "kl"]], ["a"], ["b"], ["c"], ["d"], ["e"], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"], ["l"]]

输出：
[null, false, false, false, true, false, true, false, false, false, false, false, true]

解释：
StreamChecker streamChecker = new StreamChecker(["cd", "f", "kl"]);
streamChecker.query("a"); // 返回 False
streamChecker.query("b"); // 返回 False
streamChecker.query("c"); // 返回n False
streamChecker.query("d"); // 返回 True ，因为 'cd' 在 words 中
streamChecker.query("e"); // 返回 False
streamChecker.query("f"); // 返回 True ，因为 'f' 在 words 中
streamChecker.query("g"); // 返回 False
streamChecker.query("h"); // 返回 False
streamChecker.query("i"); // 返回 False
streamChecker.query("j"); // 返回 False
streamChecker.query("k"); // 返回 False
streamChecker.query("l"); // 返回 True ，因为 'kl' 在 words 中

提示：

$1 <= words.length <= 2000$
$1 <= words[i].length <= 200$
words[i] 由小写英文字母组成
letter 是一个小写英文字母
最多调用查询 $4 \times 10^4$ 次

Trie + 枚举

先考虑最为简单的做法：将给定的所有 $words[i]$ 顺序插入字典树，根据数据范围可知这一步计算量为 $2000 \times 200$，其中最大的 $words[i]$ 长度只有 $200$。

然后利用$words[i]$ 长度只有 $200$ 这一条件，直接使用「枚举」的方式来实现 query。

具体的，我们可以先使用一个字符串 s 来记录 query 操作产生的数据流，然后实现一个 boolean query(int start, int end) 方法，该方法会检查字典树中是否存在 $s[i…j]$ 子串。

由于 $words[i]$ 长度只有 $200$（假设当前 s 的长度为 $n$），因此我们只需要枚举「$\max(0, n - 200)$ 作为子串左端点，$n - 1$ 作为子串右端点」是否存在字典树中（是否存在 $words[i]$ 中）即可，最坏情况下，单次 query 操作计算量为 $200 \times 200$。

一些细节：为了避免每个样例都 new 大数组，我们可以使用 static 优化。

Java 代码：

class StreamChecker {
    static int N = 2010 * 200, idx = 0;
    static int[][] tr = new int[N][26];
    static boolean[] isEnd = new boolean[N * 26];
    StringBuilder sb = new StringBuilder();
    void add(String s) {
        int p = 0;
        for (int i = 0; i < s.length(); i++) {
            int u = s.charAt(i) - 'a';
            if (tr[p][u] == 0) tr[p][u] = ++idx;
            p = tr[p][u];
        }
        isEnd[p] = true;
    }
    boolean query(int start, int end) {
        int p = 0;
        for (int i = start; i <= end; i++) {
            int u = sb.charAt(i) - 'a';
            if (tr[p][u] == 0) return false;
            p = tr[p][u];
        }
        return isEnd[p];
    }
    public StreamChecker(String[] words) {
        for (int i = 0; i <= idx; i++) {
            Arrays.fill(tr[i], 0);
            isEnd[i] = false;
        }
        idx = 0;
        for (String s : words) add(s);
    }
    public boolean query(char c) {
        sb.append(c);
        int n = sb.length(), min = Math.max(0, n - 200);
        for (int i = n - 1; i >= min; i--) {
            if (query(i, n - 1)) return true;
        }
        return false;
    }
}

C++ 代码：

class StreamChecker {
public:
    static const int N = 2010 * 200;
    int tr[N][26];
    bool isEnd[N];
    string sb;
    int idx = 0;
    void add(const string &s) {
        int p = 0;
        for (char c : s) {
            int u = c - 'a';
            if (tr[p][u] == 0) tr[p][u] = ++idx;
            p = tr[p][u];
        }
        isEnd[p] = true;
    }
    bool query(int start, int end) {
        int p = 0;
        for (int i = start; i <= end; i++) {
            int u = sb[i] - 'a';
            if (tr[p][u] == 0) return false;
            p = tr[p][u];
        }
        return isEnd[p];
    }
    StreamChecker(const vector<string>& words) {
        memset(tr, 0, sizeof(tr));
        memset(isEnd, 0, sizeof(isEnd));
        for (const string &s : words) add(s);
    }
    bool query(char c) {
        sb.push_back(c);
        int n = sb.length(), min = max(0, n - 200);
        for (int i = n - 1; i >= min; i--) {
            if (query(i, n - 1)) return true;
        }
        return false;
    }
};

时间复杂度：StreamChecker 初始化复杂度为 $O(n)$，其中 $n$ 为 words 字符总数；query 操作复杂度为 $O(m^2)$，其中 $m = 200$ 为最大 words[i] 长度
空间复杂度：$O(n \times C)$，其中 $n$ 为 words 字符总数，$C = 26$ 为字符集大小

Trie（优化）

初始化将所有的 $words[i]$ 存入 Trie 是必然的，我们只能考虑如何优化 query 操作。

在解法一中，我们需要对新数据流对应的字符串的每个后缀进行搜索，同时每次搜索是相互独立的，即本次匹配不会对下一次匹配产生贡献。

实际上，我们可以通过「倒序建 Trie」的方式，将「枚举检查多个后缀」的操作变为「匹配一次后缀」操作。

具体的，我们可以在初始化 StreamChecker 时，将每个 $words[i]$ 翻转（倒序）加入 Trie 中；然后在 query 操作时（假设当前数据流对应的字符串为 s，长度为 $n$），从 s 的尾部开始在 Trie 中进行检索（即从 $s[n - 1]$ 开始往回找）。

若在某个位置 idx 时匹配成功，意味着 $s[idx … (n-1)]$ 的翻转子串在字典树中，同时我们又是将每个 words[i] 进行倒序插入，即意味着 $s[idx … (n - 1)]$ 的正向子串在 words 中，即满足 s 的某个后缀出现在 words 中。

同理，我们可以利用最大的 words[i] 长度为 $200$ 来控制从 $s[n - 1]$ 开始往回找的最远距离，同时利用当某个短后缀不在 Trie 中，则其余长度更大的后缀必然不在 Trie 中进行剪枝操作。

Java 代码：

class StreamChecker {
    static int N = 2010 * 200, idx = 0;
    static int[][] tr = new int[N][26];
    static boolean[] isEnd = new boolean[N * 26];
    StringBuilder sb = new StringBuilder();
    void add(String s) {
        int p = 0;
        for (int i = s.length() - 1; i >= 0; i--) {
            int u = s.charAt(i) - 'a';
            if (tr[p][u] == 0) tr[p][u] = ++idx;
            p = tr[p][u];
        }
        isEnd[p] = true;
    }
    public StreamChecker(String[] words) {
        for (int i = 0; i <= idx; i++) {
            Arrays.fill(tr[i], 0);
            isEnd[i] = false;
        }
        idx = 0;
        for (String s : words) add(s);
    }
    public boolean query(char c) {
        sb.append(c);
        int n = sb.length(), min = Math.max(0, n - 200), p = 0;
        for (int i = n - 1; i >= min; i--) {
            if (isEnd[p]) return true;
            int u = sb.charAt(i) - 'a';
            if (tr[p][u] == 0) return false;
            p = tr[p][u];
        }
        return isEnd[p];
    }
}

C++ 代码：

class StreamChecker {
public:
    static const int N = 2010 * 200;
    static const int ALPHABET_SIZE = 26;
    vector<vector<int>> tr;
    vector<bool> isEnd;
    string sb;
    int idx = 0;
    void add(const string &s) {
        int p = 0;
        for (int i = s.length() - 1; i >= 0; i--) {
            int u = s[i] - 'a';
            if (tr[p].size() <= u) tr[p].resize(u + 1, 0);
            if (tr[p][u] == 0) tr[p][u] = ++idx;
            p = tr[p][u];
        }
        isEnd[p] = true;
    }
    StreamChecker(const vector<string>& words) {
        tr.resize(N);
        isEnd.resize(N);
        fill(isEnd.begin(), isEnd.end(), false);
        for (const auto &s : words) {
            add(s);
        }
    }
    bool query(char c) {
        sb.push_back(c);
        int n = sb.length(), min = max(0, n - 200), p = 0;
        for (int i = n - 1; i >= min; i--) {
            if (isEnd[p]) return true;
            int u = sb[i] - 'a';
            if (tr[p].size() <= u || tr[p][u] == 0) return false;
            p = tr[p][u];
        }
        return isEnd[p];
    }
};

时间复杂度：StreamChecker 初始化复杂度为 $O(n)$，其中 $n$ 为 words 字符总数；query 操作复杂度为 $O(m)$，其中 $m = 200$ 为最大 words[i] 长度
空间复杂度：$O(n \times C)$，其中 $n$ 为 words 字符总数，$C = 26$ 为字符集大小

最后

这是我们「刷穿 LeetCode」系列文章的第 No.1032 篇，系列开始于 2021/01/01，截止于起始日 LeetCode 上共有 1916 道题目，部分是有锁题，我们将先把所有不带锁的题目刷完。

在这个系列文章里面，除了讲解解题思路以外，还会尽可能给出最为简洁的代码。如果涉及通解还会相应的代码模板。

为了方便各位同学能够电脑上进行调试和提交代码，我建立了相关的仓库：https://github.com/SharingSource/LogicStack-LeetCode 。

在仓库地址里，你可以看到系列文章的题解链接、系列文章的相应代码、LeetCode 原题链接和其他优选题解。

字典树枚举剪枝

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

LC 1475. 商品折扣后的最终价格上一篇

LC 946. 验证栈序列下一篇