简单整理一下 CS2955 这门四周小学期课的笔记,不是很细致。
Intro
Why Randomness?
- Simplicity
- Efficiency
- Better in adversarial situation: 由于随机性很难造数据卡掉
Toy Example
array \(A: A[i]=0\) for \(\frac{n}{3}\) indices \(i\). Output some \(i\) s.t. \(A[i]=0\).
- Deterministic: check \(\frac{2n}{3}-1\) positions. Runtime \(\Omega(n)\).
- Randomized: choose a uniformly random \(i\). \(Pr\{A[i]=0\} = \frac{1}{3}\). Repeat 10 times.
- Failure Prob \((\frac{1}{3})^{10}\) Runtime \(O(1)\).
Monte-Carlo & Las Vegas
- Monte-Carlo: finite time, but the output could be WRONG.
- Las Vegas: output is always CORRECT. (Runtime could be infinite, but the \(\mathbb{E}(\text{runtime})\) must be finite).
可以被 Monte-Carlo 解决的复杂性类: BPP (Bounded error Probabilistic Polynomial time).
可以被 Las Vegas 解决的复杂性类: ZPP (Zero-error Probabilistic Polynomial time).
$$ZPP \subseteq BPP$$ (Las Vegas 无限长运行时间,可以设一个 Threshold 截断. )
e.g. 一个 Las Vegas 算法有期望运行时间 \(\mathbb{E}[T]\), 设置一个 Threshold \(T_0 = 3\mathbb{E}[T]\) ,那么有
$$
Pr(T > T_0) \le \frac{1}{3}
$$
在此 Threshold 截断得到的新算法,w.p. \(\frac{1}{3}\) 没有输出或者输出错误.
Polynomial Identity Testing (P.I.T)
Problem Description
Alice has
$$
f(x) = \sum_{i=0}^{n-1} a_i x^i
$$
Bob has
$$
g(x) = \sum_{i=0}^{n-1} b_i x^i
$$
Alice send sth to Bob, and Bob answer the question "whether f(x) and g(x) are identical".
A Simple Deterministic Algorithm
Alice send all the coefficients \(a_i\) to Bob.
Assumption: \(a_i\), \(b_i\) are of size poly(\(n\)), so we can use \(\log(n)\) bits to represent them. Totally it takes
\(O(n \log n)\).
Randomized Algorithm
- Idea: Select a random \(r\). check if \(f(r) = g(r)\).
- Analysis:
- Depends on \(r\) (trade off): large range of \(r\) \(\Longrightarrow\) smaller failure prob; large communication complexity.
- If \(f(x)=g(x)\), this algorithm is always correct.
- If \(f(x) \neq g(x)\), the algorithm fails only when \(r\) is a root of \(f(x)-g(x)\).\(f(x)-g(x)\) has at most \(n\) roots hence if we choose \(r\) from \(\{1,2, \dots, 100n\}\), the failure prob is \(\frac{1}{100}\).
- Communication Complexity:
- \(r\) : \(O(\log n)\)
- \(f(r)\): \(O(n \log n)\)
Fingerprinting
Choose a random prime \(p\), if \(a \equiv b \mod p\) then there is a good chance \(a = b\).
Fingerprints: \(a\ \text{mod}\ p\), \(b\ \text{mod}\ p\)
New Algorithm (Random with Fingerprinting)
Check if
$$
f(r) \equiv g(r) \mod p
$$
Only need to pass \(f(r)\ \text{mod}\ p\).
- Analysis: Failure probability comes from
- Case1: \(r\) is a root of \(f(x)-g(x)\)
- Case 2: \(p\) is a factor of \(f(r)-g(r)\)
- \(f(r)-g(r)\) is a number with size \(n^n\), has \(\le n \log n\) prime factors.
- There are \(O(\frac{n^2}{\ln n})\) primes \(\le n^2\).Then choose a prime \(p \le n^2\).
$$
Pr{p \mid f(r)-g(r)} = \frac{n \ln n}{\frac{n^2}{\ln n}} = O(\frac{\ln^2 n}{n})
$$
- Facts used in Analysis:
- (Prime Number Thm) 设 \(\pi(x)\) 为不超过 \(x\) 的素数个数,
$$
\pi(x) \sim \frac{x}{\ln x}
$$ - \(N\) has \(\le \ln N\) prime factors. 考虑前 \(m\) 个质数连乘, 数量级为
$$
e^{(1+o(1))m \ln m} = \#m
$$
\(m \sim \frac{\ln N}{\ln m} \le \ln N\).
- (Prime Number Thm) 设 \(\pi(x)\) 为不超过 \(x\) 的素数个数,
- 上面的部分简单推导 设 \(\theta(x) := \sum_{k=1}^{\pi(x)} \ln p_k\) (Chebyshev Function), 素数定理等价于
$$
\theta(n) \sim n
$$
又有
$$
p_m \sim m \ln m
$$
那么 \(\theta(p_m) \sim p_m\), \(\ln (\#m) \sim p_m \sim m \ln m\) - Communication Complexity
- \(r\) : \(O(\log n)\)
- \(p\): \(O(\log n)\)
- \(f(r)\ \text{mod}\ p\): \(O(\log n)\)
Repeat \(O(\log (\frac{1}{\delta}))\) times to achieve suc prob \(\ge 1-\delta\).
Multiple Variables
Input: \(f, g \in \mathbb{F}[x_1, x_2, \dots, x_n]\) of total degree \(d\).
$$
f(x_1, \dots, x_n) = \sum_{i_1, \dots, i_n \ge 0,\ i_1+ \dots+i_n \le d} a_{i_1, \dots, i_n} x_1^{i_1} x_2^{i_2} \dots x_n^{i_n}
$$
Output: \(f\) and \(g\) are identical?
这个问题等价于输入 \(f \in \mathbb{F}[x_1, x_2, \dots, x_n]\), 输出 \(f\) 是否等于 \(0\).
输入 \(f\) 方式:Blackbox Model: Given \(x_1, \dots, x_n\), return \(f(x_1, \dots, x_n)\).
Schwartz-Zippel Lemma
Let \(P \in \mathbb{F}[x_1, x_2, \dots, x_n\), \(P \neq 0\), total degree \(d\) over field \(\mathbb{F}\). Let \(S\) be a finite subset of \(\mathbb{F}\) and let \(r_1, r_2, \dots, r_n\) be selected at random independently and uniformly from \(S\). Then
$$
Pr[P(r_1, r_2, \dots, r_n) = 0] \le \frac{d}{|S|}
$$
证明略。(其实我笔记上也有,但是为了内容简洁就略掉吧。)
Application: Bipartite Matching
check whether G has a perfect matching (algebra method)
\(A: n \times n\) Tutte matrix:
$$
A_{ij} = \begin{cases}
x_{ij} & (i,j) \in E \\
0 & (i, j) \not\in E
\end{cases}
$$
Claim: \(\text{det} A = 0\) iff G has no perfect matching. 因为把行列式写开(对排列 \(\pi\) sigma),每个排列描述了一个完美匹配。
Finding a Perfect Matching
Trivial Alg
爆搜。每次找一个配对,然后把这条边以及这两个点都删了,查找删掉的图是否有完美匹配。
Isolation Lemma
Let \(S\) be a family of subsets on a ground set \(U\) with \(|U|=n\). For each \(x \in U\), assign a weight
\(w(x)\) from \(\{1, 2, \dots, 100n\}\) uniformly.
Then there exists a unique minimum weight subset in \(S\) w.p. \(\ge 0.99\).
这一事实说明了最小权完美匹配大概率唯一。
Counterintuitive:
一共有 \(2^n\) 个 subsets, 权值 \(\le 100n^2\),那么每个权值感觉上对应了 \(\frac{2^n}{100n^2}\) 个子集,为什么最小权只对应一个子集?
Parallel Alg
给每条边 \((i, j) \in E\) 随机赋一个权重 \(2^{w_{ij}}\), 那么最小完美匹配的权重 \(W\) 即为整除行列式的最大的 \(2^W\).
算法:先计算出 \(W\). 我们的目标是找一组完美匹配的边。怎样判断一条边 \((i, j)\) 是否是完美匹配呢?答案是只要
$$
\text{det}(A') \frac{2^{w_{ij}}}{2^W}
$$
是 odd (奇)的即可。\(A'\) 是去掉这条边(以及两端点)后的图的 Tutte 矩阵。(回忆行列式写开来,被除剩奇数说明有一项是 \(2^W/2^W=1\),即最小权刚好是这个)
Balls and Bins
TODO
Derandomization
TODO
Concentration Inequalities
TODO
Random Vectors
TODO
Comments | NOTHING