Algorithm Design and Analysis - Overview

Jiawen Zhang Sep 14, 2018

今天上了卜东波老师的第一次算法课，对算法的概况有了一个大体的了解，在这里对今天的内容做一些总结。【Three solutions：Induction, Improvemrnt and Enumeration】

一、Mind Map

二、三种基本的算法策略

基本上，所有的算法题都可以用三种策略解决。

Divide-and-conquer: Let's start from the "smallest" problem first, and investigate whether a large problem can reduce to smaller subproblems.

分治法，将问题分解为小的问题，从最小问题出发解决问题。

See: Algorithm_Design_and Analysis-Divide-and-Conquer

"Intelligent" Enumeration: Consider an optimization problem. If the solution can be constructed step by step, we might enumerate all possible complete solutions by constructing a partial solution tree. Due to the huge size of the search tree, some techniques should be employed to prune it.

动态规划就是一种枚举所有可能性的算法，see：Algorithm Design and Analysis - Dynamic programming

枚举法，枚举所有的解。可使用trick(如设定下界)来加快。

Improvement: Let's start from an initial complete solution, and try to improve it step by step.

若无法将问题分解，就从质量不太好的完整解出发，逐步优化

三、例题

EX1.Calculating the greatest common divisor (gcd) 求最小公约数

The greatest common divisor of two integers a and b, when at least one of them is not zero, is the largest positive integer that divides the numbers without a remainder.

INPUT: two n-bits numbers a, and b (a >= b)
OUTPUT: gcd(a; b)

已知gcd(1; 0) = 1;如何求解gcd(1949; 101)?

可将复杂问题分解成小的问题(辗转相除法)：gcd(1949; 101) = gcd(101; 1949 mod 101) = gcd(101; 30)= gcd(30; 101 mod 30) = gcd(30; 11)= gcd(11; 30 mod 11) = gcd(11; 8)= gcd(8; 3) = gcd(3; 2) = gcd(2; 1) = gcd(1; 0) = 1

function Euclid(a; b)
	if b = 0 then
		return a;
	end if
	return Euclid(b; a mod b);

EX2.traveling salesman problem (TSP) 周游城市的最短距离

INPUT: n cities V = {1; 2;...; n}, and a distance matrix D, where dij (1 <= i; j <= n) denotes the distance between city i and j.
OUTPUT: the shortest tour that visits each city exactly once and returns to the origin city.

Trial 1: Divide and conquer

我们可通过计算 M(S,e) 来计算最小距离。S为要拜访的节点的集合，e为结束节点。

如，最短的距离我们可通过以下式子来计算：

min{ d2;1 +M({3; 4}; 2); d3;1 +M({2; 4}; 3); d4;1 +M({2; 3}; 4)}

//算是一种动态规划算法
function TSP(V;D)
	return min{ M(V - {e}; e) + de1};  //e属于V且e不为1

function M(S; e)
	if S = {v} then
		M(S; e) = d1v + dve;
		return M(S; e);
	end if
	return min{ M(S - {i}; i) + dei};  //i属于S，i不为e

Trial 2: Improvement strategy

从一个完整的解中开始，一步一步去完善它。Newton法，随机梯度下降和MC都用到了这个策略。但这种方法不一定能够获得最优解。可得到最优解的情况：线性规划，二次规划，网络流等。

function GenericImprovement(G;D)  //通用，逐步迭代完整解
	Let s be an initial tour;   //初始解的选择很重要
	while TRUE do
		Select a new tour s′ from the neighbourhood of s; //扰动越小越好
		if s′ is shorter than s then
			s = s′;
		end if
		if stopping(s) then  //s满足退出条件
			return s;
		end if
	end while

Trial 3: Intelligent" enumeration strategy

完整的解可以表示为n条边的排列。将边缘按照一定顺序排列，一个完整的解可以被表示为：X = [x1, x2,..., xm]。例如：a -> b -> c -> d -> e -> a 可以被表示为 X = [1, 0, 0, 1, 1, 0, 0, 1, 0, 1]。所有的环游都可以写成这种形式，我们就能枚举出所有的解。

子节点：表示一个完整的解

内部解：表示一个部分解，是一个已知item的子集

function GenericBacktrack(P0)  //枚举所有的解
	Let A = fP0g. //Start with the original problem P0. Here, A
denotes the active subproblems that are unexplored.
	best_so_far = 1;   //当前知道的最短路程
	while A ̸= NULL do  //当还有节点要扩展时
		Choose and remove a subproblem P from A;
		Expand P into smaller subproblems P1, P2,..., Pk;
		for i = 1 to k do
			if Pi corresponds to a complete solution then
				Update best_so_far if the corresponding objective function value is better;  //对应一个完整解
			else
				Insert Pi into A;   //对应一个部分解
			end if
		end for
	end while
	return best_so_far;

然而，这种方式计算量巨大，需要花费指数倍的时间，且我们无需在内存中存储整个树。因此，我们可以从中剔除一些低质量的或者不符合条件的部分解（贪心算法）。我们可以使用heuristic functions 和下界(lower bound functions)来评估部分解的质量。【这是EM算法的核心】

function IntelligentBacktrack(P0)
	Let A = fP0g. // Start with the original problem P0. Here A denotes the active subproblems that are unexplored.
	best_so_far = infinite;
	while A ̸= NULL do
		Choose a subproblem P in A with lower bound less than best_so_far;
		Remove P from A;
		Expand P into smaller subproblems P1, P2, ..., Pk,
		for i = 1 to k do
			if Pi corresponds to a complete solution then
				Update best so far;
			else
				if lowerbound(Pi) <= best so far then 	Insert Pi into A;
				end if
			end if
		end for
	end while
	return best_so_far;

四、小结

先观察问题的结构，解的形式，再设计算法
能分解成子问题是一个非常有效的信号
在优化问题中，下界信息非常重要