####################################
Tutorial 3: 搜索算法
####################################

.. include:: ../links.ref
.. include:: ../tags.ref
.. include:: ../abbrs.ref

什么是搜索问题？
================

很多 AI 问题可以建模为 **搜索问题**：从初始状态出发，通过一系列行动，到达目标状态。

搜索问题的组成：

- **初始状态**: 起点
- **行动**: 可以采取的操作
- **转移模型**: 行动如何改变状态
- **目标测试**: 判断是否到达目标
- **路径代价**: 行动的代价

.. code-block:: python

   class SearchProblem:
       """搜索问题的抽象定义"""
       
       def __init__(self, initial_state, goal_state):
           self.initial_state = initial_state
           self.goal_state = goal_state
       
       def get_actions(self, state):
           """返回当前状态可用的行动"""
           raise NotImplementedError
       
       def get_result(self, state, action):
           """返回执行行动后的新状态"""
           raise NotImplementedError
       
       def is_goal(self, state):
           """判断是否到达目标"""
           return state == self.goal_state
       
       def get_cost(self, state, action, next_state):
           """返回行动的代价"""
           return 1  # 默认代价为1

经典例子：八数码问题
--------------------

.. code-block:: text

   初始状态:        目标状态:
   ┌───┬───┬───┐   ┌───┬───┬───┐
   │ 1 │ 2 │ 3 │   │ 1 │ 2 │ 3 │
   ├───┼───┼───┤   ├───┼───┼───┤
   │ 4 │   │ 5 │ → │ 4 │ 5 │ 6 │
   ├───┼───┼───┤   ├───┼───┼───┤
   │ 6 │ 7 │ 8 │   │ 7 │ 8 │   │
   └───┴───┴───┘   └───┴───┴───┘

无信息搜索（盲目搜索）
======================

1. 广度优先搜索（BFS）
----------------------

**策略**: 先扩展浅层节点，逐层搜索

.. code-block:: python

   from collections import deque

   def bfs(problem):
       """广度优先搜索"""
       # 边界：待探索的节点队列
       frontier = deque([(problem.initial_state, [])])
       # 已探索的状态
       explored = set()
       
       while frontier:
           state, path = frontier.popleft()
           
           # 目标测试
           if problem.is_goal(state):
               return path
           
           # 标记为已探索
           explored.add(state)
           
           # 扩展节点
           for action in problem.get_actions(state):
               next_state = problem.get_result(state, action)
               
               if next_state not in explored:
                   frontier.append((next_state, path + [action]))
       
       return None  # 无解

   # 示例：迷宫问题
   class MazeProblem(SearchProblem):
       def __init__(self, maze, start, goal):
           super().__init__(start, goal)
           self.maze = maze
           self.rows = len(maze)
           self.cols = len(maze[0])
       
       def get_actions(self, state):
           actions = []
           x, y = state
           moves = [('up', -1, 0), ('down', 1, 0), ('left', 0, -1), ('right', 0, 1)]
           
           for name, dx, dy in moves:
               nx, ny = x + dx, y + dy
               if 0 <= nx < self.rows and 0 <= ny < self.cols:
                   if self.maze[nx][ny] != 1:  # 1 表示墙
                       actions.append(name)
           return actions
       
       def get_result(self, state, action):
           x, y = state
           moves = {'up': (-1, 0), 'down': (1, 0), 'left': (0, -1), 'right': (0, 1)}
           dx, dy = moves[action]
           return (x + dx, y + dy)

   # 测试
   maze = [
       [0, 0, 0, 0, 0],
       [0, 1, 1, 1, 0],
       [0, 0, 0, 1, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 0, 1, 0]
   ]

   problem = MazeProblem(maze, (0, 0), (4, 4))
   path = bfs(problem)
   print(f"BFS 找到的路径: {path}")

**特点**:

- ✅ 完备性：如果有解，一定能找到
- ✅ 最优性：找到的是最短路径（代价相同时）
- ❌ 空间复杂度高：O(b^d)，b是分支因子，d是深度

2. 深度优先搜索（DFS）
----------------------

**策略**: 先扩展最深的节点

.. code-block:: python

   def dfs(problem, max_depth=float('inf')):
       """深度优先搜索"""
       
       def recursive_dfs(state, path, depth):
           if problem.is_goal(state):
               return path
           
           if depth >= max_depth:
               return None
           
           for action in problem.get_actions(state):
               next_state = problem.get_result(state, action)
               
               if next_state not in path:  # 避免循环
                   result = recursive_dfs(
                       next_state,
                       path + [action],
                       depth + 1
                   )
                   if result is not None:
                       return result
           
           return None
       
       return recursive_dfs(problem.initial_state, [], 0)

   # 迭代加深搜索：结合 BFS 和 DFS 的优点
   def iterative_deepening_dfs(problem, max_depth=100):
       """迭代加深搜索"""
       for depth in range(max_depth):
           result = dfs(problem, depth)
           if result is not None:
               return result
       return None

**特点**:

- ✅ 空间复杂度低：O(bd)
- ❌ 不完备：可能陷入无限循环
- ❌ 不最优：可能找到较长的路径

有信息搜索（启发式搜索）
========================

启发式搜索使用 **启发函数 h(n)** 来估计从节点 n 到目标的代价。

1. 贪婪最佳优先搜索
-------------------

**策略**: 总是扩展 h(n) 最小的节点

.. code-block:: python

   import heapq

   def greedy_best_first(problem, heuristic):
       """贪婪最佳优先搜索"""
       frontier = [(heuristic(problem.initial_state), problem.initial_state, [])]
       explored = set()
       
       while frontier:
           _, state, path = heapq.heappop(frontier)
           
           if problem.is_goal(state):
               return path
           
           if state in explored:
               continue
           explored.add(state)
           
           for action in problem.get_actions(state):
               next_state = problem.get_result(state, action)
               if next_state not in explored:
                   h = heuristic(next_state)
                   heapq.heappush(frontier, (h, next_state, path + [action]))
       
       return None

2. A* 搜索
----------

**策略**: 使用 f(n) = g(n) + h(n)

- g(n): 从起点到 n 的实际代价
- h(n): 从 n 到目标的估计代价
- f(n): 估计的总代价

.. code-block:: python

   def astar(problem, heuristic):
       """A* 搜索算法"""
       # (f值, g值, 状态, 路径)
       frontier = [(heuristic(problem.initial_state), 0, problem.initial_state, [])]
       explored = {}  # 状态 -> 最小g值
       
       while frontier:
           f, g, state, path = heapq.heappop(frontier)
           
           if problem.is_goal(state):
               return path, g  # 返回路径和代价
           
           # 如果已经用更小的代价访问过，跳过
           if state in explored and explored[state] <= g:
               continue
           explored[state] = g
           
           for action in problem.get_actions(state):
               next_state = problem.get_result(state, action)
               next_g = g + problem.get_cost(state, action, next_state)
               
               if next_state not in explored or explored[next_state] > next_g:
                   h = heuristic(next_state)
                   f = next_g + h
                   heapq.heappush(frontier, (f, next_g, next_state, path + [action]))
       
       return None, float('inf')

   # 曼哈顿距离启发函数
   def manhattan_distance(state, goal):
       """曼哈顿距离：适用于网格世界"""
       return abs(state[0] - goal[0]) + abs(state[1] - goal[1])

   # 测试 A*
   problem = MazeProblem(maze, (0, 0), (4, 4))
   goal = (4, 4)
   heuristic = lambda s: manhattan_distance(s, goal)

   path, cost = astar(problem, heuristic)
   print(f"A* 找到的路径: {path}")
   print(f"路径代价: {cost}")

启发函数的性质
--------------

**可采纳性（Admissibility）**: h(n) ≤ h*(n)

- h(n) 从不高估到目标的真实代价
- 保证 A* 找到最优解

**一致性（Consistency）**: h(n) ≤ c(n, a, n') + h(n')

- 满足三角不等式
- 一致性蕴含可采纳性

实战：用 PyTorch 学习启发函数
=============================

我们可以用神经网络来学习启发函数：

.. code-block:: python

   import torch
   import torch.nn as nn
   import random

   class LearnedHeuristic(nn.Module):
       """学习的启发函数"""
       
       def __init__(self, state_size):
           super().__init__()
           self.network = nn.Sequential(
               nn.Linear(state_size, 64),
               nn.ReLU(),
               nn.Linear(64, 32),
               nn.ReLU(),
               nn.Linear(32, 1),
               nn.ReLU()  # 启发值应该非负
           )
       
       def forward(self, state):
           return self.network(state)

   def generate_training_data(maze, num_samples=1000):
       """生成训练数据：(状态, 到目标的真实距离)"""
       rows, cols = len(maze), len(maze[0])
       goal = (rows-1, cols-1)
       
       data = []
       
       for _ in range(num_samples):
           # 随机选择一个非墙壁的位置
           while True:
               x = random.randint(0, rows-1)
               y = random.randint(0, cols-1)
               if maze[x][y] != 1:
                   break
           
           # 用 BFS 计算真实距离
           problem = MazeProblem(maze, (x, y), goal)
           path = bfs(problem)
           
           if path is not None:
               true_distance = len(path)
               state = torch.tensor([x/rows, y/cols, goal[0]/rows, goal[1]/cols], 
                                   dtype=torch.float32)
               data.append((state, true_distance))
       
       return data

   def train_heuristic(maze, epochs=100):
       """训练启发函数"""
       model = LearnedHeuristic(state_size=4)
       optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
       criterion = nn.MSELoss()
       
       # 生成训练数据
       print("生成训练数据...")
       data = generate_training_data(maze, num_samples=500)
       
       print(f"训练数据量: {len(data)}")
       
       # 训练
       for epoch in range(epochs):
           total_loss = 0
           random.shuffle(data)
           
           for state, true_dist in data:
               predicted = model(state)
               loss = criterion(predicted, torch.tensor([[true_dist]], dtype=torch.float32))
               
               optimizer.zero_grad()
               loss.backward()
               optimizer.step()
               
               total_loss += loss.item()
           
           if (epoch + 1) % 20 == 0:
               print(f"Epoch {epoch+1}, Loss: {total_loss/len(data):.4f}")
       
       return model

   # 训练并使用学习的启发函数
   if __name__ == "__main__":
       maze = [
           [0, 0, 0, 0, 0],
           [0, 1, 1, 1, 0],
           [0, 0, 0, 1, 0],
           [0, 1, 0, 0, 0],
           [0, 0, 0, 1, 0]
       ]
       
       print("训练启发函数...")
       learned_h = train_heuristic(maze)
       
       # 测试
       print("\n测试学习的启发函数:")
       test_states = [(0, 0), (2, 2), (3, 3)]
       goal = (4, 4)
       
       for state in test_states:
           x, y = state
           state_tensor = torch.tensor([x/5, y/5, goal[0]/5, goal[1]/5], dtype=torch.float32)
           
           learned_h_value = learned_h(state_tensor).item()
           true_h_value = manhattan_distance(state, goal)
           
           print(f"状态 {state}: 学习的h={learned_h_value:.2f}, 曼哈顿距离={true_h_value}")

搜索算法比较
============

.. csv-table::
   :header: "算法", "完备性", "最优性", "时间复杂度", "空间复杂度"
   :widths: 20, 15, 15, 25, 25

   "BFS", "是", "是*", "O(b^d)", "O(b^d)"
   "DFS", "否", "否", "O(b^m)", "O(bm)"
   "迭代加深", "是", "是*", "O(b^d)", "O(bd)"
   "贪婪最佳优先", "否", "否", "O(b^m)", "O(b^m)"
   "A*", "是", "是", "O(b^d)", "O(b^d)"

*注：当所有边代价相同时*

关键概念总结
============

.. csv-table::
   :header: "概念", "解释"
   :widths: 25, 75

   "搜索问题", "从初始状态通过行动到达目标状态"
   "状态空间", "所有可能状态的集合"
   "搜索树", "搜索过程中生成的树结构"
   "启发函数", "估计到目标代价的函数"
   "可采纳性", "启发函数不高估真实代价"
   "A* 算法", "使用 f(n)=g(n)+h(n) 的最优搜索"

下一步
======

在下一个教程中，我们将学习知识表示与推理。

:doc:`tutorial_04_knowledge_representation`