#################################### Tutorial 2: 智能体(Agent) #################################### .. include:: ../links.ref .. include:: ../tags.ref .. include:: ../abbrs.ref 什么是智能体? ============== **智能体(Agent)** 是能够感知环境并在环境中采取行动的系统。 .. code-block:: text ┌─────────────────────────────────────────────────────────────┐ │ 环境 (Environment) │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ 感知器 │◄────── 感知 ───-───│ │ │ │ │ (Sensors) │ │ 环境 │ │ │ └──────┬──────┘ │ 状态 │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────┐ │ │ │ │ │ 智能体 │ │ │ │ │ │ (Agent) │ │ │ │ │ └──────┬──────┘ │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌─────────────┐ │ │ │ │ │ 执行器 │─────── 行动 ──────►│ │ │ │ │ (Actuators) │ └─────────────┘ │ │ └─────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ 用生活中的例子来理解: - **自动驾驶汽车**: 感知(摄像头、雷达)→ 决策(AI)→ 行动(转向、加速) - **智能客服**: 感知(用户输入)→ 决策(NLP)→ 行动(回复) - **扫地机器人**: 感知(传感器)→ 决策(路径规划)→ 行动(移动、清扫) 智能体的组成 ============ .. code-block:: python class Agent: """智能体的基本结构""" def __init__(self): self.state = None # 内部状态 def perceive(self, environment): """感知环境""" return environment.get_percept() def think(self, percept): """决策/思考""" # 根据感知决定行动 action = self.decide(percept) return action def act(self, action, environment): """执行行动""" environment.execute(action) def run(self, environment): """智能体主循环""" while True: percept = self.perceive(environment) action = self.think(percept) self.act(action, environment) 理性智能体 ========== **理性智能体** 是指在给定感知序列的情况下,能够采取使预期效用最大化的行动的智能体。 关键概念: - **性能度量**: 评估智能体行为好坏的标准 - **先验知识**: 智能体事先知道的关于环境的信息 - **感知序列**: 智能体到目前为止接收到的所有感知 - **可执行动作**: 智能体可以采取的行动集合 .. code-block:: python class RationalAgent(Agent): """理性智能体""" def __init__(self, performance_measure): super().__init__() self.performance_measure = performance_measure self.percept_history = [] def think(self, percept): # 记录感知历史 self.percept_history.append(percept) # 选择能最大化预期效用的行动 best_action = None best_utility = float('-inf') for action in self.get_possible_actions(): expected_utility = self.estimate_utility(action, percept) if expected_utility > best_utility: best_utility = expected_utility best_action = action return best_action 智能体的类型 ============ 1. 简单反射型智能体 ------------------- **特点**: 只根据当前感知做决策,不考虑历史 .. code-block:: python class SimpleReflexAgent(Agent): """简单反射型智能体""" def __init__(self, rules): super().__init__() self.rules = rules # 条件-动作规则 def think(self, percept): """直接根据规则做出反应""" for condition, action in self.rules: if condition(percept): return action return None # 示例:简单的恒温器 thermostat_rules = [ (lambda p: p['temperature'] < 20, 'turn_on_heater'), (lambda p: p['temperature'] > 25, 'turn_off_heater'), (lambda p: True, 'do_nothing') ] thermostat = SimpleReflexAgent(thermostat_rules) action = thermostat.think({'temperature': 18}) print(f"行动: {action}") # turn_on_heater **优点**: 简单、快速 **缺点**: 无法处理需要记忆的情况 2. 基于模型的反射型智能体 ------------------------- **特点**: 维护环境的内部模型,考虑历史状态 .. code-block:: python class ModelBasedAgent(Agent): """基于模型的反射型智能体""" def __init__(self, transition_model, sensor_model, rules): super().__init__() self.transition_model = transition_model # 状态转移模型 self.sensor_model = sensor_model # 感知模型 self.rules = rules self.state = None self.last_action = None def think(self, percept): # 更新内部状态 self.state = self.update_state( self.state, self.last_action, percept ) # 根据状态选择行动 for condition, action in self.rules: if condition(self.state): self.last_action = action return action return None def update_state(self, state, action, percept): """根据转移模型和感知更新状态""" if state is None: return self.sensor_model(percept) predicted_state = self.transition_model(state, action) return self.sensor_model(percept, predicted_state) 3. 基于目标的智能体 ------------------- **特点**: 有明确的目标,选择能达成目标的行动 .. code-block:: python class GoalBasedAgent(Agent): """基于目标的智能体""" def __init__(self, goal): super().__init__() self.goal = goal self.state = None def think(self, percept): self.state = self.update_state(percept) # 搜索达成目标的行动序列 plan = self.search_for_goal(self.state, self.goal) if plan: return plan[0] # 返回第一个行动 return None def search_for_goal(self, state, goal): """搜索达成目标的路径""" # 使用搜索算法(下一章详细介绍) pass # 示例:迷宫寻路智能体 class MazeAgent(GoalBasedAgent): def __init__(self, goal_position): super().__init__(goal_position) def search_for_goal(self, current_pos, goal_pos): # 简化版:贪心选择最近的方向 dx = goal_pos[0] - current_pos[0] dy = goal_pos[1] - current_pos[1] actions = [] if dx > 0: actions.append('right') if dx < 0: actions.append('left') if dy > 0: actions.append('down') if dy < 0: actions.append('up') return actions if actions else ['stay'] 4. 基于效用的智能体 ------------------- **特点**: 不仅考虑是否达成目标,还考虑"有多好" .. code-block:: python class UtilityBasedAgent(Agent): """基于效用的智能体""" def __init__(self, utility_function): super().__init__() self.utility = utility_function self.state = None def think(self, percept): self.state = self.update_state(percept) # 评估每个行动的预期效用 best_action = None best_expected_utility = float('-inf') for action in self.get_possible_actions(): # 计算行动的预期效用 expected_utility = self.expected_utility(action) if expected_utility > best_expected_utility: best_expected_utility = expected_utility best_action = action return best_action def expected_utility(self, action): """计算行动的预期效用""" total = 0 for outcome, probability in self.get_outcomes(action): total += probability * self.utility(outcome) return total 5. 学习型智能体 --------------- **特点**: 能够从经验中学习,不断改进 .. code-block:: python class LearningAgent(Agent): """学习型智能体""" def __init__(self): super().__init__() self.performance_element = None # 决策组件 self.learning_element = None # 学习组件 self.critic = None # 评价组件 self.problem_generator = None # 探索组件 def think(self, percept): # 1. 根据当前知识选择行动 action = self.performance_element.select_action(percept) # 2. 评价行动效果 feedback = self.critic.evaluate(percept, action) # 3. 学习改进 self.learning_element.learn(feedback) # 4. 探索新可能 if self.should_explore(): action = self.problem_generator.suggest_exploration() return action 实战:用 PyTorch 实现学习型智能体 ================================= 让我们实现一个简单的学习型智能体,学习在网格世界中寻找宝藏: .. code-block:: python import torch import torch.nn as nn import random class GridWorld: """简单的网格世界环境""" def __init__(self, size=5): self.size = size self.agent_pos = [0, 0] self.goal_pos = [size-1, size-1] def reset(self): self.agent_pos = [0, 0] return self.get_state() def get_state(self): """返回状态向量""" return torch.tensor([ self.agent_pos[0] / self.size, self.agent_pos[1] / self.size, self.goal_pos[0] / self.size, self.goal_pos[1] / self.size ], dtype=torch.float32) def step(self, action): """执行行动,返回新状态和奖励""" # 行动: 0=上, 1=下, 2=左, 3=右 moves = [(-1, 0), (1, 0), (0, -1), (0, 1)] dx, dy = moves[action] # 更新位置(边界检查) new_x = max(0, min(self.size-1, self.agent_pos[0] + dx)) new_y = max(0, min(self.size-1, self.agent_pos[1] + dy)) self.agent_pos = [new_x, new_y] # 计算奖励 if self.agent_pos == self.goal_pos: reward = 10.0 # 到达目标 done = True else: reward = -0.1 # 每步小惩罚,鼓励快速到达 done = False return self.get_state(), reward, done class SimpleQNetwork(nn.Module): """简单的 Q 网络""" def __init__(self, state_size=4, action_size=4): super().__init__() self.fc1 = nn.Linear(state_size, 32) self.fc2 = nn.Linear(32, 32) self.fc3 = nn.Linear(32, action_size) self.relu = nn.ReLU() def forward(self, x): x = self.relu(self.fc1(x)) x = self.relu(self.fc2(x)) return self.fc3(x) class QLearningAgent: """Q-Learning 智能体""" def __init__(self, state_size=4, action_size=4): self.action_size = action_size self.q_network = SimpleQNetwork(state_size, action_size) self.optimizer = torch.optim.Adam(self.q_network.parameters(), lr=0.01) self.epsilon = 1.0 # 探索率 self.epsilon_decay = 0.995 self.epsilon_min = 0.01 self.gamma = 0.99 # 折扣因子 def select_action(self, state): """选择行动(ε-贪婪策略)""" if random.random() < self.epsilon: return random.randint(0, self.action_size - 1) with torch.no_grad(): q_values = self.q_network(state) return q_values.argmax().item() def learn(self, state, action, reward, next_state, done): """学习更新""" # 计算目标 Q 值 with torch.no_grad(): if done: target = reward else: next_q = self.q_network(next_state).max() target = reward + self.gamma * next_q # 计算当前 Q 值 current_q = self.q_network(state)[action] # 计算损失并更新 loss = nn.MSELoss()(current_q, torch.tensor(target)) self.optimizer.zero_grad() loss.backward() self.optimizer.step() # 衰减探索率 self.epsilon = max(self.epsilon_min, self.epsilon * self.epsilon_decay) return loss.item() # 训练智能体 def train_agent(episodes=500): env = GridWorld(size=5) agent = QLearningAgent() rewards_history = [] for episode in range(episodes): state = env.reset() total_reward = 0 for step in range(100): # 最多100步 action = agent.select_action(state) next_state, reward, done = env.step(action) agent.learn(state, action, reward, next_state, done) state = next_state total_reward += reward if done: break rewards_history.append(total_reward) if (episode + 1) % 100 == 0: avg_reward = sum(rewards_history[-100:]) / 100 print(f"Episode {episode+1}, Avg Reward: {avg_reward:.2f}, Epsilon: {agent.epsilon:.3f}") return agent, rewards_history # 运行训练 if __name__ == "__main__": print("开始训练智能体...") agent, rewards = train_agent(500) print("\n测试训练好的智能体:") env = GridWorld(size=5) state = env.reset() print(f"起点: {env.agent_pos}, 目标: {env.goal_pos}") for step in range(20): action = agent.select_action(state) action_names = ['上', '下', '左', '右'] state, reward, done = env.step(action) print(f"步骤 {step+1}: {action_names[action]} -> 位置 {env.agent_pos}") if done: print("到达目标!") break 环境的分类 ========== .. csv-table:: :header: "维度", "类型", "说明", "例子" :widths: 15, 20, 35, 30 "可观察性", "完全可观察", "智能体能感知完整环境状态", "国际象棋" "", "部分可观察", "只能感知部分状态", "扑克牌" "确定性", "确定性", "行动结果完全确定", "国际象棋" "", "随机性", "行动结果有不确定性", "掷骰子游戏" "连续性", "离散", "有限的状态和行动", "棋类游戏" "", "连续", "无限的状态或行动", "自动驾驶" "智能体数", "单智能体", "只有一个智能体", "数独" "", "多智能体", "多个智能体交互", "围棋" 关键概念总结 ============ .. csv-table:: :header: "概念", "解释" :widths: 25, 75 "智能体", "能感知环境并采取行动的系统" "感知", "智能体获取环境信息的过程" "行动", "智能体对环境产生影响的过程" "理性", "选择能最大化预期效用的行动" "效用", "衡量状态或结果好坏的数值" "学习", "从经验中改进行为的能力" 下一步 ====== 在下一个教程中,我们将学习搜索算法,这是智能体解决问题的核心方法。 :doc:`tutorial_03_search_algorithms`