2.1. GIL 深度解析

全局解释器锁（Global Interpreter Lock，GIL）是 CPython 中最受争议的特性之一。理解它对于写出高效的 Python 程序至关重要。

2.1.1. 什么是 GIL

GIL 是 CPython 解释器中的一个互斥锁，它确保同一时刻只有一个线程执行 Python 字节码。

import threading
import time

counter = 0

def increment():
    global counter
    for _ in range(1000000):
        counter += 1  # 即使有 GIL，这也不是线程安全的！

# 创建多个线程
threads = [threading.Thread(target=increment) for _ in range(5)]

for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Expected: 5000000, Got: {counter}")
# 可能输出: Expected: 5000000, Got: 3847291

警告

常见误解：GIL 不保证线程安全！

counter += 1 在字节码层面是多个操作：

读取 counter 值
加 1
写回 counter

GIL 可能在任何操作之间释放。

2.1.2. GIL 的工作原理

        sequenceDiagram
    participant T1 as Thread 1
    participant GIL as GIL
    participant T2 as Thread 2
    
    T1->>GIL: 获取 GIL
    Note over T1: 执行 Python 代码
    T1->>GIL: 释放 GIL (I/O 或超时)
    T2->>GIL: 获取 GIL
    Note over T2: 执行 Python 代码
    T2->>GIL: 释放 GIL
    T1->>GIL: 获取 GIL
    Note over T1: 继续执行

2.1.2.1. GIL 释放时机

# 1. I/O 操作时自动释放
def io_operation():
    with open('file.txt') as f:
        data = f.read()  # GIL 释放
    return data

# 2. 调用 C 扩展时可能释放
import numpy as np
arr = np.array([1, 2, 3])
result = np.sum(arr)  # NumPy 操作中 GIL 释放

# 3. 使用 sleep 时释放
import time
time.sleep(1)  # GIL 释放

# 4. 使用 sys.setswitchinterval 控制切换间隔
import sys
print(sys.getswitchinterval())  # 默认 0.005 秒
# sys.setswitchinterval(0.001)  # 可以调整

2.1.3. GIL 的影响

2.1.3.1. CPU 密集型任务

import threading
import time

def cpu_intensive(n):
    """CPU 密集型任务"""
    total = 0
    for i in range(n):
        total += i * i
    return total

# 单线程
start = time.perf_counter()
for _ in range(4):
    cpu_intensive(5_000_000)
single_thread_time = time.perf_counter() - start
print(f"Single thread: {single_thread_time:.2f}s")

# 多线程
start = time.perf_counter()
threads = [
    threading.Thread(target=cpu_intensive, args=(5_000_000,))
    for _ in range(4)
]
for t in threads:
    t.start()
for t in threads:
    t.join()
multi_thread_time = time.perf_counter() - start
print(f"Multi thread: {multi_thread_time:.2f}s")

# 结果：多线程可能更慢！因为线程切换开销

2.1.3.2. I/O 密集型任务

import threading
import time
import urllib.request

def fetch_url(url):
    """I/O 密集型任务"""
    with urllib.request.urlopen(url) as response:
        return len(response.read())

urls = ['https://www.python.org'] * 10

# 单线程
start = time.perf_counter()
for url in urls:
    fetch_url(url)
single_time = time.perf_counter() - start
print(f"Single thread: {single_time:.2f}s")

# 多线程
start = time.perf_counter()
threads = [threading.Thread(target=fetch_url, args=(url,)) for url in urls]
for t in threads:
    t.start()
for t in threads:
    t.join()
multi_time = time.perf_counter() - start
print(f"Multi thread: {multi_time:.2f}s")

# 结果：多线程显著更快

2.1.4. 绕过 GIL 的方法

2.1.4.1. 1. 多进程

from multiprocessing import Pool
import time

def cpu_intensive(n):
    total = 0
    for i in range(n):
        total += i * i
    return total

if __name__ == '__main__':
    # 使用进程池
    start = time.perf_counter()
    with Pool(4) as pool:
        results = pool.map(cpu_intensive, [5_000_000] * 4)
    print(f"Multi process: {time.perf_counter() - start:.2f}s")
    # 真正的并行，接近 4 倍加速

2.1.4.2. 2. 使用释放 GIL 的库

import numpy as np
import time

# NumPy 在执行计算时释放 GIL
arr = np.random.rand(10000000)

start = time.perf_counter()
result = np.sum(arr ** 2)  # GIL 在这里被释放
print(f"NumPy: {time.perf_counter() - start:.4f}s")

# 纯 Python 对比
lst = list(arr)
start = time.perf_counter()
result = sum(x ** 2 for x in lst)
print(f"Pure Python: {time.perf_counter() - start:.4f}s")

2.1.4.3. 3. Cython 释放 GIL

# cython_example.pyx
from cython.parallel import prange

def parallel_sum(double[:] arr):
    cdef double total = 0
    cdef int i
    cdef int n = arr.shape[0]
    
    # nogil 上下文中释放 GIL
    with nogil:
        for i in prange(n):
            total += arr[i] * arr[i]
    
    return total

2.1.4.4. 4. 使用其他 Python 实现

# PyPy - 没有 GIL（STM 版本）
# Jython - 基于 JVM，没有 GIL
# IronPython - 基于 .NET，没有 GIL

# 注意：这些实现可能不支持某些 C 扩展

2.1.5. 线程安全的编程

2.1.5.1. 使用锁

import threading

counter = 0
lock = threading.Lock()

def safe_increment():
    global counter
    for _ in range(1000000):
        with lock:  # 获取锁
            counter += 1

threads = [threading.Thread(target=safe_increment) for _ in range(5)]
for t in threads:
    t.start()
for t in threads:
    t.join()

print(f"Expected: 5000000, Got: {counter}")
# 现在总是正确：5000000

2.1.5.2. 使用原子操作

import threading
from queue import Queue

# Queue 是线程安全的
task_queue = Queue()

def producer():
    for i in range(100):
        task_queue.put(i)

def consumer():
    while True:
        item = task_queue.get()
        if item is None:
            break
        print(f"Processing {item}")
        task_queue.task_done()

# 使用线程安全的数据结构避免显式锁

2.1.5.3. threading.local

import threading

# 线程本地存储
thread_local = threading.local()

def process_request(request_id):
    thread_local.request_id = request_id
    # 在同一线程的任何地方访问
    do_work()

def do_work():
    # 每个线程有自己的 request_id
    print(f"Working on request {thread_local.request_id}")

2.1.6. 最佳实践

GIL 不是问题的情况

I/O 密集型任务：GIL 在 I/O 时释放
使用 NumPy/Pandas：底层计算释放 GIL
C 扩展计算：可以释放 GIL
单线程应用：无影响

GIL 是问题的情况

纯 Python CPU 密集型：使用多进程
需要真正并行：考虑多进程或其他语言
高性能计算：使用 Cython 或 Numba

关键原则

GIL 不保证线程安全：仍需要同步机制
测量后再优化：不要假设性能瓶颈
选择合适的并发模型：asyncio、threading、multiprocessing 各有适用场景

2.1.7. Python 3.12+ 的变化

Python 正在进行移除 GIL 的工作（PEP 703）：

# 未来可能的 Python 构建选项
# --disable-gil

# 这将允许真正的多线程并行
# 但需要解决引用计数的线程安全问题

备注

截至 Python 3.12，这仍是实验性功能。在生产环境中，仍应按当前的 GIL 行为编程。