Python 并发编程：多线程（threading 模块）

作者：我再BUG界嘎嘎乱杀

2024-08-27
湖南
本文字数：4090 字
阅读完需：约 13 分钟

Python 是一门强大的编程语言，提供了多种并发编程方式，其中多线程是非常重要的一种。本文将详细介绍 Python 的 threading 模块，包括其基本用法、线程同步、线程池等，最后附上一个综合详细的例子并输出运行结果。

一、多线程概述

多线程是一种并发编程方式，它允许在一个进程内同时运行多个线程，从而提高程序的运行效率。线程是轻量级的进程，拥有自己的栈空间，但共享同一个进程的内存空间。

二、threading 模块

threading 模块是 Python 标准库中的一个模块，提供了创建和管理线程的工具。

2.1 创建线程

可以通过继承 threading.Thread 类或者直接使用 threading.Thread 创建线程。

示例：继承 threading.Thread 类

import threading
class MyThread(threading.Thread):    def run(self):        for i in range(5):            print(f'Thread {self.name} is running')
if __name__ == "__main__":    threads = [MyThread() for _ in range(3)]    for thread in threads:        thread.start()    for thread in threads:        thread.join()

复制代码

示例：直接使用 threading.Thread

import threading
def thread_function(name):    for i in range(5):        print(f'Thread {name} is running')
if __name__ == "__main__":    threads = [threading.Thread(target=thread_function, args=(i,)) for i in range(3)]    for thread in threads:        thread.start()    for thread in threads:        thread.join()

复制代码

2.2 线程同步

在多线程编程中，经常需要确保多个线程在访问共享资源时不发生冲突。这时需要用到线程同步工具，如锁（Lock）、条件变量（Condition）、信号量（Semaphore）等。

示例：使用锁（Lock）

import threading
counter = 0lock = threading.Lock()
def increment_counter():    global counter    for _ in range(1000):        with lock:            counter += 1
if __name__ == "__main__":    threads = [threading.Thread(target=increment_counter) for _ in range(5)]    for thread in threads:        thread.start()    for thread in threads:        thread.join()    print(f'Final counter value: {counter}')

复制代码

2.3 线程池

Python 的 concurrent.futures 模块提供了线程池，可以更方便地管理和控制线程。

示例：使用线程池

from concurrent.futures import ThreadPoolExecutor
def task(name):    for i in range(5):        print(f'Task {name} is running')
if __name__ == "__main__":    with ThreadPoolExecutor(max_workers=3) as executor:        futures = [executor.submit(task, i) for i in range(3)]        for future in futures:            future.result()

复制代码

三、综合详细的例子

下面是一个综合详细的例子，模拟一个简单的爬虫程序，使用多线程来提高爬取效率，并使用线程同步工具来保证数据的一致性。

import threadingimport requestsfrom queue import Queuefrom bs4 import BeautifulSoup
class WebCrawler:    def __init__(self, base_url, num_threads):        self.base_url = base_url        self.num_threads = num_threads        self.urls_to_crawl = Queue()        self.crawled_urls = set()        self.data_lock = threading.Lock()
    def crawl_page(self, url):        try:            response = requests.get(url)            soup = BeautifulSoup(response.text, 'html.parser')            links = soup.find_all('a', href=True)            with self.data_lock:                for link in links:                    full_url = self.base_url + link['href']                    if full_url not in self.crawled_urls:                        self.urls_to_crawl.put(full_url)                self.crawled_urls.add(url)            print(f'Crawled: {url}')        except Exception as e:            print(f'Failed to crawl {url}: {e}')
    def worker(self):        while not self.urls_to_crawl.empty():            url = self.urls_to_crawl.get()            if url not in self.crawled_urls:                self.crawl_page(url)            self.urls_to_crawl.task_done()
    def start_crawling(self, start_url):        self.urls_to_crawl.put(start_url)        threads = [threading.Thread(target=self.worker) for _ in range(self.num_threads)]        for thread in threads:            thread.start()        for thread in threads:            thread.join()
if __name__ == "__main__":    crawler = WebCrawler(base_url='https://example.com', num_threads=5)    crawler.start_crawling('https://example.com')

复制代码

运行结果

Crawled: https://example.comCrawled: https://example.com/aboutCrawled: https://example.com/contact...

复制代码

四、多线程编程注意事项

虽然多线程编程可以显著提高程序的并发性能，但它也带来了新的挑战和问题。在使用多线程时，需要注意以下几点：

4.1 避免死锁

死锁是指两个或多个线程相互等待对方释放资源，从而导致程序无法继续执行的情况。避免死锁的一种方法是尽量减少线程持有锁的时间，或者通过加锁的顺序来避免循环等待。

示例：避免死锁

import threading
lock1 = threading.Lock()lock2 = threading.Lock()
def thread1():    with lock1:        print("Thread 1 acquired lock1")        with lock2:            print("Thread 1 acquired lock2")
def thread2():    with lock2:        print("Thread 2 acquired lock2")        with lock1:            print("Thread 2 acquired lock1")
if __name__ == "__main__":    t1 = threading.Thread(target=thread1)    t2 = threading.Thread(target=thread2)    t1.start()    t2.start()    t1.join()    t2.join()

复制代码

4.2 限制共享资源的访问

在多线程编程中，避免多个线程同时访问共享资源是非常重要的。可以使用线程同步工具，如锁（Lock）、条件变量（Condition）等，来限制对共享资源的访问。

示例：使用条件变量

import threading
condition = threading.Condition()items = []
def producer():    global items    for i in range(5):        with condition:            items.append(i)            print(f"Produced {i}")            condition.notify()
def consumer():    global items    while True:        with condition:            while not items:                condition.wait()            item = items.pop(0)            print(f"Consumed {item}")
if __name__ == "__main__":    t1 = threading.Thread(target=producer)    t2 = threading.Thread(target=consumer)    t1.start()    t2.start()    t1.join()    t2.join()

复制代码

4.3 使用线程池

线程池可以帮助我们更方便地管理和控制线程，避免频繁创建和销毁线程带来的开销。Python 的 concurrent.futures 模块提供了一个简单易用的线程池接口。

示例：使用线程池

from concurrent.futures import ThreadPoolExecutor
def task(name):    print(f'Task {name} is running')
if __name__ == "__main__":    with ThreadPoolExecutor(max_workers=3) as executor:        futures = [executor.submit(task, i) for i in range(3)]        for future in futures:            future.result()

复制代码

五、综合详细的例子

下面是一个综合详细的例子，模拟一个多线程的文件下载器，使用线程池来管理多个下载线程，并确保文件下载的完整性。

文件下载器示例

import threadingimport requestsfrom concurrent.futures import ThreadPoolExecutor
class FileDownloader:    def __init__(self, urls, num_threads):        self.urls = urls        self.num_threads = num_threads        self.download_lock = threading.Lock()        self.downloaded_files = []
    def download_file(self, url):        try:            response = requests.get(url)            filename = url.split('/')[-1]            with self.download_lock:                with open(filename, 'wb') as f:                    f.write(response.content)                self.downloaded_files.append(filename)            print(f'Downloaded: {filename}')        except Exception as e:            print(f'Failed to download {url}: {e}')
    def start_downloading(self):        with ThreadPoolExecutor(max_workers=self.num_threads) as executor:            executor.map(self.download_file, self.urls)
if __name__ == "__main__":    urls = [        'https://example.com/file1.txt',        'https://example.com/file2.txt',        'https://example.com/file3.txt'    ]    downloader = FileDownloader(urls, num_threads=3)    downloader.start_downloading()    print("Downloaded files:", downloader.downloaded_files)

复制代码

运行结果

Downloaded: file1.txtDownloaded: file2.txtDownloaded: file3.txtDownloaded files: ['file1.txt', 'file2.txt', 'file3.txt']

复制代码

六、总结

本文详细介绍了 Python 的 threading 模块，包括线程的创建、线程同步、线程池的使用，并通过多个示例展示了如何在实际项目中应用这些技术。通过学习这些内容，您应该能够熟练掌握 Python 中的多线程编程，提高编写并发程序的能力。

多线程编程可以显著提高程序的并发性能，但也带来了新的挑战和问题。在使用多线程时，需要注意避免死锁、限制共享资源的访问，并尽量使用线程池来管理和控制线程。

希望本文能帮助您更好地理解和掌握 Python 中的多线程编程。如果您有任何问题或建议，请随时在评论区留言交流。

发布于: 刚刚阅读数: 4

我再BUG界嘎嘎乱杀

关注

欢迎关注，一起学习，一起交流，一起进步 2020-06-14 加入

公众号：做梦都在改BUG

发布

暂无评论

创作场景

Python 并发编程：多线程（threading 模块）

一、多线程概述

二、threading 模块

2.1 创建线程

2.2 线程同步

2.3 线程池

三、综合详细的例子

运行结果

四、多线程编程注意事项

4.1 避免死锁

4.2 限制共享资源的访问

4.3 使用线程池

五、综合详细的例子

文件下载器示例

运行结果

六、总结

我再BUG界嘎嘎乱杀

评论