写点什么

Python 并发编程:多线程(threading 模块)

  • 2024-08-27
    湖南
  • 本文字数:4090 字

    阅读完需:约 13 分钟

Python 是一门强大的编程语言,提供了多种并发编程方式,其中多线程是非常重要的一种。本文将详细介绍 Python 的 threading 模块,包括其基本用法、线程同步、线程池等,最后附上一个综合详细的例子并输出运行结果。

一、多线程概述

多线程是一种并发编程方式,它允许在一个进程内同时运行多个线程,从而提高程序的运行效率。线程是轻量级的进程,拥有自己的栈空间,但共享同一个进程的内存空间。

二、threading 模块

threading 模块是 Python 标准库中的一个模块,提供了创建和管理线程的工具。

2.1 创建线程

可以通过继承 threading.Thread 类或者直接使用 threading.Thread 创建线程。


示例:继承 threading.Thread 类

import threading
class MyThread(threading.Thread): def run(self): for i in range(5): print(f'Thread {self.name} is running')
if __name__ == "__main__": threads = [MyThread() for _ in range(3)] for thread in threads: thread.start() for thread in threads: thread.join()
复制代码

示例:直接使用 threading.Thread

import threading
def thread_function(name): for i in range(5): print(f'Thread {name} is running')
if __name__ == "__main__": threads = [threading.Thread(target=thread_function, args=(i,)) for i in range(3)] for thread in threads: thread.start() for thread in threads: thread.join()
复制代码

2.2 线程同步

在多线程编程中,经常需要确保多个线程在访问共享资源时不发生冲突。这时需要用到线程同步工具,如锁(Lock)、条件变量(Condition)、信号量(Semaphore)等。


示例:使用锁(Lock)

import threading
counter = 0lock = threading.Lock()
def increment_counter(): global counter for _ in range(1000): with lock: counter += 1
if __name__ == "__main__": threads = [threading.Thread(target=increment_counter) for _ in range(5)] for thread in threads: thread.start() for thread in threads: thread.join() print(f'Final counter value: {counter}')
复制代码

2.3 线程池

Python 的 concurrent.futures 模块提供了线程池,可以更方便地管理和控制线程。


示例:使用线程池

from concurrent.futures import ThreadPoolExecutor
def task(name): for i in range(5): print(f'Task {name} is running')
if __name__ == "__main__": with ThreadPoolExecutor(max_workers=3) as executor: futures = [executor.submit(task, i) for i in range(3)] for future in futures: future.result()
复制代码

三、综合详细的例子

下面是一个综合详细的例子,模拟一个简单的爬虫程序,使用多线程来提高爬取效率,并使用线程同步工具来保证数据的一致性。

import threadingimport requestsfrom queue import Queuefrom bs4 import BeautifulSoup
class WebCrawler: def __init__(self, base_url, num_threads): self.base_url = base_url self.num_threads = num_threads self.urls_to_crawl = Queue() self.crawled_urls = set() self.data_lock = threading.Lock()
def crawl_page(self, url): try: response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') links = soup.find_all('a', href=True) with self.data_lock: for link in links: full_url = self.base_url + link['href'] if full_url not in self.crawled_urls: self.urls_to_crawl.put(full_url) self.crawled_urls.add(url) print(f'Crawled: {url}') except Exception as e: print(f'Failed to crawl {url}: {e}')
def worker(self): while not self.urls_to_crawl.empty(): url = self.urls_to_crawl.get() if url not in self.crawled_urls: self.crawl_page(url) self.urls_to_crawl.task_done()
def start_crawling(self, start_url): self.urls_to_crawl.put(start_url) threads = [threading.Thread(target=self.worker) for _ in range(self.num_threads)] for thread in threads: thread.start() for thread in threads: thread.join()
if __name__ == "__main__": crawler = WebCrawler(base_url='https://example.com', num_threads=5) crawler.start_crawling('https://example.com')
复制代码

运行结果

Crawled: https://example.comCrawled: https://example.com/aboutCrawled: https://example.com/contact...
复制代码

四、多线程编程注意事项

虽然多线程编程可以显著提高程序的并发性能,但它也带来了新的挑战和问题。在使用多线程时,需要注意以下几点:

4.1 避免死锁

死锁是指两个或多个线程相互等待对方释放资源,从而导致程序无法继续执行的情况。避免死锁的一种方法是尽量减少线程持有锁的时间,或者通过加锁的顺序来避免循环等待。


示例:避免死锁

import threading
lock1 = threading.Lock()lock2 = threading.Lock()
def thread1(): with lock1: print("Thread 1 acquired lock1") with lock2: print("Thread 1 acquired lock2")
def thread2(): with lock2: print("Thread 2 acquired lock2") with lock1: print("Thread 2 acquired lock1")
if __name__ == "__main__": t1 = threading.Thread(target=thread1) t2 = threading.Thread(target=thread2) t1.start() t2.start() t1.join() t2.join()
复制代码

4.2 限制共享资源的访问

在多线程编程中,避免多个线程同时访问共享资源是非常重要的。可以使用线程同步工具,如锁(Lock)、条件变量(Condition)等,来限制对共享资源的访问。


示例:使用条件变量

import threading
condition = threading.Condition()items = []
def producer(): global items for i in range(5): with condition: items.append(i) print(f"Produced {i}") condition.notify()
def consumer(): global items while True: with condition: while not items: condition.wait() item = items.pop(0) print(f"Consumed {item}")
if __name__ == "__main__": t1 = threading.Thread(target=producer) t2 = threading.Thread(target=consumer) t1.start() t2.start() t1.join() t2.join()
复制代码

4.3 使用线程池

线程池可以帮助我们更方便地管理和控制线程,避免频繁创建和销毁线程带来的开销。Python 的 concurrent.futures 模块提供了一个简单易用的线程池接口。


示例:使用线程池

from concurrent.futures import ThreadPoolExecutor
def task(name): print(f'Task {name} is running')
if __name__ == "__main__": with ThreadPoolExecutor(max_workers=3) as executor: futures = [executor.submit(task, i) for i in range(3)] for future in futures: future.result()
复制代码

五、综合详细的例子

下面是一个综合详细的例子,模拟一个多线程的文件下载器,使用线程池来管理多个下载线程,并确保文件下载的完整性。

文件下载器示例

import threadingimport requestsfrom concurrent.futures import ThreadPoolExecutor
class FileDownloader: def __init__(self, urls, num_threads): self.urls = urls self.num_threads = num_threads self.download_lock = threading.Lock() self.downloaded_files = []
def download_file(self, url): try: response = requests.get(url) filename = url.split('/')[-1] with self.download_lock: with open(filename, 'wb') as f: f.write(response.content) self.downloaded_files.append(filename) print(f'Downloaded: {filename}') except Exception as e: print(f'Failed to download {url}: {e}')
def start_downloading(self): with ThreadPoolExecutor(max_workers=self.num_threads) as executor: executor.map(self.download_file, self.urls)
if __name__ == "__main__": urls = [ 'https://example.com/file1.txt', 'https://example.com/file2.txt', 'https://example.com/file3.txt' ] downloader = FileDownloader(urls, num_threads=3) downloader.start_downloading() print("Downloaded files:", downloader.downloaded_files)
复制代码

运行结果

Downloaded: file1.txtDownloaded: file2.txtDownloaded: file3.txtDownloaded files: ['file1.txt', 'file2.txt', 'file3.txt']
复制代码

六、总结

本文详细介绍了 Python 的 threading 模块,包括线程的创建、线程同步、线程池的使用,并通过多个示例展示了如何在实际项目中应用这些技术。通过学习这些内容,您应该能够熟练掌握 Python 中的多线程编程,提高编写并发程序的能力。


多线程编程可以显著提高程序的并发性能,但也带来了新的挑战和问题。在使用多线程时,需要注意避免死锁、限制共享资源的访问,并尽量使用线程池来管理和控制线程。


希望本文能帮助您更好地理解和掌握 Python 中的多线程编程。如果您有任何问题或建议,请随时在评论区留言交流。

用户头像

欢迎关注,一起学习,一起交流,一起进步 2020-06-14 加入

公众号:做梦都在改BUG

评论

发布
暂无评论
Python并发编程:多线程(threading模块)_Python_我再BUG界嘎嘎乱杀_InfoQ写作社区