python利用threading处理 list数据

时间：2019-07-18 22:19:48 阅读：235 评论：0 收藏：0 [点我收藏+]

需求：在从银行数据库中取出几十万数据时，需要对每行数据进行相关操作，通过pandas的dataframe发现数据处理过慢，于是对数据进行分段后通过线程进行处理；

如下给出测试版代码，通过 list 分段模拟 pandas 的 dataframe ；

 1 # -*- coding: utf-8 -*-
 2 # (C) Guangcai Ren <renguangcai@jiaaocap.com>
 3 # All rights reserved
 4 # create time ‘2019/6/26 14:41‘
 5 import math
 6 import random
 7 import time
 8 from threading import Thread
 9 
10 _result_list = []
11 
12 
13 def split_df():
14     # 线程列表
15     thread_list = []
16     # 需要处理的数据
17     _l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
18     # 每个线程处理的数据大小
19     split_count = 2
20     # 需要的线程个数
21     times = math.ceil(len(_l) / split_count)
22     count = 0
23     for item in range(times):
24         _list = _l[count: count + split_count]
25         # 线程相关处理
26         thread = Thread(target=work, args=(item, _list,))
27         thread_list.append(thread)
28         # 在子线程中运行任务
29         thread.start()
30         count += split_count
31 
32     # 线程同步，等待子线程结束任务，主线程再结束
33     for _item in thread_list:
34         _item.join()
35 
36 
37 def work(df, _list):
38     """ 线程执行的任务，让程序随机sleep几秒
39 
40     :param df:
41     :param _list:
42     :return:
43     """
44     sleep_time = random.randint(1, 5)
45     print(f‘count is {df},sleep {sleep_time},list is {_list}‘)
46     time.sleep(sleep_time)
47     _result_list.append(df)
48 
49 
50 def use():
51     split_df()
52 
53 
54 if __name__ == ‘__main__‘:
55     y = use()
56     print(len(_result_list), _result_list)

响应结果如下：

技术分享图片

注意点：

脚本中的 _result_list 在项目中要放在函数中，不能直接放在路由类中，否则会造成多次请求数据污染；

定义线程任务时 thread = Thread(target=work, args=(item, _list,)) 代码中的 work函数和参数要分开，否则多线程无效

注意线程数不能过多

python利用threading处理 list数据

原文：https://www.cnblogs.com/rgcLOVEyaya/p/RGC_LOVE_YAYA_1103_3days.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)