python中数据分割小技巧

2024-03-20 11:07:00 by wst

python小技巧

在数据处理过程中，经常需要对大数据集进行分割，以便放到多个进程中运行。

前两天发现一个数据分割小技巧，这里分享给大家，稍后做详细说明。

# 样本
samples = list(range(10820))
# 分片数
num_shards = 8
# 根据分片数分割样本
shards = [samples[i::num_shards] for i in range(num_shards)]
for idx,shard in enumerate(shards):
    print(f"shard[{idx}] count:{len(shard)}")

详解：

1.samples 为读取的样本，现实场景可能是100w张图片的地址。

2.num_shards 为分片数，即划分为多少份。

3.samples[i::num_shards] 为列表分割，num_shards为步长。

问：列表为什么要这样分割呢？有懂的朋友吗？

python中数据分割小技巧

Comments(0) Add Your Comment

Not Comment!

Search

Category