提问



我有一个任意长度的列表,我需要将它分成相同大小的块并对其进行操作。有一些明显的方法可以做到这一点,比如保留一个计数器和两个列表,当第二个列表填满时,将它添加到第一个列表并清空下一轮数据的第二个列表,但这可能非常昂贵。


我想知道是否有人对任何长度的列表都有一个很好的解决方案,例如使用发电机。


我在itertools中寻找有用的东西,但我找不到任何明显有用的东西。但是可能会错过它。


相关问题:在块中迭代列表的最pythonic方法是什么?

最佳参考


这是一个产生你想要的块的生成器:


def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]





import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[**10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74**]





如果您使用的是Python 2,则应使用xrange()而不是range():


def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in xrange(0, len(l), n):
        yield l[i:i + n]





您也可以简单地使用列表理解而不是编写函数。 Python 3:


[l[i:i + n] for i in range(0, len(l), n)]


Python 2版本:


[l[i:i + n] for i in xrange(0, len(l), n)]

其它参考1


如果你想要一些超级简单的事:


def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in xrange(0, len(l), n))

其它参考2


直接来自(旧)Python文档(itertools的配方):


from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)


目前的版本,由J.F.Sebastian建议:


#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)


我猜Guido的时间机器工作 - 工作 - 将工作 - 将工作 - 再次工作。


这些解决方案有效,因为[iter(iterable)]*n(或早期版本中的等价物)创建一个迭代器,在列表中重复n次。 izip_longest然后有效地执行每个迭代器的循环;因为这是相同的迭代器,所以每个这样的调用都会使它前进,导致每个这样的zip-roundrobin生成一个n项的元组。

其它参考3


我知道这有点旧,但我不知道为什么没有人提到numpy.array_split:


lst = range(50)
In [26]: np.array_split(lst,5)
Out[26]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
 array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
 array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
 array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

其它参考4


这是一个适用于任意迭代的生成器:


def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))


例:


>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[**0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74**]

其它参考5


我很惊讶没有人想过使用iter的两个论证形式:[88]


from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())


演示:


>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]


这适用于任何可迭代的并且懒惰地产生输出。它返回元组而不是迭代器,但我认为它有一定的优雅。它也没有填充;如果你想要填充,上面的一个简单的变化就足够了:


from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)


演示:


>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]


与基于izip_longest的解决方案一样,上面的总是。据我所知,没有一行或两行的itertools配方可以选择可选垫。通过结合上述两种方法,这个方法非常接近:


_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)


演示:


>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]


我相信这是提供可选填充的最短时间段。

其它参考6


def chunk(input, size):
    return map(None, *([iter(input)] * size))

其它参考7


简约而优雅


l = range(1, 1000)
print [l[x:x+10] for x in xrange(0, len(l), 10)]


或者如果您愿意:


chunks = lambda l, n: [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)

其它参考8


我在这个问题的副本中看到了最棒的Python-ish答案:


from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]


你可以为任何n创建n元组。如果a = range(1, 15),那么结果将是:


[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]


如果列表均匀分配,则可以用zip替换zip_longest,否则三元组(13, 14, None)将丢失。上面使用了Python 3。对于Python 2,使用izip_longest

其它参考9


批评其他答案:



这些答案中没有一个是大小均匀的块,它们最后都留下了一块粗糙的块,所以它们不完全平衡。如果你使用这些功能来分配工作,你就已经内置了一个可能完成工作的前景在其他人之前,所以当其他人继续努力工作时,它会无所事事。


例如,当前的最佳答案以:


[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74**]


我最后讨厌那个小矮人!


其他人,如list(grouper(3, xrange(7)))chunk(xrange(7), 3)都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]None只是填充,在我看来相当不优雅。它们不是均匀地分块迭代。


为什么我们不能更好地划分这些?


我的解决方案



这是一个平衡的解决方案,改编自我在生产中使用的函数(在Python 3中注意用range替换xrange):


def baskets_from(items, maxbaskets=25):
    baskets = [**] for _ in xrange(maxbaskets)] # in Python 3 use range
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 


我创建了一个生成器,如果你把它放到一个列表中,它会做同样的事情:


def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in xrange(baskets):
        yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]


最后,因为我看到所有上述函数都以连续的顺序返回元素(如给出的那样):


def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in xrange(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in xrange(length)]


输出



测试它们:


print(baskets_from(xrange(6), 8))
print(list(iter_baskets_from(xrange(6), 8)))
print(list(iter_baskets_contiguous(xrange(6), 8)))
print(baskets_from(xrange(22), 8))
print(list(iter_baskets_from(xrange(22), 8)))
print(list(iter_baskets_contiguous(xrange(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(xrange(26), 5))
print(list(iter_baskets_from(xrange(26), 5)))
print(list(iter_baskets_contiguous(xrange(26), 5)))


打印出来:


[**0], [1], [2], [3], [4], [5**]
[**0], [1], [2], [3], [4], [5**]
[**0], [1], [2], [3], [4], [5**]
[**0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15**]
[**0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15**]
[**0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21**]
[**'A', 'D', 'G'], ['B', 'E'], ['C', 'F'**]
[**'A', 'D', 'G'], ['B', 'E'], ['C', 'F'**]
[**'A', 'B', 'C'], ['D', 'E'], ['F', 'G'**]
[**0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24**]
[**0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24**]
[**0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25**]


请注意,连续生成器以与其他两个相同的长度模式提供块,但是这些项都是有序的,并且它们被均匀地划分为可以划分离散元素的列表。

其它参考10


more-itertools有一个块迭代器。[90]


它还有很多东西,包括itertools文档中的所有配方。

其它参考11


例如,如果您的块大小为3,则可以执行以下操作:


zip(*[iterable[i::3] for i in range(3)]) 


资源:
http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/[91]


当我的块大小是我可以输入的固定数字时,我会使用它,例如3,永远不会改变。

其它参考12


如果您知道列表大小:


def SplitList(list, chunk_size):
    return [list[offs:offs+chunk_size] for offs in range(0, len(list), chunk_size)]


如果你不是(迭代器):


def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion


在后一种情况下,如果你可以确定序列总是包含给定大小的整数个块(即没有不完整的最后一个块),它可以以更漂亮的方式重新表述。

其它参考13


生成器表达式:


def chunks(seq, n):
    return (seq[i:i+n] for i in xrange(0, len(seq), n))


例如。


print list(chunks(range(1, 1000), 10))

其它参考14


我喜欢tzot和J.F.Sebastian提出的Python doc版本,
 但它有两个缺点:



  • 它不是很明确

  • 我通常不想在最后一个块中填充值



我在我的代码中经常使用这个:


from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()


更新:懒人块版本:


from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

其它参考15


toolz库具有partition函数:[92]


from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]

其它参考16


在这一点上,我认为我们需要一个递归生成器,以防万一......


在python 2中:


def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e


在python 3中:


def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)


此外,在大量外星人入侵的情况下,装饰的递归生成器可能会变得方便:


def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

其它参考17


您也可以使用utilspie库的get_chunks功能:[93] [94]


>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[**1, 2, 3, 4, 5], [6, 7, 8, 9**]


你可以通过pip安装utilspie:[95]


sudo pip install utilspie


免责声明:我是utilspie库的创建者。[96]

其它参考18


[AA[i:i+SS] for i in range(len(AA))[::SS**]


AA是数组,SS是块大小。例如:


>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS**]
[**10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20**]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

其它参考19


我很好奇不同方法的表现,这里是:


在Python 3.5.1上测试


import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)


结果:


slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

其它参考20


码:


def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)


结果:


[**1, 2, 3], [4, 5, 6], [7, 8, 9], [10**]

其它参考21


def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop


用法:


seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq

其它参考22


嘿,一行版


In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[**1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99**]

其它参考23


另一个更明确的版本。


def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [**3, 4, 9], [7, 1, 1], [2, 3**]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList

其它参考24


考虑使用matplotlib.cbook作品[97]


例如:


import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

其它参考25


a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
CHUNK = 4
[a[i*CHUNK:(i+1)*CHUNK] for i in xrange((len(a) + CHUNK - 1) / CHUNK )]

其它参考26


此时,我认为我们需要强制性的匿名递归功能。


Y = lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args)))
chunks = Y(lambda f: lambda n: [n[0][:n[1**]] + f((n[0][n[1]:], n[1])) if len(n[0]) > 0 else [])

其它参考27


不调用len(),这对大型列表有用:


def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]


这是针对迭代的:


def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))


以上的功能味道:


def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))


要么:


def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())


要么:


def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

其它参考28


def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[**0, 1, 4], [9, 16, 25], [36, 49, 64], [81**]'

其它参考29


我意识到这个问题已经过时了(在Google上偶然发现它),但是肯定会像以下一样比任何大型复杂的建议都简单明了,只使用切片:


def chunker(iterable, chunksize):
    for i,c in enumerate(iterable[::chunksize]):
        yield iterable[i*chunksize:(i+1)*chunksize]

>>> for chunk in chunker(range(0,100), 10):
...     print list(chunk)
... 
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
... etc ...