NumPy迭代数组的实现

来源：jb51　　时间：2023/2/20 15:16:30　　对本文有异议

迭代数组

NumPy中引入了 nditer 对象来提供一种对于数组元素的访问方式。

一、单数组迭代

1. 使用 nditer 访问数组的每个元素

>>>a = np.arange(12).reshape(3, 4)
>>>for x in np.nditer(a):
?? ??? ??? ?print(x, end=' ')
0 1 2 3 4 5 6 7 8 9 10 11?
 
# 以上实例不是使用标准 C 或者 Fortran 顺序，选择的顺序是和数组内存布局一致的，
# 这样做是为了提升访问的效率，默认是行序优先（row-major order，或者说是 C-order）。
# 这反映了默认情况下只需访问每个元素，而无需考虑其特定顺序。
# 我们可以通过迭代上述数组的转置来看到这一点，
# 并与以 C 顺序访问数组转置的 copy 方式做对比，如下实例：
>>>for x in np.nditer(a.T):
?? ??? ??? ?print(x, end=' ')
0 1 2 3 4 5 6 7 8 9 10 11?
 
>>>for x in np.nditer(a.T.copy(order='C')):
?? ??? ??? ?print(x, end=' ')
0 4 8 1 5 9 2 6 10 3 7 11?

2. 控制数组元素的迭代顺序

使用参数 order 控制元素的访问顺序，参数的可选值有：

‘C’：C order，即是行序优先；
‘F’：Fortran order，即是列序优先；
’K’：参考数组元素在内存中的顺序；
‘A’：表示’F’顺序；

>>>a = np.arange(12).reshape(3, 4)
>>>for x in np.nditer(a, order='C'):
? ? ?? ?print(x, end=' ')
0 1 2 3 4 5 6 7 8 9 10 11?
 
>>>a = np.arange(12).reshape(3, 4)
>>>for x in np.nditer(a, order='F'):
? ? ?? ?print(x, end=' ')
0 4 8 1 5 9 2 6 10 3 7 11?
 
>>>a = np.arange(12).reshape(3, 4)
>>>for x in np.nditer(a, order='K'):
? ? ?? ?print(x, end=' ')
0 1 2 3 4 5 6 7 8 9 10 11?
 
>>>a = np.arange(12).reshape(3, 4)
>>>for x in np.nditer(a, order='A'):
? ? ?? ?print(x, end=' ')
0 1 2 3 4 5 6 7 8 9 10 11?

3. 修改数组值

在使用 nditer 对象迭代数组时，默认情况下是只读状态。因此，如果需要修改数组，可以使用参数 op_flags = 'readwrite' or 'writeonly' 来标志为读写或只读模式。

此时，nditer 在迭代时将生成可写的缓冲区数组，可以在此进行修改。为了在修改后，可以将修改的数据回写到原始位置，需要在迭代结束后，抛出迭代结束信号，有两种方式：

使用 with 上下文管理器；
在迭代结束后，调用迭代器的close方法；

>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>with np.nditer(a, op_flags=['readwrite']) as it:
        for x in it:
            x += 10
>>>print(a)
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[10 11 12 13]
 [14 15 16 17]
 [18 19 20 21]]

4. 使用外部循环，跟踪索引或多索引

以上操作在迭代过程中，都是逐元素进行的，这虽然简单，但是效率不高。可以使用参数 flags 让 nditer 迭代时提供更大的块。并可以通过强制设定 C 和 F 顺序，得到不同的块大小。

# 默认情况下保持本机的内存顺序，迭代器提供单一的一维数组
# 'external_loop' 给出的值是具有多个值的一维数组，而不是零维数组
>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>for x in np.nditer(a, flags=['external_loop']):
? ? ?? ?print(x, end=' ')
[[ 0 ?1 ?2 ?3]
?[ 4 ?5 ?6 ?7]
?[ 8 ?9 10 11]]
[ 0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11],?
 
# 设定 'F' 顺序
>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>for x in np.nditer(a, flags=['external_loop'], order='F'):
? ? ?? ?print(x, end=' ')
[[ 0 ?1 ?2 ?3]
?[ 4 ?5 ?6 ?7]
?[ 8 ?9 10 11]]
[0 4 8], [1 5 9], [ 2 ?6 10], [ 3 ?7 11],?
 
# 'c_index' 可以通过 it.index 跟踪 'C‘ 顺序的索引
>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>it = np.nditer(a, flags=['c_index'])
>>>for x in it:
? ??? ? ?? ?print("{}: ({})".format(x, it.index))
[[ 0 ?1 ?2 ?3]
?[ 4 ?5 ?6 ?7]
?[ 8 ?9 10 11]]
0: (0)
1: (1)
2: (2)
3: (3)
4: (4)
5: (5)
6: (6)
7: (7)
8: (8)
9: (9)
10: (10)
11: (11)
 
# 'f_index' 可以通过 it.index 跟踪 'F‘ 顺序的索引
>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>it = np.nditer(a, flags=['c_index'])
>>>for x in it:
? ??? ? ?? ?print("{}: ({})".format(x, it.index))
[[ 0 ?1 ?2 ?3]
?[ 4 ?5 ?6 ?7]
?[ 8 ?9 10 11]]
0: (0)
1: (3)
2: (6)
3: (9)
4: (1)
5: (4)
6: (7)
7: (10)
8: (2)
9: (5)
10: (8)
11: (11)
 
# 'multi_index' 可以通过 it.multi_index 跟踪数组索引
>>>a = np.arange(12).reshape(3, 4)
>>>print(a)
>>>it = np.nditer(a, flags=['multi_index'])
>>>for x in it:
? ? ?? ?print("{}: {}".format(x, it.multi_index))
[[ 0 ?1 ?2 ?3]
?[ 4 ?5 ?6 ?7]
?[ 8 ?9 10 11]]
0: (0, 0)
1: (0, 1)
2: (0, 2)
3: (0, 3)
4: (1, 0)
5: (1, 1)
6: (1, 2)
7: (1, 3)
8: (2, 0)
9: (2, 1)
10: (2, 2)
11: (2, 3)

external_loop 与 multi_index、c_index、c_index不可同时使用，否则将引发错误 ValueError: Iterator flag EXTERNAL_LOOP cannot be used if an index or multi-index is being tracked

5. 以特定数据类型迭代

当需要以其它的数据类型来迭代数组时，有两种方法：

临时副本：迭代时，会使用新的数据类型创建数组的副本，然后在副本中完成迭代。但是，这种方法会消耗大量的内存空间。
缓冲模式：使用缓冲来支持灵活输入，内存开销最小。

# 临时副本
>>>a = np.arange(12).reshape(3, 4)
>>>print(a.dtype)
>>>it = np.nditer(a, op_flags=['readonly', 'copy'],op_dtypes=[np.float64])
>>>for x in it:
? ? ?? ?print("{}".format(x), end=', ')
int32
0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,
 
# 缓冲模式
 
>>>a = np.arange(12).reshape(3, 4)
>>>print(a.dtype)
>>>it = np.nditer(a, flags=['buffered'],op_dtypes=[np.float64])
>>>for x in it:
? ? ?? ?print("{}".format(x), end=', ')
int32
0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0,?

注意
默认情况下，转化会执行“安全”机制，如果不符合 NumPy 的转换规则，会引发异常：TypeError: Iterator operand 0 dtype could not be cast from dtype('float64') to dtype('float32') according to the rule 'safe'

二、广播数组迭代

如果不同形状的数组是可广播的，那么 dtype 可以迭代多个数组。

>>> a = np.arange(3)
>>> b = np.arange(6).reshape(2,3)
>>> for x, y in np.nditer([a,b]):
        print("%d:%d" % (x,y), end=' ')
0:0 1:1 2:2 0:3 1:4 2:5