torch.cuda.comm.reduce_add# torch.cuda.comm.reduce_add(inputs, destination=None)[source]# Sum tensors from multiple GPUs. All inputs should have matching shapes, dtype, and layout. The output tensor will be of the same shape, dtype, and layout. Parameters inputs (Iterable[Tensor]) – an iterable of tensors to add. destination (int, optional) – a device on which the output will be placed (default: current device). Returns A tensor containing an elementwise sum of all inputs, placed on the destination device.
torch.cuda.comm.reduce_add# torch.cuda.comm.reduce_add(inputs, destination=None)[source]# Sum tensors from multiple GPUs. All inputs should have matching shapes, dtype, and layout. The output tensor will be of the same shape, dtype, and layout. Parameters inputs (Iterable[Tensor]) – an iterable of tensors to add. destination (int, optional) – a device on which the output will be placed (default: current device). Returns A tensor containing an elementwise sum of all inputs, placed on the destination device.