PyTorch documentation¶
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
torch¶
The torch package contains data structures for multi-dimensional tensors and mathematical operations over these are defined. Additionally, it provides many utilities for efficient serializing of Tensors and arbitrary types, and other useful utilities.
It has a CUDA counterpart, that enables you to run your tensor computations on an NVIDIA GPU with compute capability >= 3.0.
Tensors¶
- 
torch.is_tensor(obj)¶
- Returns True if obj is a PyTorch tensor. - Parameters
- obj (Object) – Object to test 
 
- 
torch.is_storage(obj)¶
- Returns True if obj is a PyTorch storage object. - Parameters
- obj (Object) – Object to test 
 
- 
torch.is_floating_point(tensor) -> (bool)¶
- Returns True if the data type of - tensoris a floating point data type i.e., one of- torch.float64,- torch.float32and- torch.float16.- Parameters
- tensor (Tensor) – the PyTorch tensor to test 
 
- 
torch.set_default_dtype(d)¶
- Sets the default floating point dtype to - d. This type will be used as default floating point type for type inference in- torch.tensor().- The default floating point dtype is initially - torch.float32.- Parameters
- d ( - torch.dtype) – the floating point dtype to make the default
 - Example: - >>> torch.tensor([1.2, 3]).dtype # initial default for floating point is torch.float32 torch.float32 >>> torch.set_default_dtype(torch.float64) >>> torch.tensor([1.2, 3]).dtype # a new floating point tensor torch.float64 
- 
torch.get_default_dtype() → torch.dtype¶
- Get the current default floating point - torch.dtype.- Example: - >>> torch.get_default_dtype() # initial default for floating point is torch.float32 torch.float32 >>> torch.set_default_dtype(torch.float64) >>> torch.get_default_dtype() # default is now changed to torch.float64 torch.float64 >>> torch.set_default_tensor_type(torch.FloatTensor) # setting tensor type also affects this >>> torch.get_default_dtype() # changed to torch.float32, the dtype for torch.FloatTensor torch.float32 
- 
torch.set_default_tensor_type(t)¶
- Sets the default - torch.Tensortype to floating point tensor type- t. This type will also be used as default floating point type for type inference in- torch.tensor().- The default floating point tensor type is initially - torch.FloatTensor.- Parameters
- t (type or string) – the floating point tensor type or its name 
 - Example: - >>> torch.tensor([1.2, 3]).dtype # initial default for floating point is torch.float32 torch.float32 >>> torch.set_default_tensor_type(torch.DoubleTensor) >>> torch.tensor([1.2, 3]).dtype # a new floating point tensor torch.float64 
- 
torch.numel(input) → int¶
- Returns the total number of elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> a = torch.randn(1, 2, 3, 4, 5) >>> torch.numel(a) 120 >>> a = torch.zeros(4,4) >>> torch.numel(a) 16 
- 
torch.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, profile=None, sci_mode=None)¶
- Set options for printing. Items shamelessly taken from NumPy - Parameters
- precision – Number of digits of precision for floating point output (default = 4). 
- threshold – Total number of array elements which trigger summarization rather than full repr (default = 1000). 
- edgeitems – Number of array items in summary at beginning and end of each dimension (default = 3). 
- linewidth – The number of characters per line for the purpose of inserting line breaks (default = 80). Thresholded matrices will ignore this parameter. 
- profile – Sane defaults for pretty printing. Can override with any of the above options. (any one of default, short, full) 
- sci_mode – Enable (True) or disable (False) scientific notation. If None (default) is specified, the value is defined by _Formatter 
 
 
- 
torch.set_flush_denormal(mode) → bool¶
- Disables denormal floating numbers on CPU. - Returns - Trueif your system supports flushing denormal numbers and it successfully configures flush denormal mode.- set_flush_denormal()is only supported on x86 architectures supporting SSE3.- Parameters
- mode (bool) – Controls whether to enable flush denormal mode or not 
 - Example: - >>> torch.set_flush_denormal(True) True >>> torch.tensor([1e-323], dtype=torch.float64) tensor([ 0.], dtype=torch.float64) >>> torch.set_flush_denormal(False) True >>> torch.tensor([1e-323], dtype=torch.float64) tensor(9.88131e-324 * [ 1.0000], dtype=torch.float64) 
Creation Ops¶
Note
Random sampling creation ops are listed under Random sampling and
include:
torch.rand()
torch.rand_like()
torch.randn()
torch.randn_like()
torch.randint()
torch.randint_like()
torch.randperm()
You may also use torch.empty() with the In-place random sampling
methods to create torch.Tensor s with values sampled from a broader
range of distributions.
- 
torch.tensor(data, dtype=None, device=None, requires_grad=False) → Tensor¶
- Constructs a tensor with - data.- Warning - torch.tensor()always copies- data. If you have a Tensor- dataand want to avoid a copy, use- torch.Tensor.requires_grad_()or- torch.Tensor.detach(). If you have a NumPy- ndarrayand want to avoid a copy, use- torch.as_tensor().- Warning - When data is a tensor x, - torch.tensor()reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore- torch.tensor(x)is equivalent to- x.clone().detach()and- torch.tensor(x, requires_grad=True)is equivalent to- x.clone().detach().requires_grad_(True). The equivalents using- clone()and- detach()are recommended.- Parameters
- data (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy - ndarray, scalar, and other types.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, infers data type from- data.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]]) tensor([[ 0.1000, 1.2000], [ 2.2000, 3.1000], [ 4.9000, 5.2000]]) >>> torch.tensor([0, 1]) # Type inference on data tensor([ 0, 1]) >>> torch.tensor([[0.11111, 0.222222, 0.3333333]], dtype=torch.float64, device=torch.device('cuda:0')) # creates a torch.cuda.DoubleTensor tensor([[ 0.1111, 0.2222, 0.3333]], dtype=torch.float64, device='cuda:0') >>> torch.tensor(3.14159) # Create a scalar (zero-dimensional tensor) tensor(3.1416) >>> torch.tensor([]) # Create an empty tensor (of size (0,)) tensor([]) 
- 
torch.sparse_coo_tensor(indices, values, size=None, dtype=None, device=None, requires_grad=False) → Tensor¶
- Constructs a sparse tensors in COO(rdinate) format with non-zero elements at the given - indiceswith the given- values. A sparse tensor can be uncoalesced, in that case, there are duplicate coordinates in the indices, and the value at that index is the sum of all duplicate value entries: torch.sparse.- Parameters
- indices (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy - ndarray, scalar, and other types. Will be cast to a- torch.LongTensorinternally. The indices are the coordinates of the non-zero values in the matrix, and thus should be two-dimensional where the first dimension is the number of tensor dimensions and the second dimension is the number of non-zero values.
- values (array_like) – Initial values for the tensor. Can be a list, tuple, NumPy - ndarray, scalar, and other types.
- size (list, tuple, or - torch.Size, optional) – Size of the sparse tensor. If not provided the size will be inferred as the minimum size big enough to hold all non-zero elements.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if None, infers data type from- values.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> i = torch.tensor([[0, 1, 1], [2, 0, 2]]) >>> v = torch.tensor([3, 4, 5], dtype=torch.float32) >>> torch.sparse_coo_tensor(i, v, [2, 4]) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 4), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v) # Shape inference tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 3), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v, [2, 4], dtype=torch.float64, device=torch.device('cuda:0')) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), device='cuda:0', size=(2, 4), nnz=3, dtype=torch.float64, layout=torch.sparse_coo) # Create an empty sparse tensor with the following invariants: # 1. sparse_dim + dense_dim = len(SparseTensor.shape) # 2. SparseTensor._indices().shape = (sparse_dim, nnz) # 3. SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:]) # # For instance, to create an empty sparse tensor with nnz = 0, dense_dim = 0 and # sparse_dim = 1 (hence indices is a 2D tensor of shape = (1, 0)) >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), [], [1]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0,)), size=(1,), nnz=0, layout=torch.sparse_coo) # and to create an empty sparse tensor with nnz = 0, dense_dim = 1 and # sparse_dim = 1 >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), torch.empty([0, 2]), [1, 2]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0, 2)), size=(1, 2), nnz=0, layout=torch.sparse_coo) 
- 
torch.as_tensor(data, dtype=None, device=None) → Tensor¶
- Convert the data into a torch.Tensor. If the data is already a Tensor with the same dtype and device, no copy will be performed, otherwise a new Tensor will be returned with computational graph retained if data Tensor has - requires_grad=True. Similarly, if the data is an- ndarrayof the corresponding dtype and the device is the cpu, no copy will be performed.- Parameters
- data (array_like) – Initial data for the tensor. Can be a list, tuple, NumPy - ndarray, scalar, and other types.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, infers data type from- data.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
 
 - Example: - >>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([-1, 2, 3]) >>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a, device=torch.device('cuda')) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([1, 2, 3]) 
- 
torch.from_numpy(ndarray) → Tensor¶
- Creates a - Tensorfrom a- numpy.ndarray.- The returned tensor and - ndarrayshare the same memory. Modifications to the tensor will be reflected in the- ndarrayand vice versa. The returned tensor is not resizable.- Example: - >>> a = numpy.array([1, 2, 3]) >>> t = torch.from_numpy(a) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([-1, 2, 3]) 
- 
torch.zeros(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument - sizes.- Parameters
- sizes (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.zeros(2, 3) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> torch.zeros(5) tensor([ 0., 0., 0., 0., 0.]) 
- 
torch.zeros_like(input, dtype=None, layout=None, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with the scalar value 0, with the same size as - input.- torch.zeros_like(input)is equivalent to- torch.zeros(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).- Warning - As of 0.4, this function does not support an - outkeyword. As an alternative, the old- torch.zeros_like(input, out=output)is equivalent to- torch.zeros(input.size(), out=output).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> input = torch.empty(2, 3) >>> torch.zeros_like(input) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) 
- 
torch.ones(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument - sizes.- Parameters
- sizes (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.ones(2, 3) tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> torch.ones(5) tensor([ 1., 1., 1., 1., 1.]) 
- 
torch.ones_like(input, dtype=None, layout=None, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with the scalar value 1, with the same size as - input.- torch.ones_like(input)is equivalent to- torch.ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).- Warning - As of 0.4, this function does not support an - outkeyword. As an alternative, the old- torch.ones_like(input, out=output)is equivalent to- torch.ones(input.size(), out=output).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> input = torch.empty(2, 3) >>> torch.ones_like(input) tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) 
- 
torch.arange(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a 1-D tensor of size \(\left\lfloor \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor\) with values from the interval - [start, end)taken with common difference- stepbeginning from start.- Note that non-integer - stepis subject to floating point rounding errors when comparing against- end; to avoid inconsistency, we advise adding a small epsilon to- endin such cases.\[\text{out}_{{i+1}} = \text{out}_{i} + \text{step} \]- Parameters
- start (Number) – the starting value for the set of points. Default: - 0.
- end (Number) – the ending value for the set of points 
- step (Number) – the gap between each pair of adjacent points. Default: - 1.
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()). If dtype is not given, infer the data type from the other input arguments. If any of start, end, or stop are floating-point, the dtype is inferred to be the default dtype, see- get_default_dtype(). Otherwise, the dtype is inferred to be torch.int64.
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.arange(5) tensor([ 0, 1, 2, 3, 4]) >>> torch.arange(1, 4) tensor([ 1, 2, 3]) >>> torch.arange(1, 2.5, 0.5) tensor([ 1.0000, 1.5000, 2.0000]) 
- 
torch.range(start=0, end, step=1, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a 1-D tensor of size \(\left\lfloor \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1\) with values from - startto- endwith step- step. Step is the gap between two values in the tensor.\[\text{out}_{i+1} = \text{out}_i + \text{step}. \]- Warning - This function is deprecated in favor of - torch.arange().- Parameters
- start (float) – the starting value for the set of points. Default: - 0.
- end (float) – the ending value for the set of points 
- step (float) – the gap between each pair of adjacent points. Default: - 1.
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.range(1, 4) tensor([ 1., 2., 3., 4.]) >>> torch.range(1, 4, 0.5) tensor([ 1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000]) 
- 
torch.linspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a one-dimensional tensor of - stepsequally spaced points between- startand- end.- The output tensor is 1-D of size - steps.- Parameters
- start (float) – the starting value for the set of points 
- end (float) – the ending value for the set of points 
- steps (int) – number of points to sample between - startand- end. Default:- 100.
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.linspace(3, 10, steps=5) tensor([ 3.0000, 4.7500, 6.5000, 8.2500, 10.0000]) >>> torch.linspace(-10, 10, steps=5) tensor([-10., -5., 0., 5., 10.]) >>> torch.linspace(start=-10, end=10, steps=5) tensor([-10., -5., 0., 5., 10.]) >>> torch.linspace(start=-10, end=10, steps=1) tensor([-10.]) 
- 
torch.logspace(start, end, steps=100, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a one-dimensional tensor of - stepspoints logarithmically spaced between \(10^{\text{start}}\) and \(10^{\text{end}}\).- The output tensor is 1-D of size - steps.- Parameters
- start (float) – the starting value for the set of points 
- end (float) – the ending value for the set of points 
- steps (int) – number of points to sample between - startand- end. Default:- 100.
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.logspace(start=-10, end=10, steps=5) tensor([ 1.0000e-10, 1.0000e-05, 1.0000e+00, 1.0000e+05, 1.0000e+10]) >>> torch.logspace(start=0.1, end=1.0, steps=5) tensor([ 1.2589, 2.1135, 3.5481, 5.9566, 10.0000]) >>> torch.logspace(start=0.1, end=1.0, steps=1) tensor([1.2589]) 
- 
torch.eye(n, m=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a 2-D tensor with ones on the diagonal and zeros elsewhere. - Parameters
- n (int) – the number of rows 
- m (int, optional) – the number of columns with default being - n
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
- Returns
- A 2-D tensor with ones on the diagonal and zeros elsewhere 
- Return type
 - Example: - >>> torch.eye(3) tensor([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) 
- 
torch.empty(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument - sizes.- Parameters
- sizes (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.empty(2, 3) tensor(1.00000e-08 * [[ 6.3984, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]) 
- 
torch.empty_like(input, dtype=None, layout=None, device=None, requires_grad=False) → Tensor¶
- Returns an uninitialized tensor with the same size as - input.- torch.empty_like(input)is equivalent to- torch.empty(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.empty((2,3), dtype=torch.int64) tensor([[ 9.4064e+13, 2.8000e+01, 9.3493e+13], [ 7.5751e+18, 7.1428e+18, 7.5955e+18]]) 
- 
torch.full(size, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor of size - sizefilled with- fill_value.- Parameters
- size (int...) – a list, tuple, or - torch.Sizeof integers defining the shape of the output tensor.
- fill_value – the number to fill the output tensor with. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.full((2, 3), 3.141592) tensor([[ 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416]]) 
- 
torch.full_like(input, fill_value, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor with the same size as - inputfilled with- fill_value.- torch.full_like(input, fill_value)is equivalent to- torch.full(input.size(), fill_value, dtype=input.dtype, layout=input.layout, device=input.device).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- fill_value – the number to fill the output tensor with. 
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 
Indexing, Slicing, Joining, Mutating Ops¶
- 
torch.cat(tensors, dim=0, out=None) → Tensor¶
- Concatenates the given sequence of - seqtensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty.- torch.cat()can be seen as an inverse operation for- torch.split()and- torch.chunk().- torch.cat()can be best understood via examples.- Parameters
 - Example: - >>> x = torch.randn(2, 3) >>> x tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 0) tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 1) tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497]]) 
- 
torch.chunk(tensor, chunks, dim=0) → List of Tensors¶
- Splits a tensor into a specific number of chunks. - Last chunk will be smaller if the tensor size along the given dimension - dimis not divisible by- chunks.
- 
torch.gather(input, dim, index, out=None, sparse_grad=False) → Tensor¶
- Gathers values along an axis specified by dim. - For a 3-D tensor the output is specified by: - out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0 out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1 out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2 - If - inputis an n-dimensional tensor with size \((x_0, x_1..., x_{i-1}, x_i, x_{i+1}, ..., x_{n-1})\) and- dim = i, then- indexmust be an \(n\)-dimensional tensor with size \((x_0, x_1, ..., x_{i-1}, y, x_{i+1}, ..., x_{n-1})\) where \(y \geq 1\) and- outwill have the same size as- index.- Parameters
 - Example: - >>> t = torch.tensor([[1,2],[3,4]]) >>> torch.gather(t, 1, torch.tensor([[0,0],[1,0]])) tensor([[ 1, 1], [ 4, 3]]) 
- 
torch.index_select(input, dim, index, out=None) → Tensor¶
- Returns a new tensor which indexes the - inputtensor along dimension- dimusing the entries in- indexwhich is a LongTensor.- The returned tensor has the same number of dimensions as the original tensor ( - input). The- dimth dimension has the same size as the length of- index; other dimensions have the same size as in the original tensor.- Note - The returned tensor does not use the same storage as the original tensor. If - outhas a different shape than expected, we silently change it to the correct shape, reallocating the underlying storage if necessary.- Parameters
 - Example: - >>> x = torch.randn(3, 4) >>> x tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-0.4664, 0.2647, -0.1228, -1.1068], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> indices = torch.tensor([0, 2]) >>> torch.index_select(x, 0, indices) tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> torch.index_select(x, 1, indices) tensor([[ 0.1427, -0.5414], [-0.4664, -0.1228], [-1.1734, 0.7230]]) 
- 
torch.masked_select(input, mask, out=None) → Tensor¶
- Returns a new 1-D tensor which indexes the - inputtensor according to the binary mask- maskwhich is a ByteTensor.- The shapes of the - masktensor and the- inputtensor don’t need to match, but they must be broadcastable.- Note - The returned tensor does not use the same storage as the original tensor - Parameters
- input (Tensor) – the input data 
- mask (ByteTensor) – the tensor containing the binary mask to index with 
- out (Tensor, optional) – the output tensor 
 
 - Example: - >>> x = torch.randn(3, 4) >>> x tensor([[ 0.3552, -2.3825, -0.8297, 0.3477], [-1.2035, 1.2252, 0.5002, 0.6248], [ 0.1307, -2.0608, 0.1244, 2.0139]]) >>> mask = x.ge(0.5) >>> mask tensor([[ 0, 0, 0, 0], [ 0, 1, 1, 1], [ 0, 0, 0, 1]], dtype=torch.uint8) >>> torch.masked_select(x, mask) tensor([ 1.2252, 0.5002, 0.6248, 2.0139]) 
- 
torch.narrow(input, dimension, start, length) → Tensor¶
- Returns a new tensor that is a narrowed version of - inputtensor. The dimension- dimis input from- startto- start + length. The returned tensor and- inputtensor share the same underlying storage.- Parameters
 - Example: - >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> torch.narrow(x, 0, 0, 2) tensor([[ 1, 2, 3], [ 4, 5, 6]]) >>> torch.narrow(x, 1, 1, 2) tensor([[ 2, 3], [ 5, 6], [ 8, 9]]) 
- 
torch.nonzero(input, out=None) → LongTensor¶
- Returns a tensor containing the indices of all non-zero elements of - input. Each row in the result contains the indices of a non-zero element in- input.- If - inputhas n dimensions, then the resulting indices tensor- outis of size \((z \times n)\), where \(z\) is the total number of non-zero elements in the- inputtensor.- Parameters
- input (Tensor) – the input tensor 
- out (LongTensor, optional) – the output tensor containing indices 
 
 - Example: - >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1])) tensor([[ 0], [ 1], [ 2], [ 4]]) >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0], [0.0, 0.4, 0.0, 0.0], [0.0, 0.0, 1.2, 0.0], [0.0, 0.0, 0.0,-0.4]])) tensor([[ 0, 0], [ 1, 1], [ 2, 2], [ 3, 3]]) 
- 
torch.reshape(input, shape) → Tensor¶
- Returns a tensor with the same data and number of elements as - input, but with the specified shape. When possible, the returned tensor will be a view of- input. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.- See - torch.Tensor.view()on when it is possible to return a view.- A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in - input.- Parameters
- input (Tensor) – the tensor to be reshaped 
- shape (tuple of python:ints) – the new shape 
 
 - Example: - >>> a = torch.arange(4.) >>> torch.reshape(a, (2, 2)) tensor([[ 0., 1.], [ 2., 3.]]) >>> b = torch.tensor([[0, 1], [2, 3]]) >>> torch.reshape(b, (-1,)) tensor([ 0, 1, 2, 3]) 
- 
torch.split(tensor, split_size_or_sections, dim=0)¶
- Splits the tensor into chunks. - If - split_size_or_sectionsis an integer type, then- tensorwill be split into equally sized chunks (if possible). Last chunk will be smaller if the tensor size along the given dimension- dimis not divisible by- split_size.- If - split_size_or_sectionsis a list, then- tensorwill be split into- len(split_size_or_sections)chunks with sizes in- dimaccording to- split_size_or_sections.
- 
torch.squeeze(input, dim=None, out=None) → Tensor¶
- Returns a tensor with all the dimensions of - inputof size 1 removed.- For example, if input is of shape: \((A \times 1 \times B \times C \times 1 \times D)\) then the out tensor will be of shape: \((A \times B \times C \times D)\). - When - dimis given, a squeeze operation is done only in the given dimension. If input is of shape: \((A \times 1 \times B)\),- squeeze(input, 0)leaves the tensor unchanged, but- squeeze(input, 1)will squeeze the tensor to the shape \((A \times B)\).- Note - The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other. - Parameters
 - Example: - >>> x = torch.zeros(2, 1, 2, 1, 2) >>> x.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x) >>> y.size() torch.Size([2, 2, 2]) >>> y = torch.squeeze(x, 0) >>> y.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x, 1) >>> y.size() torch.Size([2, 2, 1, 2]) 
- 
torch.stack(seq, dim=0, out=None) → Tensor¶
- Concatenates sequence of tensors along a new dimension. - All tensors need to be of the same size. 
- 
torch.t(input) → Tensor¶
- Expects - inputto be <= 2-D tensor and transposes dimensions 0 and 1.- 0-D and 1-D tensors are returned as it is and 2-D tensor can be seen as a short-hand function for - transpose(input, 0, 1).- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> x = torch.randn(()) >>> x tensor(0.1995) >>> torch.t(x) tensor(0.1995) >>> x = torch.randn(3) >>> x tensor([ 2.4320, -0.4608, 0.7702]) >>> torch.t(x) tensor([.2.4320,.-0.4608,..0.7702]) >>> x = torch.randn(2, 3) >>> x tensor([[ 0.4875, 0.9158, -0.5872], [ 0.3938, -0.6929, 0.6932]]) >>> torch.t(x) tensor([[ 0.4875, 0.3938], [ 0.9158, -0.6929], [-0.5872, 0.6932]]) 
- 
torch.take(input, indices) → Tensor¶
- Returns a new tensor with the elements of - inputat the given indices. The input tensor is treated as if it were viewed as a 1-D tensor. The result takes the same shape as the indices.- Parameters
- input (Tensor) – the input tensor 
- indices (LongTensor) – the indices into tensor 
 
 - Example: - >>> src = torch.tensor([[4, 3, 5], [6, 7, 8]]) >>> torch.take(src, torch.tensor([0, 2, 5])) tensor([ 4, 5, 8]) 
- 
torch.transpose(input, dim0, dim1) → Tensor¶
- Returns a tensor that is a transposed version of - input. The given dimensions- dim0and- dim1are swapped.- The resulting - outtensor shares it’s underlying storage with the- inputtensor, so changing the content of one would change the content of the other.- Parameters
 - Example: - >>> x = torch.randn(2, 3) >>> x tensor([[ 1.0028, -0.9893, 0.5809], [-0.1669, 0.7299, 0.4942]]) >>> torch.transpose(x, 0, 1) tensor([[ 1.0028, -0.1669], [-0.9893, 0.7299], [ 0.5809, 0.4942]]) 
- 
torch.unbind(tensor, dim=0) → seq¶
- Removes a tensor dimension. - Returns a tuple of all slices along a given dimension, already without it. - Example: - >>> torch.unbind(torch.tensor([[1, 2, 3], >>> [4, 5, 6], >>> [7, 8, 9]])) (tensor([1, 2, 3]), tensor([4, 5, 6]), tensor([7, 8, 9])) 
- 
torch.unsqueeze(input, dim, out=None) → Tensor¶
- Returns a new tensor with a dimension of size one inserted at the specified position. - The returned tensor shares the same underlying data with this tensor. - A - dimvalue within the range- [-input.dim() - 1, input.dim() + 1)can be used. Negative- dimwill correspond to- unsqueeze()applied at- dim=- dim + input.dim() + 1.- Parameters
 - Example: - >>> x = torch.tensor([1, 2, 3, 4]) >>> torch.unsqueeze(x, 0) tensor([[ 1, 2, 3, 4]]) >>> torch.unsqueeze(x, 1) tensor([[ 1], [ 2], [ 3], [ 4]]) 
- 
torch.where(condition, x, y) → Tensor¶
- Return a tensor of elements selected from either - xor- y, depending on- condition.- The operation is defined as: \[out_i = \begin{cases} x_i & \text{if } \text{condition}_i \\ y_i & \text{otherwise} \\ \end{cases} \]- Note - The tensors - condition,- x,- ymust be broadcastable.- Parameters
- condition (ByteTensor) – When True (nonzero), yield x, otherwise yield y 
- x (Tensor) – values selected at indices where - conditionis- True
- y (Tensor) – values selected at indices where - conditionis- False
 
- Returns
- A tensor of shape equal to the broadcasted shape of - condition,- x,- y
- Return type
 - Example: - >>> x = torch.randn(3, 2) >>> y = torch.ones(3, 2) >>> x tensor([[-0.4620, 0.3139], [ 0.3898, -0.7197], [ 0.0478, -0.1657]]) >>> torch.where(x > 0, x, y) tensor([[ 1.0000, 0.3139], [ 0.3898, 1.0000], [ 0.0478, 1.0000]]) 
Random sampling¶
- 
torch.manual_seed(seed)¶
- Sets the seed for generating random numbers. Returns a torch._C.Generator object. - Parameters
- seed (int) – The desired seed. 
 
- 
torch.initial_seed()¶
- Returns the initial seed for generating random numbers as a Python long. 
- 
torch.get_rng_state()¶
- Returns the random number generator state as a torch.ByteTensor. 
- 
torch.set_rng_state(new_state)¶
- Sets the random number generator state. - Parameters
- new_state (torch.ByteTensor) – The desired state 
 
- 
torch.default_generator= <torch._C.Generator object>¶
- 
torch.bernoulli(input, *, generator=None, out=None) → Tensor¶
- Draws binary random numbers (0 or 1) from a Bernoulli distribution. - The - inputtensor should be a tensor containing probabilities to be used for drawing the binary random number. Hence, all values in- inputhave to be in the range: \(0 \leq \text{input}_i \leq 1\).- The \(\text{i}^{th}\) element of the output tensor will draw a value \(1\) according to the \(\text{i}^{th}\) probability value given in - input.\[\text{out}_{i} \sim \mathrm{Bernoulli}(p = \text{input}_{i}) \]- The returned - outtensor only has values 0 or 1 and is of the same shape as- input.- outcan have integral- dtype, but- inputmust have floating point- dtype.- Parameters
 - Example: - >>> a = torch.empty(3, 3).uniform_(0, 1) # generate a uniform random matrix with range [0, 1] >>> a tensor([[ 0.1737, 0.0950, 0.3609], [ 0.7148, 0.0289, 0.2676], [ 0.9456, 0.8937, 0.7202]]) >>> torch.bernoulli(a) tensor([[ 1., 0., 0.], [ 0., 0., 0.], [ 1., 1., 1.]]) >>> a = torch.ones(3, 3) # probability of drawing "1" is 1 >>> torch.bernoulli(a) tensor([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]]) >>> a = torch.zeros(3, 3) # probability of drawing "1" is 0 >>> torch.bernoulli(a) tensor([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]) 
- 
torch.multinomial(input, num_samples, replacement=False, out=None) → LongTensor¶
- Returns a tensor where each row contains - num_samplesindices sampled from the multinomial probability distribution located in the corresponding row of tensor- input.- Note - The rows of - inputdo not need to sum to one (in which case we use the values as weights), but must be non-negative, finite and have a non-zero sum.- Indices are ordered from left to right according to when each was sampled (first samples are placed in first column). - If - inputis a vector,- outis a vector of size- num_samples.- If - inputis a matrix with m rows,- outis an matrix of shape \((m \times \text{num\_samples})\).- If replacement is - True, samples are drawn with replacement.- If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row. - Note - When drawn without replacement, - num_samplesmust be lower than number of non-zero elements in- input(or the min number of non-zero elements in each row of- inputif it is a matrix).- Parameters
 - Example: - >>> weights = torch.tensor([0, 10, 3, 0], dtype=torch.float) # create a tensor of weights >>> torch.multinomial(weights, 2) tensor([1, 2]) >>> torch.multinomial(weights, 4) # ERROR! RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False, not enough non-negative category to sample) at ../aten/src/TH/generic/THTensorRandom.cpp:320 >>> torch.multinomial(weights, 4, replacement=True) tensor([ 2, 1, 1, 1]) 
- 
torch.normal()¶
- 
torch.normal(mean, std, out=None) → Tensor
 - Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. - The - meanis a tensor with the mean of each output element’s normal distribution- The - stdis a tensor with the standard deviation of each output element’s normal distribution- The shapes of - meanand- stddon’t need to match, but the total number of elements in each tensor need to be the same.- Note - When the shapes do not match, the shape of - meanis used as the shape for the returned output tensor- Parameters
 - Example: - >>> torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1)) tensor([ 1.0425, 3.5672, 2.7969, 4.2925, 4.7229, 6.2134, 8.0505, 8.1408, 9.0563, 10.0566]) - 
torch.normal(mean=0.0, std, out=None) → Tensor
 - Similar to the function above, but the means are shared among all drawn elements. - Parameters
 - Example: - >>> torch.normal(mean=0.5, std=torch.arange(1., 6.)) tensor([-1.2793, -1.0732, -2.0687, 5.1177, -1.2303]) - 
torch.normal(mean, std=1.0, out=None) → Tensor
 - Similar to the function above, but the standard-deviations are shared among all drawn elements. - Parameters
 - Example: - >>> torch.normal(mean=torch.arange(1., 6.)) tensor([ 1.1552, 2.6148, 2.6535, 5.8318, 4.2361]) 
- 
- 
torch.rand(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with random numbers from a uniform distribution on the interval \([0, 1)\) - The shape of the tensor is defined by the variable argument - sizes.- Parameters
- sizes (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.rand(4) tensor([ 0.5204, 0.2503, 0.3525, 0.5673]) >>> torch.rand(2, 3) tensor([[ 0.8237, 0.5781, 0.6879], [ 0.3816, 0.7249, 0.0998]]) 
- 
torch.rand_like(input, dtype=None, layout=None, device=None, requires_grad=False) → Tensor¶
- Returns a tensor with the same size as - inputthat is filled with random numbers from a uniform distribution on the interval \([0, 1)\).- torch.rand_like(input)is equivalent to- torch.rand(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 
- 
torch.randint(low=0, high, size, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with random integers generated uniformly between - low(inclusive) and- high(exclusive).- The shape of the tensor is defined by the variable argument - size.- Parameters
- low (int, optional) – Lowest integer to be drawn from the distribution. Default: 0. 
- high (int) – One above the highest integer to be drawn from the distribution. 
- size (tuple) – a tuple defining the shape of the output tensor. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.randint(3, 5, (3,)) tensor([4, 3, 4]) >>> torch.randint(10, (2, 2)) tensor([[0, 2], [5, 5]]) >>> torch.randint(3, 10, (2, 2)) tensor([[4, 5], [6, 7]]) 
- 
torch.randint_like(input, low=0, high, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor with the same shape as Tensor - inputfilled with random integers generated uniformly between- low(inclusive) and- high(exclusive).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- low (int, optional) – Lowest integer to be drawn from the distribution. Default: 0. 
- high (int) – One above the highest integer to be drawn from the distribution. 
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 
- 
torch.randn(*sizes, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution). \[\text{out}_{i} \sim \mathcal{N}(0, 1) \]- The shape of the tensor is defined by the variable argument - sizes.- Parameters
- sizes (int...) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()).
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.randn(4) tensor([-2.1436, 0.9966, 2.3426, -0.6366]) >>> torch.randn(2, 3) tensor([[ 1.5954, 2.8929, -1.0923], [ 1.1719, -0.4709, -0.1996]]) 
- 
torch.randn_like(input, dtype=None, layout=None, device=None, requires_grad=False) → Tensor¶
- Returns a tensor with the same size as - inputthat is filled with random numbers from a normal distribution with mean 0 and variance 1.- torch.randn_like(input)is equivalent to- torch.randn(input.size(), dtype=input.dtype, layout=input.layout, device=input.device).- Parameters
- input (Tensor) – the size of - inputwill determine size of the output tensor
- dtype ( - torch.dtype, optional) – the desired data type of returned Tensor. Default: if- None, defaults to the dtype of- input.
- layout ( - torch.layout, optional) – the desired layout of returned tensor. Default: if- None, defaults to the layout of- input.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, defaults to the device of- input.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 
- 
torch.randperm(n, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False) → LongTensor¶
- Returns a random permutation of integers from - 0to- n - 1.- Parameters
- n (int) – the upper bound (exclusive) 
- out (Tensor, optional) – the output tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default:- torch.int64.
- layout ( - torch.layout, optional) – the desired layout of returned Tensor. Default:- torch.strided.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> torch.randperm(4) tensor([2, 1, 0, 3]) 
In-place random sampling¶
There are a few more in-place random sampling functions defined on Tensors as well. Click through to refer to their documentation:
- torch.Tensor.bernoulli_()- in-place version of- torch.bernoulli()
- torch.Tensor.cauchy_()- numbers drawn from the Cauchy distribution
- torch.Tensor.exponential_()- numbers drawn from the exponential distribution
- torch.Tensor.geometric_()- elements drawn from the geometric distribution
- torch.Tensor.log_normal_()- samples from the log-normal distribution
- torch.Tensor.normal_()- in-place version of- torch.normal()
- torch.Tensor.random_()- numbers sampled from the discrete uniform distribution
- torch.Tensor.uniform_()- numbers sampled from the continuous uniform distribution
Serialization¶
- 
torch.save(obj, f, pickle_module=<module 'pickle'>, pickle_protocol=2)¶
- Saves an object to a disk file. - See also: recommend-saving-models - Parameters
- obj – saved object 
- f – a file-like object (has to implement write and flush) or a string containing a file name 
- pickle_module – module used for pickling metadata and objects 
- pickle_protocol – can be specified to override the default protocol 
 
 - Warning - If you are using Python 2, torch.save does NOT support StringIO.StringIO as a valid file-like object. This is because the write method should return the number of bytes written; StringIO.write() does not do this. - Please use something like io.BytesIO instead. - Example - >>> # Save to file >>> x = torch.tensor([0, 1, 2, 3, 4]) >>> torch.save(x, 'tensor.pt') >>> # Save to io.BytesIO buffer >>> buffer = io.BytesIO() >>> torch.save(x, buffer) 
- 
torch.load(f, map_location=None, pickle_module=<module 'pickle'>, **pickle_load_args)¶
- Loads an object saved with - torch.save()from a file.- torch.load()uses Python’s unpickling facilities but treats storages, which underlie tensors, specially. They are first deserialized on the CPU and are then moved to the device they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised. However, storages can be dynamically remapped to an alternative set of devices using the map_location argument.- If map_location is a callable, it will be called once for each serialized storage with two arguments: storage and location. The storage argument will be the initial deserialization of the storage, residing on the CPU. Each serialized storage has a location tag associated with it which identifies the device it was saved from, and this tag is the second argument passed to map_location. The builtin location tags are ‘cpu’ for CPU tensors and ‘cuda:device_id’ (e.g. ‘cuda:2’) for CUDA tensors. map_location should return either None or a storage. If map_location returns a storage, it will be used as the final deserialized object, already moved to the right device. Otherwise, \(torch.load\) will fall back to the default behavior, as if map_location wasn’t specified. - If map_location is a string, it should be a device tag, where all tensors should be loaded. - Otherwise, if map_location is a dict, it will be used to remap location tags appearing in the file (keys), to ones that specify where to put the storages (values). - User extensions can register their own location tags and tagging and deserialization methods using register_package. - Parameters
- f – a file-like object (has to implement read, readline, tell, and seek), or a string containing a file name 
- map_location – a function, torch.device, string or a dict specifying how to remap storage locations 
- pickle_module – module used for unpickling metadata and objects (has to match the pickle_module used to serialize file) 
- pickle_load_args – optional keyword arguments passed over to - pickle_module.loadand- pickle_module.Unpickler, e.g.,- encoding=....
 
 - Note - When you call - torch.load()on a file which contains GPU tensors, those tensors will be loaded to GPU by default. You can call torch.load(.., map_location=’cpu’) and then- load_state_dict()to avoid GPU RAM surge when loading a model checkpoint.- Note - In Python 3, when loading files saved by Python 2, you may encounter - UnicodeDecodeError: 'ascii' codec can't decode byte 0x.... This is caused by the difference of handling in byte strings in Python2 and Python 3. You may use extra- encodingkeyword argument to specify how these objects should be loaded, e.g.,- encoding='latin1'decodes them to strings using- latin1encoding, and- encoding='bytes'keeps them as byte arrays which can be decoded later with- byte_array.decode(...).- Example - >>> torch.load('tensors.pt') # Load all tensors onto the CPU >>> torch.load('tensors.pt', map_location=torch.device('cpu')) # Load all tensors onto the CPU, using a function >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage) # Load all tensors onto GPU 1 >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1)) # Map tensors from GPU 1 to GPU 0 >>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'}) # Load tensor from io.BytesIO object >>> with open('tensor.pt', 'rb') as f: buffer = io.BytesIO(f.read()) >>> torch.load(buffer) 
Parallelism¶
- 
torch.get_num_threads() → int¶
- Gets the number of OpenMP threads used for parallelizing CPU operations 
- 
torch.set_num_threads(int)¶
- Sets the number of OpenMP threads used for parallelizing CPU operations 
Locally disabling gradient computation¶
The context managers torch.no_grad(), torch.enable_grad(), and
torch.set_grad_enabled() are helpful for locally disabling and enabling
gradient computation. See Locally disabling gradient computation for more details on
their usage.
Examples:
>>> x = torch.zeros(1, requires_grad=True)
>>> with torch.no_grad():
...     y = x * 2
>>> y.requires_grad
False
>>> is_train = False
>>> with torch.set_grad_enabled(is_train):
...     y = x * 2
>>> y.requires_grad
False
>>> torch.set_grad_enabled(True)  # this can also be used as a function
>>> y = x * 2
>>> y.requires_grad
True
>>> torch.set_grad_enabled(False)
>>> y = x * 2
>>> y.requires_grad
False
Math operations¶
Pointwise Ops¶
- 
torch.abs(input, out=None) → Tensor¶
- Computes the element-wise absolute value of the given - inputtensor.\[\text{out}_{i} = |\text{input}_{i}| \]- Example: - >>> torch.abs(torch.tensor([-1, -2, 3])) tensor([ 1, 2, 3]) 
- 
torch.acos(input, out=None) → Tensor¶
- Returns a new tensor with the arccosine of the elements of - input.\[\text{out}_{i} = \cos^{-1}(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.3348, -0.5889, 0.2005, -0.1584]) >>> torch.acos(a) tensor([ 1.2294, 2.2004, 1.3690, 1.7298]) 
- 
torch.add()¶
- 
torch.add(input, value, out=None)
 - Adds the scalar - valueto each element of the input- inputand returns a new resulting tensor.\[\text{out} = \text{input} + \text{value} \]- If - inputis of type FloatTensor or DoubleTensor,- valuemust be a real number, otherwise it should be an integer.- Parameters
- input (Tensor) – the input tensor 
- value (Number) – the number to be added to each element of - input
 
- Keyword Arguments
- out (Tensor, optional) – the output tensor 
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.0202, 1.0985, 1.3506, -0.6056]) >>> torch.add(a, 20) tensor([ 20.0202, 21.0985, 21.3506, 19.3944]) - 
torch.add(input, value=1, other, out=None)
 - Each element of the tensor - otheris multiplied by the scalar- valueand added to each element of the tensor- input. The resulting tensor is returned.- The shapes of - inputand- othermust be broadcastable.\[\text{out} = \text{input} + \text{value} \times \text{other} \]- If - otheris of type FloatTensor or DoubleTensor,- valuemust be a real number, otherwise it should be an integer.- Parameters
- Keyword Arguments
- out (Tensor, optional) – the output tensor 
 - Example: - >>> a = torch.randn(4) >>> a tensor([-0.9732, -0.3497, 0.6245, 0.4022]) >>> b = torch.randn(4, 1) >>> b tensor([[ 0.3743], [-1.7724], [-0.5811], [-0.8017]]) >>> torch.add(a, 10, b) tensor([[ 2.7695, 3.3930, 4.3672, 4.1450], [-18.6971, -18.0736, -17.0994, -17.3216], [ -6.7845, -6.1610, -5.1868, -5.4090], [ -8.9902, -8.3667, -7.3925, -7.6147]]) 
- 
- 
torch.addcdiv(tensor, value=1, tensor1, tensor2, out=None) → Tensor¶
- Performs the element-wise division of - tensor1by- tensor2, multiply the result by the scalar- valueand add it to- tensor.\[\text{out}_i = \text{tensor}_i + \text{value} \times \frac{\text{tensor1}_i}{\text{tensor2}_i} \]- The shapes of - tensor,- tensor1, and- tensor2must be broadcastable.- For inputs of type FloatTensor or DoubleTensor, - valuemust be a real number, otherwise an integer.- Parameters
 - Example: - >>> t = torch.randn(1, 3) >>> t1 = torch.randn(3, 1) >>> t2 = torch.randn(1, 3) >>> torch.addcdiv(t, 0.1, t1, t2) tensor([[-0.2312, -3.6496, 0.1312], [-1.0428, 3.4292, -0.1030], [-0.5369, -0.9829, 0.0430]]) 
- 
torch.addcmul(tensor, value=1, tensor1, tensor2, out=None) → Tensor¶
- Performs the element-wise multiplication of - tensor1by- tensor2, multiply the result by the scalar- valueand add it to- tensor.\[\text{out}_i = \text{tensor}_i + \text{value} \times \text{tensor1}_i \times \text{tensor2}_i \]- The shapes of - tensor,- tensor1, and- tensor2must be broadcastable.- For inputs of type FloatTensor or DoubleTensor, - valuemust be a real number, otherwise an integer.- Parameters
 - Example: - >>> t = torch.randn(1, 3) >>> t1 = torch.randn(3, 1) >>> t2 = torch.randn(1, 3) >>> torch.addcmul(t, 0.1, t1, t2) tensor([[-0.8635, -0.6391, 1.6174], [-0.7617, -0.5879, 1.7388], [-0.8353, -0.6249, 1.6511]]) 
- 
torch.asin(input, out=None) → Tensor¶
- Returns a new tensor with the arcsine of the elements of - input.\[\text{out}_{i} = \sin^{-1}(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.5962, 1.4985, -0.4396, 1.4525]) >>> torch.asin(a) tensor([-0.6387, nan, -0.4552, nan]) 
- 
torch.atan(input, out=None) → Tensor¶
- Returns a new tensor with the arctangent of the elements of - input.\[\text{out}_{i} = \tan^{-1}(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.2341, 0.2539, -0.6256, -0.6448]) >>> torch.atan(a) tensor([ 0.2299, 0.2487, -0.5591, -0.5727]) 
- 
torch.atan2(input1, input2, out=None) → Tensor¶
- Returns a new tensor with the arctangent of the elements of - input1and- input2.- The shapes of - input1and- input2must be broadcastable.- Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.9041, 0.0196, -0.3108, -2.4423]) >>> torch.atan2(a, torch.randn(4)) tensor([ 0.9833, 0.0811, -1.9743, -1.4151]) 
- 
torch.ceil(input, out=None) → Tensor¶
- Returns a new tensor with the ceil of the elements of - input, the smallest integer greater than or equal to each element.\[\text{out}_{i} = \left\lceil \text{input}_{i} \right\rceil = \left\lfloor \text{input}_{i} \right\rfloor + 1 \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.6341, -1.4208, -1.0900, 0.5826]) >>> torch.ceil(a) tensor([-0., -1., -1., 1.]) 
- 
torch.clamp(input, min, max, out=None) → Tensor¶
- Clamp all elements in - inputinto the range [- min,- max] and return a resulting tensor:\[y_i = \begin{cases} \text{min} & \text{if } x_i < \text{min} \\ x_i & \text{if } \text{min} \leq x_i \leq \text{max} \\ \text{max} & \text{if } x_i > \text{max} \end{cases} \]- If - inputis of type FloatTensor or DoubleTensor, args- minand- maxmust be real numbers, otherwise they should be integers.- Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([-1.7120, 0.1734, -0.0478, -0.0922]) >>> torch.clamp(a, min=-0.5, max=0.5) tensor([-0.5000, 0.1734, -0.0478, -0.0922]) - 
torch.clamp(input, *, min, out=None) → Tensor
 - Clamps all elements in - inputto be larger or equal- min.- If - inputis of type FloatTensor or DoubleTensor,- valueshould be a real number, otherwise it should be an integer.- Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([-0.0299, -2.3184, 2.1593, -0.8883]) >>> torch.clamp(a, min=0.5) tensor([ 0.5000, 0.5000, 2.1593, 0.5000]) - 
torch.clamp(input, *, max, out=None) → Tensor
 - Clamps all elements in - inputto be smaller or equal- max.- If - inputis of type FloatTensor or DoubleTensor,- valueshould be a real number, otherwise it should be an integer.- Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.7753, -0.4702, -0.4599, 1.1899]) >>> torch.clamp(a, max=0.5) tensor([ 0.5000, -0.4702, -0.4599, 0.5000]) 
- 
torch.cos(input, out=None) → Tensor¶
- Returns a new tensor with the cosine of the elements of - input.\[\text{out}_{i} = \cos(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 1.4309, 1.2706, -0.8562, 0.9796]) >>> torch.cos(a) tensor([ 0.1395, 0.2957, 0.6553, 0.5574]) 
- 
torch.cosh(input, out=None) → Tensor¶
- Returns a new tensor with the hyperbolic cosine of the elements of - input.\[\text{out}_{i} = \cosh(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.1632, 1.1835, -0.6979, -0.7325]) >>> torch.cosh(a) tensor([ 1.0133, 1.7860, 1.2536, 1.2805]) 
- 
torch.div()¶
- 
torch.div(input, value, out=None) → Tensor
 - Divides each element of the input - inputwith the scalar- valueand returns a new resulting tensor.\[\text{out}_i = \frac{\text{input}_i}{\text{value}} \]- If - inputis of type FloatTensor or DoubleTensor,- valueshould be a real number, otherwise it should be an integer- Parameters
 - Example: - >>> a = torch.randn(5) >>> a tensor([ 0.3810, 1.2774, -0.2972, -0.3719, 0.4637]) >>> torch.div(a, 0.5) tensor([ 0.7620, 2.5548, -0.5944, -0.7439, 0.9275]) - 
torch.div(input, other, out=None) → Tensor
 - Each element of the tensor - inputis divided by each element of the tensor- other. The resulting tensor is returned. The shapes of- inputand- othermust be broadcastable.\[\text{out}_i = \frac{\text{input}_i}{\text{other}_i} \]- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[-0.3711, -1.9353, -0.4605, -0.2917], [ 0.1815, -1.0111, 0.9805, -1.5923], [ 0.1062, 1.4581, 0.7759, -1.2344], [-0.1830, -0.0313, 1.1908, -1.4757]]) >>> b = torch.randn(4) >>> b tensor([ 0.8032, 0.2930, -0.8113, -0.2308]) >>> torch.div(a, b) tensor([[-0.4620, -6.6051, 0.5676, 1.2637], [ 0.2260, -3.4507, -1.2086, 6.8988], [ 0.1322, 4.9764, -0.9564, 5.3480], [-0.2278, -0.1068, -1.4678, 6.3936]]) 
- 
- 
torch.digamma(input, out=None) → Tensor¶
- Computes the logarithmic derivative of the gamma function on input. \[\psi(x) = \frac{d}{dx} \ln\left(\Gamma\left(x\right)\right) = \frac{\Gamma'(x)}{\Gamma(x)} \]- Parameters
- input (Tensor) – the tensor to compute the digamma function on 
 - Example: - >>> a = torch.tensor([1, 0.5]) >>> torch.digamma(a) tensor([-0.5772, -1.9635]) 
- 
torch.erf(tensor, out=None) → Tensor¶
- Computes the error function of each element. The error function is defined as follows: \[\mathrm{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt \]- Example: - >>> torch.erf(torch.tensor([0, -1., 10.])) tensor([ 0.0000, -0.8427, 1.0000]) 
- 
torch.erfc(input, out=None) → Tensor¶
- Computes the complementary error function of each element of - input. The complementary error function is defined as follows:\[\mathrm{erfc}(x) = 1 - \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt \]- Example: - >>> torch.erfc(torch.tensor([0, -1., 10.])) tensor([ 1.0000, 1.8427, 0.0000]) 
- 
torch.erfinv(input, out=None) → Tensor¶
- Computes the inverse error function of each element of - input. The inverse error function is defined in the range \((-1, 1)\) as:\[\mathrm{erfinv}(\mathrm{erf}(x)) = x \]- Example: - >>> torch.erfinv(torch.tensor([0, 0.5, -1.])) tensor([ 0.0000, 0.4769, -inf]) 
- 
torch.exp(input, out=None) → Tensor¶
- Returns a new tensor with the exponential of the elements of the input tensor - input.\[y_{i} = e^{x_{i}} \]- Example: - >>> torch.exp(torch.tensor([0, math.log(2.)])) tensor([ 1., 2.]) 
- 
torch.expm1(input, out=None) → Tensor¶
- Returns a new tensor with the exponential of the elements minus 1 of - input.\[y_{i} = e^{x_{i}} - 1 \]- Example: - >>> torch.expm1(torch.tensor([0, math.log(2.)])) tensor([ 0., 1.]) 
- 
torch.floor(input, out=None) → Tensor¶
- Returns a new tensor with the floor of the elements of - input, the largest integer less than or equal to each element.\[\text{out}_{i} = \left\lfloor \text{input}_{i} \right\rfloor \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.8166, 1.5308, -0.2530, -0.2091]) >>> torch.floor(a) tensor([-1., 1., -1., -1.]) 
- 
torch.fmod(input, divisor, out=None) → Tensor¶
- Computes the element-wise remainder of division. - The dividend and divisor may contain both for integer and floating point numbers. The remainder has the same sign as the dividend - input.- When - divisoris a tensor, the shapes of- inputand- divisormust be broadcastable.- Parameters
 - Example: - >>> torch.fmod(torch.tensor([-3., -2, -1, 1, 2, 3]), 2) tensor([-1., -0., -1., 1., 0., 1.]) >>> torch.fmod(torch.tensor([1., 2, 3, 4, 5]), 1.5) tensor([ 1.0000, 0.5000, 0.0000, 1.0000, 0.5000]) 
- 
torch.frac(input, out=None) → Tensor¶
- Computes the fractional portion of each element in - input.\[\text{out}_{i} = \text{input}_{i} - \left\lfloor \text{input}_{i} \right\rfloor \]- Example: - >>> torch.frac(torch.tensor([1, 2.5, -3.2])) tensor([ 0.0000, 0.5000, -0.2000]) 
- 
torch.lerp(start, end, weight, out=None)¶
- Does a linear interpolation of two tensors - startand- endbased on a scalar or tensor- weightand returns the resulting- outtensor.\[\text{out}_i = \text{start}_i + \text{weight}_i \times (\text{end}_i - \text{start}_i) \]- The shapes of - startand- endmust be broadcastable. If- weightis a tensor, then the shapes of- start,- endmust be broadcastable.- Parameters
 - Example: - >>> start = torch.arange(1., 5.) >>> end = torch.empty(4).fill_(10) >>> start tensor([ 1., 2., 3., 4.]) >>> end tensor([ 10., 10., 10., 10.]) >>> torch.lerp(start, end, 0.5) tensor([ 5.5000, 6.0000, 6.5000, 7.0000]) >>> torch.lerp(start, end, torch.full_like(start, 0.5)) tensor([ 5.5000, 6.0000, 6.5000, 7.0000]) 
- 
torch.log(input, out=None) → Tensor¶
- Returns a new tensor with the natural logarithm of the elements of - input.\[y_{i} = \log_{e} (x_{i}) \]- Example: - >>> a = torch.randn(5) >>> a tensor([-0.7168, -0.5471, -0.8933, -1.4428, -0.1190]) >>> torch.log(a) tensor([ nan, nan, nan, nan, nan]) 
- 
torch.log10(input, out=None) → Tensor¶
- Returns a new tensor with the logarithm to the base 10 of the elements of - input.\[y_{i} = \log_{10} (x_{i}) \]- Example: - >>> a = torch.rand(5) >>> a tensor([ 0.5224, 0.9354, 0.7257, 0.1301, 0.2251]) >>> torch.log10(a) tensor([-0.2820, -0.0290, -0.1392, -0.8857, -0.6476]) 
- 
torch.log1p(input, out=None) → Tensor¶
- Returns a new tensor with the natural logarithm of (1 + - input).\[y_i = \log_{e} (x_i + 1) \]- Note - This function is more accurate than - torch.log()for small values of- input- Example: - >>> a = torch.randn(5) >>> a tensor([-1.0090, -0.9923, 1.0249, -0.5372, 0.2492]) >>> torch.log1p(a) tensor([ nan, -4.8653, 0.7055, -0.7705, 0.2225]) 
- 
torch.log2(input, out=None) → Tensor¶
- Returns a new tensor with the logarithm to the base 2 of the elements of - input.\[y_{i} = \log_{2} (x_{i}) \]- Example: - >>> a = torch.rand(5) >>> a tensor([ 0.8419, 0.8003, 0.9971, 0.5287, 0.0490]) >>> torch.log2(a) tensor([-0.2483, -0.3213, -0.0042, -0.9196, -4.3504]) 
- 
torch.mul()¶
- 
torch.mul(input, value, out=None)
 - Multiplies each element of the input - inputwith the scalar- valueand returns a new resulting tensor.\[\text{out}_i = \text{value} \times \text{input}_i \]- If - inputis of type FloatTensor or DoubleTensor,- valueshould be a real number, otherwise it should be an integer- Parameters
 - Example: - >>> a = torch.randn(3) >>> a tensor([ 0.2015, -0.4255, 2.6087]) >>> torch.mul(a, 100) tensor([ 20.1494, -42.5491, 260.8663]) - 
torch.mul(input, other, out=None)
 - Each element of the tensor - inputis multiplied by the corresponding element of the Tensor- other. The resulting tensor is returned.- The shapes of - inputand- othermust be broadcastable.\[\text{out}_i = \text{input}_i \times \text{other}_i \]- Parameters
 - Example: - >>> a = torch.randn(4, 1) >>> a tensor([[ 1.1207], [-0.3137], [ 0.0700], [ 0.8378]]) >>> b = torch.randn(1, 4) >>> b tensor([[ 0.5146, 0.1216, -0.5244, 2.2382]]) >>> torch.mul(a, b) tensor([[ 0.5767, 0.1363, -0.5877, 2.5083], [-0.1614, -0.0382, 0.1645, -0.7021], [ 0.0360, 0.0085, -0.0367, 0.1567], [ 0.4312, 0.1019, -0.4394, 1.8753]]) 
- 
- 
torch.mvlgamma(input, p) → Tensor¶
- Computes the multivariate log-gamma function ([reference]) with dimension \(p\) element-wise, given by \[\log(\Gamma_{p}(a)) = C + \displaystyle \sum_{i=1}^{p} \log\left(\Gamma\left(a - \frac{i - 1}{2}\right)\right) \]- where \(C = \log(\pi) \times \frac{p (p - 1)}{4}\) and \(\Gamma(\cdot)\) is the Gamma function. - If any of the elements are less than or equal to \(\frac{p - 1}{2}\), then an error is thrown. - Parameters
 - Example: - >>> a = torch.empty(2, 3).uniform_(1, 2) >>> a tensor([[1.6835, 1.8474, 1.1929], [1.0475, 1.7162, 1.4180]]) >>> torch.mvlgamma(a, 2) tensor([[0.3928, 0.4007, 0.7586], [1.0311, 0.3901, 0.5049]]) 
- 
torch.neg(input, out=None) → Tensor¶
- Returns a new tensor with the negative of the elements of - input.\[\text{out} = -1 \times \text{input} \]- Example: - >>> a = torch.randn(5) >>> a tensor([ 0.0090, -0.2262, -0.0682, -0.2866, 0.3940]) >>> torch.neg(a) tensor([-0.0090, 0.2262, 0.0682, 0.2866, -0.3940]) 
- 
torch.pow()¶
- 
torch.pow(input, exponent, out=None) → Tensor
 - Takes the power of each element in - inputwith- exponentand returns a tensor with the result.- exponentcan be either a single- floatnumber or a Tensor with the same number of elements as- input.- When - exponentis a scalar value, the operation applied is:\[\text{out}_i = x_i ^ \text{exponent} \]- When - exponentis a tensor, the operation applied is:\[\text{out}_i = x_i ^ {\text{exponent}_i} \]- When - exponentis a tensor, the shapes of- inputand- exponentmust be broadcastable.- Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.4331, 1.2475, 0.6834, -0.2791]) >>> torch.pow(a, 2) tensor([ 0.1875, 1.5561, 0.4670, 0.0779]) >>> exp = torch.arange(1., 5.) >>> a = torch.arange(1., 5.) >>> a tensor([ 1., 2., 3., 4.]) >>> exp tensor([ 1., 2., 3., 4.]) >>> torch.pow(a, exp) tensor([ 1., 4., 27., 256.]) - 
torch.pow(base, input, out=None) → Tensor
 - baseis a scalar- floatvalue, and- inputis a tensor. The returned tensor- outis of the same shape as- input- The operation applied is: \[out_i = base ^ {input_i} \]- Parameters
 - Example: - >>> exp = torch.arange(1., 5.) >>> base = 2 >>> torch.pow(base, exp) tensor([ 2., 4., 8., 16.]) 
- 
- 
torch.reciprocal(input, out=None) → Tensor¶
- Returns a new tensor with the reciprocal of the elements of - input\[\text{out}_{i} = \frac{1}{\text{input}_{i}} \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.4595, -2.1219, -1.4314, 0.7298]) >>> torch.reciprocal(a) tensor([-2.1763, -0.4713, -0.6986, 1.3702]) 
- 
torch.remainder(input, divisor, out=None) → Tensor¶
- Computes the element-wise remainder of division. - The divisor and dividend may contain both for integer and floating point numbers. The remainder has the same sign as the divisor. - When - divisoris a tensor, the shapes of- inputand- divisormust be broadcastable.- Parameters
 - Example: - >>> torch.remainder(torch.tensor([-3., -2, -1, 1, 2, 3]), 2) tensor([ 1., 0., 1., 1., 0., 1.]) >>> torch.remainder(torch.tensor([1., 2, 3, 4, 5]), 1.5) tensor([ 1.0000, 0.5000, 0.0000, 1.0000, 0.5000]) - See also - torch.fmod(), which computes the element-wise remainder of division equivalently to the C library function- fmod().
- 
torch.round(input, out=None) → Tensor¶
- Returns a new tensor with each of the elements of - inputrounded to the closest integer.- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.9920, 0.6077, 0.9734, -1.0362]) >>> torch.round(a) tensor([ 1., 1., 1., -1.]) 
- 
torch.rsqrt(input, out=None) → Tensor¶
- Returns a new tensor with the reciprocal of the square-root of each of the elements of - input.\[\text{out}_{i} = \frac{1}{\sqrt{\text{input}_{i}}} \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.0370, 0.2970, 1.5420, -0.9105]) >>> torch.rsqrt(a) tensor([ nan, 1.8351, 0.8053, nan]) 
- 
torch.sigmoid(input, out=None) → Tensor¶
- Returns a new tensor with the sigmoid of the elements of - input.\[\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}} \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.9213, 1.0887, -0.8858, -1.7683]) >>> torch.sigmoid(a) tensor([ 0.7153, 0.7481, 0.2920, 0.1458]) 
- 
torch.sign(input, out=None) → Tensor¶
- Returns a new tensor with the sign of the elements of - input.- Example: - >>> a = torch.tensor([0.7, -1.2, 0., 2.3]) >>> a tensor([ 0.7000, -1.2000, 0.0000, 2.3000]) >>> torch.sign(a) tensor([ 1., -1., 0., 1.]) 
- 
torch.sin(input, out=None) → Tensor¶
- Returns a new tensor with the sine of the elements of - input.\[\text{out}_{i} = \sin(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([-0.5461, 0.1347, -2.7266, -0.2746]) >>> torch.sin(a) tensor([-0.5194, 0.1343, -0.4032, -0.2711]) 
- 
torch.sinh(input, out=None) → Tensor¶
- Returns a new tensor with the hyperbolic sine of the elements of - input.\[\text{out}_{i} = \sinh(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.5380, -0.8632, -0.1265, 0.9399]) >>> torch.sinh(a) tensor([ 0.5644, -0.9744, -0.1268, 1.0845]) 
- 
torch.sqrt(input, out=None) → Tensor¶
- Returns a new tensor with the square-root of the elements of - input.\[\text{out}_{i} = \sqrt{\text{input}_{i}} \]- Example: - >>> a = torch.randn(4) >>> a tensor([-2.0755, 1.0226, 0.0831, 0.4806]) >>> torch.sqrt(a) tensor([ nan, 1.0112, 0.2883, 0.6933]) 
- 
torch.tan(input, out=None) → Tensor¶
- Returns a new tensor with the tangent of the elements of - input.\[\text{out}_{i} = \tan(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([-1.2027, -1.7687, 0.4412, -1.3856]) >>> torch.tan(a) tensor([-2.5930, 4.9859, 0.4722, -5.3366]) 
- 
torch.tanh(input, out=None) → Tensor¶
- Returns a new tensor with the hyperbolic tangent of the elements of - input.\[\text{out}_{i} = \tanh(\text{input}_{i}) \]- Example: - >>> a = torch.randn(4) >>> a tensor([ 0.8986, -0.7279, 1.1745, 0.2611]) >>> torch.tanh(a) tensor([ 0.7156, -0.6218, 0.8257, 0.2553]) 
- 
torch.trunc(input, out=None) → Tensor¶
- Returns a new tensor with the truncated integer values of the elements of - input.- Example: - >>> a = torch.randn(4) >>> a tensor([ 3.4742, 0.5466, -0.8008, -0.9079]) >>> torch.trunc(a) tensor([ 3., 0., -0., -0.]) 
Reduction Ops¶
- 
torch.argmax(input, dim=None, keepdim=False)¶
- Returns the indices of the maximum values of a tensor across a dimension. - This is the second value returned by - torch.max(). See its documentation for the exact semantics of this method.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[ 1.3398, 0.2663, -0.2686, 0.2450], [-0.7401, -0.8805, -0.3402, -1.1936], [ 0.4907, -1.3948, -1.0691, -0.3132], [-1.6092, 0.5419, -0.2993, 0.3195]]) >>> torch.argmax(a, dim=1) tensor([ 0, 2, 0, 1]) 
- 
torch.argmin(input, dim=None, keepdim=False)¶
- Returns the indices of the minimum values of a tensor across a dimension. - This is the second value returned by - torch.min(). See its documentation for the exact semantics of this method.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[ 0.1139, 0.2254, -0.1381, 0.3687], [ 1.0100, -1.1975, -0.0102, -0.4732], [-0.9240, 0.1207, -0.7506, -1.0213], [ 1.7809, -1.2960, 0.9384, 0.1438]]) >>> torch.argmin(a, dim=1) tensor([ 2, 1, 3, 1]) 
- 
torch.cumprod(input, dim, dtype=None) → Tensor¶
- Returns the cumulative product of elements of - inputin the dimension- dim.- For example, if - inputis a vector of size N, the result will also be a vector of size N, with elements.\[y_i = x_1 \times x_2\times x_3\times \dots \times x_i \]- Parameters
- input (Tensor) – the input tensor 
- dim (int) – the dimension to do the operation over 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(10) >>> a tensor([ 0.6001, 0.2069, -0.1919, 0.9792, 0.6727, 1.0062, 0.4126, -0.2129, -0.4206, 0.1968]) >>> torch.cumprod(a, dim=0) tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0158, -0.0065, 0.0014, -0.0006, -0.0001]) >>> a[5] = 0.0 >>> torch.cumprod(a, dim=0) tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0000, -0.0000, 0.0000, -0.0000, -0.0000]) 
- 
torch.cumsum(input, dim, out=None, dtype=None) → Tensor¶
- Returns the cumulative sum of elements of - inputin the dimension- dim.- For example, if - inputis a vector of size N, the result will also be a vector of size N, with elements.\[y_i = x_1 + x_2 + x_3 + \dots + x_i \]- Parameters
- input (Tensor) – the input tensor 
- dim (int) – the dimension to do the operation over 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(10) >>> a tensor([-0.8286, -0.4890, 0.5155, 0.8443, 0.1865, -0.1752, -2.0595, 0.1850, -1.1571, -0.4243]) >>> torch.cumsum(a, dim=0) tensor([-0.8286, -1.3175, -0.8020, 0.0423, 0.2289, 0.0537, -2.0058, -1.8209, -2.9780, -3.4022]) 
- 
torch.dist(input, other, p=2) → Tensor¶
- Returns the p-norm of ( - input-- other)- The shapes of - inputand- othermust be broadcastable.- Parameters
 - Example: - >>> x = torch.randn(4) >>> x tensor([-1.5393, -0.8675, 0.5916, 1.6321]) >>> y = torch.randn(4) >>> y tensor([ 0.0967, -1.0511, 0.6295, 0.8360]) >>> torch.dist(x, y, 3.5) tensor(1.6727) >>> torch.dist(x, y, 3) tensor(1.6973) >>> torch.dist(x, y, 0) tensor(inf) >>> torch.dist(x, y, 1) tensor(2.6537) 
- 
torch.logsumexp(input, dim, keepdim=False, out=None)¶
- Returns the log of summed exponentials of each row of the - inputtensor in the given dimension- dim. The computation is numerically stabilized.- For summation index \(j\) given by dim and other indices \(i\), the result is \[\text{logsumexp}(x)_{i} = \log \sum_j \exp(x_{ij}) \]- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension(s)- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 (or- len(dim)) fewer dimension(s).- Parameters
 - Example::
- >>> a = torch.randn(3, 3) >>> torch.logsumexp(a, 1) tensor([ 0.8442, 1.4322, 0.8711]) 
 
- 
torch.mean()¶
- 
torch.mean(input) → Tensor
 - Returns the mean value of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[ 0.2294, -0.5481, 1.3288]]) >>> torch.mean(a) tensor(0.3367) - 
torch.mean(input, dim, keepdim=False, out=None) → Tensor
 - Returns the mean value of each row of the - inputtensor in the given dimension- dim. If- dimis a list of dimensions, reduce over all of them.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension(s)- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 (or- len(dim)) fewer dimension(s).- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[-0.3841, 0.6320, 0.4254, -0.7384], [-0.9644, 1.0131, -0.6549, -1.4279], [-0.2951, -1.3350, -0.7694, 0.5600], [ 1.0842, -0.9580, 0.3623, 0.2343]]) >>> torch.mean(a, 1) tensor([-0.0163, -0.5085, -0.4599, 0.1807]) >>> torch.mean(a, 1, True) tensor([[-0.0163], [-0.5085], [-0.4599], [ 0.1807]]) 
- 
- 
torch.median()¶
- 
torch.median(input) → Tensor
 - Returns the median value of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[ 1.5219, -1.5212, 0.2202]]) >>> torch.median(a) tensor(0.2202) - 
torch.median(input, dim=-1, keepdim=False, values=None, indices=None) -> (Tensor, LongTensor)
 - Returns a namedtuple - (values, indices)where- valuesis the median value of each row of the- inputtensor in the given dimension- dim. And- indicesis the index location of each median value found.- By default, - dimis the last dimension of the- inputtensor.- If - keepdimis- True, the output tensors are of the same size as- inputexcept in the dimension- dimwhere they are of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the outputs tensor having 1 fewer dimension than- input.- Parameters
 - Example: - >>> a = torch.randn(4, 5) >>> a tensor([[ 0.2505, -0.3982, -0.9948, 0.3518, -1.3131], [ 0.3180, -0.6993, 1.0436, 0.0438, 0.2270], [-0.2751, 0.7303, 0.2192, 0.3321, 0.2488], [ 1.0778, -1.9510, 0.7048, 0.4742, -0.7125]]) >>> torch.median(a, 1) torch.return_types.median(values=tensor([-0.3982, 0.2270, 0.2488, 0.4742]), indices=tensor([1, 4, 4, 3])) 
- 
- 
torch.mode(input, dim=-1, keepdim=False, values=None, indices=None) -> (Tensor, LongTensor)¶
- Returns a namedtuple - (values, indices)where- valuesis the mode value of each row of the- inputtensor in the given dimension- dim, i.e. a value which appears most often in that row, and- indicesis the index location of each mode value found.- By default, - dimis the last dimension of the- inputtensor.- If - keepdimis- True, the output tensors are of the same size as- inputexcept in the dimension- dimwhere they are of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensors having 1 fewer dimension than- input.- Note - This function is not defined for - torch.cuda.Tensoryet.- Parameters
 - Example: - >>> a = torch.randint(10, (5,)) >>> a tensor([6, 5, 1, 0, 2]) >>> b = a + (torch.randn(50, 1) * 5).long() >>> torch.mode(b, 0) torch.return_types.mode(values=tensor([6, 5, 1, 0, 2]), indices=tensor([2, 2, 2, 2, 2])) 
- 
torch.norm(input, p='fro', dim=None, keepdim=False, out=None, dtype=None)¶
- Returns the matrix norm or vector norm of a given tensor. - Parameters
- input (Tensor) – the input tensor 
- p (int, float, inf, -inf, 'fro', 'nuc', optional) – - the order of norm. Default: - 'fro'The following norms can be calculated:- ord - matrix norm - vector norm - None - Frobenius norm - 2-norm - ’fro’ - Frobenius norm - – - ‘nuc’ - nuclear norm - – - Other - as vec norm when dim is None - sum(abs(x)**ord)**(1./ord) 
- dim (int, 2-tuple of python:ints, 2-list of python:ints, optional) – If it is an int, vector norm will be calculated, if it is 2-tuple of ints, matrix norm will be calculated. If the value is None, matrix norm will be calculated when the input tensor only has two dimensions, vector norm will be calculated when the input tensor only has one dimension. If the input tensor has more than two dimensions, the vector norm will be applied to last dimension. 
- keepdim (bool, optional) – whether the output tensors have - dimretained or not. Ignored if- dim=- Noneand- out=- None. Default:- False
- out (Tensor, optional) – the output tensor. Ignored if - dim=- Noneand- out=- None.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to :attr:’dtype’ while performing the operation. Default: None.
 
 - Example: - >>> import torch >>> a = torch.arange(9, dtype= torch.float) - 4 >>> b = a.reshape((3, 3)) >>> torch.norm(a) tensor(7.7460) >>> torch.norm(b) tensor(7.7460) >>> torch.norm(a, float('inf')) tensor(4.) >>> torch.norm(b, float('inf')) tensor(4.) >>> c = torch.tensor([[ 1, 2, 3],[-1, 1, 4]] , dtype= torch.float) >>> torch.norm(c, dim=0) tensor([1.4142, 2.2361, 5.0000]) >>> torch.norm(c, dim=1) tensor([3.7417, 4.2426]) >>> torch.norm(c, p=1, dim=1) tensor([6., 6.]) >>> d = torch.arange(8, dtype= torch.float).reshape(2,2,2) >>> torch.norm(d, dim=(1,2)) tensor([ 3.7417, 11.2250]) >>> torch.norm(d[0, :, :]), torch.norm(d[1, :, :]) (tensor(3.7417), tensor(11.2250)) 
- 
torch.prod()¶
- 
torch.prod(input, dtype=None) → Tensor
 - Returns the product of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[-0.8020, 0.5428, -1.5854]]) >>> torch.prod(a) tensor(0.6902) - 
torch.prod(input, dim, keepdim=False, dtype=None) → Tensor
 - Returns the product of each row of the - inputtensor in the given dimension- dim.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 fewer dimension than- input.- Parameters
- input (Tensor) – the input tensor 
- dim (int) – the dimension to reduce 
- keepdim (bool) – whether the output tensor has - dimretained or not
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(4, 2) >>> a tensor([[ 0.5261, -0.3837], [ 1.1857, -0.2498], [-1.1646, 0.0705], [ 1.1131, -1.0629]]) >>> torch.prod(a, 1) tensor([-0.2018, -0.2962, -0.0821, -1.1831]) 
- 
- 
torch.std()¶
- 
torch.std(input, unbiased=True) → Tensor
 - Returns the standard-deviation of all elements in the - inputtensor.- If - unbiasedis- False, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used.- Parameters
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[-0.8166, -1.3802, -0.3560]]) >>> torch.std(a) tensor(0.5130) - 
torch.std(input, dim, keepdim=False, unbiased=True, out=None) → Tensor
 - Returns the standard-deviation of each row of the - inputtensor in the dimension- dim. If- dimis a list of dimensions, reduce over all of them.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension(s)- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 (or- len(dim)) fewer dimension(s).- If - unbiasedis- False, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[ 0.2035, 1.2959, 1.8101, -0.4644], [ 1.5027, -0.3270, 0.5905, 0.6538], [-1.5745, 1.3330, -0.5596, -0.6548], [ 0.1264, -0.5080, 1.6420, 0.1992]]) >>> torch.std(a, dim=1) tensor([ 1.0311, 0.7477, 1.2204, 0.9087]) 
- 
- 
torch.sum()¶
- 
torch.sum(input, dtype=None) → Tensor
 - Returns the sum of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[ 0.1133, -0.9567, 0.2958]]) >>> torch.sum(a) tensor(-0.5475) - 
torch.sum(input, dim, keepdim=False, dtype=None) → Tensor
 - Returns the sum of each row of the - inputtensor in the given dimension- dim. If- dimis a list of dimensions, reduce over all of them.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension(s)- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 (or- len(dim)) fewer dimension(s).- Parameters
- input (Tensor) – the input tensor 
- dim (int or tuple of python:ints) – the dimension or dimensions to reduce 
- keepdim (bool) – whether the output tensor has - dimretained or not
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[ 0.0569, -0.2475, 0.0737, -0.3429], [-0.2993, 0.9138, 0.9337, -1.6864], [ 0.1132, 0.7892, -0.1003, 0.5688], [ 0.3637, -0.9906, -0.4752, -1.5197]]) >>> torch.sum(a, 1) tensor([-0.4598, -0.1381, 1.3708, -2.6217]) >>> b = torch.arange(4 * 5 * 6).view(4, 5, 6) >>> torch.sum(b, (2, 1)) tensor([ 435., 1335., 2235., 3135.]) 
- 
- 
torch.unique(input, sorted=True, return_inverse=False, dim=None)¶
- Returns the unique scalar elements of the input tensor as a 1-D tensor. - Parameters
- input (Tensor) – the input tensor 
- sorted (bool) – Whether to sort the unique elements in ascending order before returning as output. 
- return_inverse (bool) – Whether to also return the indices for where elements in the original input ended up in the returned unique list. 
- dim (int) – the dimension to apply unique. If - None, the unique of the flattened input is returned. default:- None
 
- Returns
- A tensor or a tuple of tensors containing - output (Tensor): the output list of unique scalar elements. 
- inverse_indices (Tensor): (optional) if - return_inverseis True, there will be a 2nd returned tensor (same shape as input) representing the indices for where elements in the original input map to in the output; otherwise, this function will only return a single tensor.
 
- Return type
 - Example: - >>> output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long)) >>> output tensor([ 2, 3, 1]) >>> output, inverse_indices = torch.unique( torch.tensor([1, 3, 2, 3], dtype=torch.long), sorted=True, return_inverse=True) >>> output tensor([ 1, 2, 3]) >>> inverse_indices tensor([ 0, 2, 1, 2]) >>> output, inverse_indices = torch.unique( torch.tensor([[1, 3], [2, 3]], dtype=torch.long), sorted=True, return_inverse=True) >>> output tensor([ 1, 2, 3]) >>> inverse_indices tensor([[ 0, 2], [ 1, 2]]) 
- 
torch.var()¶
- 
torch.var(input, unbiased=True) → Tensor
 - Returns the variance of all elements in the - inputtensor.- If - unbiasedis- False, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used.- Parameters
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[-0.3425, -1.2636, -0.4864]]) >>> torch.var(a) tensor(0.2455) - 
torch.var(input, dim, keepdim=False, unbiased=True, out=None) → Tensor
 - Returns the variance of each row of the - inputtensor in the given dimension- dim.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension(s)- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 (or- len(dim)) fewer dimension(s).- If - unbiasedis- False, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[-0.3567, 1.7385, -1.3042, 0.7423], [ 1.3436, -0.1015, -0.9834, -0.8438], [ 0.6056, 0.1089, -0.3112, -1.4085], [-0.7700, 0.6074, -0.1469, 0.7777]]) >>> torch.var(a, 1) tensor([ 1.7444, 1.1363, 0.7356, 0.5112]) 
- 
Comparison Ops¶
- 
torch.allclose(self, other, rtol=1e-05, atol=1e-08, equal_nan=False) → bool¶
- This function checks if all - selfand- othersatisfy the condition:\[\lvert \text{self} - \text{other} \rvert \leq \texttt{atol} + \texttt{rtol} \times \lvert \text{other} \rvert \]- elementwise, for all elements of - selfand- other. The behaviour of this function is analogous to numpy.allclose- Parameters
 - Example: - >>> torch.allclose(torch.tensor([10000., 1e-07]), torch.tensor([10000.1, 1e-08])) False >>> torch.allclose(torch.tensor([10000., 1e-08]), torch.tensor([10000.1, 1e-09])) True >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')])) False >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')]), equal_nan=True) True 
- 
torch.argsort(input, dim=-1, descending=False, out=None) → LongTensor¶
- Returns the indices that sort a tensor along a given dimension in ascending order by value. - This is the second value returned by - torch.sort(). See its documentation for the exact semantics of this method.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[ 0.0785, 1.5267, -0.8521, 0.4065], [ 0.1598, 0.0788, -0.0745, -1.2700], [ 1.2208, 1.0722, -0.7064, 1.2564], [ 0.0669, -0.2318, -0.8229, -0.9280]]) >>> torch.argsort(a, dim=1) tensor([[2, 0, 3, 1], [3, 2, 1, 0], [2, 1, 0, 3], [3, 2, 1, 0]]) 
- 
torch.eq(input, other, out=None) → Tensor¶
- Computes element-wise equality - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A - torch.ByteTensorcontaining a 1 at each location where comparison is true
- Return type
 - Example: - >>> torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 1, 0], [ 0, 1]], dtype=torch.uint8) 
- 
torch.equal(tensor1, tensor2) → bool¶
- Trueif two tensors have the same size and elements,- Falseotherwise.- Example: - >>> torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2])) True 
- 
torch.ge(input, other, out=None) → Tensor¶
- Computes \(\text{input} \geq \text{other}\) element-wise. - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A - torch.ByteTensorcontaining a 1 at each location where comparison is true
- Return type
 - Example: - >>> torch.ge(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 1, 1], [ 0, 1]], dtype=torch.uint8) 
- 
torch.gt(input, other, out=None) → Tensor¶
- Computes \(\text{input} > \text{other}\) element-wise. - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A - torch.ByteTensorcontaining a 1 at each location where comparison is true
- Return type
 - Example: - >>> torch.gt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 0, 1], [ 0, 0]], dtype=torch.uint8) 
- 
torch.isfinite(tensor)¶
- Returns a new tensor with boolean elements representing if each element is Finite or not. - Parameters
- tensor (Tensor) – A tensor to check 
- Returns
- A - torch.ByteTensorcontaining a 1 at each location of finite elements and 0 otherwise
- Return type
 - Example: - >>> torch.isfinite(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')])) tensor([ 1, 0, 1, 0, 0], dtype=torch.uint8) 
- 
torch.isinf(tensor)¶
- Returns a new tensor with boolean elements representing if each element is +/-INF or not. - Parameters
- tensor (Tensor) – A tensor to check 
- Returns
- A - torch.ByteTensorcontaining a 1 at each location of +/-INF elements and 0 otherwise
- Return type
 - Example: - >>> torch.isinf(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')])) tensor([ 0, 1, 0, 1, 0], dtype=torch.uint8) 
- 
torch.isnan()¶
- Returns a new tensor with boolean elements representing if each element is NaN or not. - Parameters
- tensor (Tensor) – A tensor to check 
- Returns
- A - torch.ByteTensorcontaining a 1 at each location of NaN elements.
- Return type
 - Example: - >>> torch.isnan(torch.tensor([1, float('nan'), 2])) tensor([ 0, 1, 0], dtype=torch.uint8) 
- 
torch.kthvalue(input, k, dim=None, keepdim=False, out=None) -> (Tensor, LongTensor)¶
- Returns a namedtuple - (values, indices)where- valuesis the- kth smallest element of each row of the- inputtensor in the given dimension- dim. And- indicesis the index location of each element found.- If - dimis not given, the last dimension of the input is chosen.- If - keepdimis- True, both the- valuesand- indicestensors are the same size as- input, except in the dimension- dimwhere they are of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in both the- valuesand- indicestensors having 1 fewer dimension than the- inputtensor.- Parameters
- input (Tensor) – the input tensor 
- k (int) – k for the k-th smallest element 
- dim (int, optional) – the dimension to find the kth value along 
- keepdim (bool) – whether the output tensors have - dimretained or not
- out (tuple, optional) – the output tuple of (Tensor, LongTensor) can be optionally given to be used as output buffers 
 
 - Example: - >>> x = torch.arange(1., 6.) >>> x tensor([ 1., 2., 3., 4., 5.]) >>> torch.kthvalue(x, 4) torch.return_types.kthvalue(values=tensor(4.), indices=tensor(3)) >>> x=torch.arange(1.,7.).resize_(2,3) >>> x tensor([[ 1., 2., 3.], [ 4., 5., 6.]]) >>> torch.kthvalue(x, 2, 0, True) torch.return_types.kthvalue(values=tensor([[4., 5., 6.]]), indices=tensor([[1, 1, 1]])) 
- 
torch.le(input, other, out=None) → Tensor¶
- Computes \(\text{input} \leq \text{other}\) element-wise. - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A - torch.ByteTensorcontaining a 1 at each location where comparison is true
- Return type
 - Example: - >>> torch.le(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 1, 0], [ 1, 1]], dtype=torch.uint8) 
- 
torch.lt(input, other, out=None) → Tensor¶
- Computes \(\text{input} < \text{other}\) element-wise. - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A torch.ByteTensor containing a 1 at each location where comparison is true 
- Return type
 - Example: - >>> torch.lt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 0, 0], [ 1, 0]], dtype=torch.uint8) 
- 
torch.max()¶
- 
torch.max(input) → Tensor
 - Returns the maximum value of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[ 0.6763, 0.7445, -2.2369]]) >>> torch.max(a) tensor(0.7445) - 
torch.max(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
 - Returns a namedtuple - (values, indices)where- valuesis the maximum value of each row of the- inputtensor in the given dimension- dim. And- indicesis the index location of each maximum value found (argmax).- If - keepdimis- True, the output tensors are of the same size as- inputexcept in the dimension- dimwhere they are of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensors having 1 fewer dimension than- input.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[-1.2360, -0.2942, -0.1222, 0.8475], [ 1.1949, -1.1127, -2.2379, -0.6702], [ 1.5717, -0.9207, 0.1297, -1.8768], [-0.6172, 1.0036, -0.6060, -0.2432]]) >>> torch.max(a, 1) torch.return_types.max(values=tensor([0.8475, 1.1949, 1.5717, 1.0036]), indices=tensor([3, 0, 0, 1])) - 
torch.max(input, other, out=None) → Tensor
 - Each element of the tensor - inputis compared with the corresponding element of the tensor- otherand an element-wise maximum is taken.- The shapes of - inputand- otherdon’t need to match, but they must be broadcastable.\[\text{out}_i = \max(\text{tensor}_i, \text{other}_i) \]- Note - When the shapes do not match, the shape of the returned output tensor follows the broadcasting rules. - Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.2942, -0.7416, 0.2653, -0.1584]) >>> b = torch.randn(4) >>> b tensor([ 0.8722, -1.7421, -0.4141, -0.5055]) >>> torch.max(a, b) tensor([ 0.8722, -0.7416, 0.2653, -0.1584]) 
- 
- 
torch.min()¶
- 
torch.min(input) → Tensor
 - Returns the minimum value of all elements in the - inputtensor.- Parameters
- input (Tensor) – the input tensor 
 - Example: - >>> a = torch.randn(1, 3) >>> a tensor([[ 0.6750, 1.0857, 1.7197]]) >>> torch.min(a) tensor(0.6750) - 
torch.min(input, dim, keepdim=False, out=None) -> (Tensor, LongTensor)
 - Returns a namedtuple - (values, indices)where- valuesis the minimum value of each row of the- inputtensor in the given dimension- dim. And- indicesis the index location of each minimum value found (argmin).- If - keepdimis- True, the output tensors are of the same size as- inputexcept in the dimension- dimwhere they are of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensors having 1 fewer dimension than- input.- Parameters
 - Example: - >>> a = torch.randn(4, 4) >>> a tensor([[-0.6248, 1.1334, -1.1899, -0.2803], [-1.4644, -0.2635, -0.3651, 0.6134], [ 0.2457, 0.0384, 1.0128, 0.7015], [-0.1153, 2.9849, 2.1458, 0.5788]]) >>> torch.min(a, 1) torch.return_types.min(values=tensor([-1.1899, -1.4644, 0.0384, -0.1153]), indices=tensor([2, 0, 1, 0])) - 
torch.min(input, other, out=None) → Tensor
 - Each element of the tensor - inputis compared with the corresponding element of the tensor- otherand an element-wise minimum is taken. The resulting tensor is returned.- The shapes of - inputand- otherdon’t need to match, but they must be broadcastable.\[\text{out}_i = \min(\text{tensor}_i, \text{other}_i) \]- Note - When the shapes do not match, the shape of the returned output tensor follows the broadcasting rules. - Parameters
 - Example: - >>> a = torch.randn(4) >>> a tensor([ 0.8137, -1.1740, -0.6460, 0.6308]) >>> b = torch.randn(4) >>> b tensor([-0.1369, 0.1555, 0.4019, -0.1929]) >>> torch.min(a, b) tensor([-0.1369, -1.1740, -0.6460, -0.1929]) 
- 
- 
torch.ne(input, other, out=None) → Tensor¶
- Computes \(input \neq other\) element-wise. - The second argument can be a number or a tensor whose shape is broadcastable with the first argument. - Parameters
- Returns
- A - torch.ByteTensorcontaining a 1 at each location where comparison is true.
- Return type
 - Example: - >>> torch.ne(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ 0, 1], [ 1, 0]], dtype=torch.uint8) 
- 
torch.sort(input, dim=-1, descending=False, out=None) -> (Tensor, LongTensor)¶
- Sorts the elements of the - inputtensor along a given dimension in ascending order by value.- If - dimis not given, the last dimension of the input is chosen.- If - descendingis- Truethen the elements are sorted in descending order by value.- A tuple of (sorted_tensor, sorted_indices) is returned, where the sorted_indices are the indices of the elements in the original input tensor. - Parameters
 - Example: - >>> x = torch.randn(3, 4) >>> sorted, indices = torch.sort(x) >>> sorted tensor([[-0.2162, 0.0608, 0.6719, 2.3332], [-0.5793, 0.0061, 0.6058, 0.9497], [-0.5071, 0.3343, 0.9553, 1.0960]]) >>> indices tensor([[ 1, 0, 2, 3], [ 3, 1, 0, 2], [ 0, 3, 1, 2]]) >>> sorted, indices = torch.sort(x, 0) >>> sorted tensor([[-0.5071, -0.2162, 0.6719, -0.5793], [ 0.0608, 0.0061, 0.9497, 0.3343], [ 0.6058, 0.9553, 1.0960, 2.3332]]) >>> indices tensor([[ 2, 0, 0, 1], [ 0, 1, 1, 2], [ 1, 2, 2, 0]]) 
- 
torch.topk(input, k, dim=None, largest=True, sorted=True, out=None) -> (Tensor, LongTensor)¶
- Returns the - klargest elements of the given- inputtensor along a given dimension.- If - dimis not given, the last dimension of the input is chosen.- If - largestis- Falsethen the k smallest elements are returned.- A tuple of (values, indices) is returned, where the indices are the indices of the elements in the original input tensor. - The boolean option - sortedif- True, will make sure that the returned k elements are themselves sorted- Parameters
- input (Tensor) – the input tensor 
- k (int) – the k in “top-k” 
- dim (int, optional) – the dimension to sort along 
- largest (bool, optional) – controls whether to return largest or smallest elements 
- sorted (bool, optional) – controls whether to return the elements in sorted order 
- out (tuple, optional) – the output tuple of (Tensor, LongTensor) that can be optionally given to be used as output buffers 
 
 - Example: - >>> x = torch.arange(1., 6.) >>> x tensor([ 1., 2., 3., 4., 5.]) >>> torch.topk(x, 3) (tensor([ 5., 4., 3.]), tensor([ 4, 3, 2])) 
Spectral Ops¶
- 
torch.fft(input, signal_ndim, normalized=False) → Tensor¶
- Complex-to-complex Discrete Fourier Transform - This method computes the complex-to-complex discrete Fourier transform. Ignoring the batch dimensions, it computes the following expression: \[X[\omega_1, \dots, \omega_d] = \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x[n_1, \dots, n_d] e^{-j\ 2 \pi \sum_{i=0}^d \frac{\omega_i n_i}{N_i}}, \]- where \(d\) = - signal_ndimis number of dimensions for the signal, and \(N_i\) is the size of signal dimension \(i\).- This method supports 1D, 2D and 3D complex-to-complex transforms, indicated by - signal_ndim.- inputmust be a tensor with last dimension of size 2, representing the real and imaginary components of complex numbers, and should have at least- signal_ndim + 1dimensions with optionally arbitrary number of leading batch dimensions. If- normalizedis set to- True, this normalizes the result by dividing it with \(\sqrt{\prod_{i=1}^K N_i}\) so that the operator is unitary.- Returns the real and the imaginary parts together as one tensor of the same shape of - input.- The inverse of this function is - ifft().- Note - For CUDA tensors, an LRU cache is used for cuFFT plans to speed up repeatedly running FFT methods on tensors of same geometry with same same configuration. - Changing - torch.backends.cuda.cufft_plan_cache.max_size(default is 4096 on CUDA 10 and newer, and 1023 on older CUDA versions) controls the capacity of this cache. Some cuFFT plans may allocate GPU memory. You can use- torch.backends.cuda.cufft_plan_cache.sizeto query the number of plans currently in cache, and- torch.backends.cuda.cufft_plan_cache.clear()to clear the cache.- Warning - For CPU tensors, this method is currently only available with MKL. Use - torch.backends.mkl.is_available()to check if MKL is installed.- Parameters
- Returns
- A tensor containing the complex-to-complex Fourier transform result 
- Return type
 - Example: - >>> # unbatched 2D FFT >>> x = torch.randn(4, 3, 2) >>> torch.fft(x, 2) tensor([[[-0.0876, 1.7835], [-2.0399, -2.9754], [ 4.4773, -5.0119]], [[-1.5716, 2.7631], [-3.8846, 5.2652], [ 0.2046, -0.7088]], [[ 1.9938, -0.5901], [ 6.5637, 6.4556], [ 2.9865, 4.9318]], [[ 7.0193, 1.1742], [-1.3717, -2.1084], [ 2.0289, 2.9357]]]) >>> # batched 1D FFT >>> torch.fft(x, 1) tensor([[[ 1.8385, 1.2827], [-0.1831, 1.6593], [ 2.4243, 0.5367]], [[-0.9176, -1.5543], [-3.9943, -2.9860], [ 1.2838, -2.9420]], [[-0.8854, -0.6860], [ 2.4450, 0.0808], [ 1.3076, -0.5768]], [[-0.1231, 2.7411], [-0.3075, -1.7295], [-0.5384, -2.0299]]]) >>> # arbitrary number of batch dimensions, 2D FFT >>> x = torch.randn(3, 3, 5, 5, 2) >>> y = torch.fft(x, 2) >>> y.shape torch.Size([3, 3, 5, 5, 2]) 
- 
torch.ifft(input, signal_ndim, normalized=False) → Tensor¶
- Complex-to-complex Inverse Discrete Fourier Transform - This method computes the complex-to-complex inverse discrete Fourier transform. Ignoring the batch dimensions, it computes the following expression: \[X[\omega_1, \dots, \omega_d] = \frac{1}{\prod_{i=1}^d N_i} \sum_{n_1=0}^{N_1-1} \dots \sum_{n_d=0}^{N_d-1} x[n_1, \dots, n_d] e^{\ j\ 2 \pi \sum_{i=0}^d \frac{\omega_i n_i}{N_i}}, \]- where \(d\) = - signal_ndimis number of dimensions for the signal, and \(N_i\) is the size of signal dimension \(i\).- The argument specifications are almost identical with - fft(). However, if- normalizedis set to- True, this instead returns the results multiplied by \(\sqrt{\prod_{i=1}^d N_i}\), to become a unitary operator. Therefore, to invert a- fft(), the- normalizedargument should be set identically for- fft().- Returns the real and the imaginary parts together as one tensor of the same shape of - input.- The inverse of this function is - fft().- Note - For CUDA tensors, an LRU cache is used for cuFFT plans to speed up repeatedly running FFT methods on tensors of same geometry with same same configuration. - Changing - torch.backends.cuda.cufft_plan_cache.max_size(default is 4096 on CUDA 10 and newer, and 1023 on older CUDA versions) controls the capacity of this cache. Some cuFFT plans may allocate GPU memory. You can use- torch.backends.cuda.cufft_plan_cache.sizeto query the number of plans currently in cache, and- torch.backends.cuda.cufft_plan_cache.clear()to clear the cache.- Warning - For CPU tensors, this method is currently only available with MKL. Use - torch.backends.mkl.is_available()to check if MKL is installed.- Parameters
- Returns
- A tensor containing the complex-to-complex inverse Fourier transform result 
- Return type
 - Example: - >>> x = torch.randn(3, 3, 2) >>> x tensor([[[ 1.2766, 1.3680], [-0.8337, 2.0251], [ 0.9465, -1.4390]], [[-0.1890, 1.6010], [ 1.1034, -1.9230], [-0.9482, 1.0775]], [[-0.7708, -0.8176], [-0.1843, -0.2287], [-1.9034, -0.2196]]]) >>> y = torch.fft(x, 2) >>> torch.ifft(y, 2) # recover x tensor([[[ 1.2766, 1.3680], [-0.8337, 2.0251], [ 0.9465, -1.4390]], [[-0.1890, 1.6010], [ 1.1034, -1.9230], [-0.9482, 1.0775]], [[-0.7708, -0.8176], [-0.1843, -0.2287], [-1.9034, -0.2196]]]) 
- 
torch.rfft(input, signal_ndim, normalized=False, onesided=True) → Tensor¶
- Real-to-complex Discrete Fourier Transform - This method computes the real-to-complex discrete Fourier transform. It is mathematically equivalent with - fft()with differences only in formats of the input and output.- This method supports 1D, 2D and 3D real-to-complex transforms, indicated by - signal_ndim.- inputmust be a tensor with at least- signal_ndimdimensions with optionally arbitrary number of leading batch dimensions. If- normalizedis set to- True, this normalizes the result by dividing it with \(\sqrt{\prod_{i=1}^K N_i}\) so that the operator is unitary, where \(N_i\) is the size of signal dimension \(i\).- The real-to-complex Fourier transform results follow conjugate symmetry: \[X[\omega_1, \dots, \omega_d] = X^*[N_1 - \omega_1, \dots, N_d - \omega_d], \]- where the index arithmetic is computed modulus the size of the corresponding dimension, \(\ ^*\) is the conjugate operator, and \(d\) = - signal_ndim.- onesidedflag controls whether to avoid redundancy in the output results. If set to- True(default), the output will not be full complex result of shape \((*, 2)\), where \(*\) is the shape of- input, but instead the last dimension will be halfed as of size \(\lfloor \frac{N_d}{2} \rfloor + 1\).- The inverse of this function is - irfft().- Note - For CUDA tensors, an LRU cache is used for cuFFT plans to speed up repeatedly running FFT methods on tensors of same geometry with same same configuration. - Changing - torch.backends.cuda.cufft_plan_cache.max_size(default is 4096 on CUDA 10 and newer, and 1023 on older CUDA versions) controls the capacity of this cache. Some cuFFT plans may allocate GPU memory. You can use- torch.backends.cuda.cufft_plan_cache.sizeto query the number of plans currently in cache, and- torch.backends.cuda.cufft_plan_cache.clear()to clear the cache.- Warning - For CPU tensors, this method is currently only available with MKL. Use - torch.backends.mkl.is_available()to check if MKL is installed.- Parameters
- input (Tensor) – the input tensor of at least - signal_ndimdimensions
- signal_ndim (int) – the number of dimensions in each signal. - signal_ndimcan only be 1, 2 or 3
- normalized (bool, optional) – controls whether to return normalized results. Default: - False
- onesided (bool, optional) – controls whether to return half of results to avoid redundancy. Default: - True
 
- Returns
- A tensor containing the real-to-complex Fourier transform result 
- Return type
 - Example: - >>> x = torch.randn(5, 5) >>> torch.rfft(x, 2).shape torch.Size([5, 3, 2]) >>> torch.rfft(x, 2, onesided=False).shape torch.Size([5, 5, 2]) 
- 
torch.irfft(input, signal_ndim, normalized=False, onesided=True, signal_sizes=None) → Tensor¶
- Complex-to-real Inverse Discrete Fourier Transform - This method computes the complex-to-real inverse discrete Fourier transform. It is mathematically equivalent with - ifft()with differences only in formats of the input and output.- The argument specifications are almost identical with - ifft(). Similar to- ifft(), if- normalizedis set to- True, this normalizes the result by multiplying it with \(\sqrt{\prod_{i=1}^K N_i}\) so that the operator is unitary, where \(N_i\) is the size of signal dimension \(i\).- Due to the conjugate symmetry, - inputdo not need to contain the full complex frequency values. Roughly half of the values will be sufficient, as is the case when- inputis given by- rfft()with- rfft(signal, onesided=True). In such case, set the- onesidedargument of this method to- True. Moreover, the original signal shape information can sometimes be lost, optionally set- signal_sizesto be the size of the original signal (without the batch dimensions if in batched mode) to recover it with correct shape.- Therefore, to invert an - rfft(), the- normalizedand- onesidedarguments should be set identically for- irfft(), and preferrably a- signal_sizesis given to avoid size mismatch. See the example below for a case of size mismatch.- See - rfft()for details on conjugate symmetry.- The inverse of this function is - rfft().- Warning - Generally speaking, the input of this function should contain values following conjugate symmetry. Note that even if - onesidedis- True, often symmetry on some part is still needed. When this requirement is not satisfied, the behavior of- irfft()is undefined. Since- torch.autograd.gradcheck()estimates numerical Jacobian with point perturbations,- irfft()will almost certainly fail the check.- Note - For CUDA tensors, an LRU cache is used for cuFFT plans to speed up repeatedly running FFT methods on tensors of same geometry with same same configuration. - Changing - torch.backends.cuda.cufft_plan_cache.max_size(default is 4096 on CUDA 10 and newer, and 1023 on older CUDA versions) controls the capacity of this cache. Some cuFFT plans may allocate GPU memory. You can use- torch.backends.cuda.cufft_plan_cache.sizeto query the number of plans currently in cache, and- torch.backends.cuda.cufft_plan_cache.clear()to clear the cache.- Warning - For CPU tensors, this method is currently only available with MKL. Use - torch.backends.mkl.is_available()to check if MKL is installed.- Parameters
- input (Tensor) – the input tensor of at least - signal_ndim- + 1dimensions
- signal_ndim (int) – the number of dimensions in each signal. - signal_ndimcan only be 1, 2 or 3
- normalized (bool, optional) – controls whether to return normalized results. Default: - False
- onesided (bool, optional) – controls whether - inputwas halfed to avoid redundancy, e.g., by- rfft(). Default:- True
- signal_sizes (list or - torch.Size, optional) – the size of the original signal (without batch dimension). Default:- None
 
- Returns
- A tensor containing the complex-to-real inverse Fourier transform result 
- Return type
 - Example: - >>> x = torch.randn(4, 4) >>> torch.rfft(x, 2, onesided=True).shape torch.Size([4, 3, 2]) >>> >>> # notice that with onesided=True, output size does not determine the original signal size >>> x = torch.randn(4, 5) >>> torch.rfft(x, 2, onesided=True).shape torch.Size([4, 3, 2]) >>> >>> # now we use the original shape to recover x >>> x tensor([[-0.8992, 0.6117, -1.6091, -0.4155, -0.8346], [-2.1596, -0.0853, 0.7232, 0.1941, -0.0789], [-2.0329, 1.1031, 0.6869, -0.5042, 0.9895], [-0.1884, 0.2858, -1.5831, 0.9917, -0.8356]]) >>> y = torch.rfft(x, 2, onesided=True) >>> torch.irfft(y, 2, onesided=True, signal_sizes=x.shape) # recover x tensor([[-0.8992, 0.6117, -1.6091, -0.4155, -0.8346], [-2.1596, -0.0853, 0.7232, 0.1941, -0.0789], [-2.0329, 1.1031, 0.6869, -0.5042, 0.9895], [-0.1884, 0.2858, -1.5831, 0.9917, -0.8356]]) 
- 
torch.stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=True)¶
- Short-time Fourier transform (STFT). - Ignoring the optional batch dimension, this method computes the following expression: \[X[m, \omega] = \sum_{k = 0}^{\text{win\_length-1}}% \text{window}[k]\ \text{input}[m \times \text{hop\_length} + k]\ % \exp\left(- j \frac{2 \pi \cdot \omega k}{\text{win\_length}}\right), \]- where \(m\) is the index of the sliding window, and \(\omega\) is the frequency that \(0 \leq \omega < \text{n\_fft}\). When - onesidedis the default value- True,- inputmust be either a 1-D time sequence or a 2-D batch of time sequences.
- If - hop_lengthis- None(default), it is treated as equal to- floor(n_fft / 4).
- If - win_lengthis- None(default), it is treated as equal to- n_fft.
- windowcan be a 1-D tensor of size- win_length, e.g., from- torch.hann_window(). If- windowis- None(default), it is treated as if having \(1\) everywhere in the window. If \(\text{win\_length} < \text{n\_fft}\),- windowwill be padded on both sides to length- n_fftbefore being applied.
- If - centeris- True(default),- inputwill be padded on both sides so that the \(t\)-th frame is centered at time \(t \times \text{hop\_length}\). Otherwise, the \(t\)-th frame begins at time \(t \times \text{hop\_length}\).
- pad_modedetermines the padding method used on- inputwhen- centeris- True. See- torch.nn.functional.pad()for all available options. Default is- "reflect".
- If - onesidedis- True(default), only values for \(\omega\) in \(\left[0, 1, 2, \dots, \left\lfloor \frac{\text{n\_fft}}{2} \right\rfloor + 1\right]\) are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., \(X[m, \omega] = X[m, \text{n\_fft} - \omega]^*\).
- If - normalizedis- True(default is- False), the function returns the normalized STFT results, i.e., multiplied by \((\text{frame\_length})^{-0.5}\).
 - Returns the real and the imaginary parts together as one tensor of size \((* \times N \times T \times 2)\), where \(*\) is the optional batch size of - input, \(N\) is the number of frequencies where STFT is applied, \(T\) is the total number of frames used, and each pair in the last dimension represents a complex number as the real part and the imaginary part.- Warning - This function changed signature at version 0.4.1. Calling with the previous signature may cause error or return incorrect result. - Parameters
- input (Tensor) – the input tensor 
- n_fft (int) – size of Fourier transform 
- hop_length (int, optional) – the distance between neighboring sliding window frames. Default: - None(treated as equal to- floor(n_fft / 4))
- win_length (int, optional) – the size of window frame and STFT filter. Default: - None(treated as equal to- n_fft)
- window (Tensor, optional) – the optional window function. Default: - None(treated as window of all \(1\) s)
- center (bool, optional) – whether to pad - inputon both sides so that the \(t\)-th frame is centered at time \(t \times \text{hop\_length}\). Default:- True
- pad_mode (string, optional) – controls the padding method used when - centeris- True. Default:- "reflect"
- normalized (bool, optional) – controls whether to return the normalized STFT results Default: - False
- onesided (bool, optional) – controls whether to return half of results to avoid redundancy Default: - True
 
- Returns
- A tensor containing the STFT result with shape described above 
- Return type
 
- 
torch.bartlett_window(window_length, periodic=True, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Bartlett window function. \[w[n] = 1 - \left| \frac{2n}{N-1} - 1 \right| = \begin{cases} \frac{2n}{N - 1} & \text{if } 0 \leq n \leq \frac{N - 1}{2} \\ 2 - \frac{2n}{N - 1} & \text{if } \frac{N - 1}{2} < n < N \\ \end{cases}, \]- where \(N\) is the full window size. - The input - window_lengthis a positive integer controlling the returned window size.- periodicflag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like- torch.stft(). Therefore, if- periodicis true, the \(N\) in above formula is in fact \(\text{window\_length} + 1\). Also, we always have- torch.bartlett_window(L, periodic=True)equal to- torch.bartlett_window(L + 1, periodic=False)[:-1]).- Note - If - window_length\(=1\), the returned window contains a single value 1.- Parameters
- window_length (int) – the size of returned window 
- periodic (bool, optional) – If True, returns a window to be used as periodic function. If False, return a symmetric window. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()). Only floating point types are supported.
- layout ( - torch.layout, optional) – the desired layout of returned window tensor. Only- torch.strided(dense layout) is supported.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
- Returns
- A 1-D tensor of size \((\text{window\_length},)\) containing the window 
- Return type
 
- 
torch.blackman_window(window_length, periodic=True, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Blackman window function. \[w[n] = 0.42 - 0.5 \cos \left( \frac{2 \pi n}{N - 1} \right) + 0.08 \cos \left( \frac{4 \pi n}{N - 1} \right) \]- where \(N\) is the full window size. - The input - window_lengthis a positive integer controlling the returned window size.- periodicflag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like- torch.stft(). Therefore, if- periodicis true, the \(N\) in above formula is in fact \(\text{window\_length} + 1\). Also, we always have- torch.blackman_window(L, periodic=True)equal to- torch.blackman_window(L + 1, periodic=False)[:-1]).- Note - If - window_length\(=1\), the returned window contains a single value 1.- Parameters
- window_length (int) – the size of returned window 
- periodic (bool, optional) – If True, returns a window to be used as periodic function. If False, return a symmetric window. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()). Only floating point types are supported.
- layout ( - torch.layout, optional) – the desired layout of returned window tensor. Only- torch.strided(dense layout) is supported.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
- Returns
- A 1-D tensor of size \((\text{window\_length},)\) containing the window 
- Return type
 
- 
torch.hamming_window(window_length, periodic=True, alpha=0.54, beta=0.46, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Hamming window function. \[w[n] = \alpha - \beta\ \cos \left( \frac{2 \pi n}{N - 1} \right), \]- where \(N\) is the full window size. - The input - window_lengthis a positive integer controlling the returned window size.- periodicflag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like- torch.stft(). Therefore, if- periodicis true, the \(N\) in above formula is in fact \(\text{window\_length} + 1\). Also, we always have- torch.hamming_window(L, periodic=True)equal to- torch.hamming_window(L + 1, periodic=False)[:-1]).- Note - If - window_length\(=1\), the returned window contains a single value 1.- Note - This is a generalized version of - torch.hann_window().- Parameters
- window_length (int) – the size of returned window 
- periodic (bool, optional) – If True, returns a window to be used as periodic function. If False, return a symmetric window. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()). Only floating point types are supported.
- layout ( - torch.layout, optional) – the desired layout of returned window tensor. Only- torch.strided(dense layout) is supported.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
- Returns
- A 1-D tensor of size \((\text{window\_length},)\) containing the window 
- Return type
 
- 
torch.hann_window(window_length, periodic=True, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor¶
- Hann window function. \[w[n] = \frac{1}{2}\ \left[1 - \cos \left( \frac{2 \pi n}{N - 1} \right)\right] = \sin^2 \left( \frac{\pi n}{N - 1} \right), \]- where \(N\) is the full window size. - The input - window_lengthis a positive integer controlling the returned window size.- periodicflag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like- torch.stft(). Therefore, if- periodicis true, the \(N\) in above formula is in fact \(\text{window\_length} + 1\). Also, we always have- torch.hann_window(L, periodic=True)equal to- torch.hann_window(L + 1, periodic=False)[:-1]).- Note - If - window_length\(=1\), the returned window contains a single value 1.- Parameters
- window_length (int) – the size of returned window 
- periodic (bool, optional) – If True, returns a window to be used as periodic function. If False, return a symmetric window. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None, uses a global default (see- torch.set_default_tensor_type()). Only floating point types are supported.
- layout ( - torch.layout, optional) – the desired layout of returned window tensor. Only- torch.strided(dense layout) is supported.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
- Returns
- A 1-D tensor of size \((\text{window\_length},)\) containing the window 
- Return type
 
Other Operations¶
- 
torch.bincount(self, weights=None, minlength=0) → Tensor¶
- Count the frequency of each value in an array of non-negative ints. - The number of bins (size 1) is one larger than the largest value in - inputunless- inputis empty, in which case the result is a tensor of size 0. If- minlengthis specified, the number of bins is at least- minlengthand if- inputis empty, then the result is tensor of size- minlengthfilled with zeros. If- nis the value at position- i,- out[n] += weights[i]if- weightsis specified else- out[n] += 1.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
- Returns
- a tensor of shape - Size([max(input) + 1])if- inputis non-empty, else- Size(0)
- Return type
- output (Tensor) 
 - Example: - >>> input = torch.randint(0, 8, (5,), dtype=torch.int64) >>> weights = torch.linspace(0, 1, steps=5) >>> input, weights (tensor([4, 3, 6, 3, 4]), tensor([ 0.0000, 0.2500, 0.5000, 0.7500, 1.0000]) >>> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1]) >>> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000]) 
- 
torch.broadcast_tensors(*tensors) → List of Tensors¶
- Broadcasts the given tensors according to broadcasting-semantics. - Parameters
- *tensors – any number of tensors of the same type 
 - Warning - More than one element of a broadcasted tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. - Example: - >>> x = torch.arange(3).view(1, 3) >>> y = torch.arange(2).view(2, 1) >>> a, b = torch.broadcast_tensors(x, y) >>> a.size() torch.Size([2, 3]) >>> a tensor([[0, 1, 2], [0, 1, 2]]) 
- 
torch.cartesian_prod(*tensors)¶
- Do cartesian product of the given sequence of tensors. The behavior is similar to python’s itertools.product. - Parameters
- *tensors – any number of 1 dimensional tensors. 
- Returns
- A tensor equivalent to converting all the input tensors into lists,
- do itertools.product on these lists, and finally convert the resulting list into tensor. 
 
- Return type
 - Example: - >>> a = [1, 2, 3] >>> b = [4, 5] >>> list(itertools.product(a, b)) [(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)] >>> tensor_a = torch.tensor(a) >>> tensor_b = torch.tensor(b) >>> torch.cartesian_prod(tensor_a, tensor_b) tensor([[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]]) 
- 
torch.combinations(tensor, r=2, with_replacement=False) → seq¶
- Compute combinations of length \(r\) of the given tensor. The behavior is similar to python’s itertools.combinations when with_replacement is set to False, and itertools.combinations_with_replacement when with_replacement is set to True. - Parameters
- Returns
- A tensor equivalent to converting all the input tensors into lists, do itertools.combinations or itertools.combinations_with_replacement on these lists, and finally convert the resulting list into tensor. 
- Return type
 - Example: - >>> a = [1, 2, 3] >>> list(itertools.combinations(a, r=2)) [(1, 2), (1, 3), (2, 3)] >>> list(itertools.combinations(a, r=3)) [(1, 2, 3)] >>> list(itertools.combinations_with_replacement(a, r=2)) [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)] >>> tensor_a = torch.tensor(a) >>> torch.combinations(tensor_a) tensor([[1, 2], [1, 3], [2, 3]]) >>> torch.combinations(tensor_a, r=3) tensor([[1, 2, 3]]) >>> torch.combinations(tensor_a, with_replacement=True) tensor([[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 3]]) 
- 
torch.cross(input, other, dim=-1, out=None) → Tensor¶
- Returns the cross product of vectors in dimension - dimof- inputand- other.- inputand- othermust have the same size, and the size of their- dimdimension should be 3.- If - dimis not given, it defaults to the first dimension found with the size 3.- Parameters
 - Example: - >>> a = torch.randn(4, 3) >>> a tensor([[-0.3956, 1.1455, 1.6895], [-0.5849, 1.3672, 0.3599], [-1.1626, 0.7180, -0.0521], [-0.1339, 0.9902, -2.0225]]) >>> b = torch.randn(4, 3) >>> b tensor([[-0.0257, -1.4725, -1.2251], [-1.1479, -0.7005, -1.9757], [-1.3904, 0.3726, -1.1836], [-0.9688, -0.7153, 0.2159]]) >>> torch.cross(a, b, dim=1) tensor([[ 1.0844, -0.5281, 0.6120], [-2.4490, -1.5687, 1.9792], [-0.8304, -1.3037, 0.5650], [-1.2329, 1.9883, 1.0551]]) >>> torch.cross(a, b) tensor([[ 1.0844, -0.5281, 0.6120], [-2.4490, -1.5687, 1.9792], [-0.8304, -1.3037, 0.5650], [-1.2329, 1.9883, 1.0551]]) 
- 
torch.diag(input, diagonal=0, out=None) → Tensor¶
- If - inputis a vector (1-D tensor), then returns a 2-D square tensor with the elements of- inputas the diagonal.
- If - inputis a matrix (2-D tensor), then returns a 1-D tensor with the diagonal elements of- input.
 - The argument - diagonalcontrols which diagonal to consider:- If - diagonal= 0, it is the main diagonal.
- If - diagonal> 0, it is above the main diagonal.
- If - diagonal< 0, it is below the main diagonal.
 - Parameters
 - See also - torch.diagonal()always returns the diagonal of its input.- torch.diagflat()always constructs a tensor with diagonal elements specified by the input.- Examples: - Get the square matrix where the input vector is the diagonal: - >>> a = torch.randn(3) >>> a tensor([ 0.5950,-0.0872, 2.3298]) >>> torch.diag(a) tensor([[ 0.5950, 0.0000, 0.0000], [ 0.0000,-0.0872, 0.0000], [ 0.0000, 0.0000, 2.3298]]) >>> torch.diag(a, 1) tensor([[ 0.0000, 0.5950, 0.0000, 0.0000], [ 0.0000, 0.0000,-0.0872, 0.0000], [ 0.0000, 0.0000, 0.0000, 2.3298], [ 0.0000, 0.0000, 0.0000, 0.0000]]) - Get the k-th diagonal of a given matrix: - >>> a = torch.randn(3, 3) >>> a tensor([[-0.4264, 0.0255,-0.1064], [ 0.8795,-0.2429, 0.1374], [ 0.1029,-0.6482,-1.6300]]) >>> torch.diag(a, 0) tensor([-0.4264,-0.2429,-1.6300]) >>> torch.diag(a, 1) tensor([ 0.0255, 0.1374]) 
- 
torch.diag_embed(input, offset=0, dim1=-2, dim2=-1) → Tensor¶
- Creates a tensor whose diagonals of certain 2D planes (specified by - dim1and- dim2) are filled by- input. To facilitate creating batched diagonal matrices, the 2D planes formed by the last two dimensions of the returned tensor are chosen by default.- The argument - offsetcontrols which diagonal to consider:- If - offset= 0, it is the main diagonal.
- If - offset> 0, it is above the main diagonal.
- If - offset< 0, it is below the main diagonal.
 - The size of the new matrix will be calculated to make the specified diagonal of the size of the last input dimension. Note that for - offsetother than \(0\), the order of- dim1and- dim2matters. Exchanging them is equivalent to changing the sign of- offset.- Applying - torch.diagonal()to the output of this function with the same arguments yields a matrix identical to input. However,- torch.diagonal()has different default dimensions, so those need to be explicitly specified.- Parameters
- input (Tensor) – the input tensor. Must be at least 1-dimensional. 
- offset (int, optional) – which diagonal to consider. Default: 0 (main diagonal). 
- dim1 (int, optional) – first dimension with respect to which to take diagonal. Default: -2. 
- dim2 (int, optional) – second dimension with respect to which to take diagonal. Default: -1. 
 
 - Example: - >>> a = torch.randn(2, 3) >>> torch.diag_embed(a) tensor([[[ 1.5410, 0.0000, 0.0000], [ 0.0000, -0.2934, 0.0000], [ 0.0000, 0.0000, -2.1788]], [[ 0.5684, 0.0000, 0.0000], [ 0.0000, -1.0845, 0.0000], [ 0.0000, 0.0000, -1.3986]]]) >>> torch.diag_embed(a, offset=1, dim1=0, dim2=2) tensor([[[ 0.0000, 1.5410, 0.0000, 0.0000], [ 0.0000, 0.5684, 0.0000, 0.0000]], [[ 0.0000, 0.0000, -0.2934, 0.0000], [ 0.0000, 0.0000, -1.0845, 0.0000]], [[ 0.0000, 0.0000, 0.0000, -2.1788], [ 0.0000, 0.0000, 0.0000, -1.3986]], [[ 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000]]]) 
- 
torch.diagflat(input, diagonal=0) → Tensor¶
- If - inputis a vector (1-D tensor), then returns a 2-D square tensor with the elements of- inputas the diagonal.
- If - inputis a tensor with more than one dimension, then returns a 2-D tensor with diagonal elements equal to a flattened- input.
 - The argument - offsetcontrols which diagonal to consider:- If - offset= 0, it is the main diagonal.
- If - offset> 0, it is above the main diagonal.
- If - offset< 0, it is below the main diagonal.
 - Parameters
 - Examples: - >>> a = torch.randn(3) >>> a tensor([-0.2956, -0.9068, 0.1695]) >>> torch.diagflat(a) tensor([[-0.2956, 0.0000, 0.0000], [ 0.0000, -0.9068, 0.0000], [ 0.0000, 0.0000, 0.1695]]) >>> torch.diagflat(a, 1) tensor([[ 0.0000, -0.2956, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.9068, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.1695], [ 0.0000, 0.0000, 0.0000, 0.0000]]) >>> a = torch.randn(2, 2) >>> a tensor([[ 0.2094, -0.3018], [-0.1516, 1.9342]]) >>> torch.diagflat(a) tensor([[ 0.2094, 0.0000, 0.0000, 0.0000], [ 0.0000, -0.3018, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.1516, 0.0000], [ 0.0000, 0.0000, 0.0000, 1.9342]]) 
- 
torch.diagonal(input, offset=0, dim1=0, dim2=1) → Tensor¶
- Returns a partial view of - inputwith the its diagonal elements with respect to- dim1and- dim2appended as a dimension at the end of the shape.- The argument - offsetcontrols which diagonal to consider:- If - offset= 0, it is the main diagonal.
- If - offset> 0, it is above the main diagonal.
- If - offset< 0, it is below the main diagonal.
 - Applying - torch.diag_embed()to the output of this function with the same arguments yields a diagonal matrix with the diagonal entries of the input. However,- torch.diag_embed()has different default dimensions, so those need to be explicitly specified.- Parameters
- input (Tensor) – the input tensor. Must be at least 2-dimensional. 
- offset (int, optional) – which diagonal to consider. Default: 0 (main diagonal). 
- dim1 (int, optional) – first dimension with respect to which to take diagonal. Default: 0. 
- dim2 (int, optional) – second dimension with respect to which to take diagonal. Default: 1. 
 
 - Note - To take a batch diagonal, pass in dim1=-2, dim2=-1. - Examples: - >>> a = torch.randn(3, 3) >>> a tensor([[-1.0854, 1.1431, -0.1752], [ 0.8536, -0.0905, 0.0360], [ 0.6927, -0.3735, -0.4945]]) >>> torch.diagonal(a, 0) tensor([-1.0854, -0.0905, -0.4945]) >>> torch.diagonal(a, 1) tensor([ 1.1431, 0.0360]) >>> x = torch.randn(2, 5, 4, 2) >>> torch.diagonal(x, offset=-1, dim1=1, dim2=2) tensor([[[-1.2631, 0.3755, -1.5977, -1.8172], [-1.1065, 1.0401, -0.2235, -0.7938]], [[-1.7325, -0.3081, 0.6166, 0.2335], [ 1.0500, 0.7336, -0.3836, -1.1015]]]) 
- 
torch.einsum(equation, *operands) → Tensor¶
- This function provides a way of computing multilinear expressions (i.e. sums of products) using the Einstein summation convention. - Parameters
- equation (string) – The equation is given in terms of lower case letters (indices) to be associated with each dimension of the operands and result. The left hand side lists the operands dimensions, separated by commas. There should be one index letter per tensor dimension. The right hand side follows after -> and gives the indices for the output. If the -> and right hand side are omitted, it implicitly defined as the alphabetically sorted list of all indices appearing exactly once in the left hand side. The indices not apprearing in the output are summed over after multiplying the operands entries. If an index appears several times for the same operand, a diagonal is taken. Ellipses … represent a fixed number of dimensions. If the right hand side is inferred, the ellipsis dimensions are at the beginning of the output. 
- operands (list of Tensors) – The operands to compute the Einstein sum of. 
 
 - Examples: - >>> x = torch.randn(5) >>> y = torch.randn(4) >>> torch.einsum('i,j->ij', x, y) # outer product tensor([[-0.0570, -0.0286, -0.0231, 0.0197], [ 1.2616, 0.6335, 0.5113, -0.4351], [ 1.4452, 0.7257, 0.5857, -0.4984], [-0.4647, -0.2333, -0.1883, 0.1603], [-1.1130, -0.5588, -0.4510, 0.3838]]) >>> A = torch.randn(3,5,4) >>> l = torch.randn(2,5) >>> r = torch.randn(2,4) >>> torch.einsum('bn,anm,bm->ba', l, A, r) # compare torch.nn.functional.bilinear tensor([[-0.3430, -5.2405, 0.4494], [ 0.3311, 5.5201, -3.0356]]) >>> As = torch.randn(3,2,5) >>> Bs = torch.randn(3,5,4) >>> torch.einsum('bij,bjk->bik', As, Bs) # batch matrix multiplication tensor([[[-1.0564, -1.5904, 3.2023, 3.1271], [-1.6706, -0.8097, -0.8025, -2.1183]], [[ 4.2239, 0.3107, -0.5756, -0.2354], [-1.4558, -0.3460, 1.5087, -0.8530]], [[ 2.8153, 1.8787, -4.3839, -1.2112], [ 0.3728, -2.1131, 0.0921, 0.8305]]]) >>> A = torch.randn(3, 3) >>> torch.einsum('ii->i', A) # diagonal tensor([-0.7825, 0.8291, -0.1936]) >>> A = torch.randn(4, 3, 3) >>> torch.einsum('...ii->...i', A) # batch diagonal tensor([[-1.0864, 0.7292, 0.0569], [-0.9725, -1.0270, 0.6493], [ 0.5832, -1.1716, -1.5084], [ 0.4041, -1.1690, 0.8570]]) >>> A = torch.randn(2, 3, 4, 5) >>> torch.einsum('...ij->...ji', A).shape # batch permute torch.Size([2, 3, 5, 4]) 
- 
torch.flatten(input, start_dim=0, end_dim=-1) → Tensor¶
- Flattens a contiguous range of dims in a tensor. - Parameters
 - Example: - >>> t = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) >>> torch.flatten(t) tensor([1, 2, 3, 4, 5, 6, 7, 8]) >>> torch.flatten(t, start_dim=1) tensor([[1, 2, 3, 4], [5, 6, 7, 8]]) 
- 
torch.flip(input, dims) → Tensor¶
- Reverse the order of a n-D tensor along given axis in dims. - Example: - >>> x = torch.arange(8).view(2, 2, 2) >>> x tensor([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]]]) >>> torch.flip(x, [0, 1]) tensor([[[ 6, 7], [ 4, 5]], [[ 2, 3], [ 0, 1]]]) 
- 
torch.rot90(input, k, dims) → Tensor¶
- Rotate a n-D tensor by 90 degrees in the plane specified by dims axis. Rotation direction is from the first towards the second axis if k > 0, and from the second towards the first for k < 0. - Parameters
 - Example: - >>> x = torch.arange(4).view(2, 2) >>> x tensor([[0, 1], [2, 3]]) >>> torch.rot90(x, 1, [0, 1]) tensor([[1, 3], [0, 2]]) >>> x = torch.arange(8).view(2, 2, 2) >>> x tensor([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> torch.rot90(x, 1, [1, 2]) tensor([[[1, 3], [0, 2]], [[5, 7], [4, 6]]]) 
- 
torch.histc(input, bins=100, min=0, max=0, out=None) → Tensor¶
- Computes the histogram of a tensor. - The elements are sorted into equal width bins between - minand- max. If- minand- maxare both zero, the minimum and maximum values of the data are used.- Parameters
- Returns
- Histogram represented as a tensor 
- Return type
 - Example: - >>> torch.histc(torch.tensor([1., 2, 1]), bins=4, min=0, max=3) tensor([ 0., 2., 1., 0.]) 
- 
torch.meshgrid(*tensors, **kwargs)¶
- Take \(N\) tensors, each of which can be either scalar or 1-dimensional vector, and create \(N\) N-dimensional grids, where the \(i\) th grid is defined by expanding the \(i\) th input over dimensions defined by other inputs. - Args:
- tensors (list of Tensor): list of scalars or 1 dimensional tensors. Scalars will be treated as tensors of size \((1,)\) automatically 
- Returns:
- seq (sequence of Tensors): If the input has \(k\) tensors of size \((N_1,), (N_2,), \ldots , (N_k,)\), then the output would also has \(k\) tensors, where all tensors are of size \((N_1, N_2, \ldots , N_k)\). 
 - Example: - >>> x = torch.tensor([1, 2, 3]) >>> y = torch.tensor([4, 5, 6]) >>> grid_x, grid_y = torch.meshgrid(x, y) >>> grid_x tensor([[1, 1, 1], [2, 2, 2], [3, 3, 3]]) >>> grid_y tensor([[4, 5, 6], [4, 5, 6], [4, 5, 6]]) 
- 
torch.renorm(input, p, dim, maxnorm, out=None) → Tensor¶
- Returns a tensor where each sub-tensor of - inputalong dimension- dimis normalized such that the p-norm of the sub-tensor is lower than the value- maxnorm- Note - If the norm of a row is lower than maxnorm, the row is unchanged - Parameters
 - Example: - >>> x = torch.ones(3, 3) >>> x[1].fill_(2) tensor([ 2., 2., 2.]) >>> x[2].fill_(3) tensor([ 3., 3., 3.]) >>> x tensor([[ 1., 1., 1.], [ 2., 2., 2.], [ 3., 3., 3.]]) >>> torch.renorm(x, 1, 0, 5) tensor([[ 1.0000, 1.0000, 1.0000], [ 1.6667, 1.6667, 1.6667], [ 1.6667, 1.6667, 1.6667]]) 
- 
torch.roll(input, shifts, dims=None) → Tensor¶
- Roll the tensor along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position. If a dimension is not specified, the tensor will be flattened before rolling and then restored to the original shape. - Parameters
- input (Tensor) – the input tensor 
- shifts (int or tuple of python:ints) – The number of places by which the elements of the tensor are shifted. If shifts is a tuple, dims must be a tuple of the same size, and each dimension will be rolled by the corresponding value 
- dims (int or tuple of python:ints) – Axis along which to roll 
 
 - Example: - >>> x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2) >>> x tensor([[1, 2], [3, 4], [5, 6], [7, 8]]) >>> torch.roll(x, 1, 0) tensor([[7, 8], [1, 2], [3, 4], [5, 6]]) >>> torch.roll(x, -1, 0) tensor([[3, 4], [5, 6], [7, 8], [1, 2]]) >>> torch.roll(x, shifts=(2, 1), dims=(0, 1)) tensor([[6, 5], [8, 7], [2, 1], [4, 3]]) 
- 
torch.tensordot(a, b, dims=2)¶
- Returns a contraction of a and b over multiple dimensions. - tensordotimplements a generalizes the matrix product.- Parameters
 - When called with an integer argument - dims= \(d\), and the number of dimensions of- aand- bis \(m\) and \(n\), respectively, it computes\[r_{i_0,...,i_{m-d}, i_d,...,i_n} = \sum_{k_0,...,k_{d-1}} a_{i_0,...,i_{m-d},k_0,...,k_{d-1}} \times b_{k_0,...,k_{d-1}, i_d,...,i_n}. \]- When called with - dimsof the list form, the given dimensions will be contracted in place of the last \(d\) of- aand the first \(d\) of \(b\). The sizes in these dimensions must match, but- tensordotwill deal with broadcasted dimensions.- Examples: - >>> a = torch.arange(60.).reshape(3, 4, 5) >>> b = torch.arange(24.).reshape(4, 3, 2) >>> torch.tensordot(a, b, dims=([1, 0], [0, 1])) tensor([[4400., 4730.], [4532., 4874.], [4664., 5018.], [4796., 5162.], [4928., 5306.]]) >>> a = torch.randn(3, 4, 5, device='cuda') >>> b = torch.randn(4, 5, 6, device='cuda') >>> c = torch.tensordot(a, b, dims=2).cpu() tensor([[ 8.3504, -2.5436, 6.2922, 2.7556, -1.0732, 3.2741], [ 3.3161, 0.0704, 5.0187, -0.4079, -4.3126, 4.8744], [ 0.8223, 3.9445, 3.2168, -0.2400, 3.4117, 1.7780]]) 
- 
torch.trace(input) → Tensor¶
- Returns the sum of the elements of the diagonal of the input 2-D matrix. - Example: - >>> x = torch.arange(1., 10.).view(3, 3) >>> x tensor([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.]]) >>> torch.trace(x) tensor(15.) 
- 
torch.tril(input, diagonal=0, out=None) → Tensor¶
- Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices - input, the other elements of the result tensor- outare set to 0.- The lower triangular part of the matrix is defined as the elements on and below the diagonal. - The argument - diagonalcontrols which diagonal to consider. If- diagonal= 0, all elements on and below the main diagonal are retained. A positive value includes just as many diagonals above the main diagonal, and similarly a negative value excludes just as many diagonals below the main diagonal. The main diagonal are the set of indices \(\lbrace (i, i) \rbrace\) for \(i \in [0, \min\{d_{1}, d_{2}\} - 1]\) where \(d_{1}, d_{2}\) are the dimensions of the matrix.- Parameters
 - Example: - >>> a = torch.randn(3, 3) >>> a tensor([[-1.0813, -0.8619, 0.7105], [ 0.0935, 0.1380, 2.2112], [-0.3409, -0.9828, 0.0289]]) >>> torch.tril(a) tensor([[-1.0813, 0.0000, 0.0000], [ 0.0935, 0.1380, 0.0000], [-0.3409, -0.9828, 0.0289]]) >>> b = torch.randn(4, 6) >>> b tensor([[ 1.2219, 0.5653, -0.2521, -0.2345, 1.2544, 0.3461], [ 0.4785, -0.4477, 0.6049, 0.6368, 0.8775, 0.7145], [ 1.1502, 3.2716, -1.1243, -0.5413, 0.3615, 0.6864], [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0978]]) >>> torch.tril(b, diagonal=1) tensor([[ 1.2219, 0.5653, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.4785, -0.4477, 0.6049, 0.0000, 0.0000, 0.0000], [ 1.1502, 3.2716, -1.1243, -0.5413, 0.0000, 0.0000], [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0000]]) >>> torch.tril(b, diagonal=-1) tensor([[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.4785, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 1.1502, 3.2716, 0.0000, 0.0000, 0.0000, 0.0000], [-0.0614, -0.7344, -1.3164, 0.0000, 0.0000, 0.0000]]) 
- 
torch.tril_indices(row, column, offset=0, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor¶
- Returns the indices of the lower triangular part of a - row-by-- columnmatrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates. Indices are ordered based on rows and then columns.- The lower triangular part of the matrix is defined as the elements on and below the diagonal. - The argument - offsetcontrols which diagonal to consider. If- offset= 0, all elements on and below the main diagonal are retained. A positive value includes just as many diagonals above the main diagonal, and similarly a negative value excludes just as many diagonals below the main diagonal. The main diagonal are the set of indices \(\lbrace (i, i) \rbrace\) for \(i \in [0, \min\{d_{1}, d_{2}\} - 1]\) where \(d_{1}, d_{2}\) are the dimensions of the matrix.- NOTE: when running on ‘cuda’, row * col must be less than \(2^{59}\) to prevent overflow during calculation. - Parameters
- row ( - int) – number of rows in the 2-D matrix.
- column ( - int) – number of columns in the 2-D matrix.
- offset ( - int) – diagonal offset from the main diagonal. Default: if not provided, 0.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None,- torch.long.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- layout ( - torch.layout, optional) – currently only support- torch.strided.
 
 - Example::
- >>> a = torch.tril_indices(3, 3) >>> a tensor([[0, 1, 1, 2, 2, 2], [0, 0, 1, 0, 1, 2]]) - >>> a = torch.tril_indices(4, 3, -1) >>> a tensor([[1, 2, 2, 3, 3, 3], [0, 0, 1, 0, 1, 2]]) - >>> a = torch.tril_indices(4, 3, 1) >>> a tensor([[0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2]]) 
 
- 
torch.triu(input, diagonal=0, out=None) → Tensor¶
- Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices - input, the other elements of the result tensor- outare set to 0.- The upper triangular part of the matrix is defined as the elements on and above the diagonal. - The argument - diagonalcontrols which diagonal to consider. If- diagonal= 0, all elements on and below the main diagonal are retained. A positive value excludes just as many diagonals above the main diagonal, and similarly a negative value includes just as many diagonals below the main diagonal. The main diagonal are the set of indices \(\lbrace (i, i) \rbrace\) for \(i \in [0, \min\{d_{1}, d_{2}\} - 1]\) where \(d_{1}, d_{2}\) are the dimensions of the matrix.- Parameters
 - Example: - >>> a = torch.randn(3, 3) >>> a tensor([[ 0.2309, 0.5207, 2.0049], [ 0.2072, -1.0680, 0.6602], [ 0.3480, -0.5211, -0.4573]]) >>> torch.triu(a) tensor([[ 0.2309, 0.5207, 2.0049], [ 0.0000, -1.0680, 0.6602], [ 0.0000, 0.0000, -0.4573]]) >>> torch.triu(a, diagonal=1) tensor([[ 0.0000, 0.5207, 2.0049], [ 0.0000, 0.0000, 0.6602], [ 0.0000, 0.0000, 0.0000]]) >>> torch.triu(a, diagonal=-1) tensor([[ 0.2309, 0.5207, 2.0049], [ 0.2072, -1.0680, 0.6602], [ 0.0000, -0.5211, -0.4573]]) >>> b = torch.randn(4, 6) >>> b tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.4333, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410], [-0.9888, 1.0679, -1.3337, -1.6556, 0.4798, 0.2830]]) >>> torch.triu(b, diagonal=1) tensor([[ 0.0000, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [ 0.0000, 0.0000, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.0000, 0.0000, 0.0000, -1.0432, 0.9348, -0.4410], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.4798, 0.2830]]) >>> torch.triu(b, diagonal=-1) tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.0000, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410], [ 0.0000, 0.0000, -1.3337, -1.6556, 0.4798, 0.2830]]) 
- 
torch.triu_indices(row, column, offset=0, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor¶
- Returns the indices of the upper triangular part of a - rowby- columnmatrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates. Indices are ordered based on rows and then columns.- The upper triangular part of the matrix is defined as the elements on and above the diagonal. - The argument - offsetcontrols which diagonal to consider. If- offset= 0, all elements on and above the main diagonal are retained. A positive value excludes just as many diagonals above the main diagonal, and similarly a negative value includes just as many diagonals below the main diagonal. The main diagonal are the set of indices \(\lbrace (i, i) \rbrace\) for \(i \in [0, \min\{d_{1}, d_{2}\} - 1]\) where \(d_{1}, d_{2}\) are the dimensions of the matrix.- NOTE: when running on ‘cuda’, row * col must be less than \(2^{59}\) to prevent overflow during calculation. - Parameters
- row ( - int) – number of rows in the 2-D matrix.
- column ( - int) – number of columns in the 2-D matrix.
- offset ( - int) – diagonal offset from the main diagonal. Default: if not provided, 0.
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. Default: if- None,- torch.long.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if- None, uses the current device for the default tensor type (see- torch.set_default_tensor_type()).- devicewill be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
- layout ( - torch.layout, optional) – currently only support- torch.strided.
 
 - Example::
- >>> a = torch.triu_indices(3, 3) >>> a tensor([[0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]]) - >>> a = torch.triu_indices(4, 3, -1) >>> a tensor([[0, 0, 0, 1, 1, 1, 2, 2, 3], [0, 1, 2, 0, 1, 2, 1, 2, 2]]) - >>> a = torch.triu_indices(4, 3, 1) >>> a tensor([[0, 0, 1], [1, 2, 2]]) 
 
BLAS and LAPACK Operations¶
- 
torch.addbmm(beta=1, mat, alpha=1, batch1, batch2, out=None) → Tensor¶
- Performs a batch matrix-matrix product of matrices stored in - batch1and- batch2, with a reduced add step (all matrix multiplications get accumulated along the first dimension).- matis added to the final result.- batch1and- batch2must be 3-D tensors each containing the same number of matrices.- If - batch1is a \((b \times n \times m)\) tensor,- batch2is a \((b \times m \times p)\) tensor,- matmust be broadcastable with a \((n \times p)\) tensor and- outwill be a \((n \times p)\) tensor.\[out = \beta\ \text{mat} + \alpha\ (\sum_{i=0}^{b-1} \text{batch1}_i \mathbin{@} \text{batch2}_i) \]- For inputs of type FloatTensor or DoubleTensor, arguments - betaand- alphamust be real numbers, otherwise they should be integers.- Parameters
- beta (Number, optional) – multiplier for - mat(\(\beta\))
- mat (Tensor) – matrix to be added 
- alpha (Number, optional) – multiplier for batch1 @ batch2 (\(\alpha\)) 
- batch1 (Tensor) – the first batch of matrices to be multiplied 
- batch2 (Tensor) – the second batch of matrices to be multiplied 
- out (Tensor, optional) – the output tensor 
 
 - Example: - >>> M = torch.randn(3, 5) >>> batch1 = torch.randn(10, 3, 4) >>> batch2 = torch.randn(10, 4, 5) >>> torch.addbmm(M, batch1, batch2) tensor([[ 6.6311, 0.0503, 6.9768, -12.0362, -2.1653], [ -4.8185, -1.4255, -6.6760, 8.9453, 2.5743], [ -3.8202, 4.3691, 1.0943, -1.1109, 5.4730]]) 
- 
torch.addmm(beta=1, mat, alpha=1, mat1, mat2, out=None) → Tensor¶
- Performs a matrix multiplication of the matrices - mat1and- mat2. The matrix- matis added to the final result.- If - mat1is a \((n \times m)\) tensor,- mat2is a \((m \times p)\) tensor, then- matmust be broadcastable with a \((n \times p)\) tensor and- outwill be a \((n \times p)\) tensor.- alphaand- betaare scaling factors on matrix-vector product between- mat1and- mat2and the added matrix- matrespectively.\[\text{out} = \beta\ \text{mat} + \alpha\ (\text{mat1}_i \mathbin{@} \text{mat2}_i) \]- For inputs of type FloatTensor or DoubleTensor, arguments - betaand- alphamust be real numbers, otherwise they should be integers.- Parameters
- beta (Number, optional) – multiplier for - mat(\(\beta\))
- mat (Tensor) – matrix to be added 
- alpha (Number, optional) – multiplier for \(mat1 @ mat2\) (\(\alpha\)) 
- mat1 (Tensor) – the first matrix to be multiplied 
- mat2 (Tensor) – the second matrix to be multiplied 
- out (Tensor, optional) – the output tensor 
 
 - Example: - >>> M = torch.randn(2, 3) >>> mat1 = torch.randn(2, 3) >>> mat2 = torch.randn(3, 3) >>> torch.addmm(M, mat1, mat2) tensor([[-4.8716, 1.4671, -1.3746], [ 0.7573, -3.9555, -2.8681]]) 
- 
torch.addmv(beta=1, tensor, alpha=1, mat, vec, out=None) → Tensor¶
- Performs a matrix-vector product of the matrix - matand the vector- vec. The vector- tensoris added to the final result.- If - matis a \((n \times m)\) tensor,- vecis a 1-D tensor of size m, then- tensormust be broadcastable with a 1-D tensor of size n and- outwill be 1-D tensor of size n.- alphaand- betaare scaling factors on matrix-vector product between- matand- vecand the added tensor- tensorrespectively.\[\text{out} = \beta\ \text{tensor} + \alpha\ (\text{mat} \mathbin{@} \text{vec}) \]- For inputs of type FloatTensor or DoubleTensor, arguments - betaand- alphamust be real numbers, otherwise they should be integers- Parameters
 - Example: - >>> M = torch.randn(2) >>> mat = torch.randn(2, 3) >>> vec = torch.randn(3) >>> torch.addmv(M, mat, vec) tensor([-0.3768, -5.5565]) 
- 
torch.addr(beta=1, mat, alpha=1, vec1, vec2, out=None) → Tensor¶
- Performs the outer-product of vectors - vec1and- vec2and adds it to the matrix- mat.- Optional values - betaand- alphaare scaling factors on the outer product between- vec1and- vec2and the added matrix- matrespectively.\[\text{out} = \beta\ \text{mat} + \alpha\ (\text{vec1} \otimes \text{vec2}) \]- If - vec1is a vector of size n and- vec2is a vector of size m, then- matmust be broadcastable with a matrix of size \((n \times m)\) and- outwill be a matrix of size \((n \times m)\).- For inputs of type FloatTensor or DoubleTensor, arguments - betaand- alphamust be real numbers, otherwise they should be integers- Parameters
- beta (Number, optional) – multiplier for - mat(\(\beta\))
- mat (Tensor) – matrix to be added 
- alpha (Number, optional) – multiplier for \(\text{vec1} \otimes \text{vec2}\) (\(\alpha\)) 
- vec1 (Tensor) – the first vector of the outer product 
- vec2 (Tensor) – the second vector of the outer product 
- out (Tensor, optional) – the output tensor 
 
 - Example: - >>> vec1 = torch.arange(1., 4.) >>> vec2 = torch.arange(1., 3.) >>> M = torch.zeros(3, 2) >>> torch.addr(M, vec1, vec2) tensor([[ 1., 2.], [ 2., 4.], [ 3., 6.]]) 
- 
torch.baddbmm(beta=1, mat, alpha=1, batch1, batch2, out=None) → Tensor¶
- Performs a batch matrix-matrix product of matrices in - batch1and- batch2.- matis added to the final result.- batch1and- batch2must be 3-D tensors each containing the same number of matrices.- If - batch1is a \((b \times n \times m)\) tensor,- batch2is a \((b \times m \times p)\) tensor, then- matmust be broadcastable with a \((b \times n \times p)\) tensor and- outwill be a \((b \times n \times p)\) tensor. Both- alphaand- betamean the same as the scaling factors used in- torch.addbmm().\[\text{out}_i = \beta\ \text{mat}_i + \alpha\ (\text{batch1}_i \mathbin{@} \text{batch2}_i) \]- For inputs of type FloatTensor or DoubleTensor, arguments - betaand- alphamust be real numbers, otherwise they should be integers.- Parameters
- beta (Number, optional) – multiplier for - mat(\(\beta\))
- mat (Tensor) – the tensor to be added 
- alpha (Number, optional) – multiplier for \(\text{batch1} \mathbin{@} \text{batch2}\) (\(\alpha\)) 
- batch1 (Tensor) – the first batch of matrices to be multiplied 
- batch2 (Tensor) – the second batch of matrices to be multiplied 
- out (Tensor, optional) – the output tensor 
 
 - Example: - >>> M = torch.randn(10, 3, 5) >>> batch1 = torch.randn(10, 3, 4) >>> batch2 = torch.randn(10, 4, 5) >>> torch.baddbmm(M, batch1, batch2).size() torch.Size([10, 3, 5]) 
- 
torch.bmm(batch1, batch2, out=None) → Tensor¶
- Performs a batch matrix-matrix product of matrices stored in - batch1and- batch2.- batch1and- batch2must be 3-D tensors each containing the same number of matrices.- If - batch1is a \((b \times n \times m)\) tensor,- batch2is a \((b \times m \times p)\) tensor,- outwill be a \((b \times n \times p)\) tensor.\[\text{out}_i = \text{batch1}_i \mathbin{@} \text{batch2}_i \]- Note - This function does not broadcast. For broadcasting matrix products, see - torch.matmul().- Parameters
 - Example: - >>> batch1 = torch.randn(10, 3, 4) >>> batch2 = torch.randn(10, 4, 5) >>> res = torch.bmm(batch1, batch2) >>> res.size() torch.Size([10, 3, 5]) 
- 
torch.btrifact(A, pivot=True) -> (Tensor, IntTensor)¶
- Batch LU factorization. - Returns a tuple containing the LU factorization and pivots. Pivoting is done if - pivotis set.- Note - LU factorization with - pivot=- Trueis not available for CPU, and attempting to do so will throw an error. However, LU factorization with- pivot=- Trueis available for CUDA.- Parameters
- Returns
- A tuple containing factorization and pivots. 
 - Example: - >>> A = torch.randn(2, 3, 3) >>> A_LU, pivots = torch.btrifact(A) >>> A_LU tensor([[[ 1.3506, 2.5558, -0.0816], [ 0.1684, 1.1551, 0.1940], [ 0.1193, 0.6189, -0.5497]], [[ 0.4526, 1.2526, -0.3285], [-0.7988, 0.7175, -0.9701], [ 0.2634, -0.9255, -0.3459]]]) >>> pivots tensor([[ 3, 3, 3], [ 3, 3, 3]], dtype=torch.int32) 
- 
torch.btrifact_with_info(A, pivot=True) -> (Tensor, IntTensor, IntTensor)¶
- Batch LU factorization with additional error information. - This is a version of - torch.btrifact()that always creates an info IntTensor, and returns it as the third return value.- Parameters
- Returns
- A tuple containing factorization, pivots, and an IntTensor where non-zero values indicate whether factorization for each minibatch sample succeeds. 
 - Example: - >>> A = torch.randn(2, 3, 3) >>> A_LU, pivots, info = A.btrifact_with_info() >>> if info.nonzero().size(0) == 0: >>> print('LU factorization succeeded for all samples!') LU factorization succeeded for all samples! 
- 
torch.btrisolve(b, LU_data, LU_pivots) → Tensor¶
- Batch LU solve. - Returns the LU solve of the linear system \(Ax = b\). - Parameters
- b (Tensor) – the RHS tensor 
- LU_data (Tensor) – the pivoted LU factorization of A from - btrifact().
- LU_pivots (IntTensor) – the pivots of the LU factorization 
 
 - Example: - >>> A = torch.randn(2, 3, 3) >>> b = torch.randn(2, 3) >>> A_LU = torch.btrifact(A) >>> x = torch.btrisolve(b, *A_LU) >>> torch.norm(torch.bmm(A, x.unsqueeze(2)) - b.unsqueeze(2)) tensor(1.00000e-07 * 2.8312) 
- 
torch.btriunpack(LU_data, LU_pivots, unpack_data=True, unpack_pivots=True)¶
- Unpacks the data and pivots from a batched LU factorization (btrifact) of a tensor. - Returns a tuple of tensors as - (the pivots, the L tensor, the U tensor).- Parameters
 - Example: - >>> A = torch.randn(2, 3, 3) >>> A_LU, pivots = A.btrifact() >>> P, A_L, A_U = torch.btriunpack(A_LU, pivots) >>> >>> # can recover A from factorization >>> A_ = torch.bmm(P, torch.bmm(A_L, A_U)) 
- 
torch.chain_matmul(*matrices)¶
- Returns the matrix product of the \(N\) 2-D tensors. This product is efficiently computed using the matrix chain order algorithm which selects the order in which incurs the lowest cost in terms of arithmetic operations ([CLRS]). Note that since this is a function to compute the product, \(N\) needs to be greater than or equal to 2; if equal to 2 then a trivial matrix-matrix product is returned. If \(N\) is 1, then this is a no-op - the original matrix is returned as is. - Parameters
- matrices (Tensors...) – a sequence of 2 or more 2-D tensors whose product is to be determined. 
- Returns
- if the \(i^{th}\) tensor was of dimensions \(p_{i} \times p_{i + 1}\), then the product would be of dimensions \(p_{1} \times p_{N + 1}\). 
- Return type
 - Example: - >>> a = torch.randn(3, 4) >>> b = torch.randn(4, 5) >>> c = torch.randn(5, 6) >>> d = torch.randn(6, 7) >>> torch.chain_matmul(a, b, c, d) tensor([[ -2.3375, -3.9790, -4.1119, -6.6577, 9.5609, -11.5095, -3.2614], [ 21.4038, 3.3378, -8.4982, -5.2457, -10.2561, -2.4684, 2.7163], [ -0.9647, -5.8917, -2.3213, -5.2284, 12.8615, -12.2816, -2.5095]]) 
- 
torch.cholesky(A, upper=False, out=None) → Tensor¶
- Computes the Cholesky decomposition of a symmetric positive-definite matrix \(A\) or for batches of symmetric positive-definite matrices. - If - upperis- True, the returned matrix U is upper-triangular, and the decomposition has the form:\[A = U^TU\]- If - upperis- False, the returned matrix L is lower-triangular, and the decomposition has the form:\[A = LL^T\]- If - upperis- True, and- Ais a batch of symmetric positive-definite matrices, then the returned tensor will be composed of upper-triangular Cholesky factors of each of the individual matrices. Similarly, when- upperis- False, the returned tensor will be composed of lower-triangular Cholesky factors of each of the individual matrices.- Parameters
 - Example: - >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive-definite >>> l = torch.cholesky(a) >>> a tensor([[ 2.4112, -0.7486, 1.4551], [-0.7486, 1.3544, 0.1294], [ 1.4551, 0.1294, 1.6724]]) >>> l tensor([[ 1.5528, 0.0000, 0.0000], [-0.4821, 1.0592, 0.0000], [ 0.9371, 0.5487, 0.7023]]) >>> torch.mm(l, l.t()) tensor([[ 2.4112, -0.7486, 1.4551], [-0.7486, 1.3544, 0.1294], [ 1.4551, 0.1294, 1.6724]]) >>> a = torch.randn(3, 2, 2) >>> a = torch.matmul(a, a.transpose(-1, -2)) + 1e-03 # make symmetric positive-definite >>> l = torch.cholesky(a) >>> z = torch.matmul(l, l.transpose(-1, -2)) >>> torch.max(torch.abs(z - a)) # Max non-zero tensor(2.3842e-07) 
- 
torch.cholesky_solve(b, u, upper=False, out=None) → Tensor¶
- Solves a linear system of equations with a positive semidefinite matrix to be inverted given its Cholesky factor matrix - u.- If - upperis- False,- uis and lower triangular and c is returned such that:\[c = (u u^T)^{-1} b \]- If - upperis- Trueor not provided,- uis upper triangular and c is returned such that:\[c = (u^T u)^{-1} b \]- torch.cholesky_solve(b, u) can take in 2D inputs b, u or inputs that are batches of 2D matrices. If the inputs are batches, then returns batched outputs c - Note - The - outkeyword only supports 2D matrix inputs, that is, b, u must be 2D matrices.- Parameters
- b (Tensor) – input matrix of size \((*, m, k)\), where \(*\) is zero or more batch dimensions 
- u (Tensor) – input matrix of size \((*, m, m)\), where \(*\) is zero of more batch dimensions composed of upper or lower triangular Cholesky factor 
- upper (bool, optional) – whether to consider the Cholesky factor as a lower or upper triangular matrix. Default: - False.
- out (Tensor, optional) – the output tensor for c 
 
 - Example: - >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive definite >>> u = torch.cholesky(a) >>> a tensor([[ 0.7747, -1.9549, 1.3086], [-1.9549, 6.7546, -5.4114], [ 1.3086, -5.4114, 4.8733]]) >>> b = torch.randn(3, 2) >>> b tensor([[-0.6355, 0.9891], [ 0.1974, 1.4706], [-0.4115, -0.6225]]) >>> torch.cholesky_solve(b, u) tensor([[ -8.1625, 19.6097], [ -5.8398, 14.2387], [ -4.3771, 10.4173]]) >>> torch.mm(a.inverse(), b) tensor([[ -8.1626, 19.6097], [ -5.8398, 14.2387], [ -4.3771, 10.4173]]) 
- 
torch.dot(tensor1, tensor2) → Tensor¶
- Computes the dot product (inner product) of two tensors. - Note - This function does not broadcast. - Example: - >>> torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1])) tensor(7) 
- 
torch.eig(a, eigenvectors=False, out=None) -> (Tensor, Tensor)¶
- Computes the eigenvalues and eigenvectors of a real square matrix. - Note - Since eigenvalues and eigenvectors might be complex, backward pass is supported only - for - torch.symeig()- Parameters
- Returns
- A namedtuple (eigenvalues, eigenvectors) containing - eigenvalues (Tensor): Shape \((n \times 2)\). Each row is an eigenvalue of - a, where the first element is the real part and the second element is the imaginary part. The eigenvalues are not necessarily ordered.
- eigenvectors (Tensor): If - eigenvectors=False, it’s an empty tensor. Otherwise, this tensor of shape \((n \times n)\) can be used to compute normalized (unit length) eigenvectors of corresponding eigenvalues as follows. If the corresponding eigenvalues[j] is a real number, column eigenvectors[:, j] is the eigenvector corresponding to eigenvalues[j]. If the corresponding eigenvalues[j] and eigenvalues[j + 1] form a complex conjugate pair, then the true eigenvectors can be computed as \(\text{true eigenvector}[j] = eigenvectors[:, j] + i \times eigenvectors[:, j + 1]\), \(\text{true eigenvector}[j + 1] = eigenvectors[:, j] - i \times eigenvectors[:, j + 1]\).
 
- Return type
 
- 
torch.gels(B, A, out=None) → Tensor¶
- Computes the solution to the least squares and least norm problems for a full rank matrix \(A\) of size \((m \times n)\) and a matrix \(B\) of size \((m \times k)\). - If \(m \geq n\), - gels()solves the least-squares problem:\[\begin{array}{ll} \min_X & \|AX-B\|_2. \end{array}\]- If \(m < n\), - gels()solves the least-norm problem:\[\begin{array}{ll} \min_X & \|X\|_2 & \text{subject to} & AX = B. \end{array}\]- Returned tensor \(X\) has shape \((\max(m, n) \times k)\). The first \(n\) rows of \(X\) contains the solution. If \(m \geq n\), the residual sum of squares for the solution in each column is given by the sum of squares of elements in the remaining \(m - n\) rows of that column. - Parameters
- Returns
- A tuple containing: - X (Tensor): the least squares solution 
- qr (Tensor): the details of the QR factorization 
 
- Return type
 - Note - The returned matrices will always be transposed, irrespective of the strides of the input matrices. That is, they will have stride (1, m) instead of (m, 1). - Example: - >>> A = torch.tensor([[1., 1, 1], [2, 3, 4], [3, 5, 2], [4, 2, 5], [5, 4, 3]]) >>> B = torch.tensor([[-10., -3], [ 12, 14], [ 14, 12], [ 16, 16], [ 18, 16]]) >>> X, _ = torch.gels(B, A) >>> X tensor([[ 2.0000, 1.0000], [ 1.0000, 1.0000], [ 1.0000, 2.0000], [ 10.9635, 4.8501], [ 8.9332, 5.2418]]) 
- 
torch.geqrf(input, out=None) -> (Tensor, Tensor)¶
- This is a low-level function for calling LAPACK directly. This function returns a namedtuple (a, tau) as defined in LAPACK documentation for geqrf . - You’ll generally want to use - torch.qr()instead.- Computes a QR decomposition of - input, but without constructing \(Q\) and \(R\) as explicit separate matrices.- Rather, this directly calls the underlying LAPACK function ?geqrf which produces a sequence of ‘elementary reflectors’. - See LAPACK documentation for geqrf for further details. 
- 
torch.ger(vec1, vec2, out=None) → Tensor¶
- Outer product of - vec1and- vec2. If- vec1is a vector of size \(n\) and- vec2is a vector of size \(m\), then- outmust be a matrix of size \((n \times m)\).- Note - This function does not broadcast. - Parameters
 - Example: - >>> v1 = torch.arange(1., 5.) >>> v2 = torch.arange(1., 4.) >>> torch.ger(v1, v2) tensor([[ 1., 2., 3.], [ 2., 4., 6.], [ 3., 6., 9.], [ 4., 8., 12.]]) 
- 
torch.gesv(b, A, out=None)¶
- This function returns the solution to the system of linear equations represented by \(AX = B\) and the LU factorization of A, in order as a tuple X, LU. - For more information regarding - torch.gesv(), please check- torch.solve().- Warning - torch.gesv()is deprecated in favour of- torch.solve()and will be removed in the next release. Please use- torch.solve()instead.
- 
torch.inverse(input, out=None) → Tensor¶
- Takes the inverse of the square matrix - input.- inputcan be batches of 2D square tensors, in which case this function would return a tensor composed of individual inverses.- Note - Irrespective of the original strides, the returned tensors will be transposed, i.e. with strides like input.contiguous().transpose(-2, -1).strides() - Parameters
 - Example: - >>> x = torch.rand(4, 4) >>> y = torch.inverse(x) >>> z = torch.mm(x, y) >>> z tensor([[ 1.0000, -0.0000, -0.0000, 0.0000], [ 0.0000, 1.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 1.0000, 0.0000], [ 0.0000, -0.0000, -0.0000, 1.0000]]) >>> torch.max(torch.abs(z - torch.eye(4))) # Max non-zero tensor(1.1921e-07) >>> # Batched inverse example >>> x = torch.randn(2, 3, 4, 4) >>> y = torch.inverse(x) >>> z = torch.matmul(x, y) >>> torch.max(torch.abs(z - torch.eye(4).expand_as(x))) # Max non-zero tensor(1.9073e-06) 
- 
torch.det(A) → Tensor¶
- Calculates determinant of a 2D square tensor. - Note - Backward through - det()internally uses SVD results when- Ais not invertible. In this case, double backward through- det()will be unstable in when- Adoesn’t have distinct singular values. See- svd()for details.- Parameters
- A (Tensor) – The input 2D square tensor 
 - Example: - >>> A = torch.randn(3, 3) >>> torch.det(A) tensor(3.7641) 
- 
torch.logdet(A) → Tensor¶
- Calculates log determinant of a 2D square tensor. - Note - Result is - -infif- Ahas zero log determinant, and is- nanif- Ahas negative determinant.- Note - Backward through - logdet()internally uses SVD results when- Ais not invertible. In this case, double backward through- logdet()will be unstable in when- Adoesn’t have distinct singular values. See- svd()for details.- Parameters
- A (Tensor) – The input 2D square tensor 
 - Example: - >>> A = torch.randn(3, 3) >>> torch.det(A) tensor(0.2611) >>> torch.logdet(A) tensor(-1.3430) 
- 
torch.slogdet(A) -> (Tensor, Tensor)¶
- Calculates the sign and log value of a 2D square tensor’s determinant. - Note - If - Ahas zero determinant, this returns- (0, -inf).- Note - Backward through - slogdet()internally uses SVD results when- Ais not invertible. In this case, double backward through- slogdet()will be unstable in when- Adoesn’t have distinct singular values. See- svd()for details.- Parameters
- A (Tensor) – The input 2D square tensor 
- Returns
- A tuple containing the sign of the determinant, and the log value of the absolute determinant. 
 - Example: - >>> A = torch.randn(3, 3) >>> torch.det(A) tensor(-4.8215) >>> torch.logdet(A) tensor(nan) >>> torch.slogdet(A) (tensor(-1.), tensor(1.5731)) 
- 
torch.matmul(tensor1, tensor2, out=None) → Tensor¶
- Matrix product of two tensors. - The behavior depends on the dimensionality of the tensors as follows: - If both tensors are 1-dimensional, the dot product (scalar) is returned. 
- If both arguments are 2-dimensional, the matrix-matrix product is returned. 
- If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed. 
- If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned. 
- If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are broadcasted (and thus must be broadcastable). For example, if - tensor1is a \((j \times 1 \times n \times m)\) tensor and- tensor2is a \((k \times m \times p)\) tensor,- outwill be an \((j \times k \times n \times p)\) tensor.
 - Note - The 1-dimensional dot product version of this function does not support an - outparameter.- Parameters
 - Example: - >>> # vector x vector >>> tensor1 = torch.randn(3) >>> tensor2 = torch.randn(3) >>> torch.matmul(tensor1, tensor2).size() torch.Size([]) >>> # matrix x vector >>> tensor1 = torch.randn(3, 4) >>> tensor2 = torch.randn(4) >>> torch.matmul(tensor1, tensor2).size() torch.Size([3]) >>> # batched matrix x broadcasted vector >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3]) >>> # batched matrix x batched matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(10, 4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) >>> # batched matrix x broadcasted matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) 
- 
torch.matrix_power(input, n) → Tensor¶
- Returns the matrix raised to the power - nfor square matrices. For batch of matrices, each individual matrix is raised to the power- n.- If - nis negative, then the inverse of the matrix (if invertible) is raised to the power- n. For a batch of matrices, the batched inverse (if invertible) is raised to the power- n. If- nis 0, then an identity matrix is returned.- Example: - >>> a = torch.randn(2, 2, 2) >>> a tensor([[[-1.9975, -1.9610], [ 0.9592, -2.3364]], [[-1.2534, -1.3429], [ 0.4153, -1.4664]]]) >>> torch.matrix_power(a, 3) tensor([[[ 3.9392, -23.9916], [ 11.7357, -0.2070]], [[ 0.2468, -6.7168], [ 2.0774, -0.8187]]]) 
- 
torch.matrix_rank(input, tol=None, bool symmetric=False) → Tensor¶
- Returns the numerical rank of a 2-D tensor. The method to compute the matrix rank is done using SVD by default. If - symmetricis- True, then- inputis assumed to be symmetric, and the computation of the rank is done by obtaining the eigenvalues.- tolis the threshold below which the singular values (or the eigenvalues when- symmetricis- True) are considered to be 0. If- tolis not specified,- tolis set to- S.max() * max(S.size()) * epswhere S is the singular values (or the eigenvalues when- symmetricis- True), and- epsis the epsilon value for the datatype of- input.- Parameters
 - Example: - >>> a = torch.eye(10) >>> torch.matrix_rank(a) tensor(10) >>> b = torch.eye(10) >>> b[0, 0] = 0 >>> torch.matrix_rank(b) tensor(9) 
- 
torch.mm(mat1, mat2, out=None) → Tensor¶
- Performs a matrix multiplication of the matrices - mat1and- mat2.- If - mat1is a \((n \times m)\) tensor,- mat2is a \((m \times p)\) tensor,- outwill be a \((n \times p)\) tensor.- Note - This function does not broadcast. For broadcasting matrix products, see - torch.matmul().- Parameters
 - Example: - >>> mat1 = torch.randn(2, 3) >>> mat2 = torch.randn(3, 3) >>> torch.mm(mat1, mat2) tensor([[ 0.4851, 0.5037, -0.3633], [-0.0760, -3.6705, 2.4784]]) 
- 
torch.mv(mat, vec, out=None) → Tensor¶
- Performs a matrix-vector product of the matrix - matand the vector- vec.- If - matis a \((n \times m)\) tensor,- vecis a 1-D tensor of size \(m\),- outwill be 1-D of size \(n\).- Note - This function does not broadcast. - Parameters
 - Example: - >>> mat = torch.randn(2, 3) >>> vec = torch.randn(3) >>> torch.mv(mat, vec) tensor([ 1.0404, -0.6361]) 
- 
torch.orgqr(a, tau) → Tensor¶
- Computes the orthogonal matrix Q of a QR factorization, from the (a, tau) tuple returned by - torch.geqrf().- This directly calls the underlying LAPACK function ?orgqr. See LAPACK documentation for orgqr for further details. - Parameters
- a (Tensor) – the a from - torch.geqrf().
- tau (Tensor) – the tau from - torch.geqrf().
 
 
- 
torch.ormqr(a, tau, mat, left=True, transpose=False) → Tensor¶
- Multiplies mat by the orthogonal Q matrix of the QR factorization formed by - torch.geqrf()that is represented by (a, tau).- This directly calls the underlying LAPACK function ?ormqr. See LAPACK documentation for ormqr for further details. - Parameters
- a (Tensor) – the a from - torch.geqrf().
- tau (Tensor) – the tau from - torch.geqrf().
- mat (Tensor) – the matrix to be multiplied. 
 
 
- 
torch.pinverse(input, rcond=1e-15) → Tensor¶
- Calculates the pseudo-inverse (also known as the Moore-Penrose inverse) of a 2D tensor. Please look at Moore-Penrose inverse for more details - Note - This method is implemented using the Singular Value Decomposition. - Note - The pseudo-inverse is not necessarily a continuous function in the elements of the matrix [1]. Therefore, derivatives are not always existent, and exist for a constant rank only [2]. However, this method is backprop-able due to the implementation by using SVD results, and could be unstable. Double-backward will also be unstable due to the usage of SVD internally. See - svd()for more details.- Parameters
- Returns
- The pseudo-inverse of - inputof dimensions \(n \times m\)
 - Example: - >>> input = torch.randn(3, 5) >>> input tensor([[ 0.5495, 0.0979, -1.4092, -0.1128, 0.4132], [-1.1143, -0.3662, 0.3042, 1.6374, -0.9294], [-0.3269, -0.5745, -0.0382, -0.5922, -0.6759]]) >>> torch.pinverse(input) tensor([[ 0.0600, -0.1933, -0.2090], [-0.0903, -0.0817, -0.4752], [-0.7124, -0.1631, -0.2272], [ 0.1356, 0.3933, -0.5023], [-0.0308, -0.1725, -0.5216]]) 
- 
torch.potrf(a, upper=True, out=None)¶
- Computes the Cholesky decomposition of a symmetric positive-definite matrix \(A\). - For more information regarding - torch.potrf(), please check- torch.cholesky().- Warning - torch.potrf()is deprecated in favour of- torch.cholesky()and will be removed in the next release. Please use- torch.cholesky()instead and note that the- upperargument in- torch.cholesky()defaults to- False.
- 
torch.potri(u, upper=True, out=None) → Tensor¶
- Computes the inverse of a positive semidefinite matrix given its Cholesky factor - u: returns matrix inv- If - upperis- Trueor not provided,- uis upper triangular such that the returned tensor is\[inv = (u^T u)^{-1} \]- If - upperis- False,- uis lower triangular such that the returned tensor is\[inv = (uu^{T})^{-1} \]- Parameters
 - Example: - >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive definite >>> u = torch.cholesky(a) >>> a tensor([[ 0.9935, -0.6353, 1.5806], [ -0.6353, 0.8769, -1.7183], [ 1.5806, -1.7183, 10.6618]]) >>> torch.potri(u) tensor([[ 1.9314, 1.2251, -0.0889], [ 1.2251, 2.4439, 0.2122], [-0.0889, 0.2122, 0.1412]]) >>> a.inverse() tensor([[ 1.9314, 1.2251, -0.0889], [ 1.2251, 2.4439, 0.2122], [-0.0889, 0.2122, 0.1412]]) 
- 
torch.potrs(b, u, upper=True, out=None)¶
- Solves a linear system of equations with a positive semidefinite matrix to be inverted given its Cholesky factor matrix - u.- For more information regarding - torch.potrs(), please check- torch.cholesky_solve().- Warning - torch.potrs()is deprecated in favour of- torch.cholesky_solve()and will be removed in the next release. Please use- torch.cholesky_solve()instead and note that the- upperargument in- torch.cholesky_solve()defaults to- False.
- 
torch.pstrf(a, upper=True, out=None)¶
- Computes the pivoted Cholesky decomposition of a symmetric positive-definite matrix - a. returns a namedtuple (u, pivot) of matrice.- If - upperis- Trueor not provided, u is upper triangular such that \(a = p^T u^T u p\), with p the permutation given by pivot.- If - upperis- False, u is lower triangular such that \(a = p^T u u^T p\).- Warning - torch.pstrf()is deprecated in favour of- torch.cholesky()and will be removed in the next release.- Parameters
 - Example: - >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive definite >>> a tensor([[ 3.5405, -0.4577, 0.8342], [-0.4577, 1.8244, -0.1996], [ 0.8342, -0.1996, 3.7493]]) >>> u,piv = torch.pstrf(a) >>> u tensor([[ 1.9363, 0.4308, -0.1031], [ 0.0000, 1.8316, -0.2256], [ 0.0000, 0.0000, 1.3277]]) >>> piv tensor([ 2, 0, 1], dtype=torch.int32) >>> p = torch.eye(3).index_select(0,piv.long()).index_select(0,piv.long()).t() # make pivot permutation >>> torch.mm(torch.mm(p.t(),torch.mm(u.t(),u)),p) # reconstruct tensor([[ 3.5405, -0.4577, 0.8342], [-0.4577, 1.8244, -0.1996], [ 0.8342, -0.1996, 3.7493]]) 
- 
torch.qr(input, out=None) -> (Tensor, Tensor)¶
- Computes the QR decomposition of a matrix - input, and returns a namedtuple (Q, R) of matrices such that \(\text{input} = Q R\), with \(Q\) being an orthogonal matrix and \(R\) being an upper triangular matrix.- This returns the thin (reduced) QR factorization. - Note - precision may be lost if the magnitudes of the elements of - inputare large- Note - While it should always give you a valid decomposition, it may not give you the same one across platforms - it will depend on your LAPACK implementation. - Note - Irrespective of the original strides, the returned matrix \(Q\) will be transposed, i.e. with strides (1, m) instead of (m, 1). - Example: - >>> a = torch.tensor([[12., -51, 4], [6, 167, -68], [-4, 24, -41]]) >>> q, r = torch.qr(a) >>> q tensor([[-0.8571, 0.3943, 0.3314], [-0.4286, -0.9029, -0.0343], [ 0.2857, -0.1714, 0.9429]]) >>> r tensor([[ -14.0000, -21.0000, 14.0000], [ 0.0000, -175.0000, 70.0000], [ 0.0000, 0.0000, -35.0000]]) >>> torch.mm(q, r).round() tensor([[ 12., -51., 4.], [ 6., 167., -68.], [ -4., 24., -41.]]) >>> torch.mm(q.t(), q).round() tensor([[ 1., 0., 0.], [ 0., 1., -0.], [ 0., -0., 1.]]) 
- 
torch.solve(B, A, out=None) -> (Tensor, Tensor)¶
- This function returns the solution to the system of linear equations represented by \(AX = B\) and the LU factorization of A, in order as a tuple X, LU. - LU contains L and U factors for LU factorization of A. - torch.solve(B, A) can take in 2D inputs B, A or inputs that are batches of 2D matrices. If the inputs are batches, then returns batched outputs X, LU. - Note - Irrespective of the original strides, the returned matrices X and LU will be transposed, i.e. with strides like B.contiguous().transpose(-1, -2).strides() and A.contiguous().transpose(-1, -2).strides() respectively. - Parameters
 - Example: - >>> A = torch.tensor([[6.80, -2.11, 5.66, 5.97, 8.23], [-6.05, -3.30, 5.36, -4.44, 1.08], [-0.45, 2.58, -2.70, 0.27, 9.04], [8.32, 2.71, 4.35, -7.17, 2.14], [-9.67, -5.14, -7.26, 6.08, -6.87]]).t() >>> B = torch.tensor([[4.02, 6.19, -8.22, -7.57, -3.03], [-1.56, 4.00, -8.67, 1.75, 2.86], [9.81, -4.09, -4.57, -8.61, 8.99]]).t() >>> X, LU = torch.solve(B, A) >>> torch.dist(B, torch.mm(A, X)) tensor(1.00000e-06 * 7.0977) >>> # Batched solver example >>> A = torch.randn(2, 3, 1, 4, 4) >>> B = torch.randn(2, 3, 1, 4, 6) >>> X, LU = torch.solve(B, A) >>> torch.dist(B, A.matmul(X)) tensor(1.00000e-06 * 3.6386) 
- 
torch.svd(input, some=True, compute_uv=True, out=None) -> (Tensor, Tensor, Tensor)¶
- svd(A)returns a namedtuple- (U, S, V)which the singular value decomposition of a input real matrix A of size (n x m) such that \(A = USV^T\).- U is of shape \((n \times n)\). - S is a diagonal matrix of shape \((n \times m)\), represented as a vector of size \(\min(n, m)\) containing the non-negative diagonal entries. - V is of shape \((m \times m)\). - If - someis- True(default), the returned U and V matrices will contain only \(min(n, m)\) orthonormal columns.- If - compute_uvis- False, the returned U and V matrices will be zero matrices of shape \((n \times n)\) and \((m \times m)\) respectively.- somewill be ignored here.- Note - The implementation of SVD on CPU uses the LAPACK routine ?gesdd (a divide-and-conquer algorithm) instead of ?gesvd for speed. Analogously, the SVD on GPU uses the MAGMA routine gesdd as well. - Note - Irrespective of the original strides, the returned matrix U will be transposed, i.e. with strides (1, n) instead of (n, 1). - Note - Extra care needs to be taken when backward through U and V outputs. Such operation is really only stable when - inputis full rank with all distinct singular values. Otherwise,- NaNcan appear as the gradients are not properly defined. Also, notice that double backward will usually do an additional backward through U and V even if the original backward is only on S.- Note - When - some=- False, the gradients on- U[:, min(n, m):]and- V[:, min(n, m):]will be ignored in backward as those vectors can be arbitrary bases of the subspaces.- Note - When - compute_uv=- False, backward cannot be performed since- Uand- Vfrom the forward pass is required for the backward operation.- Parameters
 - Example: - >>> a = torch.tensor([[8.79, 6.11, -9.15, 9.57, -3.49, 9.84], [9.93, 6.91, -7.93, 1.64, 4.02, 0.15], [9.83, 5.04, 4.86, 8.83, 9.80, -8.99], [5.45, -0.27, 4.85, 0.74, 10.00, -6.02], [3.16, 7.98, 3.01, 5.80, 4.27, -5.31]]).t() >>> torch.svd(a).__class__ <class 'torch.return_types.svd'> >>> u, s, v = torch.svd(a) >>> u tensor([[-0.5911, 0.2632, 0.3554, 0.3143, 0.2299], [-0.3976, 0.2438, -0.2224, -0.7535, -0.3636], [-0.0335, -0.6003, -0.4508, 0.2334, -0.3055], [-0.4297, 0.2362, -0.6859, 0.3319, 0.1649], [-0.4697, -0.3509, 0.3874, 0.1587, -0.5183], [ 0.2934, 0.5763, -0.0209, 0.3791, -0.6526]]) >>> s tensor([ 27.4687, 22.6432, 8.5584, 5.9857, 2.0149]) >>> v tensor([[-0.2514, 0.8148, -0.2606, 0.3967, -0.2180], [-0.3968, 0.3587, 0.7008, -0.4507, 0.1402], [-0.6922, -0.2489, -0.2208, 0.2513, 0.5891], [-0.3662, -0.3686, 0.3859, 0.4342, -0.6265], [-0.4076, -0.0980, -0.4933, -0.6227, -0.4396]]) >>> torch.dist(a, torch.mm(torch.mm(u, torch.diag(s)), v.t())) tensor(1.00000e-06 * 9.3738) 
- 
torch.symeig(input, eigenvectors=False, upper=True, out=None) -> (Tensor, Tensor)¶
- This function returns eigenvalues and eigenvectors of a real symmetric matrix - input, represented by a namedtuple (eigenvalues, eigenvectors).- inputand \(V\) are \((m \times m)\) matrices and \(e\) is a \(m\) dimensional vector.- This function calculates all eigenvalues (and vectors) of - inputsuch that \(\text{input} = V \text{diag}(e) V^T\).- The boolean argument - eigenvectorsdefines computation of eigenvectors or eigenvalues only.- If it is - False, only eigenvalues are computed. If it is- True, both eigenvalues and eigenvectors are computed.- Since the input matrix - inputis supposed to be symmetric, only the upper triangular portion is used by default.- If - upperis- False, then lower triangular portion is used.- Note - Irrespective of the original strides, the returned matrix V will be transposed, i.e. with strides (1, m) instead of (m, 1). - Note - Extra care needs to be taken when backward through outputs. Such operation is really only stable when all eigenvalues are distinct. Otherwise, - NaNcan appear as the gradients are not properly defined.- Parameters
- Returns
- A namedtuple (eigenvalues, eigenvectors) containing - eigenvalues (Tensor): Shape \((m)\). Each element is an eigenvalue of - input, The eigenvalues are in ascending order.
- eigenvectors (Tensor): Shape \((m \times m)\). If - eigenvectors=False, it’s a tensor filled with zeros. Otherwise, this tensor contains the orthonormal eigenvectors of the- input.
 
- Return type
 - Examples: - >>> a = torch.tensor([[ 1.96, 0.00, 0.00, 0.00, 0.00], [-6.49, 3.80, 0.00, 0.00, 0.00], [-0.47, -6.39, 4.17, 0.00, 0.00], [-7.20, 1.50, -1.51, 5.70, 0.00], [-0.65, -6.34, 2.67, 1.80, -7.10]]).t() >>> e, v = torch.symeig(a, eigenvectors=True) >>> e tensor([-11.0656, -6.2287, 0.8640, 8.8655, 16.0948]) >>> v tensor([[-0.2981, -0.6075, 0.4026, -0.3745, 0.4896], [-0.5078, -0.2880, -0.4066, -0.3572, -0.6053], [-0.0816, -0.3843, -0.6600, 0.5008, 0.3991], [-0.0036, -0.4467, 0.4553, 0.6204, -0.4564], [-0.8041, 0.4480, 0.1725, 0.3108, 0.1622]]) 
- 
torch.trtrs(b, A, upper=True, transpose=False, unitriangular=False) -> (Tensor, Tensor)¶
- Solves a system of equations with a triangular coefficient matrix \(A\) and multiple right-hand sides - b.- In particular, solves \(AX = b\) and assumes \(A\) is upper-triangular with the default keyword arguments. - Parameters
- A (Tensor) – the input triangular coefficient matrix 
- b (Tensor) – multiple right-hand sides. Each column of \(b\) is a right-hand side for the system of equations. 
- upper (bool, optional) – whether to solve the upper-triangular system of equations (default) or the lower-triangular system of equations. Default: True. 
- transpose (bool, optional) – whether \(A\) should be transposed before being sent into the solver. Default: False. 
- unitriangular (bool, optional) – whether \(A\) is unit triangular. If True, the diagonal elements of \(A\) are assumed to be 1 and not referenced from \(A\). Default: False. 
 
- Returns
- A tuple \((X, M)\) where \(M\) is a clone of \(A\) and \(X\) is the solution to \(AX = b\) (or whatever variant of the system of equations, depending on the keyword arguments.) 
 - Shape:
- A: \((N, N)\) 
- b: \((N, C)\) 
- output[0]: \((N, C)\) 
- output[1]: \((N, N)\) 
 
 - Examples: - >>> A = torch.randn(2, 2).triu() >>> A tensor([[ 1.1527, -1.0753], [ 0.0000, 0.7986]]) >>> b = torch.randn(2, 3) >>> b tensor([[-0.0210, 2.3513, -1.5492], [ 1.5429, 0.7403, -1.0243]]) >>> torch.trtrs(b, A) (tensor([[ 1.7840, 2.9045, -2.5405], [ 1.9319, 0.9269, -1.2826]]), tensor([[ 1.1527, -1.0753], [ 0.0000, 0.7986]])) 
torch.Tensor¶
A torch.Tensor is a multi-dimensional matrix containing elements of
a single data type.
Torch defines eight CPU tensor types and eight GPU tensor types:
| Data type | dtype | CPU tensor | GPU tensor | 
|---|---|---|---|
| 32-bit floating point | 
 | 
 | 
 | 
| 64-bit floating point | 
 | 
 | 
 | 
| 16-bit floating point | 
 | 
 | 
 | 
| 8-bit integer (unsigned) | 
 | 
 | |
| 8-bit integer (signed) | 
 | 
 | 
 | 
| 16-bit integer (signed) | 
 | 
 | 
 | 
| 32-bit integer (signed) | 
 | 
 | 
 | 
| 64-bit integer (signed) | 
 | 
 | 
 | 
torch.Tensor is an alias for the default tensor type (torch.FloatTensor).
A tensor can be constructed from a Python list or sequence using the
torch.tensor() constructor:
>>> torch.tensor([[1., -1.], [1., -1.]])
tensor([[ 1.0000, -1.0000],
        [ 1.0000, -1.0000]])
>>> torch.tensor(np.array([[1, 2, 3], [4, 5, 6]]))
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
Warning
torch.tensor() always copies data. If you have a Tensor
data and just want to change its requires_grad flag, use
requires_grad_() or
detach() to avoid a copy.
If you have a numpy array and want to avoid a copy, use
torch.as_tensor().
A tensor of specific data type can be constructed by passing a
torch.dtype and/or a torch.device to a
constructor or tensor creation op:
>>> torch.zeros([2, 4], dtype=torch.int32)
tensor([[ 0,  0,  0,  0],
        [ 0,  0,  0,  0]], dtype=torch.int32)
>>> cuda0 = torch.device('cuda:0')
>>> torch.ones([2, 4], dtype=torch.float64, device=cuda0)
tensor([[ 1.0000,  1.0000,  1.0000,  1.0000],
        [ 1.0000,  1.0000,  1.0000,  1.0000]], dtype=torch.float64, device='cuda:0')
The contents of a tensor can be accessed and modified using Python’s indexing and slicing notation:
>>> x = torch.tensor([[1, 2, 3], [4, 5, 6]])
>>> print(x[1][2])
tensor(6)
>>> x[0][1] = 8
>>> print(x)
tensor([[ 1,  8,  3],
        [ 4,  5,  6]])
Use torch.Tensor.item() to get a Python number from a tensor containing a
single value:
>>> x = torch.tensor([[1]])
>>> x
tensor([[ 1]])
>>> x.item()
1
>>> x = torch.tensor(2.5)
>>> x
tensor(2.5000)
>>> x.item()
2.5
A tensor can be created with requires_grad=True so that
torch.autograd records operations on them for automatic differentiation.
>>> x = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True)
>>> out = x.pow(2).sum()
>>> out.backward()
>>> x.grad
tensor([[ 2.0000, -2.0000],
        [ 2.0000,  2.0000]])
Each tensor has an associated torch.Storage, which holds its data.
The tensor class provides multi-dimensional, strided
view of a storage and defines numeric operations on it.
Note
For more information on the torch.dtype, torch.device, and
torch.layout attributes of a torch.Tensor, see
Tensor Attributes.
Note
Methods which mutate a tensor are marked with an underscore suffix.
For example, torch.FloatTensor.abs_() computes the absolute value
in-place and returns the modified tensor, while torch.FloatTensor.abs()
computes the result in a new tensor.
Note
To change an existing tensor’s torch.device and/or torch.dtype, consider using
to() method on the tensor.
- 
class torch.Tensor¶
- There are a few main ways to create a tensor, depending on your use case. - To create a tensor with pre-existing data, use - torch.tensor().
- To create a tensor with specific size, use - torch.*tensor creation ops (see Creation Ops).
- To create a tensor with the same size (and similar types) as another tensor, use - torch.*_liketensor creation ops (see Creation Ops).
- To create a tensor with similar type but different size as another tensor, use - tensor.new_*creation ops.
 - 
new_tensor(data, dtype=None, device=None, requires_grad=False) → Tensor¶
- Returns a new Tensor with - dataas the tensor data. By default, the returned Tensor has the same- torch.dtypeand- torch.deviceas this tensor.- Warning - new_tensor()always copies- data. If you have a Tensor- dataand want to avoid a copy, use- torch.Tensor.requires_grad_()or- torch.Tensor.detach(). If you have a numpy array and want to avoid a copy, use- torch.from_numpy().- Warning - When data is a tensor x, - new_tensor()reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore- tensor.new_tensor(x)is equivalent to- x.clone().detach()and- tensor.new_tensor(x, requires_grad=True)is equivalent to- x.clone().detach().requires_grad_(True). The equivalents using- clone()and- detach()are recommended.- Parameters
- data (array_like) – The returned Tensor copies - data.
- dtype ( - torch.dtype, optional) – the desired type of returned tensor. Default: if None, same- torch.dtypeas this tensor.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, same- torch.deviceas this tensor.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> tensor = torch.ones((2,), dtype=torch.int8) >>> data = [[0, 1], [2, 3]] >>> tensor.new_tensor(data) tensor([[ 0, 1], [ 2, 3]], dtype=torch.int8) 
 - 
new_full(size, fill_value, dtype=None, device=None, requires_grad=False) → Tensor¶
- Returns a Tensor of size - sizefilled with- fill_value. By default, the returned Tensor has the same- torch.dtypeand- torch.deviceas this tensor.- Parameters
- fill_value (scalar) – the number to fill the output tensor with. 
- dtype ( - torch.dtype, optional) – the desired type of returned tensor. Default: if None, same- torch.dtypeas this tensor.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, same- torch.deviceas this tensor.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> tensor = torch.ones((2,), dtype=torch.float64) >>> tensor.new_full((3, 4), 3.141592) tensor([[ 3.1416, 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416, 3.1416]], dtype=torch.float64) 
 - 
new_empty(size, dtype=None, device=None, requires_grad=False) → Tensor¶
- Returns a Tensor of size - sizefilled with uninitialized data. By default, the returned Tensor has the same- torch.dtypeand- torch.deviceas this tensor.- Parameters
- dtype ( - torch.dtype, optional) – the desired type of returned tensor. Default: if None, same- torch.dtypeas this tensor.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, same- torch.deviceas this tensor.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> tensor = torch.ones(()) >>> tensor.new_empty((2, 3)) tensor([[ 5.8182e-18, 4.5765e-41, -1.0545e+30], [ 3.0949e-41, 4.4842e-44, 0.0000e+00]]) 
 - 
new_ones(size, dtype=None, device=None, requires_grad=False) → Tensor¶
- Returns a Tensor of size - sizefilled with- 1. By default, the returned Tensor has the same- torch.dtypeand- torch.deviceas this tensor.- Parameters
- size (int...) – a list, tuple, or - torch.Sizeof integers defining the shape of the output tensor.
- dtype ( - torch.dtype, optional) – the desired type of returned tensor. Default: if None, same- torch.dtypeas this tensor.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, same- torch.deviceas this tensor.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> tensor = torch.tensor((), dtype=torch.int32) >>> tensor.new_ones((2, 3)) tensor([[ 1, 1, 1], [ 1, 1, 1]], dtype=torch.int32) 
 - 
new_zeros(size, dtype=None, device=None, requires_grad=False) → Tensor¶
- Returns a Tensor of size - sizefilled with- 0. By default, the returned Tensor has the same- torch.dtypeand- torch.deviceas this tensor.- Parameters
- size (int...) – a list, tuple, or - torch.Sizeof integers defining the shape of the output tensor.
- dtype ( - torch.dtype, optional) – the desired type of returned tensor. Default: if None, same- torch.dtypeas this tensor.
- device ( - torch.device, optional) – the desired device of returned tensor. Default: if None, same- torch.deviceas this tensor.
- requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: - False.
 
 - Example: - >>> tensor = torch.tensor((), dtype=torch.float64) >>> tensor.new_zeros((2, 3)) tensor([[ 0., 0., 0.], [ 0., 0., 0.]], dtype=torch.float64) 
 - 
is_cuda¶
- Is - Trueif the Tensor is stored on the GPU,- Falseotherwise.
 - 
device¶
- Is the - torch.devicewhere this Tensor is.
 - 
abs() → Tensor¶
- See - torch.abs()
 - 
acos() → Tensor¶
- See - torch.acos()
 - 
add(value) → Tensor¶
- add(value=1, other) -> Tensor - See - torch.add()
 - 
addbmm(beta=1, alpha=1, batch1, batch2) → Tensor¶
- See - torch.addbmm()
 - 
addcdiv(value=1, tensor1, tensor2) → Tensor¶
- See - torch.addcdiv()
 - 
addcmul(value=1, tensor1, tensor2) → Tensor¶
- See - torch.addcmul()
 - 
addmm(beta=1, alpha=1, mat1, mat2) → Tensor¶
- See - torch.addmm()
 - 
addmv(beta=1, alpha=1, mat, vec) → Tensor¶
- See - torch.addmv()
 - 
addr(beta=1, alpha=1, vec1, vec2) → Tensor¶
- See - torch.addr()
 - 
allclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor¶
- See - torch.allclose()
 - 
apply_(callable) → Tensor¶
- Applies the function - callableto each element in the tensor, replacing each element with the value returned by- callable.- Note - This function only works with CPU tensors and should not be used in code sections that require high performance. 
 - 
argmax(dim=None, keepdim=False)¶
- See - torch.argmax()
 - 
argmin(dim=None, keepdim=False)¶
- See - torch.argmin()
 - 
asin() → Tensor¶
- See - torch.asin()
 - 
atan() → Tensor¶
- See - torch.atan()
 - 
atan2(other) → Tensor¶
- See - torch.atan2()
 - 
baddbmm(beta=1, alpha=1, batch1, batch2) → Tensor¶
- See - torch.baddbmm()
 - 
bernoulli(*, generator=None) → Tensor¶
- Returns a result tensor where each \(\texttt{result[i]}\) is independently sampled from \(\text{Bernoulli}(\texttt{self[i]})\). - selfmust have floating point- dtype, and the result will have the same- dtype.
 - 
bernoulli_()¶
- 
bernoulli_(p=0.5, *, generator=None) → Tensor
- Fills each location of - selfwith an independent sample from \(\text{Bernoulli}(\texttt{p})\).- selfcan have integral- dtype.
 - 
bernoulli_(p_tensor, *, generator=None) → Tensor
- p_tensorshould be a tensor containing probabilities to be used for drawing the binary random number.- The \(\text{i}^{th}\) element of - selftensor will be set to a value sampled from \(\text{Bernoulli}(\texttt{p\_tensor[i]})\).- selfcan have integral- dtype, but- p_tensormust have floating point- dtype.
 - See also - bernoulli()and- torch.bernoulli()
- 
 - 
bmm(batch2) → Tensor¶
- See - torch.bmm()
 - 
btrifact(pivot=True) -> (Tensor, Tensor)¶
- See - torch.btrifact()
 - 
btrifact_with_info(pivot=True) -> (Tensor, Tensor, Tensor)¶
 - 
btrisolve(LU_data, LU_pivots) → Tensor¶
 - 
cauchy_(median=0, sigma=1, *, generator=None) → Tensor¶
- Fills the tensor with numbers drawn from the Cauchy distribution: \[f(x) = \dfrac{1}{\pi} \dfrac{\sigma}{(x - \text{median})^2 + \sigma^2}\]
 - 
ceil() → Tensor¶
- See - torch.ceil()
 - 
cholesky(upper=False) → Tensor¶
- See - torch.cholesky()
 - 
cholesky_solve(input2, upper=False) → Tensor¶
 - 
chunk(chunks, dim=0) → List of Tensors¶
- See - torch.chunk()
 - 
clamp(min, max) → Tensor¶
- See - torch.clamp()
 - 
clone() → Tensor¶
- Returns a copy of the - selftensor. The copy has the same size and data type as- self.- Note - Unlike copy_(), this function is recorded in the computation graph. Gradients propagating to the cloned tensor will propagate to the original tensor. 
 - 
contiguous() → Tensor¶
- Returns a contiguous tensor containing the same data as - selftensor. If- selftensor is contiguous, this function returns the- selftensor.
 - 
copy_(src, non_blocking=False) → Tensor¶
- Copies the elements from - srcinto- selftensor and returns- self.- The - srctensor must be broadcastable with the- selftensor. It may be of a different data type or reside on a different device.
 - 
cos() → Tensor¶
- See - torch.cos()
 - 
cosh() → Tensor¶
- See - torch.cosh()
 - 
cpu() → Tensor¶
- Returns a copy of this object in CPU memory. - If this object is already in CPU memory and on the correct device, then no copy is performed and the original object is returned. 
 - 
cross(other, dim=-1) → Tensor¶
- See - torch.cross()
 - 
cuda(device=None, non_blocking=False) → Tensor¶
- Returns a copy of this object in CUDA memory. - If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. - Parameters
- device ( - torch.device) – The destination GPU device. Defaults to the current CUDA device.
- non_blocking (bool) – If - Trueand the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default:- False.
 
 
 - 
cumprod(dim, dtype=None) → Tensor¶
- See - torch.cumprod()
 - 
cumsum(dim, dtype=None) → Tensor¶
- See - torch.cumsum()
 - 
data_ptr() → int¶
- Returns the address of the first element of - selftensor.
 - 
det() → Tensor¶
- See - torch.det()
 - 
diag(diagonal=0) → Tensor¶
- See - torch.diag()
 - 
diag_embed(offset=0, dim1=-2, dim2=-1) → Tensor¶
 - 
dim() → int¶
- Returns the number of dimensions of - selftensor.
 - 
dist(other, p=2) → Tensor¶
- See - torch.dist()
 - 
div(value) → Tensor¶
- See - torch.div()
 - 
dot(tensor2) → Tensor¶
- See - torch.dot()
 - 
eig(eigenvectors=False) -> (Tensor, Tensor)¶
- See - torch.eig()
 - 
element_size() → int¶
- Returns the size in bytes of an individual element. - Example: - >>> torch.tensor([]).element_size() 4 >>> torch.tensor([], dtype=torch.uint8).element_size() 1 
 - 
eq(other) → Tensor¶
- See - torch.eq()
 - 
equal(other) → bool¶
- See - torch.equal()
 - 
erf() → Tensor¶
- See - torch.erf()
 - 
erfc() → Tensor¶
- See - torch.erfc()
 - 
erfinv() → Tensor¶
- See - torch.erfinv()
 - 
exp() → Tensor¶
- See - torch.exp()
 - 
expm1() → Tensor¶
- See - torch.expm1()
 - 
expand(*sizes) → Tensor¶
- Returns a new view of the - selftensor with singleton dimensions expanded to a larger size.- Passing -1 as the size for a dimension means not changing the size of that dimension. - Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1. - Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the - strideto 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory.- Parameters
- *sizes (torch.Size or int...) – the desired expanded size 
 - Warning - More than one element of an expanded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. - Example: - >>> x = torch.tensor([[1], [2], [3]]) >>> x.size() torch.Size([3, 1]) >>> x.expand(3, 4) tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]]) >>> x.expand(-1, 4) # -1 means not changing the size of that dimension tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]]) 
 - 
expand_as(other) → Tensor¶
- Expand this tensor to the same size as - other.- self.expand_as(other)is equivalent to- self.expand(other.size()).- Please see - expand()for more information about- expand.- Parameters
- other ( - torch.Tensor) – The result tensor has the same size as- other.
 
 - 
exponential_(lambd=1, *, generator=None) → Tensor¶
- Fills - selftensor with elements drawn from the exponential distribution:\[f(x) = \lambda e^{-\lambda x}\]
 - 
fill_(value) → Tensor¶
- Fills - selftensor with the specified value.
 - 
flatten(input, start_dim=0, end_dim=-1) → Tensor¶
- see - torch.flatten()
 - 
flip(dims) → Tensor¶
- See - torch.flip()
 - 
floor() → Tensor¶
- See - torch.floor()
 - 
fmod(divisor) → Tensor¶
- See - torch.fmod()
 - 
frac() → Tensor¶
- See - torch.frac()
 - 
gather(dim, index) → Tensor¶
- See - torch.gather()
 - 
ge(other) → Tensor¶
- See - torch.ge()
 - 
gels(A) → Tensor¶
- See - torch.gels()
 - 
geometric_(p, *, generator=None) → Tensor¶
- Fills - selftensor with elements drawn from the geometric distribution:\[f(X=k) = (1 - p)^{k - 1} p\]
 - 
geqrf() -> (Tensor, Tensor)¶
- See - torch.geqrf()
 - 
ger(vec2) → Tensor¶
- See - torch.ger()
 - 
gesv(A)¶
- See - torch.solve()
 - 
get_device() -> Device ordinal (Integer)¶
- For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, an error is thrown. - Example: - >>> x = torch.randn(3, 4, 5, device='cuda:0') >>> x.get_device() 0 >>> x.cpu().get_device() # RuntimeError: get_device is not implemented for type torch.FloatTensor 
 - 
gt(other) → Tensor¶
- See - torch.gt()
 - 
histc(bins=100, min=0, max=0) → Tensor¶
- See - torch.histc()
 - 
index_add_(dim, index, tensor) → Tensor¶
- Accumulate the elements of - tensorinto the- selftensor by adding to the indices in the order given in- index. For example, if- dim == 0and- index[i] == j, then the- ith row of- tensoris added to the- jth row of- self.- The - dimth dimension of- tensormust have the same size as the length of- index(which must be a vector), and all other dimensions must match- self, or an error will be raised.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
 - Example: - >>> x = torch.ones(5, 3) >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 4, 2]) >>> x.index_add_(0, index, t) tensor([[ 2., 3., 4.], [ 1., 1., 1.], [ 8., 9., 10.], [ 1., 1., 1.], [ 5., 6., 7.]]) 
 - 
index_add(dim, index, tensor) → Tensor¶
- Out-of-place version of - torch.Tensor.index_add_()
 - 
index_copy_(dim, index, tensor) → Tensor¶
- Copies the elements of - tensorinto the- selftensor by selecting the indices in the order given in- index. For example, if- dim == 0and- index[i] == j, then the- ith row of- tensoris copied to the- jth row of- self.- The - dimth dimension of- tensormust have the same size as the length of- index(which must be a vector), and all other dimensions must match- self, or an error will be raised.- Parameters
 - Example: - >>> x = torch.zeros(5, 3) >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 4, 2]) >>> x.index_copy_(0, index, t) tensor([[ 1., 2., 3.], [ 0., 0., 0.], [ 7., 8., 9.], [ 0., 0., 0.], [ 4., 5., 6.]]) 
 - 
index_copy(dim, index, tensor) → Tensor¶
- Out-of-place version of - torch.Tensor.index_copy_()
 - 
index_fill_(dim, index, val) → Tensor¶
- Fills the elements of the - selftensor with value- valby selecting the indices in the order given in- index.- Parameters
 - Example::
- >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 2]) >>> x.index_fill_(1, index, -1) tensor([[-1., 2., -1.], [-1., 5., -1.], [-1., 8., -1.]]) 
 
 - 
index_fill(dim, index, value) → Tensor¶
- Out-of-place version of - torch.Tensor.index_fill_()
 - 
index_put_(indices, value, accumulate=False) → Tensor¶
- Puts values from the tensor - valueinto the tensor- selfusing the indices specified in- indices(which is a tuple of Tensors). The expression- tensor.index_put_(indices, value)is equivalent to- tensor[indices] = value. Returns- self.- If - accumulateis- True, the elements in- tensorare added to- self. If accumulate is- False, the behavior is undefined if indices contain duplicate elements.
 - 
index_put()¶
 - 
index_select(dim, index) → Tensor¶
 - 
inverse() → Tensor¶
- See - torch.inverse()
 - 
is_contiguous() → bool¶
- Returns True if - selftensor is contiguous in memory in C order.
 - 
is_floating_point() → bool¶
- Returns True if the data type of - selfis a floating point data type.
 - 
is_pinned()¶
- Returns true if this tensor resides in pinned memory 
 - 
is_set_to(tensor) → bool¶
- Returns True if this object refers to the same - THTensorobject from the Torch C API as the given tensor.
 - 
is_signed()¶
 - 
item() → number¶
- Returns the value of this tensor as a standard Python number. This only works for tensors with one element. For other cases, see - tolist().- This operation is not differentiable. - Example: - >>> x = torch.tensor([1.0]) >>> x.item() 1.0 
 - 
kthvalue(k, dim=None, keepdim=False) -> (Tensor, LongTensor)¶
- See - torch.kthvalue()
 - 
le(other) → Tensor¶
- See - torch.le()
 - 
lerp(end, weight) → Tensor¶
- See - torch.lerp()
 - 
log() → Tensor¶
- See - torch.log()
 - 
logdet() → Tensor¶
- See - torch.logdet()
 - 
log10() → Tensor¶
- See - torch.log10()
 - 
log1p() → Tensor¶
- See - torch.log1p()
 - 
log2() → Tensor¶
- See - torch.log2()
 - 
log_normal_(mean=1, std=2, *, generator=None)¶
- Fills - selftensor with numbers samples from the log-normal distribution parameterized by the given mean \(\mu\) and standard deviation \(\sigma\). Note that- meanand- stdare the mean and standard deviation of the underlying normal distribution, and not of the returned distribution:\[f(x) = \dfrac{1}{x \sigma \sqrt{2\pi}}\ e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}}\]
 - 
logsumexp(dim, keepdim=False) → Tensor¶
 - 
lt(other) → Tensor¶
- See - torch.lt()
 - 
map_(tensor, callable)¶
- Applies - callablefor each element in- selftensor and the given- tensorand stores the results in- selftensor.- selftensor and the given- tensormust be broadcastable.- The - callableshould have the signature:- def callable(a, b) -> number 
 - 
masked_scatter_(mask, source)¶
- Copies elements from - sourceinto- selftensor at positions where the- maskis one. The shape of- maskmust be broadcastable with the shape of the underlying tensor. The- sourceshould have at least as many elements as the number of ones in- mask- Parameters
- mask (ByteTensor) – the binary mask 
- source (Tensor) – the tensor to copy from 
 
 - Note - The - maskoperates on the- selftensor, not on the given- sourcetensor.
 - 
masked_scatter(mask, tensor) → Tensor¶
- Out-of-place version of - torch.Tensor.masked_scatter_()
 - 
masked_fill_(mask, value)¶
- Fills elements of - selftensor with- valuewhere- maskis one. The shape of- maskmust be broadcastable with the shape of the underlying tensor.- Parameters
- mask (ByteTensor) – the binary mask 
- value (float) – the value to fill in with 
 
 
 - 
masked_fill(mask, value) → Tensor¶
- Out-of-place version of - torch.Tensor.masked_fill_()
 - 
masked_select(mask) → Tensor¶
 - 
matmul(tensor2) → Tensor¶
- See - torch.matmul()
 - 
matrix_power(n) → Tensor¶
 - 
max(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)¶
- See - torch.max()
 - 
mean(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)¶
- See - torch.mean()
 - 
median(dim=None, keepdim=False) -> (Tensor, LongTensor)¶
- See - torch.median()
 - 
min(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)¶
- See - torch.min()
 - 
mm(mat2) → Tensor¶
- See - torch.mm()
 - 
mode(dim=None, keepdim=False) -> (Tensor, LongTensor)¶
- See - torch.mode()
 - 
mul(value) → Tensor¶
- See - torch.mul()
 - 
multinomial(num_samples, replacement=False, *, generator=None) → Tensor¶
 - 
mv(vec) → Tensor¶
- See - torch.mv()
 - 
mvlgamma(p) → Tensor¶
- See - torch.mvlgamma()
 - 
mvlgamma_(p) → Tensor¶
- In-place version of - mvlgamma()
 - 
narrow(dimension, start, length) → Tensor¶
- See - torch.narrow()- Example: - >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x.narrow(0, 0, 2) tensor([[ 1, 2, 3], [ 4, 5, 6]]) >>> x.narrow(1, 1, 2) tensor([[ 2, 3], [ 5, 6], [ 8, 9]]) 
 - 
ne(other) → Tensor¶
- See - torch.ne()
 - 
neg() → Tensor¶
- See - torch.neg()
 - 
nonzero() → LongTensor¶
- See - torch.nonzero()
 - 
norm(p='fro', dim=None, keepdim=False, dtype=None)¶
- See - torch.norm()
 - 
normal_(mean=0, std=1, *, generator=None) → Tensor¶
- Fills - selftensor with elements samples from the normal distribution parameterized by- meanand- std.
 - 
numel() → int¶
- See - torch.numel()
 - 
numpy() → numpy.ndarray¶
- Returns - selftensor as a NumPy- ndarray. This tensor and the returned- ndarrayshare the same underlying storage. Changes to- selftensor will be reflected in the- ndarrayand vice versa.
 - 
orgqr(input2) → Tensor¶
- See - torch.orgqr()
 - 
ormqr(input2, input3, left=True, transpose=False) → Tensor¶
- See - torch.ormqr()
 - 
permute(*dims) → Tensor¶
- Permute the dimensions of this tensor. - Parameters
- *dims (int...) – The desired ordering of dimensions 
 - Example - >>> x = torch.randn(2, 3, 5) >>> x.size() torch.Size([2, 3, 5]) >>> x.permute(2, 0, 1).size() torch.Size([5, 2, 3]) 
 - 
pin_memory()¶
 - 
pinverse() → Tensor¶
- See - torch.pinverse()
 - 
potrf(upper=True)¶
- See - torch.cholesky()
 - 
potri(upper=True) → Tensor¶
- See - torch.potri()
 - 
potrs(u, upper=True)¶
 - 
pow(exponent) → Tensor¶
- See - torch.pow()
 - 
prod(dim=None, keepdim=False, dtype=None) → Tensor¶
- See - torch.prod()
 - 
pstrf(upper=True)¶
- See - torch.pstrf()
 - 
put_(indices, tensor, accumulate=False) → Tensor¶
- Copies the elements from - tensorinto the positions specified by indices. For the purpose of indexing, the- selftensor is treated as if it were a 1-D tensor.- If - accumulateis- True, the elements in- tensorare added to- self. If accumulate is- False, the behavior is undefined if indices contain duplicate elements.- Parameters
 - Example: - >>> src = torch.tensor([[4, 3, 5], [6, 7, 8]]) >>> src.put_(torch.tensor([1, 3]), torch.tensor([9, 10])) tensor([[ 4, 9, 5], [ 10, 7, 8]]) 
 - 
qr() -> (Tensor, Tensor)¶
- See - torch.qr()
 - 
random_(from=0, to=None, *, generator=None) → Tensor¶
- Fills - selftensor with numbers sampled from the discrete uniform distribution over- [from, to - 1]. If not specified, the values are usually only bounded by- selftensor’s data type. However, for floating point types, if unspecified, range will be- [0, 2^mantissa]to ensure that every value is representable. For example, torch.tensor(1, dtype=torch.double).random_() will be uniform in- [0, 2^53].
 - 
reciprocal() → Tensor¶
 - 
reciprocal_() → Tensor¶
- In-place version of - reciprocal()
 - 
remainder(divisor) → Tensor¶
 - 
remainder_(divisor) → Tensor¶
- In-place version of - remainder()
 - 
renorm(p, dim, maxnorm) → Tensor¶
- See - torch.renorm()
 - 
repeat(*sizes) → Tensor¶
- Repeats this tensor along the specified dimensions. - Unlike - expand(), this function copies the tensor’s data.- Warning - torch.repeat()behaves differently from numpy.repeat, but is more similar to numpy.tile.- Parameters
- sizes (torch.Size or int...) – The number of times to repeat this tensor along each dimension 
 - Example: - >>> x = torch.tensor([1, 2, 3]) >>> x.repeat(4, 2) tensor([[ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3]]) >>> x.repeat(4, 2, 1).size() torch.Size([4, 2, 3]) 
 - 
requires_grad_(requires_grad=True) → Tensor¶
- Change if autograd should record operations on this tensor: sets this tensor’s - requires_gradattribute in-place. Returns this tensor.- require_grad_()’s main use case is to tell autograd to begin recording operations on a Tensor- tensor. If- tensorhas- requires_grad=False(because it was obtained through a DataLoader, or required preprocessing or initialization),- tensor.requires_grad_()makes it so that autograd will begin to record operations on- tensor.- Parameters
- requires_grad (bool) – If autograd should record operations on this tensor. Default: - True.
 - Example: - >>> # Let's say we want to preprocess some saved weights and use >>> # the result as new weights. >>> saved_weights = [0.1, 0.2, 0.3, 0.25] >>> loaded_weights = torch.tensor(saved_weights) >>> weights = preprocess(loaded_weights) # some function >>> weights tensor([-0.5503, 0.4926, -2.1158, -0.8303]) >>> # Now, start to record operations done to weights >>> weights.requires_grad_() >>> out = weights.pow(2).sum() >>> out.backward() >>> weights.grad tensor([-1.1007, 0.9853, -4.2316, -1.6606]) 
 - 
reshape(*shape) → Tensor¶
- Returns a tensor with the same data and number of elements as - selfbut with the specified shape. This method returns a view if- shapeis compatible with the current shape. See- torch.Tensor.view()on when it is possible to return a view.- See - torch.reshape()- Parameters
- shape (tuple of python:ints or int...) – the desired shape 
 
 - 
reshape_as(other) → Tensor¶
- Returns this tensor as the same shape as - other.- self.reshape_as(other)is equivalent to- self.reshape(other.sizes()). This method returns a view if- other.sizes()is compatible with the current shape. See- torch.Tensor.view()on when it is possible to return a view.- Please see - reshape()for more information about- reshape.- Parameters
- other ( - torch.Tensor) – The result tensor has the same shape as- other.
 
 - 
resize_(*sizes) → Tensor¶
- Resizes - selftensor to the specified size. If the number of elements is larger than the current storage size, then the underlying storage is resized to fit the new number of elements. If the number of elements is smaller, the underlying storage is not changed. Existing elements are preserved but any new memory is uninitialized.- Warning - This is a low-level method. The storage is reinterpreted as C-contiguous, ignoring the current strides (unless the target size equals the current size, in which case the tensor is left unchanged). For most purposes, you will instead want to use - view(), which checks for contiguity, or- reshape(), which copies data if needed. To change the size in-place with custom strides, see- set_().- Parameters
- sizes (torch.Size or int...) – the desired size 
 - Example: - >>> x = torch.tensor([[1, 2], [3, 4], [5, 6]]) >>> x.resize_(2, 2) tensor([[ 1, 2], [ 3, 4]]) 
 - 
resize_as_(tensor) → Tensor¶
- Resizes the - selftensor to be the same size as the specified- tensor. This is equivalent to- self.resize_(tensor.size()).
 - 
roll(shifts, dims) → Tensor¶
- See - torch.roll()
 - 
round() → Tensor¶
- See - torch.round()
 - 
rsqrt() → Tensor¶
- See - torch.rsqrt()
 - 
scatter_(dim, index, src) → Tensor¶
- Writes all values from the tensor - srcinto- selfat the indices specified in the- indextensor. For each value in- src, its output index is specified by its index in- srcfor- dimension != dimand by the corresponding value in- indexfor- dimension = dim.- For a 3-D tensor, - selfis updated as:- self[index[i][j][k]][j][k] = src[i][j][k] # if dim == 0 self[i][index[i][j][k]][k] = src[i][j][k] # if dim == 1 self[i][j][index[i][j][k]] = src[i][j][k] # if dim == 2 - This is the reverse operation of the manner described in - gather().- self,- indexand- src(if it is a Tensor) should have same number of dimensions. It is also required that- index.size(d) <= src.size(d)for all dimensions- d, and that- index.size(d) <= self.size(d)for all dimensions- d != dim.- Moreover, as for - gather(), the values of- indexmust be between- 0and- self.size(dim) - 1inclusive, and all values in a row along the specified dimension- dimmust be unique.- Parameters
- dim (int) – the axis along which to index 
- index (LongTensor) – the indices of elements to scatter, can be either empty or the same size of src. When empty, the operation returns identity 
- src (Tensor) – the source element(s) to scatter, incase value is not specified 
- value (float) – the source element(s) to scatter, incase src is not specified 
 
 - Example: - >>> x = torch.rand(2, 5) >>> x tensor([[ 0.3992, 0.2908, 0.9044, 0.4850, 0.6004], [ 0.5735, 0.9006, 0.6797, 0.4152, 0.1732]]) >>> torch.zeros(3, 5).scatter_(0, torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]]), x) tensor([[ 0.3992, 0.9006, 0.6797, 0.4850, 0.6004], [ 0.0000, 0.2908, 0.0000, 0.4152, 0.0000], [ 0.5735, 0.0000, 0.9044, 0.0000, 0.1732]]) >>> z = torch.zeros(2, 4).scatter_(1, torch.tensor([[2], [3]]), 1.23) >>> z tensor([[ 0.0000, 0.0000, 1.2300, 0.0000], [ 0.0000, 0.0000, 0.0000, 1.2300]]) 
 - 
scatter(dim, index, source) → Tensor¶
- Out-of-place version of - torch.Tensor.scatter_()
 - 
scatter_add_(dim, index, other) → Tensor¶
- Adds all values from the tensor - otherinto- selfat the indices specified in the- indextensor in a similar fashion as- scatter_(). For each value in- other, it is added to an index in- selfwhich is specified by its index in- otherfor- dimension != dimand by the corresponding value in- indexfor- dimension = dim.- For a 3-D tensor, - selfis updated as:- self[index[i][j][k]][j][k] += other[i][j][k] # if dim == 0 self[i][index[i][j][k]][k] += other[i][j][k] # if dim == 1 self[i][j][index[i][j][k]] += other[i][j][k] # if dim == 2 - self,- indexand- othershould have same number of dimensions. It is also required that- index.size(d) <= other.size(d)for all dimensions- d, and that- index.size(d) <= self.size(d)for all dimensions- d != dim.- Moreover, as for - gather(), the values of- indexmust be between- 0and- self.size(dim) - 1inclusive, and all values in a row along the specified dimension- dimmust be unique.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
 - Example: - >>> x = torch.rand(2, 5) >>> x tensor([[0.7404, 0.0427, 0.6480, 0.3806, 0.8328], [0.7953, 0.2009, 0.9154, 0.6782, 0.9620]]) >>> torch.ones(3, 5).scatter_add_(0, torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]]), x) tensor([[1.7404, 1.2009, 1.9154, 1.3806, 1.8328], [1.0000, 1.0427, 1.0000, 1.6782, 1.0000], [1.7953, 1.0000, 1.6480, 1.0000, 1.9620]]) 
 - 
scatter_add(dim, index, source) → Tensor¶
- Out-of-place version of - torch.Tensor.scatter_add_()
 - 
select(dim, index) → Tensor¶
- Slices the - selftensor along the selected dimension at the given index. This function returns a tensor with the given dimension removed.- Note - select()is equivalent to slicing. For example,- tensor.select(0, index)is equivalent to- tensor[index]and- tensor.select(2, index)is equivalent to- tensor[:,:,index].
 - 
set_(source=None, storage_offset=0, size=None, stride=None) → Tensor¶
- Sets the underlying storage, size, and strides. If - sourceis a tensor,- selftensor will share the same storage and have the same size and strides as- source. Changes to elements in one tensor will be reflected in the other.- If - sourceis a- Storage, the method sets the underlying storage, offset, size, and stride.
 - Moves the underlying storage to shared memory. - This is a no-op if the underlying storage is already in shared memory and for CUDA tensors. Tensors in shared memory cannot be resized. 
 - 
sigmoid() → Tensor¶
- See - torch.sigmoid()
 - 
sign() → Tensor¶
- See - torch.sign()
 - 
sin() → Tensor¶
- See - torch.sin()
 - 
sinh() → Tensor¶
- See - torch.sinh()
 - 
size() → torch.Size¶
- Returns the size of the - selftensor. The returned value is a subclass of- tuple.- Example: - >>> torch.empty(3, 4, 5).size() torch.Size([3, 4, 5]) 
 - 
slogdet() -> (Tensor, Tensor)¶
- See - torch.slogdet()
 - 
solve(A) → Tensor, Tensor¶
- See - torch.solve()
 - 
sort(dim=-1, descending=False) -> (Tensor, LongTensor)¶
- See - torch.sort()
 - 
split(split_size, dim=0)¶
- See - torch.split()
 - 
sparse_mask(input, mask) → Tensor¶
- Returns a new SparseTensor with values from Tensor - inputfiltered by indices of- maskand values are ignored.- inputand- maskmust have the same shape.- Parameters
- input (Tensor) – an input Tensor 
- mask (SparseTensor) – a SparseTensor which we filter - inputbased on its indices
 
 - Example: - >>> nnz = 5 >>> dims = [5, 5, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> D.sparse_mask(S) tensor(indices=tensor([[0, 0, 0, 2], [0, 1, 4, 3]]), values=tensor([[[ 1.6550, 0.2397], [-0.1611, -0.0779]], [[ 0.2326, -1.0558], [ 1.4711, 1.9678]], [[-0.5138, -0.0411], [ 1.9417, 0.5158]], [[ 0.0793, 0.0036], [-0.2569, -0.1055]]]), size=(5, 5, 2, 2), nnz=4, layout=torch.sparse_coo) 
 - 
sqrt() → Tensor¶
- See - torch.sqrt()
 - 
squeeze(dim=None) → Tensor¶
- See - torch.squeeze()
 - 
std(dim=None, unbiased=True, keepdim=False) → Tensor¶
- See - torch.std()
 - 
storage() → torch.Storage¶
- Returns the underlying storage 
 - 
storage_offset() → int¶
- Returns - selftensor’s offset in the underlying storage in terms of number of storage elements (not bytes).- Example: - >>> x = torch.tensor([1, 2, 3, 4, 5]) >>> x.storage_offset() 0 >>> x[3:].storage_offset() 3 
 - 
storage_type()¶
 - 
stride(dim) → tuple or int¶
- Returns the stride of - selftensor.- Stride is the jump necessary to go from one element to the next one in the specified dimension - dim. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension- dim.- Parameters
- dim (int, optional) – the desired dimension in which stride is required 
 - Example: - >>> x = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) >>> x.stride() (5, 1) >>>x.stride(0) 5 >>> x.stride(-1) 1 
 - 
sub(value, other) → Tensor¶
- Subtracts a scalar or tensor from - selftensor. If both- valueand- otherare specified, each element of- otheris scaled by- valuebefore being used.- When - otheris a tensor, the shape of- othermust be broadcastable with the shape of the underlying tensor.
 - 
sum(dim=None, keepdim=False, dtype=None) → Tensor¶
- See - torch.sum()
 - 
svd(some=True, compute_uv=True) -> (Tensor, Tensor, Tensor)¶
- See - torch.svd()
 - 
symeig(eigenvectors=False, upper=True) -> (Tensor, Tensor)¶
- See - torch.symeig()
 - 
to(*args, **kwargs) → Tensor¶
- Performs Tensor dtype and/or device conversion. A - torch.dtypeand- torch.deviceare inferred from the arguments of- self.to(*args, **kwargs).- Note - If the - selfTensor already has the correct- torch.dtypeand- torch.device, then- selfis returned. Otherwise, the returned tensor is a copy of- selfwith the desired- torch.dtypeand- torch.device.- Here are the ways to call - to:- 
to(dtype, non_blocking=False, copy=False) → Tensor
- Returns a Tensor with the specified - dtype
 - 
to(device=None, dtype=None, non_blocking=False, copy=False) → Tensor
- Returns a Tensor with the specified - deviceand (optional)- dtype. If- dtypeis- Noneit is inferred to be- self.dtype. When- non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor. When- copyis set, a new Tensor is created even when the Tensor already matches the desired conversion.
 - 
to(other, non_blocking=False, copy=False) → Tensor
- Returns a Tensor with same - torch.dtypeand- torch.deviceas the Tensor- other. When- non_blocking, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor. When- copyis set, a new Tensor is created even when the Tensor already matches the desired conversion.
 - Example: - >>> tensor = torch.randn(2, 2) # Initially dtype=float32, device=cpu >>> tensor.to(torch.float64) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64) >>> cuda0 = torch.device('cuda:0') >>> tensor.to(cuda0) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], device='cuda:0') >>> tensor.to(cuda0, dtype=torch.float64) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0') >>> other = torch.randn((), dtype=torch.float64, device=cuda0) >>> tensor.to(other, non_blocking=True) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0') 
- 
 - 
take(indices) → Tensor¶
- See - torch.take()
 - 
tan()¶
 - 
tanh() → Tensor¶
- See - torch.tanh()
 - 
tolist()¶
- ” tolist() -> list or number - Returns the tensor as a (nested) list. For scalars, a standard Python number is returned, just like with - item(). Tensors are automatically moved to the CPU first if necessary.- This operation is not differentiable. - Examples: - >>> a = torch.randn(2, 2) >>> a.tolist() [[0.012766935862600803, 0.5415473580360413], [-0.08909505605697632, 0.7729271650314331]] >>> a[0,0].tolist() 0.012766935862600803 
 - 
topk(k, dim=None, largest=True, sorted=True) -> (Tensor, LongTensor)¶
- See - torch.topk()
 - 
to_sparse(sparseDims) → Tensor¶
- Returns a sparse copy of the tensor. PyTorch supports sparse tensors in coordinate format. - Parameters
- sparseDims (int, optional) – the number of sparse dimensions to include in the new sparse tensor 
 - Example: - >>> d = torch.tensor([[0, 0, 0], [9, 0, 10], [0, 0, 0]]) >>> d tensor([[ 0, 0, 0], [ 9, 0, 10], [ 0, 0, 0]]) >>> d.to_sparse() tensor(indices=tensor([[1, 1], [0, 2]]), values=tensor([ 9, 10]), size=(3, 3), nnz=2, layout=torch.sparse_coo) >>> d.to_sparse(1) tensor(indices=tensor([[1]]), values=tensor([[ 9, 0, 10]]), size=(3, 3), nnz=1, layout=torch.sparse_coo) 
 - 
trace() → Tensor¶
- See - torch.trace()
 - 
transpose(dim0, dim1) → Tensor¶
 - 
transpose_(dim0, dim1) → Tensor¶
- In-place version of - transpose()
 - 
tril(k=0) → Tensor¶
- See - torch.tril()
 - 
triu(k=0) → Tensor¶
- See - torch.triu()
 - 
trtrs(A, upper=True, transpose=False, unitriangular=False) -> (Tensor, Tensor)¶
- See - torch.trtrs()
 - 
trunc() → Tensor¶
- See - torch.trunc()
 - 
type(dtype=None, non_blocking=False, **kwargs) → str or Tensor¶
- Returns the type if dtype is not provided, else casts this object to the specified type. - If this is already of the correct type, no copy is performed and the original object is returned. - Parameters
- dtype (type or string) – The desired type 
- non_blocking (bool) – If - True, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.
- **kwargs – For compatibility, may contain the key - asyncin place of the- non_blockingargument. The- asyncarg is deprecated.
 
 
 - 
type_as(tensor) → Tensor¶
- Returns this tensor cast to the type of the given tensor. - This is a no-op if the tensor is already of the correct type. This is equivalent to - self.type(tensor.type())- Parameters
- tensor (Tensor) – the tensor which has the desired type 
 
 - 
unfold(dim, size, step) → Tensor¶
- Returns a tensor which contains all slices of size - sizefrom- selftensor in the dimension- dim.- Step between two slices is given by - step.- If sizedim is the size of dimension - dimfor- self, the size of dimension- dimin the returned tensor will be (sizedim - size) / step + 1.- An additional dimension of size - sizeis appended in the returned tensor.- Parameters
 - Example: - >>> x = torch.arange(1., 8) >>> x tensor([ 1., 2., 3., 4., 5., 6., 7.]) >>> x.unfold(0, 2, 1) tensor([[ 1., 2.], [ 2., 3.], [ 3., 4.], [ 4., 5.], [ 5., 6.], [ 6., 7.]]) >>> x.unfold(0, 2, 2) tensor([[ 1., 2.], [ 3., 4.], [ 5., 6.]]) 
 - 
uniform_(from=0, to=1) → Tensor¶
- Fills - selftensor with numbers sampled from the continuous uniform distribution:\[P(x) = \dfrac{1}{\text{to} - \text{from}} \]
 - 
unique(sorted=True, return_inverse=False, dim=None)¶
- Returns the unique scalar elements of the tensor as a 1-D tensor. - See - torch.unique()
 - 
unsqueeze(dim) → Tensor¶
 - 
unsqueeze_(dim) → Tensor¶
- In-place version of - unsqueeze()
 - 
var(dim=None, unbiased=True, keepdim=False) → Tensor¶
- See - torch.var()
 - 
view(*shape) → Tensor¶
- Returns a new tensor with the same data as the - selftensor but of a different- shape.- The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions \(d, d+1, \dots, d+k\) that satisfy the following contiguity-like condition that \(\forall i = 0, \dots, k-1\), \[\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1]\]- Otherwise, - contiguous()needs to be called before the tensor can be viewed. See also:- reshape(), which returns a view if the shapes are compatible, and copies (equivalent to calling- contiguous()) otherwise.- Parameters
- shape (torch.Size or int...) – the desired size 
 - Example: - >>> x = torch.randn(4, 4) >>> x.size() torch.Size([4, 4]) >>> y = x.view(16) >>> y.size() torch.Size([16]) >>> z = x.view(-1, 8) # the size -1 is inferred from other dimensions >>> z.size() torch.Size([2, 8]) >>> a = torch.randn(1, 2, 3, 4) >>> a.size() torch.Size([1, 2, 3, 4]) >>> b = a.transpose(1, 2) # Swaps 2nd and 3rd dimension >>> b.size() torch.Size([1, 3, 2, 4]) >>> c = a.view(1, 3, 2, 4) # Does not change tensor layout in memory >>> c.size() torch.Size([1, 3, 2, 4]) >>> torch.equal(b, c) False 
 - 
view_as(other) → Tensor¶
- View this tensor as the same size as - other.- self.view_as(other)is equivalent to- self.view(other.size()).- Please see - view()for more information about- view.- Parameters
- other ( - torch.Tensor) – The result tensor has the same size as- other.
 
 - 
zero_() → Tensor¶
- Fills - selftensor with zeros.
 
- 
class torch.ByteTensor¶
- The following methods are unique to - torch.ByteTensor.- 
all()¶
- 
all() → bool
 - Returns True if all elements in the tensor are non-zero, False otherwise. - Example: - >>> a = torch.randn(1, 3).byte() % 2 >>> a tensor([[1, 0, 0]], dtype=torch.uint8) >>> a.all() tensor(0, dtype=torch.uint8) - 
all(dim, keepdim=False, out=None) → Tensor
 - Returns True if all elements in each row of the tensor in the given dimension - dimare non-zero, False otherwise.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 fewer dimension than- input.- Parameters
 - Example: - >>> a = torch.randn(4, 2).byte() % 2 >>> a tensor([[0, 0], [0, 0], [0, 1], [1, 1]], dtype=torch.uint8) >>> a.all(dim=1) tensor([0, 0, 0, 1], dtype=torch.uint8) 
- 
 - 
any()¶
- 
any() → bool
 - Returns True if any elements in the tensor are non-zero, False otherwise. - Example: - >>> a = torch.randn(1, 3).byte() % 2 >>> a tensor([[0, 0, 1]], dtype=torch.uint8) >>> a.any() tensor(1, dtype=torch.uint8) - 
any(dim, keepdim=False, out=None) → Tensor
 - Returns True if any elements in each row of the tensor in the given dimension - dimare non-zero, False otherwise.- If - keepdimis- True, the output tensor is of the same size as- inputexcept in the dimension- dimwhere it is of size 1. Otherwise,- dimis squeezed (see- torch.squeeze()), resulting in the output tensor having 1 fewer dimension than- input.- Parameters
 - Example: - >>> a = torch.randn(4, 2).byte() % 2 >>> a tensor([[1, 0], [0, 0], [0, 1], [0, 0]], dtype=torch.uint8) >>> a.any(dim=1) tensor([1, 0, 1, 0], dtype=torch.uint8) 
- 
 
- 
Tensor Attributes¶
Each torch.Tensor has a torch.dtype, torch.device, and torch.layout.
torch.dtype¶
- 
class torch.dtype¶
A torch.dtype is an object that represents the data type of a
torch.Tensor. PyTorch has eight different data types:
| Data type | dtype | Tensor types | 
|---|---|---|
| 32-bit floating point | 
 | 
 | 
| 64-bit floating point | 
 | 
 | 
| 16-bit floating point | 
 | 
 | 
| 8-bit integer (unsigned) | 
 | 
 | 
| 8-bit integer (signed) | 
 | 
 | 
| 16-bit integer (signed) | 
 | 
 | 
| 32-bit integer (signed) | 
 | 
 | 
| 64-bit integer (signed) | 
 | 
 | 
To find out if a torch.dtype is a floating point data type, the property is_floating_point
can be used, which returns True if the data type is a floating point data type.
torch.device¶
- 
class torch.device¶
A torch.device is an object representing the device on which a torch.Tensor is
or will be allocated.
The torch.device contains a device type ('cpu' or 'cuda') and optional device ordinal for the
device type.  If the device ordinal is not present, this represents the current device for the device type;
e.g. a torch.Tensor constructed with device 'cuda' is equivalent to 'cuda:X' where X is the result of
torch.cuda.current_device().
A torch.Tensor’s device can be accessed via the Tensor.device property.
A torch.device can be constructed via a string or via a string and device ordinal
Via a string:
>>> torch.device('cuda:0')
device(type='cuda', index=0)
>>> torch.device('cpu')
device(type='cpu')
>>> torch.device('cuda')  # current cuda device
device(type='cuda')
Via a string and device ordinal:
>>> torch.device('cuda', 0)
device(type='cuda', index=0)
>>> torch.device('cpu', 0)
device(type='cpu', index=0)
Note
The torch.device argument in functions can generally be substituted with a string.
This allows for fast prototyping of code.
>>> # Example of a function that takes in a torch.device
>>> cuda1 = torch.device('cuda:1')
>>> torch.randn((2,3), device=cuda1)
>>> # You can substitute the torch.device with a string
>>> torch.randn((2,3), device='cuda:1')
Note
For legacy reasons, a device can be constructed via a single device ordinal, which is treated
as a cuda device.  This matches Tensor.get_device(), which returns an ordinal for cuda
tensors and is not supported for cpu tensors.
>>> torch.device(1)
device(type='cuda', index=1)
Note
Methods which take a device will generally accept a (properly formatted) string or (legacy) integer device ordinal, i.e. the following are all equivalent:
>>> torch.randn((2,3), device=torch.device('cuda:1'))
>>> torch.randn((2,3), device='cuda:1')
>>> torch.randn((2,3), device=1)  # legacy
torch.layout¶
- 
class torch.layout¶
A torch.layout is an object that represents the memory layout of a
torch.Tensor. Currently, we support torch.strided (dense Tensors)
and have experimental support for torch.sparse_coo (sparse COO Tensors).
torch.strided represents dense Tensors and is the memory layout that
is most commonly used. Each strided tensor has an associated
torch.Storage, which holds its data. These tensors provide
multi-dimensional, strided
view of a storage. Strides are a list of integers: the k-th stride
represents the jump in the memory necessary to go from one element to the
next one in the k-th dimension of the Tensor. This concept makes it possible
to perform many tensor operations efficiently.
Example:
>>> x = torch.Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
>>> x.stride()
(5, 1)
>>> x.t().stride()
(1, 5)
For more information on torch.sparse_coo tensors, see torch.sparse.
Type Info¶
The numerical properties of a torch.dtype can be accessed through either the torch.finfo or the torch.iinfo.
torch.finfo¶
- 
class torch.finfo¶
A torch.finfo is an object that represents the numerical properties of a floating point
torch.dtype, (i.e. torch.float32, torch.float64, and torch.float16). This is similar to numpy.finfo.
A torch.finfo provides the following attributes:
| Name | Type | Description | 
|---|---|---|
| bits | int | The number of bits occupied by the type. | 
| eps | float | The smallest representable number such that  | 
| max | float | The largest representable number. | 
| min | float | The smallest representable number (typically  | 
| tiny | float | The smallest positive representable number. | 
Note
The constructor of torch.finfo can be called without argument, in which case the class is created for the pytorch default dtype (as returned by torch.get_default_dtype()).
torch.iinfo¶
- 
class torch.iinfo¶
A torch.iinfo is an object that represents the numerical properties of a integer
torch.dtype (i.e. torch.uint8, torch.int8, torch.int16, torch.int32, and torch.int64). This is similar to numpy.iinfo.
A torch.iinfo provides the following attributes:
| Name | Type | Description | 
|---|---|---|
| bits | int | The number of bits occupied by the type. | 
| max | int | The largest representable number. | 
| min | int | The smallest representable number. | 
torch.sparse¶
Warning
This API is currently experimental and may change in the near future.
Torch supports sparse tensors in COO(rdinate) format, which can efficiently store and process tensors for which the majority of elements are zeros.
A sparse tensor is represented as a pair of dense tensors: a tensor of values and a 2D tensor of indices. A sparse tensor can be constructed by providing these two tensors, as well as the size of the sparse tensor (which cannot be inferred from these tensors!) Suppose we want to define a sparse tensor with the entry 3 at location (0, 2), entry 4 at location (1, 0), and entry 5 at location (1, 2). We would then write:
>>> i = torch.LongTensor([[0, 1, 1],
                          [2, 0, 2]])
>>> v = torch.FloatTensor([3, 4, 5])
>>> torch.sparse.FloatTensor(i, v, torch.Size([2,3])).to_dense()
 0  0  3
 4  0  5
[torch.FloatTensor of size 2x3]
Note that the input to LongTensor is NOT a list of index tuples. If you want to write your indices this way, you should transpose before passing them to the sparse constructor:
>>> i = torch.LongTensor([[0, 2], [1, 0], [1, 2]])
>>> v = torch.FloatTensor([3,      4,      5    ])
>>> torch.sparse.FloatTensor(i.t(), v, torch.Size([2,3])).to_dense()
 0  0  3
 4  0  5
[torch.FloatTensor of size 2x3]
You can also construct hybrid sparse tensors, where only the first n dimensions are sparse, and the rest of the dimensions are dense.
>>> i = torch.LongTensor([[2, 4]])
>>> v = torch.FloatTensor([[1, 3], [5, 7]])
>>> torch.sparse.FloatTensor(i, v).to_dense()
 0  0
 0  0
 1  3
 0  0
 5  7
[torch.FloatTensor of size 5x2]
An empty sparse tensor can be constructed by specifying its size:
>>> torch.sparse.FloatTensor(2, 3)
SparseFloatTensor of size 2x3 with indices:
[torch.LongTensor with no dimension]
and values:
[torch.FloatTensor with no dimension]
- SparseTensor has the following invariants:
- sparse_dim + dense_dim = len(SparseTensor.shape) 
- SparseTensor._indices().shape = (sparse_dim, nnz) 
- SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:]) 
 
Since SparseTensor._indices() is always a 2D tensor, the smallest sparse_dim = 1. Therefore, representation of a SparseTensor of sparse_dim = 0 is simply a dense tensor.
Note
Our sparse tensor format permits uncoalesced sparse tensors, where there may be duplicate coordinates in the indices; in this case, the interpretation is that the value at that index is the sum of all duplicate value entries. Uncoalesced tensors permit us to implement certain operators more efficiently.
For the most part, you shouldn’t have to care whether or not a sparse tensor is coalesced or not, as most operations will work identically given a coalesced or uncoalesced sparse tensor. However, there are two cases in which you may need to care.
First, if you repeatedly perform an operation that can produce
duplicate entries (e.g., torch.sparse.FloatTensor.add()), you
should occasionally coalesce your sparse tensors to prevent
them from growing too large.
Second, some operators will produce different values depending on
whether or not they are coalesced or not (e.g.,
torch.sparse.FloatTensor._values() and
torch.sparse.FloatTensor._indices(), as well as
torch.Tensor.sparse_mask()).  These operators are
prefixed by an underscore to indicate that they reveal internal
implementation details and should be used with care, since code
that works with coalesced sparse tensors may not work with
uncoalesced sparse tensors; generally speaking, it is safest
to explicitly coalesce before working with these operators.
For example, suppose that we wanted to implement an operator
by operating directly on torch.sparse.FloatTensor._values().
Multiplication by a scalar can be implemented in the obvious way,
as multiplication distributes over addition; however, square root
cannot be implemented directly, since sqrt(a + b) != sqrt(a) +
sqrt(b) (which is what would be computed if you were given an
uncoalesced tensor.)
- 
class torch.sparse.FloatTensor¶
- 
add()¶
 - 
add_()¶
 - 
clone()¶
 - 
dim()¶
 - 
div()¶
 - 
div_()¶
 - 
get_device()¶
 - 
hspmm()¶
 - 
mm()¶
 - 
mul()¶
 - 
mul_()¶
 - 
narrow_copy()¶
 - 
resizeAs_()¶
 - 
size()¶
 - 
spadd()¶
 - 
spmm()¶
 - 
sspaddmm()¶
 - 
sspmm()¶
 - 
sub()¶
 - 
sub_()¶
 - 
t_()¶
 - 
toDense()¶
 - 
transpose()¶
 - 
transpose_()¶
 - 
zero_()¶
 - 
coalesce()¶
 - 
is_coalesced()¶
 - 
_indices()¶
 - 
_values()¶
 - 
_nnz()¶
 
- 
Functions¶
- 
torch.sparse.addmm(mat, mat1, mat2, beta=1, alpha=1)¶
- This function does exact same thing as - torch.addmm()in the forward, except that it supports backward for sparse matrix- mat1.- mat1need to have sparse_dim = 2. Note that the gradients of- mat1is a coalesced sparse tensor.
- 
torch.sparse.mm(mat1, mat2)¶
- Performs a matrix multiplication of the sparse matrix - mat1and dense matrix- mat2. Similar to- torch.mm(), If- mat1is a \((n \times m)\) tensor,- mat2is a \((m \times p)\) tensor, out will be a \((n \times p)\) dense tensor.- mat1need to have sparse_dim = 2. This function also supports backward for both matrices. Note that the gradients of- mat1is a coalesced sparse tensor.- Parameters
- mat1 (SparseTensor) – the first sparse matrix to be multiplied 
- mat2 (Tensor) – the second dense matrix to be multiplied 
 
 - Example: - >>> a = torch.randn(2, 3).to_sparse().requires_grad_(True) >>> a tensor(indices=tensor([[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]), values=tensor([ 1.5901, 0.0183, -0.6146, 1.8061, -0.0112, 0.6302]), size=(2, 3), nnz=6, layout=torch.sparse_coo, requires_grad=True) >>> b = torch.randn(3, 2, requires_grad=True) >>> b tensor([[-0.6479, 0.7874], [-1.2056, 0.5641], [-1.1716, -0.9923]], requires_grad=True) >>> y = torch.sparse.mm(a, b) >>> y tensor([[-0.3323, 1.8723], [-1.8951, 0.7904]], grad_fn=<SparseAddmmBackward>) >>> y.sum().backward() >>> a.grad tensor(indices=tensor([[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]), values=tensor([ 0.1394, -0.6415, -2.1639, 0.1394, -0.6415, -2.1639]), size=(2, 3), nnz=6, layout=torch.sparse_coo) 
- 
torch.sparse.sum(input, dim=None, dtype=None)¶
- Returns the sum of each row of SparseTensor - inputin the given dimensions- dim. If- dimis a list of dimensions, reduce over all of them. When sum over all- sparse_dim, this method returns a Tensor instead of SparseTensor.- All summed - dimare squeezed (see- torch.squeeze()), resulting an output tensor having- dimfewer dimensions than- input.- During backward, only gradients at - nnzlocations of- inputwill propagate back. Note that the gradients of- inputis coalesced.- Parameters
 - Example: - >>> nnz = 3 >>> dims = [5, 5, 2, 3] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size) >>> S tensor(indices=tensor([[2, 0, 3], [2, 4, 1]]), values=tensor([[[-0.6438, -1.6467, 1.4004], [ 0.3411, 0.0918, -0.2312]], [[ 0.5348, 0.0634, -2.0494], [-0.7125, -1.0646, 2.1844]], [[ 0.1276, 0.1874, -0.6334], [-1.9682, -0.5340, 0.7483]]]), size=(5, 5, 2, 3), nnz=3, layout=torch.sparse_coo) # when sum over only part of sparse_dims, return a SparseTensor >>> torch.sparse.sum(S, [1, 3]) tensor(indices=tensor([[0, 2, 3]]), values=tensor([[-1.4512, 0.4073], [-0.8901, 0.2017], [-0.3183, -1.7539]]), size=(5, 2), nnz=3, layout=torch.sparse_coo) # when sum over all sparse dim, return a dense Tensor # with summed dims squeezed >>> torch.sparse.sum(S, [0, 1, 3]) tensor([-2.6596, -1.1450]) 
torch.cuda¶
This package adds support for CUDA tensor types, that implement the same function as CPU tensors, but they utilize GPUs for computation.
It is lazily initialized, so you can always import it, and use
is_available() to determine if your system supports CUDA.
cuda-semantics has more details about working with CUDA.
- 
torch.cuda.current_blas_handle()¶
- Returns cublasHandle_t pointer to current cuBLAS handle 
- 
torch.cuda.current_device()¶
- Returns the index of a currently selected device. 
- 
torch.cuda.current_stream(device=None)¶
- Returns the currently selected - Streamfor a given device.- Parameters
- device (torch.device or int, optional) – selected device. Returns the currently selected - Streamfor the current device, given by- current_device(), if- deviceis- None(default).
 
- 
torch.cuda.default_stream(device=None)¶
- Returns the default - Streamfor a given device.- Parameters
- device (torch.device or int, optional) – selected device. Returns the default - Streamfor the current device, given by- current_device(), if- deviceis- None(default).
 
- 
class torch.cuda.device(device)¶
- Context-manager that changes the selected device. - Parameters
- device (torch.device or int) – device index to select. It’s a no-op if this argument is a negative integer or - None.
 
- 
torch.cuda.device_count()¶
- Returns the number of GPUs available. 
- 
class torch.cuda.device_of(obj)¶
- Context-manager that changes the current device to that of given object. - You can use both tensors and storages as arguments. If a given object is not allocated on a GPU, this is a no-op. - Parameters
- obj (Tensor or Storage) – object allocated on the selected device. 
 
- 
torch.cuda.empty_cache()¶
- Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi. - Note - empty_cache()doesn’t increase the amount of GPU memory available for PyTorch. See cuda-memory-management for more details about GPU memory management.
- 
torch.cuda.get_device_capability(device=None)¶
- Gets the cuda capability of a device. - Parameters
- device (torch.device or int, optional) – device for which to return the device capability. This function is a no-op if this argument is a negative integer. Uses the current device, given by - current_device(), if- deviceis- None(default).
- Returns
- the major and minor cuda capability of the device 
- Return type
 
- 
torch.cuda.get_device_name(device=None)¶
- Gets the name of a device. - Parameters
- device (torch.device or int, optional) – device for which to return the name. This function is a no-op if this argument is a negative integer. Uses the current device, given by - current_device(), if- deviceis- None(default).
 
- 
torch.cuda.init()¶
- Initialize PyTorch’s CUDA state. You may need to call this explicitly if you are interacting with PyTorch via its C API, as Python bindings for CUDA functionality will not be until this initialization takes place. Ordinary users should not need this, as all of PyTorch’s CUDA methods automatically initialize CUDA state on-demand. - Does nothing if the CUDA state is already initialized. 
- 
torch.cuda.is_available()¶
- Returns a bool indicating if CUDA is currently available. 
- 
torch.cuda.max_memory_allocated(device=None)¶
- Returns the maximum GPU memory occupied by tensors in bytes for a given device. - By default, this returns the peak allocated memory since the beginning of this program. - reset_max_memory_allocated()can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak allocated memory usage of each iteration in a training loop.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.max_memory_cached(device=None)¶
- Returns the maximum GPU memory managed by the caching allocator in bytes for a given device. - By default, this returns the peak cached memory since the beginning of this program. - reset_max_memory_cached()can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak cached memory amount of each iteration in a training loop.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.memory_allocated(device=None)¶
- Returns the current GPU memory occupied by tensors in bytes for a given device. - Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - This is likely less than the amount shown in nvidia-smi since some unused memory can be held by the caching allocator and some context needs to be created on GPU. See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.memory_cached(device=None)¶
- Returns the current GPU memory managed by the caching allocator in bytes for a given device. - Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.reset_max_memory_allocated(device=None)¶
- Resets the starting point in tracking maximum GPU memory occupied by tensors for a given device. - See - max_memory_allocated()for details.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.reset_max_memory_cached(device=None)¶
- Resets the starting point in tracking maximum GPU memory managed by the caching allocator for a given device. - See - max_memory_cached()for details.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.set_device(device)¶
- Sets the current device. - Usage of this function is discouraged in favor of - device. In most cases it’s better to use- CUDA_VISIBLE_DEVICESenvironmental variable.- Parameters
- device (torch.device or int) – selected device. This function is a no-op if this argument is negative. 
 
- 
torch.cuda.stream(stream)¶
- Context-manager that selects a given stream. - All CUDA kernels queued within its context will be enqueued on a selected stream. - Parameters
- stream (Stream) – selected stream. This manager is a no-op if it’s - None.
 - Note - Streams are per-device. If the selected stream is not on the current device, this function will also change the current device to match the stream. 
- 
torch.cuda.synchronize()¶
- Waits for all kernels in all streams on current device to complete. 
Random Number Generator¶
- 
torch.cuda.get_rng_state(device=device(type='cuda'))¶
- Returns the random number generator state of the current GPU as a ByteTensor. - Parameters
- device (torch.device or int, optional) – The device to return the RNG state of. Default: - torch.device('cuda')(i.e., the current CUDA device).
 - Warning - This function eagerly initializes CUDA. 
- 
torch.cuda.get_rng_state_all()¶
- Returns a tuple of ByteTensor representing the random number states of all devices. 
- 
torch.cuda.set_rng_state(new_state, device=device(type='cuda'))¶
- Sets the random number generator state of the current GPU. - Parameters
- new_state (torch.ByteTensor) – The desired state 
- device (torch.device or int, optional) – The device to set the RNG state. Default: - torch.device('cuda')(i.e., the current CUDA device).
 
 
- 
torch.cuda.set_rng_state_all(new_states)¶
- Sets the random number generator state of all devices. - Parameters
- new_state (tuple of torch.ByteTensor) – The desired state for each device 
 
- 
torch.cuda.manual_seed(seed)¶
- Sets the seed for generating random numbers for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. - Parameters
- seed (int) – The desired seed. 
 - Warning - If you are working with a multi-GPU model, this function is insufficient to get determinism. To seed all GPUs, use - manual_seed_all().
- 
torch.cuda.manual_seed_all(seed)¶
- Sets the seed for generating random numbers on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. - Parameters
- seed (int) – The desired seed. 
 
- 
torch.cuda.seed()¶
- Sets the seed for generating random numbers to a random number for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. - Warning - If you are working with a multi-GPU model, this function will only initialize the seed on one GPU. To initialize all GPUs, use - seed_all().
- 
torch.cuda.seed_all()¶
- Sets the seed for generating random numbers to a random number on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. 
- 
torch.cuda.initial_seed()¶
- Returns the current random seed of the current GPU. - Warning - This function eagerly initializes CUDA. 
Communication collectives¶
- 
torch.cuda.comm.broadcast(tensor, devices)¶
- Broadcasts a tensor to a number of GPUs. - Parameters
- tensor (Tensor) – tensor to broadcast. 
- devices (Iterable) – an iterable of devices among which to broadcast. Note that it should be like (src, dst1, dst2, …), the first element of which is the source device to broadcast from. 
 
- Returns
- A tuple containing copies of the - tensor, placed on devices corresponding to indices from- devices.
 
- 
torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)¶
- Broadcasts a sequence tensors to the specified GPUs. Small tensors are first coalesced into a buffer to reduce the number of synchronizations. - Parameters
- tensors (sequence) – tensors to broadcast. 
- devices (Iterable) – an iterable of devices among which to broadcast. Note that it should be like (src, dst1, dst2, …), the first element of which is the source device to broadcast from. 
- buffer_size (int) – maximum size of the buffer used for coalescing 
 
- Returns
- A tuple containing copies of the - tensor, placed on devices corresponding to indices from- devices.
 
- 
torch.cuda.comm.reduce_add(inputs, destination=None)¶
- Sums tensors from multiple GPUs. - All inputs should have matching shapes. 
- 
torch.cuda.comm.scatter(tensor, devices, chunk_sizes=None, dim=0, streams=None)¶
- Scatters tensor across multiple GPUs. - Parameters
- tensor (Tensor) – tensor to scatter. 
- devices (Iterable[int]) – iterable of ints, specifying among which devices the tensor should be scattered. 
- chunk_sizes (Iterable[int], optional) – sizes of chunks to be placed on each device. It should match - devicesin length and sum to- tensor.size(dim). If not specified, the tensor will be divided into equal chunks.
- dim (int, optional) – A dimension along which to chunk the tensor. 
 
- Returns
- A tuple containing chunks of the - tensor, spread across given- devices.
 
- 
torch.cuda.comm.gather(tensors, dim=0, destination=None)¶
- Gathers tensors from multiple GPUs. - Tensor sizes in all dimension different than - dimhave to match.- Parameters
- Returns
- A tensor located on - destinationdevice, that is a result of concatenating- tensorsalong- dim.
 
Streams and events¶
- 
class torch.cuda.Stream¶
- Wrapper around a CUDA stream. - A CUDA stream is a linear sequence of execution that belongs to a specific device, independent from other streams. See cuda-semantics for details. - Parameters
- device (torch.device or int, optional) – a device on which to allocate the stream. If - deviceis- None(default) or a negative integer, this will use the current device.
- priority (int, optional) – priority of the stream. Lower numbers represent higher priorities. 
 
 - 
query()¶
- Checks if all the work submitted has been completed. - Returns
- A boolean indicating if all kernels in this stream are completed. 
 
 - 
record_event(event=None)¶
- Records an event. - Parameters
- event (Event, optional) – event to record. If not given, a new one will be allocated. 
- Returns
- Recorded event. 
 
 - 
synchronize()¶
- Wait for all the kernels in this stream to complete. - Note - This is a wrapper around - cudaStreamSynchronize(): see `CUDA documentation`_ for more info.
 - 
wait_event(event)¶
- Makes all future work submitted to the stream wait for an event. - Parameters
- event (Event) – an event to wait for. 
 - Note - This is a wrapper around - cudaStreamWaitEvent(): see `CUDA documentation`_ for more info.- This function returns without waiting for - event: only future operations are affected.
 - 
wait_stream(stream)¶
- Synchronizes with another stream. - All future work submitted to this stream will wait until all kernels submitted to a given stream at the time of call complete. - Parameters
- stream (Stream) – a stream to synchronize. 
 - Note - This function returns without waiting for currently enqueued kernels in - stream: only future operations are affected.
 
- 
class torch.cuda.Event¶
- Wrapper around a CUDA event. - CUDA events are synchronization markers that can be used to monitor the device’s progress, to accurately measure timing, and to synchronize CUDA streams. - The underlying CUDA events are lazily initialized when the event is first recorded or exported to another process. After creation, only streams on the same device may record the event. However, streams on any device can wait on the event. - Parameters
 - 
elapsed_time(end_event)¶
- Returns the time elapsed in milliseconds after the event was recorded and before the end_event was recorded. 
 - 
classmethod from_ipc_handle(device, handle)¶
- Reconstruct an event from an IPC handle on the given device. 
 - 
ipc_handle()¶
- Returns an IPC handle of this event. If not recorded yet, the event will use the current device. 
 - 
query()¶
- Checks if all work currently captured by event has completed. - Returns
- A boolean indicating if all work currently captured by event has completed. 
 
 - 
record(stream=None)¶
- Records the event in a given stream. - Uses - torch.cuda.current_stream()if no stream is specified. The stream’s device must match the event’s device.
 - 
synchronize()¶
- Waits for the event to complete. - Waits until the completion of all work currently captured in this event. This prevents the CPU thread from proceeding until the event completes. - Note - This is a wrapper around - cudaEventSynchronize(): see `CUDA documentation`_ for more info.
 - 
wait(stream=None)¶
- Makes all future work submitted to the given stream wait for this event. - Use - torch.cuda.current_stream()if no stream is specified.
 
Memory management¶
- 
torch.cuda.empty_cache()
- Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in nvidia-smi. - Note - empty_cache()doesn’t increase the amount of GPU memory available for PyTorch. See cuda-memory-management for more details about GPU memory management.
- 
torch.cuda.memory_allocated(device=None)
- Returns the current GPU memory occupied by tensors in bytes for a given device. - Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - This is likely less than the amount shown in nvidia-smi since some unused memory can be held by the caching allocator and some context needs to be created on GPU. See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.max_memory_allocated(device=None)
- Returns the maximum GPU memory occupied by tensors in bytes for a given device. - By default, this returns the peak allocated memory since the beginning of this program. - reset_max_memory_allocated()can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak allocated memory usage of each iteration in a training loop.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.reset_max_memory_allocated(device=None)
- Resets the starting point in tracking maximum GPU memory occupied by tensors for a given device. - See - max_memory_allocated()for details.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.memory_cached(device=None)
- Returns the current GPU memory managed by the caching allocator in bytes for a given device. - Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.max_memory_cached(device=None)
- Returns the maximum GPU memory managed by the caching allocator in bytes for a given device. - By default, this returns the peak cached memory since the beginning of this program. - reset_max_memory_cached()can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak cached memory amount of each iteration in a training loop.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
- 
torch.cuda.reset_max_memory_cached(device=None)
- Resets the starting point in tracking maximum GPU memory managed by the caching allocator for a given device. - See - max_memory_cached()for details.- Parameters
- device (torch.device or int, optional) – selected device. Returns statistic for the current device, given by - current_device(), if- deviceis- None(default).
 - Note - See cuda-memory-management for more details about GPU memory management. 
NVIDIA Tools Extension (NVTX)¶
- 
torch.cuda.nvtx.mark(msg)¶
- Describe an instantaneous event that occurred at some point. - Parameters
- msg (string) – ASCII message to associate with the event. 
 
- 
torch.cuda.nvtx.range_push(msg)¶
- Pushes a range onto a stack of nested range span. Returns zero-based depth of the range that is started. - Parameters
- msg (string) – ASCII message to associate with range 
 
- 
torch.cuda.nvtx.range_pop()¶
- Pops a range off of a stack of nested range spans. Returns the zero-based depth of the range that is ended. 
torch.Storage¶
A torch.Storage is a contiguous, one-dimensional array of a single
data type.
Every torch.Tensor has a corresponding storage of the same data type.
- 
class torch.FloatStorage¶
- 
bool()¶
- Casts this storage to bool type 
 - 
byte()¶
- Casts this storage to byte type 
 - 
char()¶
- Casts this storage to char type 
 - 
clone()¶
- Returns a copy of this storage 
 - 
copy_()¶
 - 
cpu()¶
- Returns a CPU copy of this storage if it’s not already on the CPU 
 - 
cuda(device=None, non_blocking=False, **kwargs)¶
- Returns a copy of this object in CUDA memory. - If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. - Parameters
- device (int) – The destination GPU id. Defaults to the current device. 
- non_blocking (bool) – If - Trueand the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect.
- **kwargs – For compatibility, may contain the key - asyncin place of the- non_blockingargument.
 
 
 - 
data_ptr()¶
 - 
double()¶
- Casts this storage to double type 
 - 
element_size()¶
 - 
fill_()¶
 - 
float()¶
- Casts this storage to float type 
 - 
static from_buffer()¶
 - 
static from_file(filename, shared=False, size=0) → Storage¶
- If shared is True, then memory is shared between all processes. All changes are written to the file. If shared is False, then the changes on the storage do not affect the file. - size is the number of elements in the storage. If shared is False, then the file must contain at least size * sizeof(Type) bytes (Type is the type of storage). If shared is True the file will be created if needed. 
 - 
half()¶
- Casts this storage to half type 
 - 
int()¶
- Casts this storage to int type 
 - 
is_cuda= False¶
 - 
is_pinned()¶
 - 
is_sparse= False¶
 - 
long()¶
- Casts this storage to long type 
 - 
new()¶
 - 
pin_memory()¶
- Copies the storage to pinned memory, if it’s not already pinned. 
 - 
resize_()¶
 - Moves the storage to shared memory. - This is a no-op for storages already in shared memory and for CUDA storages, which do not need to be moved for sharing across processes. Storages in shared memory cannot be resized. - Returns: self 
 - 
short()¶
- Casts this storage to short type 
 - 
size()¶
 - 
tolist()¶
- Returns a list containing the elements of this storage 
 - 
type(dtype=None, non_blocking=False, **kwargs)¶
- Returns the type if dtype is not provided, else casts this object to the specified type. - If this is already of the correct type, no copy is performed and the original object is returned. - Parameters
- dtype (type or string) – The desired type 
- non_blocking (bool) – If - True, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.
- **kwargs – For compatibility, may contain the key - asyncin place of the- non_blockingargument. The- asyncarg is deprecated.
 
 
 
- 
torch.nn¶
Parameters¶
- 
class torch.nn.Parameter¶
- A kind of Tensor that is to be considered a module parameter. - Parameters are - Tensorsubclasses, that have a very special property when used with- Modules - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in- parameters()iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as- Parameter, these temporaries would get registered too.
Containers¶
Module¶
- 
class torch.nn.Module¶
- Base class for all neural network modules. - Your models should also subclass this class. - Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes: - import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x)) - Submodules assigned in this way will be registered, and will have their parameters converted too when you call - to(), etc.- 
add_module(name, module)¶
- Adds a child module to the current module. - The module can be accessed as an attribute using the given name. - Parameters
- name (string) – name of the child module. The child module can be accessed from this module using the given name 
- parameter (Module) – child module to be added to the module. 
 
 
 - 
apply(fn)¶
- Applies - fnrecursively to every submodule (as returned by- .children()) as well as self. Typical use includes initializing the parameters of a model (see also torch-nn-init).- Parameters
- fn ( - Module-> None) – function to be applied to each submodule
- Returns
- self 
- Return type
 - Example: - >>> def init_weights(m): print(m) if type(m) == nn.Linear: m.weight.data.fill_(1.0) print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 
 - 
buffers(recurse=True)¶
- Returns an iterator over module buffers. - Parameters
- recurse (bool) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. 
- Yields
- torch.Tensor – module buffer 
 - Example: - >>> for buf in model.buffers(): >>> print(type(buf.data), buf.size()) <class 'torch.FloatTensor'> (20L,) <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L) 
 - 
children()¶
- Returns an iterator over immediate children modules. - Yields
- Module – a child module 
 
 - 
cuda(device=None)¶
- Moves all model parameters and buffers to the GPU. - This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. 
 - 
double()¶
- Casts all floating point parameters and buffers to - doubledatatype.- Returns
- self 
- Return type
 
 - 
dump_patches= False¶
- This allows better BC support for - load_state_dict(). In- state_dict(), the version number will be saved as in the attribute _metadata of the returned state dict, and thus pickled. _metadata is a dictionary with keys that follow the naming convention of state dict. See- _load_from_state_dicton how to use this information in loading.- If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s _load_from_state_dict method can compare the version number and do appropriate changes if the state dict is from before the change. 
 - 
eval()¶
- Sets the module in evaluation mode. - This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. - Dropout,- BatchNorm, etc.
 - 
extra_repr()¶
- Set the extra representation of the module - To print customized extra information, you should reimplement this method in your own modules. Both single-line and multi-line strings are acceptable. 
 - 
float()¶
- Casts all floating point parameters and buffers to float datatype. - Returns
- self 
- Return type
 
 - 
forward(*input)¶
- Defines the computation performed at every call. - Should be overridden by all subclasses. - Note - Although the recipe for forward pass needs to be defined within this function, one should call the - Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
 - 
half()¶
- Casts all floating point parameters and buffers to - halfdatatype.- Returns
- self 
- Return type
 
 - 
load_state_dict(state_dict, strict=True)¶
- Copies parameters and buffers from - state_dictinto this module and its descendants. If- strictis- True, then the keys of- state_dictmust exactly match the keys returned by this module’s- state_dict()function.- Parameters
- state_dict (dict) – a dict containing parameters and persistent buffers. 
- strict (bool, optional) – whether to strictly enforce that the keys in - state_dictmatch the keys returned by this module’s- state_dict()function. Default:- True
 
 
 - 
modules()¶
- Returns an iterator over all modules in the network. - Yields
- Module – a module in the network 
 - Note - Duplicate modules are returned only once. In the following example, - lwill be returned only once.- Example: - >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) 
 - 
named_buffers(prefix='', recurse=True)¶
- Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. - Parameters
- Yields
- (string, torch.Tensor) – Tuple containing the name and buffer 
 - Example: - >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) 
 - 
named_children()¶
- Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. - Yields
- (string, Module) – Tuple containing a name and child module 
 - Example: - >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) 
 - 
named_modules(memo=None, prefix='')¶
- Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. - Yields
- (string, Module) – Tuple of name and module 
 - Note - Duplicate modules are returned only once. In the following example, - lwill be returned only once.- Example: - >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) 
 - 
named_parameters(prefix='', recurse=True)¶
- Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. - Parameters
- Yields
- (string, Parameter) – Tuple containing the name and parameter 
 - Example: - >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) 
 - 
parameters(recurse=True)¶
- Returns an iterator over module parameters. - This is typically passed to an optimizer. - Parameters
- recurse (bool) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. 
- Yields
- Parameter – module parameter 
 - Example: - >>> for param in model.parameters(): >>> print(type(param.data), param.size()) <class 'torch.FloatTensor'> (20L,) <class 'torch.FloatTensor'> (20L, 1L, 5L, 5L) 
 - 
register_backward_hook(hook)¶
- Registers a backward hook on the module. - The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: - hook(module, grad_input, grad_output) -> Tensor or None - The - grad_inputand- grad_outputmay be tuples if the module has multiple inputs or outputs. The hook should not modify its arguments, but it can optionally return a new gradient with respect to input that will be used in place of- grad_inputin subsequent computations.- Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 - Warning - The current implementation will not have the presented behavior for complex - Modulethat perform many operations. In some failure cases,- grad_inputand- grad_outputwill only contain the gradients for a subset of the inputs and outputs. For such- Module, you should use- torch.Tensor.register_hook()directly on a specific input or output to get the required gradients.
 - 
register_buffer(name, tensor)¶
- Adds a persistent buffer to the module. - This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s - running_meanis not a parameter, but is part of the persistent state.- Buffers can be accessed as attributes using given names. - Parameters
- name (string) – name of the buffer. The buffer can be accessed from this module using the given name 
- tensor (Tensor) – buffer to be registered. 
 
 - Example: - >>> self.register_buffer('running_mean', torch.zeros(num_features)) 
 - 
register_forward_hook(hook)¶
- Registers a forward hook on the module. - The hook will be called every time after - forward()has computed an output. It should have the following signature:- hook(module, input, output) -> None - The hook should not modify the input or output. - Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 
 - 
register_forward_pre_hook(hook)¶
- Registers a forward pre-hook on the module. - The hook will be called every time before - forward()is invoked. It should have the following signature:- hook(module, input) -> None - The hook should not modify the input. - Returns
- a handle that can be used to remove the added hook by calling - handle.remove()
- Return type
- torch.utils.hooks.RemovableHandle
 
 - 
register_parameter(name, param)¶
- Adds a parameter to the module. - The parameter can be accessed as an attribute using given name. - Parameters
- name (string) – name of the parameter. The parameter can be accessed from this module using the given name 
- parameter (Parameter) – parameter to be added to the module. 
 
 
 - 
state_dict(destination=None, prefix='', keep_vars=False)¶
- Returns a dictionary containing a whole state of the module. - Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. - Returns
- a dictionary containing a whole state of the module 
- Return type
 - Example: - >>> module.state_dict().keys() ['bias', 'weight'] 
 - 
to(*args, **kwargs)¶
- Moves and/or casts the parameters and buffers. - This can be called as - 
to(device=None, dtype=None, non_blocking=False)
 - 
to(dtype, non_blocking=False)
 - 
to(tensor, non_blocking=False)
 - Its signature is similar to - torch.Tensor.to(), but only accepts floating point desired- dtypes. In addition, this method will only cast the floating point parameters and buffers to- dtype(if given). The integral parameters and buffers will be moved- device, if that is given, but with dtypes unchanged. When- non_blockingis set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.- See below for examples. - Note - This method modifies the module in-place. - Parameters
- device ( - torch.device) – the desired device of the parameters and buffers in this module
- dtype ( - torch.dtype) – the desired floating point type of the floating point parameters and buffers in this module
- tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module 
 
- Returns
- self 
- Return type
 - Example: - >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) 
- 
 - 
train(mode=True)¶
- Sets the module in training mode. - This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. - Dropout,- BatchNorm, etc.- Returns
- self 
- Return type
 
 - 
type(dst_type)¶
- Casts all parameters and buffers to - dst_type.
 - 
zero_grad()¶
- Sets gradients of all model parameters to zero. 
 
- 
Sequential¶
- 
class torch.nn.Sequential(*args)¶
- A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in. - To make it easier to understand, here is a small example: - # Example of using Sequential model = nn.Sequential( nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU() ) # Example of using Sequential with OrderedDict model = nn.Sequential(OrderedDict([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU()) ])) - Shape:
- Input: \((*)\) where * means, any number of additional dimensions 
- Output: \((*)\), same shape as the input 
 
 
ModuleList¶
- 
class torch.nn.ModuleList(modules=None)¶
- Holds submodules in a list. - ModuleListcan be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by all- Modulemethods.- Parameters
- modules (iterable, optional) – an iterable of modules to add 
 - Example: - class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)]) def forward(self, x): # ModuleList can act as an iterable, or be indexed using ints for i, l in enumerate(self.linears): x = self.linears[i // 2](x) + l(x) return x - 
append(module)¶
- Appends a given module to the end of the list. - Parameters
- module (nn.Module) – module to append 
 
 - 
extend(modules)¶
- Appends modules from a Python iterable to the end of the list. - Parameters
- modules (iterable) – iterable of modules to append 
 
 
ModuleDict¶
- 
class torch.nn.ModuleDict(modules=None)¶
- Holds submodules in a dictionary. - ModuleDictcan be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by all- Modulemethods.- ModuleDictis an ordered dictionary that respects- the order of insertion, and 
- in - update(), the order of the merged- OrderedDictor another- ModuleDict(the argument to- update()).
 - Note that - update()with other unordered mapping types (e.g., Python’s plain- dict) does not preserve the order of the merged mapping.- Parameters
- modules (iterable, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module) 
 - Example: - class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.choices = nn.ModuleDict({ 'conv': nn.Conv2d(10, 10, 3), 'pool': nn.MaxPool2d(3) }) self.activations = nn.ModuleDict([ ['lrelu', nn.LeakyReLU()], ['prelu', nn.PReLU()] ]) def forward(self, x, choice, act): x = self.choices[choice](x) x = self.activations[act](x) return x - 
clear()¶
- Remove all items from the ModuleDict. 
 - 
items()¶
- Return an iterable of the ModuleDict key/value pairs. 
 - 
keys()¶
- Return an iterable of the ModuleDict keys. 
 - 
pop(key)¶
- Remove key from the ModuleDict and return its module. - Parameters
- key (string) – key to pop from the ModuleDict 
 
 - 
update(modules)¶
- Update the - ModuleDictwith the key-value pairs from a mapping or an iterable, overwriting existing keys.- Note - If - modulesis an- OrderedDict, a- ModuleDict, or an iterable of key-value pairs, the order of new elements in it is preserved.
 - 
values()¶
- Return an iterable of the ModuleDict values. 
 
ParameterList¶
- 
class torch.nn.ParameterList(parameters=None)¶
- Holds parameters in a list. - ParameterListcan be indexed like a regular Python list, but parameters it contains are properly registered, and will be visible by all- Modulemethods.- Parameters
- parameters (iterable, optional) – an iterable of - Parameterto add
 - Example: - class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)]) def forward(self, x): # ParameterList can act as an iterable, or be indexed using ints for i, p in enumerate(self.params): x = self.params[i // 2].mm(x) + p.mm(x) return x - 
append(parameter)¶
- Appends a given parameter at the end of the list. - Parameters
- parameter (nn.Parameter) – parameter to append 
 
 - 
extend(parameters)¶
- Appends parameters from a Python iterable to the end of the list. - Parameters
- parameters (iterable) – iterable of parameters to append 
 
 
ParameterDict¶
- 
class torch.nn.ParameterDict(parameters=None)¶
- Holds parameters in a dictionary. - ParameterDict can be indexed like a regular Python dictionary, but parameters it contains are properly registered, and will be visible by all Module methods. - ParameterDictis an ordered dictionary that respects- the order of insertion, and 
- in - update(), the order of the merged- OrderedDictor another- ParameterDict(the argument to- update()).
 - Note that - update()with other unordered mapping types (e.g., Python’s plain- dict) does not preserve the order of the merged mapping.- Parameters
- parameters (iterable, optional) – a mapping (dictionary) of (string : - Parameter) or an iterable of key-value pairs of type (string,- Parameter)
 - Example: - class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.params = nn.ParameterDict({ 'left': nn.Parameter(torch.randn(5, 10)), 'right': nn.Parameter(torch.randn(5, 10)) }) def forward(self, x, choice): x = self.params[choice].mm(x) return x - 
clear()¶
- Remove all items from the ParameterDict. 
 - 
items()¶
- Return an iterable of the ParameterDict key/value pairs. 
 - 
keys()¶
- Return an iterable of the ParameterDict keys. 
 - 
pop(key)¶
- Remove key from the ParameterDict and return its parameter. - Parameters
- key (string) – key to pop from the ParameterDict 
 
 - 
update(parameters)¶
- Update the - ParameterDictwith the key-value pairs from a mapping or an iterable, overwriting existing keys.- Note - If - parametersis an- OrderedDict, a- ParameterDict, or an iterable of key-value pairs, the order of new elements in it is preserved.
 - 
values()¶
- Return an iterable of the ParameterDict values. 
 
Convolution layers¶
Conv1d¶
- 
class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')¶
- Applies a 1D convolution over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, L)\) and output \((N, C_{\text{out}}, L_{\text{out}})\) can be precisely described as: \[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) \]- where \(\star\) is the valid cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(L\) is a length of signal sequence. - stridecontrols the stride for the cross-correlation, a single number or a one-element tuple.
- paddingcontrols the amount of implicit zero-paddings on both sides for- paddingnumber of points.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters, of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\).
 
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also termed in literature as depthwise convolution. - In other words, for an input of size \((N, C_{in}, L_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((C_\text{in}=C_{in}, C_\text{out}=C_{in} \times K, ..., \text{groups}=C_{in})\). - Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0 
- padding_mode (string, optional) – zeros 
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
 
 - Shape:
- Input: \((N, C_{in}, L_{in})\) 
- Output: \((N, C_{out}, L_{out})\) where \[L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor \]
 
 - Variables
- ~Conv1d.weight (Tensor) – the learnable weights of the module of shape \((\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}}, \text{kernel\_size})\). The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \text{kernel\_size}}\) 
- ~Conv1d.bias (Tensor) – the learnable bias of the module of shape (out_channels). If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \text{kernel\_size}}\)
 
 - Examples: - >>> m = nn.Conv1d(16, 33, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) 
Conv2d¶
- 
class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')¶
- Applies a 2D convolution over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, H, W)\) and output \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) can be precisely described as: \[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) \]- where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels. - stridecontrols the stride for the cross-correlation, a single number or a tuple.
- paddingcontrols the amount of implicit zero-paddings on both sides for- paddingnumber of points for each dimension.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters, of size: \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\).
 
 - The parameters - kernel_size,- stride,- padding,- dilationcan either be:- a single - int– in which case the same value is used for the height and width dimension
- a - tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also termed in literature as depthwise convolution. - In other words, for an input of size \((N, C_{in}, H_{in}, W_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\). - Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0 
- padding_mode (string, optional) – zeros 
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
 
 - Shape:
- Input: \((N, C_{in}, H_{in}, W_{in})\) 
- Output: \((N, C_{out}, H_{out}, W_{out})\) where \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor \]
 
 - Variables
- ~Conv2d.weight (Tensor) – - the learnable weights of the module of shape :math:`(text{out_channels}, frac{text{in_channels}}{text{groups}}, - text{kernel_size[0]}, text{kernel_size[1]})`. - The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\) 
- ~Conv2d.bias (Tensor) – the learnable bias of the module of shape (out_channels). If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
 
 - Examples: - >>> # With square kernels and equal stride >>> m = nn.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) 
Conv3d¶
- 
class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')¶
- Applies a 3D convolution over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C_{in}, D, H, W)\) and output \((N, C_{out}, D_{out}, H_{out}, W_{out})\) can be precisely described as: \[out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k) \]- where \(\star\) is the valid 3D cross-correlation operator - stridecontrols the stride for the cross-correlation.
- paddingcontrols the amount of implicit zero-paddings on both sides for- paddingnumber of points for each dimension.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters, of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\).
 
 - The parameters - kernel_size,- stride,- padding,- dilationcan either be:- a single - int– in which case the same value is used for the depth, height and width dimension
- a - tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - When groups == in_channels and out_channels == K * in_channels, where K is a positive integer, this operation is also termed in literature as depthwise convolution. - In other words, for an input of size \((N, C_{in}, D_{in}, H_{in}, W_{in})\), a depthwise convolution with a depthwise multiplier K, can be constructed by arguments \((in\_channels=C_{in}, out\_channels=C_{in} \times K, ..., groups=C_{in})\). - Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – Zero-padding added to all three sides of the input. Default: 0 
- padding_mode (string, optional) – zeros 
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
 
 - Shape:
- Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) where \[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor \]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor \]
 
 - Variables
- ~Conv3d.weight (Tensor) – - the learnable weights of the module of shape :math:`(text{out_channels}, frac{text{in_channels}}{text{groups}}, - text{kernel_size[0]}, text{kernel_size[1]}, text{kernel_size[2]})`. - The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\) 
- ~Conv3d.bias (Tensor) – the learnable bias of the module of shape (out_channels). If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
 
 - Examples: - >>> # With square kernels and equal stride >>> m = nn.Conv3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input) 
ConvTranspose1d¶
- 
class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')¶
- Applies a 1D transposed convolution operator over an input image composed of several input planes. - This module can be seen as the gradient of Conv1d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). - stridecontrols the stride for the cross-correlation.
- paddingcontrols the amount of implicit zero-paddings on both sides for- dilation * (kernel_size - 1) - paddingnumber of points. See note below for details.
- output_paddingcontrols the additional size added to one side of the output shape. See note below for details.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters (of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\)).
 
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - The - paddingargument effectively adds- dilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when a- Conv1dand a- ConvTranspose1dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when- stride > 1,- Conv1dmaps multiple input shapes to the same output shape.- output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that- output_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of the input. Default: 0
- output_padding (int or tuple, optional) – Additional size added to one side of the output shape. Default: 0 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
 
 - Shape:
- Input: \((N, C_{in}, L_{in})\) 
- Output: \((N, C_{out}, L_{out})\) where \[L_{out} = (L_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel\_size} - 1) + \text{output\_padding} + 1 \]
 
 - Variables
- ~ConvTranspose1d.weight (Tensor) – - the learnable weights of the module of shape :math:`(text{in_channels}, frac{text{out_channels}}{text{groups}}, - text{kernel_size})`. The values of these weights are sampled from - \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \text{kernel\_size}}\) 
- ~ConvTranspose1d.bias (Tensor) – the learnable bias of the module of shape (out_channels). If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \text{kernel\_size}}\)
 
 
ConvTranspose2d¶
- 
class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')¶
- Applies a 2D transposed convolution operator over an input image composed of several input planes. - This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). - stridecontrols the stride for the cross-correlation.
- paddingcontrols the amount of implicit zero-paddings on both sides for- dilation * (kernel_size - 1) - paddingnumber of points. See note below for details.
- output_paddingcontrols the additional size added to one side of the output shape. See note below for details.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters (of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\)).
 
 - The parameters - kernel_size,- stride,- padding,- output_paddingcan either be:- a single - int– in which case the same value is used for the height and width dimensions
- a - tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - The - paddingargument effectively adds- dilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when a- Conv2dand a- ConvTranspose2dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when- stride > 1,- Conv2dmaps multiple input shapes to the same output shape.- output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that- output_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
 
 - Shape:
- Input: \((N, C_{in}, H_{in}, W_{in})\) 
- Output: \((N, C_{out}, H_{out}, W_{out})\) where 
 \[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1 \]\[W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1 \]
 - Variables
- ~ConvTranspose2d.weight (Tensor) – - the learnable weights of the module of shape :math:`(text{in_channels}, frac{text{out_channels}}{text{groups}}, - text{kernel_size[0]}, text{kernel_size[1]})`. - The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\) 
- ~ConvTranspose2d.bias (Tensor) – the learnable bias of the module of shape (out_channels) If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}\)
 
 - Examples: - >>> # With square kernels and equal stride >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) >>> # exact output size can be also specified as an argument >>> input = torch.randn(1, 16, 12, 12) >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1) >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1) >>> h = downsample(input) >>> h.size() torch.Size([1, 16, 6, 6]) >>> output = upsample(h, output_size=input.size()) >>> output.size() torch.Size([1, 16, 12, 12]) 
ConvTranspose3d¶
- 
class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')¶
- Applies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. - This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). - stridecontrols the stride for the cross-correlation.
- paddingcontrols the amount of implicit zero-paddings on both sides for- dilation * (kernel_size - 1) - paddingnumber of points. See note below for details.
- output_paddingcontrols the additional size added to one side of the output shape. See note below for details.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
- groupscontrols the connections between inputs and outputs.- in_channelsand- out_channelsmust both be divisible by- groups. For example,- At groups=1, all inputs are convolved to all outputs. 
- At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated. 
- At groups= - in_channels, each input channel is convolved with its own set of filters (of size \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\)).
 
 - The parameters - kernel_size,- stride,- padding,- output_paddingcan either be:- a single - int– in which case the same value is used for the depth, height and width dimensions
- a - tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
 - Note - Depending of the size of your kernel, several (of the last) columns of the input might be lost, because it is a valid cross-correlation, and not a full cross-correlation. It is up to the user to add proper padding. - Note - The - paddingargument effectively adds- dilation * (kernel_size - 1) - paddingamount of zero padding to both sizes of the input. This is set so that when a- Conv3dand a- ConvTranspose3dare initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when- stride > 1,- Conv3dmaps multiple input shapes to the same output shape.- output_paddingis provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that- output_paddingis only used to find output shape, but does not actually add zero-padding to output.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- in_channels (int) – Number of channels in the input image 
- out_channels (int) – Number of channels produced by the convolution 
- stride (int or tuple, optional) – Stride of the convolution. Default: 1 
- padding (int or tuple, optional) – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of each dimension in the input. Default: 0
- output_padding (int or tuple, optional) – Additional size added to one side of each dimension in the output shape. Default: 0 
- groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 
- bias (bool, optional) – If - True, adds a learnable bias to the output. Default:- True
- dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 
 
 - Shape:
- Input: \((N, C_{in}, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C_{out}, D_{out}, H_{out}, W_{out})\) where 
 \[D_{out} = (D_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1 \]\[H_{out} = (H_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1 \]\[W_{out} = (W_{in} - 1) \times \text{stride}[2] - 2 \times \text{padding}[2] + \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) + \text{output\_padding}[2] + 1 \]
 - Variables
- ~ConvTranspose3d.weight (Tensor) – - the learnable weights of the module of shape :math:`(text{in_channels}, frac{text{out_channels}}{text{groups}}, - text{kernel_size[0]}, text{kernel_size[1]}, text{kernel_size[2]})`. - The values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\) 
- ~ConvTranspose3d.bias (Tensor) – the learnable bias of the module of shape (out_channels) If - biasis- True, then the values of these weights are sampled from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\_size}[i]}\)
 
 - Examples: - >>> # With square kernels and equal stride >>> m = nn.ConvTranspose3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input) 
Unfold¶
- 
class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)¶
- Extracts sliding local blocks from a batched input tensor. - Consider an batched - inputtensor of shape \((N, C, *)\), where \(N\) is the batch dimension, \(C\) is the channel dimension, and \(*\) represent arbitrary spatial dimensions. This operation flattens each sliding- kernel_size-sized block within the spatial dimensions of- inputinto a column (i.e., last dimension) of a 3-D- outputtensor of shape \((N, C \times \prod(\text{kernel\_size}), L)\), where \(C \times \prod(\text{kernel\_size})\) is the total number of values with in each block (a block has \(\prod(\text{kernel\_size})\) spatial locations each containing a \(C\)-channeled vector), and \(L\) is the total number of such blocks:\[L = \prod_d \left\lfloor\frac{\text{spatial\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor, \]- where \(\text{spatial\_size}\) is formed by the spatial dimensions of - input(\(*\) above), and \(d\) is over all spatial dimensions.- Therefore, indexing - outputat the last dimension (column dimension) gives all values within a certain block.- The - padding,- strideand- dilationarguments specify how the sliding blocks are retrieved.- stridecontrols the stride for the sliding blocks.
- paddingcontrols the amount of implicit zero-paddings on both sides for- paddingnumber of points for each dimension before reshaping.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
 - Parameters
- stride (int or tuple, optional) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 
- padding (int or tuple, optional) – implicit zero padding to be added on both sides of input. Default: 0 
- dilation (int or tuple, optional) – a parameter that controls the stride of elements within the neighborhood. Default: 1 
 
 - If - kernel_size,- dilation,- paddingor- strideis an int or a tuple of length 1, their values will be replicated across all spatial dimensions.
- For the case of two input spatial dimensions this operation is sometimes called - im2col.
 - Note - Foldcalculates each combined value in the resulting large tensor by summing all values from all containing blocks.- Unfoldextracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other.- Warning - Currently, only 4-D input tensors (batched image-like tensors) are supported. - Shape:
- Input: \((N, C, *)\) 
- Output: \((N, C \times \prod(\text{kernel\_size}), L)\) as described above 
 
 - Examples: - >>> unfold = nn.Unfold(kernel_size=(2, 3)) >>> input = torch.randn(2, 5, 3, 4) >>> output = unfold(input) >>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels) >>> # 4 blocks (2x3 kernels) in total in the 3x4 input >>> output.size() torch.Size([2, 30, 4]) >>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape) >>> inp = torch.randn(1, 3, 10, 12) >>> w = torch.randn(2, 3, 4, 5) >>> inp_unf = torch.nn.functional.unfold(inp, (4, 5)) >>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2) >>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1)) >>> # or equivalently (and avoiding a copy), >>> # out = out_unf.view(1, 2, 7, 8) >>> (torch.nn.functional.conv2d(inp, w) - out).abs().max() tensor(1.9073e-06) 
Fold¶
- 
class torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)¶
- Combines an array of sliding local blocks into a large containing tensor. - Consider a batched - inputtensor containing sliding local blocks, e.g., patches of images, of shape \((N, C \times \prod(\text{kernel\_size}), L)\), where \(N\) is batch dimension, \(C \times \prod(\text{kernel\_size})\) is the number of values with in a block (a block has \(\prod(\text{kernel\_size})\) spatial locations each containing a \(C\)-channeled vector), and \(L\) is the total number of blocks. (This is exacly the same specification as the output shape of- Unfold.) This operation combines these local blocks into the large- outputtensor of shape \((N, C, \text{output\_size}[0], \text{output\_size}[1], \dots)\) by summing the overlapping values. Similar to- Unfold, the arguments must satisfy\[L = \prod_d \left\lfloor\frac{\text{output\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor, \]- where \(d\) is over all spatial dimensions. - output_sizedescribes the spatial shape of the large containing tensor of the sliding local blocks. It is useful to resolve the ambiguity when multiple input shapes map to same number of sliding blocks, e.g., with- stride > 0.
 - The - padding,- strideand- dilationarguments specify how the sliding blocks are retrieved.- stridecontrols the stride for the sliding blocks.
- paddingcontrols the amount of implicit zero-paddings on both sides for- paddingnumber of points for each dimension before reshaping.
- dilationcontrols the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what- dilationdoes.
 - Parameters
- output_size (int or tuple) – the shape of the spatial dimensions of the output (i.e., - output.sizes()[2:])
- stride (int or tuple) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 
- padding (int or tuple, optional) – implicit zero padding to be added on both sides of input. Default: 0 
- dilation (int or tuple, optional) – a parameter that controls the stride of elements within the neighborhood. Default: 1 
 
 - If - output_size,- kernel_size,- dilation,- paddingor- strideis an int or a tuple of length 1 then their values will be replicated across all spatial dimensions.
- For the case of two output spatial dimensions this operation is sometimes called - col2im.
 - Note - Foldcalculates each combined value in the resulting large tensor by summing all values from all containing blocks.- Unfoldextracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other.- Warning - Currently, only 4-D output tensors (batched image-like tensors) are supported. - Shape:
- Input: \((N, C \times \prod(\text{kernel\_size}), L)\) 
- Output: \((N, C, \text{output\_size}[0], \text{output\_size}[1], \dots)\) as described above 
 
 - Examples: - >>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) >>> input = torch.randn(1, 3 * 2 * 2, 12) >>> output = fold(input) >>> output.size() torch.Size([1, 3, 4, 5]) 
Pooling layers¶
MaxPool1d¶
- 
class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)¶
- Applies a 1D max pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, L)\) and output \((N, C, L_{out})\) can be precisely described as: \[out(N_i, C_j, k) = \max_{m=0, \ldots, \text{kernel\_size} - 1} input(N_i, C_j, stride \times k + m) \]- If - paddingis non-zero, then the input is implicitly zero-padded on both sides for- paddingnumber of points.- dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of what- dilationdoes.- Parameters
- kernel_size – the size of the window to take a max over 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on both sides 
- dilation – a parameter that controls the stride of elements in the window 
- return_indices – if - True, will return the max indices along with the outputs. Useful for- torch.nn.MaxUnpool1dlater
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
 
 - Shape:
- Input: \((N, C, L_{in})\) 
- Output: \((N, C, L_{out})\), where \[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor \]
 
 - Examples: - >>> # pool of size=3, stride=2 >>> m = nn.MaxPool1d(3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) 
MaxPool2d¶
- 
class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)¶
- Applies a 2D max pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and - kernel_size\((kH, kW)\) can be precisely described as:\[\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned} \]- If - paddingis non-zero, then the input is implicitly zero-padded on both sides for- paddingnumber of points.- dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of what- dilationdoes.- The parameters - kernel_size,- stride,- padding,- dilationcan either be:- a single - int– in which case the same value is used for the height and width dimension
- a - tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
 - Parameters
- kernel_size – the size of the window to take a max over 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on both sides 
- dilation – a parameter that controls the stride of elements in the window 
- return_indices – if - True, will return the max indices along with the outputs. Useful for- torch.nn.MaxUnpool2dlater
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\), where \[H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor \]
 
 - Examples: - >>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) 
MaxPool3d¶
- 
class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)¶
- Applies a 3D max pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and - kernel_size\((kD, kH, kW)\) can be precisely described as:\[\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \max_{k=0, \ldots, kD-1} \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\ & \text{input}(N_i, C_j, \text{stride[0]} \times d + k, \text{stride[1]} \times h + m, \text{stride[2]} \times w + n) \end{aligned} \]- If - paddingis non-zero, then the input is implicitly zero-padded on both sides for- paddingnumber of points.- dilationcontrols the spacing between the kernel points. It is harder to describe, but this link has a nice visualization of what- dilationdoes.- The parameters - kernel_size,- stride,- padding,- dilationcan either be:- a single - int– in which case the same value is used for the depth, height and width dimension
- a - tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
 - Parameters
- kernel_size – the size of the window to take a max over 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on all three sides 
- dilation – a parameter that controls the stride of elements in the window 
- return_indices – if - True, will return the max indices along with the outputs. Useful for- torch.nn.MaxUnpool3dlater
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
 
 - Shape:
- Input: \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, D_{out}, H_{out}, W_{out})\), where \[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor \]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor \]
 
 - Examples: - >>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50,44, 31) >>> output = m(input) 
MaxUnpool1d¶
- 
class torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)¶
- Computes a partial inverse of - MaxPool1d.- MaxPool1dis not fully invertible, since the non-maximal values are lost.- MaxUnpool1dtakes in as input the output of- MaxPool1dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note - MaxPool1dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument- output_sizein the forward call. See the Inputs and Example below.- Parameters
 - Inputs:
- input: the input Tensor to invert 
- indices: the indices given out by - MaxPool1d
- output_size (optional): the targeted output size 
 
- Shape:
- Input: \((N, C, H_{in})\) 
- Output: \((N, C, H_{out})\), where \[H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{kernel\_size}[0] \]- or as given by - output_sizein the call operator
 
 - Example: - >>> pool = nn.MaxPool1d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool1d(2, stride=2) >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]]) >>> # Example showcasing the use of output_size >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8, 9]]]) >>> output, indices = pool(input) >>> unpool(output, indices, output_size=input.size()) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8., 0.]]]) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]]) 
MaxUnpool2d¶
- 
class torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)¶
- Computes a partial inverse of - MaxPool2d.- MaxPool2dis not fully invertible, since the non-maximal values are lost.- MaxUnpool2dtakes in as input the output of- MaxPool2dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note - MaxPool2dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument- output_sizein the forward call. See the Inputs and Example below.- Parameters
 - Inputs:
- input: the input Tensor to invert 
- indices: the indices given out by - MaxPool2d
- output_size (optional): the targeted output size 
 
- Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\), where \[H_{out} = (H_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\_size[0]} \]\[W_{out} = (W_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\_size[1]} \]- or as given by - output_sizein the call operator
 
 - Example: - >>> pool = nn.MaxPool2d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool2d(2, stride=2) >>> input = torch.tensor([[[[ 1., 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[[ 0., 0., 0., 0.], [ 0., 6., 0., 8.], [ 0., 0., 0., 0.], [ 0., 14., 0., 16.]]]]) >>> # specify a different output size than input size >>> unpool(output, indices, output_size=torch.Size([1, 1, 5, 5])) tensor([[[[ 0., 0., 0., 0., 0.], [ 6., 0., 8., 0., 0.], [ 0., 0., 0., 14., 0.], [ 16., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]]]]) 
MaxUnpool3d¶
- 
class torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)¶
- Computes a partial inverse of - MaxPool3d.- MaxPool3dis not fully invertible, since the non-maximal values are lost.- MaxUnpool3dtakes in as input the output of- MaxPool3dincluding the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero.- Note - MaxPool3dcan map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument- output_sizein the forward call. See the Inputs section below.- Parameters
 - Inputs:
- input: the input Tensor to invert 
- indices: the indices given out by - MaxPool3d
- output_size (optional): the targeted output size 
 
- Shape:
- Input: \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, D_{out}, H_{out}, W_{out})\), where \[D_{out} = (D_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\_size[0]} \]\[H_{out} = (H_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\_size[1]} \]\[W_{out} = (W_{in} - 1) \times \text{stride[2]} - 2 \times \text{padding[2]} + \text{kernel\_size[2]} \]- or as given by - output_sizein the call operator
 
 - Example: - >>> # pool of square window of size=3, stride=2 >>> pool = nn.MaxPool3d(3, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool3d(3, stride=2) >>> output, indices = pool(torch.randn(20, 16, 51, 33, 15)) >>> unpooled_output = unpool(output, indices) >>> unpooled_output.size() torch.Size([20, 16, 51, 33, 15]) 
AvgPool1d¶
- 
class torch.nn.AvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)¶
- Applies a 1D average pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, L)\), output \((N, C, L_{out})\) and - kernel_size\(k\) can be precisely described as:\[\text{out}(N_i, C_j, l) = \frac{1}{k} \sum_{m=0}^{k-1} \text{input}(N_i, C_j, \text{stride} \times l + m)\]- If - paddingis non-zero, then the input is implicitly zero-padded on both sides for- paddingnumber of points.- The parameters - kernel_size,- stride,- paddingcan each be an- intor a one-element tuple.- Parameters
- kernel_size – the size of the window 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on both sides 
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
- count_include_pad – when True, will include the zero-padding in the averaging calculation 
 
 - Shape:
- Input: \((N, C, L_{in})\) 
- Output: \((N, C, L_{out})\), where \[L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{kernel\_size}}{\text{stride}} + 1\right\rfloor \]
 
 - Examples: - >>> # pool with window of size=3, stride=2 >>> m = nn.AvgPool1d(3, stride=2) >>> m(torch.tensor([[[1.,2,3,4,5,6,7]]])) tensor([[[ 2., 4., 6.]]]) 
AvgPool2d¶
- 
class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)¶
- Applies a 2D average pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and - kernel_size\((kH, kW)\) can be precisely described as:\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]- If - paddingis non-zero, then the input is implicitly zero-padded on both sides for- paddingnumber of points.- The parameters - kernel_size,- stride,- paddingcan either be:- a single - int– in which case the same value is used for the height and width dimension
- a - tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
 - Parameters
- kernel_size – the size of the window 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on both sides 
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
- count_include_pad – when True, will include the zero-padding in the averaging calculation 
 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\), where \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor \]
 
 - Examples: - >>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) 
AvgPool3d¶
- 
class torch.nn.AvgPool3d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)¶
- Applies a 3D average pooling over an input signal composed of several input planes. - In the simplest case, the output value of the layer with input size \((N, C, D, H, W)\), output \((N, C, D_{out}, H_{out}, W_{out})\) and - kernel_size\((kD, kH, kW)\) can be precisely described as:\[\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \sum_{k=0}^{kD-1} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} \\ & \frac{\text{input}(N_i, C_j, \text{stride}[0] \times d + k, \text{stride}[1] \times h + m, \text{stride}[2] \times w + n)} {kD \times kH \times kW} \end{aligned} \]- If - paddingis non-zero, then the input is implicitly zero-padded on all three sides for- paddingnumber of points.- The parameters - kernel_size,- stridecan either be:- a single - int– in which case the same value is used for the depth, height and width dimension
- a - tupleof three ints – in which case, the first int is used for the depth dimension, the second int for the height dimension and the third int for the width dimension
 - Parameters
- kernel_size – the size of the window 
- stride – the stride of the window. Default value is - kernel_size
- padding – implicit zero padding to be added on all three sides 
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
- count_include_pad – when True, will include the zero-padding in the averaging calculation 
 
 - Shape:
- Input: \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, D_{out}, H_{out}, W_{out})\), where \[D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor \]\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{kernel\_size}[2]}{\text{stride}[2]} + 1\right\rfloor \]
 
 - Examples: - >>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50,44, 31) >>> output = m(input) 
FractionalMaxPool2d¶
- 
class torch.nn.FractionalMaxPool2d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)¶
- Applies a 2D fractional max pooling over an input signal composed of several input planes. - Fractional MaxPooling is described in detail in the paper Fractional MaxPooling by Ben Graham - The max-pooling operation is applied in \(kH \times kW\) regions by a stochastic step size determined by the target output size. The number of output features is equal to the number of input planes. - Parameters
- kernel_size – the size of the window to take a max over. Can be a single number k (for a square kernel of k x k) or a tuple (kh, kw) 
- output_size – the target output size of the image of the form oH x oW. Can be a tuple (oH, oW) or a single number oH for a square image oH x oH 
- output_ratio – If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) 
- return_indices – if - True, will return the indices along with the outputs. Useful to pass to- nn.MaxUnpool2d(). Default:- False
 
 - Examples - >>> # pool of square window of size=3, and target output size 13x12 >>> m = nn.FractionalMaxPool2d(3, output_size=(13, 12)) >>> # pool of square window and target output size being half of input image size >>> m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) 
LPPool1d¶
- 
class torch.nn.LPPool1d(norm_type, kernel_size, stride=None, ceil_mode=False)¶
- Applies a 1D power-average pooling over an input signal composed of several input planes. - On each window, the function computed is: \[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}} \]- At p = \(\infty\), one gets Max Pooling 
- At p = 1, one gets Sum Pooling (which is proportional to Average Pooling) 
 - Note - If the sum to the power of p is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case. - Parameters
- kernel_size – a single int, the size of the window 
- stride – a single int, the stride of the window. Default value is - kernel_size
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
 
 - Shape:
- Input: \((N, C, L_{in})\) 
- Output: \((N, C, L_{out})\), where \[L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{kernel\_size}}{\text{stride}} + 1\right\rfloor \]
 
- Examples::
- >>> # power-2 pool of window of length 3, with stride 2. >>> m = nn.LPPool1d(2, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) 
 
LPPool2d¶
- 
class torch.nn.LPPool2d(norm_type, kernel_size, stride=None, ceil_mode=False)¶
- Applies a 2D power-average pooling over an input signal composed of several input planes. - On each window, the function computed is: \[f(X) = \sqrt[p]{\sum_{x \in X} x^{p}} \]- At p = \(\infty\), one gets Max Pooling 
- At p = 1, one gets Sum Pooling (which is proportional to average pooling) 
 - The parameters - kernel_size,- stridecan either be:- a single - int– in which case the same value is used for the height and width dimension
- a - tupleof two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
 - Note - If the sum to the power of p is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case. - Parameters
- kernel_size – the size of the window 
- stride – the stride of the window. Default value is - kernel_size
- ceil_mode – when True, will use ceil instead of floor to compute the output shape 
 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\), where \[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor \]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor \]
 
 - Examples: - >>> # power-2 pool of square window of size=3, stride=2 >>> m = nn.LPPool2d(2, 3, stride=2) >>> # pool of non-square window of power 1.2 >>> m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) 
AdaptiveMaxPool1d¶
- 
class torch.nn.AdaptiveMaxPool1d(output_size, return_indices=False)¶
- Applies a 1D adaptive max pooling over an input signal composed of several input planes. - The output size is H, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size H 
- return_indices – if - True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool1d. Default:- False
 
 - Examples - >>> # target output size of 5 >>> m = nn.AdaptiveMaxPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input) 
AdaptiveMaxPool2d¶
- 
class torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)¶
- Applies a 2D adaptive max pooling over an input signal composed of several input planes. - The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a - int, or- Nonewhich means the size will be the same as that of the input.
- return_indices – if - True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d. Default:- False
 
 - Examples - >>> # target output size of 5x7 >>> m = nn.AdaptiveMaxPool2d((5,7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveMaxPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveMaxPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) 
AdaptiveMaxPool3d¶
- 
class torch.nn.AdaptiveMaxPool3d(output_size, return_indices=False)¶
- Applies a 3D adaptive max pooling over an input signal composed of several input planes. - The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size of the image of the form D x H x W. Can be a tuple (D, H, W) or a single D for a cube D x D x D. D, H and W can be either a - int, or- Nonewhich means the size will be the same as that of the input.
- return_indices – if - True, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool3d. Default:- False
 
 - Examples - >>> # target output size of 5x7x9 >>> m = nn.AdaptiveMaxPool3d((5,7,9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveMaxPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveMaxPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) 
AdaptiveAvgPool1d¶
- 
class torch.nn.AdaptiveAvgPool1d(output_size)¶
- Applies a 1D adaptive average pooling over an input signal composed of several input planes. - The output size is H, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size H 
 - Examples - >>> # target output size of 5 >>> m = nn.AdaptiveAvgPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input) 
AdaptiveAvgPool2d¶
- 
class torch.nn.AdaptiveAvgPool2d(output_size)¶
- Applies a 2D adaptive average pooling over an input signal composed of several input planes. - The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a - int, or- Nonewhich means the size will be the same as that of the input.
 - Examples - >>> # target output size of 5x7 >>> m = nn.AdaptiveAvgPool2d((5,7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveAvgPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveMaxPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) 
AdaptiveAvgPool3d¶
- 
class torch.nn.AdaptiveAvgPool3d(output_size)¶
- Applies a 3D adaptive average pooling over an input signal composed of several input planes. - The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes. - Parameters
- output_size – the target output size of the form D x H x W. Can be a tuple (D, H, W) or a single number D for a cube D x D x D. D, H and W can be either a - int, or- Nonewhich means the size will be the same as that of the input.
 - Examples - >>> # target output size of 5x7x9 >>> m = nn.AdaptiveAvgPool3d((5,7,9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveAvgPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveMaxPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) 
Padding layers¶
ReflectionPad1d¶
- 
class torch.nn.ReflectionPad1d(padding)¶
- Pads the input tensor using the reflection of the input boundary. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\)) 
 - Shape:
- Input: \((N, C, W_{in})\) 
- Output: \((N, C, W_{out})\) where - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ReflectionPad1d(2) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[2., 1., 0., 1., 2., 3., 2., 1.], [6., 5., 4., 5., 6., 7., 6., 5.]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad1d((3, 1)) >>> m(input) tensor([[[3., 2., 1., 0., 1., 2., 3., 2.], [7., 6., 5., 4., 5., 6., 7., 6.]]]) 
ReflectionPad2d¶
- 
class torch.nn.ReflectionPad2d(padding)¶
- Pads the input tensor using the reflection of the input boundary. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ReflectionPad2d(2) >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.], [5., 4., 3., 4., 5., 4., 3.], [8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.]]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[7., 6., 7., 8., 7.], [4., 3., 4., 5., 4.], [1., 0., 1., 2., 1.], [4., 3., 4., 5., 4.], [7., 6., 7., 8., 7.]]]]) 
ReplicationPad1d¶
- 
class torch.nn.ReplicationPad1d(padding)¶
- Pads the input tensor using replication of the input boundary. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\)) 
 - Shape:
- Input: \((N, C, W_{in})\) 
- Output: \((N, C, W_{out})\) where - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ReplicationPad1d(2) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[0., 0., 0., 1., 2., 3., 3., 3.], [4., 4., 4., 5., 6., 7., 7., 7.]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad1d((3, 1)) >>> m(input) tensor([[[0., 0., 0., 0., 1., 2., 3., 3.], [4., 4., 4., 4., 5., 6., 7., 7.]]]) 
ReplicationPad2d¶
- 
class torch.nn.ReplicationPad2d(padding)¶
- Pads the input tensor using replication of the input boundary. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ReplicationPad2d(2) >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [3., 3., 3., 4., 5., 5., 5.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.]]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [3., 3., 4., 5., 5.], [6., 6., 7., 8., 8.]]]]) 
ReplicationPad3d¶
- 
class torch.nn.ReplicationPad3d(padding)¶
- Pads the input tensor using replication of the input boundary. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 6-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\), \(\text{padding\_front}\), \(\text{padding\_back}\)) 
 - Shape:
- Input: \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, D_{out}, H_{out}, W_{out})\) where - \(D_{out} = D_{in} + \text{padding\_front} + \text{padding\_back}\) - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ReplicationPad3d(3) >>> input = torch.randn(16, 3, 8, 320, 480) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ReplicationPad3d((3, 3, 6, 6, 1, 1)) >>> output = m(input) 
ZeroPad2d¶
- 
class torch.nn.ZeroPad2d(padding)¶
- Pads the input tensor boundaries with zero. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ZeroPad2d(2) >>> input = torch.randn(1, 1, 3, 3) >>> input tensor([[[[-0.1678, -0.4418, 1.9466], [ 0.9604, -0.4219, -0.5241], [-0.9162, -0.5436, -0.6446]]]]) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.1678, -0.4418, 1.9466, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.9604, -0.4219, -0.5241, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.9162, -0.5436, -0.6446, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> # using different paddings for different sides >>> m = nn.ZeroPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, -0.1678, -0.4418, 1.9466, 0.0000], [ 0.0000, 0.9604, -0.4219, -0.5241, 0.0000], [ 0.0000, -0.9162, -0.5436, -0.6446, 0.0000]]]]) 
ConstantPad1d¶
- 
class torch.nn.ConstantPad1d(padding, value)¶
- Pads the input tensor boundaries with a constant value. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in both boundaries. If a 2-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\)) 
 - Shape:
- Input: \((N, C, W_{in})\) 
- Output: \((N, C, W_{out})\) where - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 4) >>> input tensor([[[-1.0491, -0.7152, -0.0749, 0.8530], [-1.3287, 1.8966, 0.1466, -0.2771]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, -1.0491, -0.7152, -0.0749, 0.8530, 3.5000, 3.5000], [ 3.5000, 3.5000, -1.3287, 1.8966, 0.1466, -0.2771, 3.5000, 3.5000]]]) >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 3) >>> input tensor([[[ 1.6616, 1.4523, -1.1255], [-3.6372, 0.1182, -1.8652]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000, 3.5000], [ 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad1d((3, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000], [ 3.5000, 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000]]]) 
ConstantPad2d¶
- 
class torch.nn.ConstantPad2d(padding, value)¶
- Pads the input tensor boundaries with a constant value. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 4-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\)) 
 - Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ConstantPad2d(2, 3.5) >>> input = torch.randn(1, 2, 2) >>> input tensor([[[ 1.6585, 0.4320], [-0.8701, -0.4649]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 1.6585, 0.4320, 3.5000, 3.5000], [ 3.5000, 3.5000, -0.8701, -0.4649, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad2d((3, 0, 2, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 1.6585, 0.4320], [ 3.5000, 3.5000, 3.5000, -0.8701, -0.4649], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]]) 
ConstantPad3d¶
- 
class torch.nn.ConstantPad3d(padding, value)¶
- Pads the input tensor boundaries with a constant value. - For N-dimensional padding, use - torch.nn.functional.pad().- Parameters
- padding (int, tuple) – the size of the padding. If is int, uses the same padding in all boundaries. If a 6-tuple, uses (\(\text{padding\_left}\), \(\text{padding\_right}\), \(\text{padding\_top}\), \(\text{padding\_bottom}\), \(\text{padding\_front}\), \(\text{padding\_back}\)) 
 - Shape:
- Input: \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, D_{out}, H_{out}, W_{out})\) where - \(D_{out} = D_{in} + \text{padding\_front} + \text{padding\_back}\) - \(H_{out} = H_{in} + \text{padding\_top} + \text{padding\_bottom}\) - \(W_{out} = W_{in} + \text{padding\_left} + \text{padding\_right}\) 
 
 - Examples: - >>> m = nn.ConstantPad3d(3, 3.5) >>> input = torch.randn(16, 3, 10, 20, 30) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ConstantPad3d((3, 3, 6, 6, 0, 1), 3.5) >>> output = m(input) 
Non-linear activations (weighted sum, nonlinearity)¶
ELU¶
- 
class torch.nn.ELU(alpha=1.0, inplace=False)¶
- Applies the element-wise function: \[\text{ELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x) - 1)) \]- Parameters
- alpha – the \(\alpha\) value for the ELU formulation. Default: 1.0 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.ELU() >>> input = torch.randn(2) >>> output = m(input) 
Hardshrink¶
- 
class torch.nn.Hardshrink(lambd=0.5)¶
- Applies the hard shrinkage function element-wise: \[\text{HardShrink}(x) = \begin{cases} x, & \text{ if } x > \lambda \\ x, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases} \]- Parameters
- lambd – the \(\lambda\) value for the Hardshrink formulation. Default: 0.5 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Hardshrink() >>> input = torch.randn(2) >>> output = m(input) 
Hardtanh¶
- 
class torch.nn.Hardtanh(min_val=-1.0, max_val=1.0, inplace=False, min_value=None, max_value=None)¶
- Applies the HardTanh function element-wise - HardTanh is defined as: \[\text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\ -1 & \text{ if } x < -1 \\ x & \text{ otherwise } \\ \end{cases} \]- The range of the linear region \([-1, 1]\) can be adjusted using - min_valand- max_val.- Parameters
- min_val – minimum value of the linear region range. Default: -1 
- max_val – maximum value of the linear region range. Default: 1 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Keyword arguments - min_valueand- max_valuehave been deprecated in favor of- min_valand- max_val.- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Hardtanh(-2, 2) >>> input = torch.randn(2) >>> output = m(input) 
LeakyReLU¶
- 
class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)¶
- Applies the element-wise function: \[\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x) \]- or \[\text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ \text{negative\_slope} \times x, & \text{ otherwise } \end{cases} \]- Parameters
- negative_slope – Controls the angle of the negative slope. Default: 1e-2 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.LeakyReLU(0.1) >>> input = torch.randn(2) >>> output = m(input) 
LogSigmoid¶
- 
class torch.nn.LogSigmoid¶
- Applies the element-wise function: \[\text{LogSigmoid}(x) = \log\left(\frac{ 1 }{ 1 + \exp(-x)}\right) \]- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.LogSigmoid() >>> input = torch.randn(2) >>> output = m(input) 
PReLU¶
- 
class torch.nn.PReLU(num_parameters=1, init=0.25)¶
- Applies the element-wise function: \[\text{PReLU}(x) = \max(0,x) + a * \min(0,x) \]- or \[\text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\ ax, & \text{ otherwise } \end{cases} \]- Here \(a\) is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter \(a\) across all input channels. If called with nn.PReLU(nChannels), a separate \(a\) is used for each input channel. - Note - weight decay should not be used when learning \(a\) for good performance. - Note - Channel dim is the 2nd dim of input. When input has dims < 2, then there is no channel dim and the number of channels = 1. - Parameters
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
 - Variables
- ~PReLU.weight (Tensor) – the learnable weights of shape ( - num_parameters).
   - Examples: - >>> m = nn.PReLU() >>> input = torch.randn(2) >>> output = m(input) 
ReLU¶
- 
class torch.nn.ReLU(inplace=False)¶
- Applies the rectified linear unit function element-wise: - \(\text{ReLU}(x)= \max(0, x)\) - Parameters
- inplace – can optionally do the operation in-place. Default: - False
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.ReLU() >>> input = torch.randn(2) >>> output = m(input) An implementation of CReLU - https://arxiv.org/abs/1603.05201 >>> m = nn.ReLU() >>> input = torch.randn(2).unsqueeze(0) >>> output = torch.cat((m(input),m(-input))) 
ReLU6¶
- 
class torch.nn.ReLU6(inplace=False)¶
- Applies the element-wise function: \[\text{ReLU6}(x) = \min(\max(0,x), 6) \]- Parameters
- inplace – can optionally do the operation in-place. Default: - False
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.ReLU6() >>> input = torch.randn(2) >>> output = m(input) 
RReLU¶
- 
class torch.nn.RReLU(lower=0.125, upper=0.3333333333333333, inplace=False)¶
- Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper: - Empirical Evaluation of Rectified Activations in Convolutional Network. - The function is defined as: \[\text{RReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ ax & \text{ otherwise } \end{cases} \]- where \(a\) is randomly sampled from uniform distribution \(\mathcal{U}(\text{lower}, \text{upper})\). - Parameters
- lower – lower bound of the uniform distribution. Default: \(\frac{1}{8}\) 
- upper – upper bound of the uniform distribution. Default: \(\frac{1}{3}\) 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
 - Examples: - >>> m = nn.RReLU(0.1, 0.3) >>> input = torch.randn(2) >>> output = m(input) 
SELU¶
- 
class torch.nn.SELU(inplace=False)¶
- Applied element-wise, as: \[\text{SELU}(x) = \text{scale} * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))) \]- with \(\alpha = 1.6732632423543772848170429916717\) and \(\text{scale} = 1.0507009873554804934193349852946\). - More details can be found in the paper Self-Normalizing Neural Networks . - Parameters
- inplace (bool, optional) – can optionally do the operation in-place. Default: - False
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.SELU() >>> input = torch.randn(2) >>> output = m(input) 
CELU¶
- 
class torch.nn.CELU(alpha=1.0, inplace=False)¶
- Applies the element-wise function: \[\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)) \]- More details can be found in the paper Continuously Differentiable Exponential Linear Units . - Parameters
- alpha – the \(\alpha\) value for the CELU formulation. Default: 1.0 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.CELU() >>> input = torch.randn(2) >>> output = m(input) 
Sigmoid¶
- 
class torch.nn.Sigmoid¶
- Applies the element-wise function: \[\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)} \]- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Sigmoid() >>> input = torch.randn(2) >>> output = m(input) 
Softplus¶
- 
class torch.nn.Softplus(beta=1, threshold=20)¶
- Applies the element-wise function: \[\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)) \]- SoftPlus is a smooth approximation to the ReLU function and can be used to constrain the output of a machine to always be positive. - For numerical stability the implementation reverts to the linear function for inputs above a certain value. - Parameters
- beta – the \(\beta\) value for the Softplus formulation. Default: 1 
- threshold – values above this revert to a linear function. Default: 20 
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Softplus() >>> input = torch.randn(2) >>> output = m(input) 
Softshrink¶
- 
class torch.nn.Softshrink(lambd=0.5)¶
- Applies the soft shrinkage function elementwise: \[\text{SoftShrinkage}(x) = \begin{cases} x - \lambda, & \text{ if } x > \lambda \\ x + \lambda, & \text{ if } x < -\lambda \\ 0, & \text{ otherwise } \end{cases} \]- Parameters
- lambd – the \(\lambda\) value for the Softshrink formulation. Default: 0.5 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Softshrink() >>> input = torch.randn(2) >>> output = m(input) 
Softsign¶
- 
class torch.nn.Softsign¶
- Applies the element-wise function: \[\text{SoftSign}(x) = \frac{x}{ 1 + |x|} \]- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Softsign() >>> input = torch.randn(2) >>> output = m(input) 
Tanh¶
- 
class torch.nn.Tanh¶
- Applies the element-wise function: \[\text{Tanh}(x) = \tanh(x) = \frac{e^x - e^{-x}} {e^x + e^{-x}} \]- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Tanh() >>> input = torch.randn(2) >>> output = m(input) 
Tanhshrink¶
- 
class torch.nn.Tanhshrink¶
- Applies the element-wise function: \[\text{Tanhshrink}(x) = x - \text{Tanh}(x) \]- Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
   - Examples: - >>> m = nn.Tanhshrink() >>> input = torch.randn(2) >>> output = m(input) 
Threshold¶
- 
class torch.nn.Threshold(threshold, value, inplace=False)¶
- Thresholds each element of the input Tensor. - Threshold is defined as: \[y = \begin{cases} x, &\text{ if } x > \text{threshold} \\ \text{value}, &\text{ otherwise } \end{cases} \]- Parameters
- threshold – The value to threshold at 
- value – The value to replace with 
- inplace – can optionally do the operation in-place. Default: - False
 
 - Shape:
- Input: \((N, *)\) where * means, any number of additional dimensions 
- Output: \((N, *)\), same shape as the input 
 
 - Examples: - >>> m = nn.Threshold(0.1, 20) >>> input = torch.randn(2) >>> output = m(input) 
Non-linear activations (other)¶
Softmin¶
- 
class torch.nn.Softmin(dim=None)¶
- Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0, 1] and sum to 1. - Softmin is defined as: \[\text{Softmin}(x_{i}) = \frac{\exp(-x_i)}{\sum_j \exp(-x_j)} \]- Shape:
- Input: \((*)\) where * means, any number of additional dimensions 
- Output: \((*)\), same shape as the input 
 
 - Parameters
- dim (int) – A dimension along which Softmin will be computed (so every slice along dim will sum to 1). 
- Returns
- a Tensor of the same dimension and shape as the input, with values in the range [0, 1] 
 - Examples: - >>> m = nn.Softmin() >>> input = torch.randn(2, 3) >>> output = m(input) 
Softmax¶
- 
class torch.nn.Softmax(dim=None)¶
- Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. - Softmax is defined as: \[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)} \]- Shape:
- Input: \((*)\) where * means, any number of additional dimensions 
- Output: \((*)\), same shape as the input 
 
 - Returns
- a Tensor of the same dimension and shape as the input with values in the range [0, 1] 
- Parameters
- dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1). 
 - Note - This module doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use LogSoftmax instead (it’s faster and has better numerical properties). - Examples: - >>> m = nn.Softmax() >>> input = torch.randn(2, 3) >>> output = m(input) 
Softmax2d¶
- 
class torch.nn.Softmax2d¶
- Applies SoftMax over features to each spatial location. - When given an image of - Channels x Height x Width, it will apply Softmax to each location \((Channels, h_i, w_j)\)- Shape:
- Input: \((N, C, H, W)\) 
- Output: \((N, C, H, W)\) (same shape as input) 
 
 - Returns
- a Tensor of the same dimension and shape as the input with values in the range [0, 1] 
 - Examples: - >>> m = nn.Softmax2d() >>> # you softmax over the 2nd dimension >>> input = torch.randn(2, 3, 12, 13) >>> output = m(input) 
LogSoftmax¶
- 
class torch.nn.LogSoftmax(dim=None)¶
- Applies the \(\log(\text{Softmax}(x))\) function to an n-dimensional input Tensor. The LogSoftmax formulation can be simplified as: \[\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right) \]- Shape:
- Input: \((*)\) where * means, any number of additional dimensions 
- Output: \((*)\), same shape as the input 
 
 - Parameters
- dim (int) – A dimension along which LogSoftmax will be computed. 
- Returns
- a Tensor of the same dimension and shape as the input with values in the range [-inf, 0) 
 - Examples: - >>> m = nn.LogSoftmax() >>> input = torch.randn(2, 3) >>> output = m(input) 
AdaptiveLogSoftmaxWithLoss¶
- 
class torch.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs, div_value=4.0, head_bias=False)¶
- Efficient softmax approximation as described in Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou. - Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency distribution approximately follows the Zipf’s law. - Adaptive softmax partitions the labels into several clusters, according to their frequency. These clusters may contain different number of targets each. Additionally, clusters containing less frequent labels assign lower dimensional embeddings to those labels, which speeds up the computation. For each minibatch, only clusters for which at least one target is present are evaluated. - The idea is that the clusters which are accessed frequently (like the first one, containing most frequent labels), should also be cheap to compute – that is, contain a small number of assigned labels. - We highly recommend taking a look at the original paper for more details. - cutoffsshould be an ordered Sequence of integers sorted in the increasing order. It controls number of clusters and the partitioning of targets into clusters. For example setting- cutoffs = [10, 100, 1000]means that first 10 targets will be assigned to the ‘head’ of the adaptive softmax, targets 11, 12, …, 100 will be assigned to the first cluster, and targets 101, 102, …, 1000 will be assigned to the second cluster, while targets 1001, 1002, …, n_classes - 1 will be assigned to the last, third cluster.
- div_valueis used to compute the size of each additional cluster, which is given as \(\left\lfloor\frac{in\_features}{div\_value^{idx}}\right\rfloor\), where \(idx\) is the cluster index (with clusters for less frequent words having larger indices, and indices starting from \(1\)).
- head_biasif set to True, adds a bias term to the ‘head’ of the adaptive softmax. See paper for details. Set to False in the official implementation.
 - Warning - Labels passed as inputs to this module should be sorted accoridng to their frequency. This means that the most frequent label should be represented by the index 0, and the least frequent label should be represented by the index n_classes - 1. - Note - This module returns a - NamedTuplewith- outputand- lossfields. See further documentation for details.- Note - To compute log-probabilities for all classes, the - log_probmethod can be used.- Parameters
- in_features (int) – Number of features in the input tensor 
- n_classes (int) – Number of classes in the dataset 
- cutoffs (Sequence) – Cutoffs used to assign targets to their buckets 
- div_value (float, optional) – value used as an exponent to compute sizes of the clusters. Default: 4.0 
- head_bias (bool, optional) – If - True, adds a bias term to the ‘head’ of the adaptive softmax. Default:- False
 
- Returns
- output is a Tensor of size - Ncontaining computed target log probabilities for each example
- loss is a Scalar representing the computed negative log likelihood loss 
 
- Return type
- NamedTuplewith- outputand- lossfields
 - Shape:
- input: \((N, in\_features)\) 
- target: \((N)\) where each value satisfies \(0 <= target[i] <= n\_classes\) 
- output1: \((N)\) 
- output2: - Scalar
 
 - 
log_prob(input)¶
- Computes log probabilities for all \(n\_classes\) - Parameters
- input (Tensor) – a minibatch of examples 
- Returns
- log-probabilities of for each class \(c\) in range \(0 <= c <= n\_classes\), where \(n\_classes\) is a parameter passed to - AdaptiveLogSoftmaxWithLossconstructor.
 - Shape:
- Input: \((N, in\_features)\) 
- Output: \((N, n\_classes)\) 
 
 
 - 
predict(input)¶
- This is equivalent to self.log_pob(input).argmax(dim=1), but is more efficient in some cases. - Parameters
- input (Tensor) – a minibatch of examples 
- Returns
- a class with the highest probability for each example 
- Return type
- output (Tensor) 
 - Shape:
- Input: \((N, in\_features)\) 
- Output: \((N)\) 
 
 
 
Normalization layers¶
BatchNorm1d¶
- 
class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶
- Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . \[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are sampled from \(\mathcal{U}(0, 1)\) and the elements of \(\beta\) are set to 0. - Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default - momentumof 0.1.- If - track_running_statsis set to- False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Because the Batch Normalization is done over the C dimension, computing statistics on (N, L) slices, it’s common terminology to call this Temporal Batch Normalization. - Parameters
- num_features – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Can be set to - Nonefor cumulative moving average (i.e. simple average). Default: 0.1
- affine – a boolean value that when set to - True, this module has learnable affine parameters. Default:- True
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- True
 
 - Shape:
- Input: \((N, C)\) or \((N, C, L)\) 
- Output: \((N, C)\) or \((N, C, L)\) (same shape as input) 
 
 - Examples: - >>> # With Learnable Parameters >>> m = nn.BatchNorm1d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm1d(100, affine=False) >>> input = torch.randn(20, 100) >>> output = m(input) 
BatchNorm2d¶
- 
class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶
- Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are sampled from \(\mathcal{U}(0, 1)\) and the elements of \(\beta\) are set to 0. - Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default - momentumof 0.1.- If - track_running_statsis set to- False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Because the Batch Normalization is done over the C dimension, computing statistics on (N, H, W) slices, it’s common terminology to call this Spatial Batch Normalization. - Parameters
- num_features – \(C\) from an expected input of size \((N, C, H, W)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Can be set to - Nonefor cumulative moving average (i.e. simple average). Default: 0.1
- affine – a boolean value that when set to - True, this module has learnable affine parameters. Default:- True
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- True
 
 - Shape:
- Input: \((N, C, H, W)\) 
- Output: \((N, C, H, W)\) (same shape as input) 
 
 - Examples: - >>> # With Learnable Parameters >>> m = nn.BatchNorm2d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm2d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input) 
BatchNorm3d¶
- 
class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)¶
- Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension over the mini-batches and \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size). By default, the elements of \(\gamma\) are sampled from \(\mathcal{U}(0, 1)\) and the elements of \(\beta\) are set to 0. - Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default - momentumof 0.1.- If - track_running_statsis set to- False, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Because the Batch Normalization is done over the C dimension, computing statistics on (N, D, H, W) slices, it’s common terminology to call this Volumetric Batch Normalization or Spatio-temporal Batch Normalization. - Parameters
- num_features – \(C\) from an expected input of size \((N, C, D, H, W)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Can be set to - Nonefor cumulative moving average (i.e. simple average). Default: 0.1
- affine – a boolean value that when set to - True, this module has learnable affine parameters. Default:- True
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- True
 
 - Shape:
- Input: \((N, C, D, H, W)\) 
- Output: \((N, C, D, H, W)\) (same shape as input) 
 
 - Examples: - >>> # With Learnable Parameters >>> m = nn.BatchNorm3d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm3d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) 
GroupNorm¶
- 
class torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)¶
- Applies Group Normalization over a mini-batch of inputs as described in the paper Group Normalization . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \]- The input channels are separated into - num_groupsgroups, each containing- num_channels / num_groupschannels. The mean and standard-deviation are calculated separately over the each group. \(\gamma\) and \(\beta\) are learnable per-channel affine transform parameter vectorss of size- num_channelsif- affineis- True.- This layer uses statistics computed from input data in both training and evaluation modes. - Parameters
- num_groups (int) – number of groups to separate the channels into 
- num_channels (int) – number of channels expected in input 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- affine – a boolean value that when set to - True, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default:- True.
 
 - Shape:
- Input: \((N, C, *)\) where \(C=\text{num\_channels}\) 
- Output: \((N, C, *)\) (same shape as input) 
 
 - Examples: - >>> input = torch.randn(20, 6, 10, 10) >>> # Separate 6 channels into 3 groups >>> m = nn.GroupNorm(3, 6) >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) >>> m = nn.GroupNorm(6, 6) >>> # Put all 6 channels into a single group (equivalent with LayerNorm) >>> m = nn.GroupNorm(1, 6) >>> # Activating the module >>> output = m(input) 
InstanceNorm1d¶
- 
class torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)¶
- Applies Instance Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if - affineis- True.- By default, this layer uses instance statistics computed from input data in both training and evaluation modes. - If - track_running_statsis set to- True, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default- momentumof 0.1.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momemtum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Note - InstanceNorm1dand- LayerNormare very similar, but have some subtle differences.- InstanceNorm1dis applied on each channel of channeled data like multidimensional time series, but- LayerNormis usually applied on entire sample and often in NLP tasks. Additionaly,- LayerNormapplies elementwise affine transform, while- InstanceNorm1dusually don’t apply affine transform.- Parameters
- num_features – \(C\) from an expected input of size \((N, C, L)\) or \(L\) from input of size \((N, L)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Default: 0.1 
- affine – a boolean value that when set to - True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:- False.
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- False
 
 - Shape:
- Input: \((N, C, L)\) 
- Output: \((N, C, L)\) (same shape as input) 
 
 - Examples: - >>> # Without Learnable Parameters >>> m = nn.InstanceNorm1d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm1d(100, affine=True) >>> input = torch.randn(20, 100, 40) >>> output = m(input) 
InstanceNorm2d¶
- 
class torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)¶
- Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if - affineis- True.- By default, this layer uses instance statistics computed from input data in both training and evaluation modes. - If - track_running_statsis set to- True, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default- momentumof 0.1.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momemtum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Note - InstanceNorm2dand- LayerNormare very similar, but have some subtle differences.- InstanceNorm2dis applied on each channel of channeled data like RGB images, but- LayerNormis usually applied on entire sample and often in NLP tasks. Additionaly,- LayerNormapplies elementwise affine transform, while- InstanceNorm2dusually don’t apply affine transform.- Parameters
- num_features – \(C\) from an expected input of size \((N, C, H, W)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Default: 0.1 
- affine – a boolean value that when set to - True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:- False.
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- False
 
 - Shape:
- Input: \((N, C, H, W)\) 
- Output: \((N, C, H, W)\) (same shape as input) 
 
 - Examples: - >>> # Without Learnable Parameters >>> m = nn.InstanceNorm2d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm2d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input) 
InstanceNorm3d¶
- 
class torch.nn.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)¶
- Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper Instance Normalization: The Missing Ingredient for Fast Stylization . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]- The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. \(\gamma\) and \(\beta\) are learnable parameter vectors of size C (where C is the input size) if - affineis- True.- By default, this layer uses instance statistics computed from input data in both training and evaluation modes. - If - track_running_statsis set to- True, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default- momentumof 0.1.- Note - This - momentumargument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is \(\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momemtum} \times x_t\), where \(\hat{x}\) is the estimated statistic and \(x_t\) is the new observed value.- Note - InstanceNorm3dand- LayerNormare very similar, but have some subtle differences.- InstanceNorm3dis applied on each channel of channeled data like 3D models with RGB color, but- LayerNormis usually applied on entire sample and often in NLP tasks. Additionaly,- LayerNormapplies elementwise affine transform, while- InstanceNorm3dusually don’t apply affine transform.- Parameters
- num_features – \(C\) from an expected input of size \((N, C, D, H, W)\) 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- momentum – the value used for the running_mean and running_var computation. Default: 0.1 
- affine – a boolean value that when set to - True, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default:- False.
- track_running_stats – a boolean value that when set to - True, this module tracks the running mean and variance, and when set to- False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:- False
 
 - Shape:
- Input: \((N, C, D, H, W)\) 
- Output: \((N, C, D, H, W)\) (same shape as input) 
 
 - Examples: - >>> # Without Learnable Parameters >>> m = nn.InstanceNorm3d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm3d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) 
LayerNorm¶
- 
class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)¶
- Applies Layer Normalization over a mini-batch of inputs as described in the paper Layer Normalization . \[y = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \]- The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by - normalized_shape. \(\gamma\) and \(\beta\) are learnable affine transform parameters of- normalized_shapeif- elementwise_affineis- True.- Note - Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the - affineoption, Layer Normalization applies per-element scale and bias with- elementwise_affine.- This layer uses statistics computed from input data in both training and evaluation modes. - Parameters
- normalized_shape (int or list or torch.Size) – - input shape from an expected input of size \[[* \times \text{normalized\_shape}[0] \times \text{normalized\_shape}[1] \times \ldots \times \text{normalized\_shape}[-1]] \]- If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size. 
- eps – a value added to the denominator for numerical stability. Default: 1e-5 
- elementwise_affine – a boolean value that when set to - True, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default:- True.
 
 - Shape:
- Input: \((N, *)\) 
- Output: \((N, *)\) (same shape as input) 
 
 - Examples: - >>> input = torch.randn(20, 5, 10, 10) >>> # With Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:]) >>> # Without Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False) >>> # Normalize over last two dimensions >>> m = nn.LayerNorm([10, 10]) >>> # Normalize over last dimension of size 10 >>> m = nn.LayerNorm(10) >>> # Activating the module >>> output = m(input) 
LocalResponseNorm¶
- 
class torch.nn.LocalResponseNorm(size, alpha=0.0001, beta=0.75, k=1.0)¶
- Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. Applies normalization across channels. \[b_{c} = a_{c}\left(k + \frac{\alpha}{n} \sum_{c'=\max(0, c-n/2)}^{\min(N-1,c+n/2)}a_{c'}^2\right)^{-\beta} \]- Parameters
- size – amount of neighbouring channels used for normalization 
- alpha – multiplicative factor. Default: 0.0001 
- beta – exponent. Default: 0.75 
- k – additive factor. Default: 1 
 
 - Shape:
- Input: \((N, C, *)\) 
- Output: \((N, C, *)\) (same shape as input) 
 
 - Examples: - >>> lrn = nn.LocalResponseNorm(2) >>> signal_2d = torch.randn(32, 5, 24, 24) >>> signal_4d = torch.randn(16, 5, 7, 7, 7, 7) >>> output_2d = lrn(signal_2d) >>> output_4d = lrn(signal_4d) 
Recurrent layers¶
RNN¶
- 
class torch.nn.RNN(*args, **kwargs)¶
- Applies a multi-layer Elman RNN with \(tanh\) or \(ReLU\) non-linearity to an input sequence. - For each element in the input sequence, each layer computes the following function: \[h_t = \text{tanh}(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) \]- where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, and \(h_{(t-1)}\) is the hidden state of the previous layer at time t-1 or the initial hidden state at time 0. If - nonlinearityis- 'relu', then ReLU is used instead of tanh.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- num_layers – Number of recurrent layers. E.g., setting - num_layers=2would mean stacking two RNNs together to form a stacked RNN, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
- nonlinearity – The non-linearity to use. Can be either - 'tanh'or- 'relu'. Default:- 'tanh'
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- batch_first – If - True, then the input and output tensors are provided as (batch, seq, feature). Default:- False
- dropout – If non-zero, introduces a Dropout layer on the outputs of each RNN layer except the last layer, with dropout probability equal to - dropout. Default: 0
- bidirectional – If - True, becomes a bidirectional RNN. Default:- False
 
 - Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See - torch.nn.utils.rnn.pack_padded_sequence()or- torch.nn.utils.rnn.pack_sequence()for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
 
- Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the RNN, for each t. If a - torch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence.- For the unpacked case, the directions can be separated using - output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. - Like output, the layers can be separated using - h_n.view(num_layers, num_directions, batch, hidden_size).
 
- Shape:
- Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\text{input\_size}\) and L represents a sequence length. 
- Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\text{hidden\_size}\) Defaults to zero if not provided. where \(S=\text{num\_layers} * \text{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
- Output1: \((L, N, H_{all})\) where \(H_all=\text{num\_directions} * \text{hidden\_size}\) 
- Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch 
 
 - Variables
- ~RNN.weight_ih_l[k] – the learnable input-hidden weights of the k-th layer, of shape (hidden_size, input_size) for k = 0. Otherwise, the shape is (hidden_size, num_directions * hidden_size) 
- ~RNN.weight_hh_l[k] – the learnable hidden-hidden weights of the k-th layer, of shape (hidden_size, hidden_size) 
- ~RNN.bias_ih_l[k] – the learnable input-hidden bias of the k-th layer, of shape (hidden_size) 
- ~RNN.bias_hh_l[k] – the learnable hidden-hidden bias of the k-th layer, of shape (hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Note - If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype - torch.float164) V100 GPU is used, 5) input data is not in- PackedSequenceformat persistent algorithm can be selected to improve performance.- Examples: - >>> rnn = nn.RNN(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0) 
LSTM¶
- 
class torch.nn.LSTM(*args, **kwargs)¶
- Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. - For each element in the input sequence, each layer computes the following function: \[\begin{array}{ll} \\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{(t-1)} + b_{hf}) \\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{(t-1)} + b_{hg}) \\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\ c_t = f_t * c_{(t-1)} + i_t * g_t \\ h_t = o_t * \tanh(c_t) \\ \end{array} \]- where \(h_t\) is the hidden state at time t, \(c_t\) is the cell state at time t, \(x_t\) is the input at time t, \(h_{(t-1)}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(i_t\), \(f_t\), \(g_t\), \(o_t\) are the input, forget, cell, and output gates, respectively. \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. - In a multilayer LSTM, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l >= 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability - dropout.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- num_layers – Number of recurrent layers. E.g., setting - num_layers=2would mean stacking two LSTMs together to form a stacked LSTM, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- batch_first – If - True, then the input and output tensors are provided as (batch, seq, feature). Default:- False
- dropout – If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to - dropout. Default: 0
- bidirectional – If - True, becomes a bidirectional LSTM. Default:- False
 
 - Inputs: input, (h_0, c_0)
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See - torch.nn.utils.rnn.pack_padded_sequence()or- torch.nn.utils.rnn.pack_sequence()for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. 
- c_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial cell state for each element in the batch. - If (h_0, c_0) is not provided, both h_0 and c_0 default to zero. 
 
- Outputs: output, (h_n, c_n)
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features (h_t) from the last layer of the LSTM, for each t. If a - torch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence.- For the unpacked case, the directions can be separated using - output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively. Similarly, the directions can be separated in the packed case.
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. - Like output, the layers can be separated using - h_n.view(num_layers, num_directions, batch, hidden_size)and similarly for c_n.
- c_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the cell state for t = seq_len. 
 
 - Variables
- ~LSTM.weight_ih_l[k] – the learnable input-hidden weights of the \(\text{k}^{th}\) layer (W_ii|W_if|W_ig|W_io), of shape (4*hidden_size, input_size) for k = 0. Otherwise, the shape is (4*hidden_size, num_directions * hidden_size) 
- ~LSTM.weight_hh_l[k] – the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer (W_hi|W_hf|W_hg|W_ho), of shape (4*hidden_size, hidden_size) 
- ~LSTM.bias_ih_l[k] – the learnable input-hidden bias of the \(\text{k}^{th}\) layer (b_ii|b_if|b_ig|b_io), of shape (4*hidden_size) 
- ~LSTM.bias_hh_l[k] – the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer (b_hi|b_hf|b_hg|b_ho), of shape (4*hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Note - If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype - torch.float164) V100 GPU is used, 5) input data is not in- PackedSequenceformat persistent algorithm can be selected to improve performance.- Examples: - >>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0)) 
GRU¶
- 
class torch.nn.GRU(*args, **kwargs)¶
- Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. - For each element in the input sequence, each layer computes the following function: \[\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array} \]- where \(h_t\) is the hidden state at time t, \(x_t\) is the input at time t, \(h_{(t-1)}\) is the hidden state of the layer at time t-1 or the initial hidden state at time 0, and \(r_t\), \(z_t\), \(n_t\) are the reset, update, and new gates, respectively. \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. - In a multilayer GRU, the input \(x^{(l)}_t\) of the \(l\) -th layer (\(l >= 2\)) is the hidden state \(h^{(l-1)}_t\) of the previous layer multiplied by dropout \(\delta^{(l-1)}_t\) where each \(\delta^{(l-1)}_t\) is a Bernoulli random variable which is \(0\) with probability - dropout.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- num_layers – Number of recurrent layers. E.g., setting - num_layers=2would mean stacking two GRUs together to form a stacked GRU, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- batch_first – If - True, then the input and output tensors are provided as (batch, seq, feature). Default:- False
- dropout – If non-zero, introduces a Dropout layer on the outputs of each GRU layer except the last layer, with dropout probability equal to - dropout. Default: 0
- bidirectional – If - True, becomes a bidirectional GRU. Default:- False
 
 - Inputs: input, h_0
- input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See - torch.nn.utils.rnn.pack_padded_sequence()for details.
- h_0 of shape (num_layers * num_directions, batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
 
- Outputs: output, h_n
- output of shape (seq_len, batch, num_directions * hidden_size): tensor containing the output features h_t from the last layer of the GRU, for each t. If a - torch.nn.utils.rnn.PackedSequencehas been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using- output.view(seq_len, batch, num_directions, hidden_size), with forward and backward being direction 0 and 1 respectively.- Similarly, the directions can be separated in the packed case. 
- h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len - Like output, the layers can be separated using - h_n.view(num_layers, num_directions, batch, hidden_size).
 
- Shape:
- Input1: \((L, N, H_{in})\) tensor containing input features where \(H_{in}=\text{input\_size}\) and L represents a sequence length. 
- Input2: \((S, N, H_{out})\) tensor containing the initial hidden state for each element in the batch. \(H_{out}=\text{hidden\_size}\) Defaults to zero if not provided. where \(S=\text{num\_layers} * \text{num\_directions}\) If the RNN is bidirectional, num_directions should be 2, else it should be 1. 
- Output1: \((L, N, H_{all})\) where \(H_all=\text{num\_directions} * \text{hidden\_size}\) 
- Output2: \((S, N, H_{out})\) tensor containing the next hidden state for each element in the batch 
 
 - Variables
- ~GRU.weight_ih_l[k] – the learnable input-hidden weights of the \(\text{k}^{th}\) layer (W_ir|W_iz|W_in), of shape (3*hidden_size, input_size) for k = 0. Otherwise, the shape is (3*hidden_size, num_directions * hidden_size) 
- ~GRU.weight_hh_l[k] – the learnable hidden-hidden weights of the \(\text{k}^{th}\) layer (W_hr|W_hz|W_hn), of shape (3*hidden_size, hidden_size) 
- ~GRU.bias_ih_l[k] – the learnable input-hidden bias of the \(\text{k}^{th}\) layer (b_ir|b_iz|b_in), of shape (3*hidden_size) 
- ~GRU.bias_hh_l[k] – the learnable hidden-hidden bias of the \(\text{k}^{th}\) layer (b_hr|b_hz|b_hn), of shape (3*hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Note - If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype - torch.float164) V100 GPU is used, 5) input data is not in- PackedSequenceformat persistent algorithm can be selected to improve performance.- Examples: - >>> rnn = nn.GRU(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0) 
RNNCell¶
- 
class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')¶
- An Elman RNN cell with tanh or ReLU non-linearity. \[h' = \tanh(W_{ih} x + b_{ih} + W_{hh} h + b_{hh})\]- If - nonlinearityis ‘relu’, then ReLU is used in place of tanh.- Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
- nonlinearity – The non-linearity to use. Can be either - 'tanh'or- 'relu'. Default:- 'tanh'
 
 - Inputs: input, hidden
- input of shape (batch, input_size): tensor containing input features 
- hidden of shape (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. 
 
- Outputs: h’
- h’ of shape (batch, hidden_size): tensor containing the next hidden state for each element in the batch 
 
- Shape:
- Input1: \((N, H_{in})\) tensor containing input features where \(H_{in}\) = input_size 
- Input2: \((N, H_{out})\) tensor containing the initial hidden state for each element in the batch where \(H_{out}\) = hidden_size Defaults to zero if not provided. 
- Output: \((N, H_{out})\) tensor containing the next hidden state for each element in the batch 
 
 - Variables
- ~RNNCell.weight_ih – the learnable input-hidden weights, of shape (hidden_size, input_size) 
- ~RNNCell.weight_hh – the learnable hidden-hidden weights, of shape (hidden_size, hidden_size) 
- ~RNNCell.bias_ih – the learnable input-hidden bias, of shape (hidden_size) 
- ~RNNCell.bias_hh – the learnable hidden-hidden bias, of shape (hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Examples: - >>> rnn = nn.RNNCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) 
LSTMCell¶
- 
class torch.nn.LSTMCell(input_size, hidden_size, bias=True)¶
- A long short-term memory (LSTM) cell. \[\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\ c' = f * c + i * g \\ h' = o * \tanh(c') \\ \end{array}\]- where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. - Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
 
 - Inputs: input, (h_0, c_0)
- input of shape (batch, input_size): tensor containing input features 
- h_0 of shape (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. 
- c_0 of shape (batch, hidden_size): tensor containing the initial cell state for each element in the batch. - If (h_0, c_0) is not provided, both h_0 and c_0 default to zero. 
 
- Outputs: (h_1, c_1)
- h_1 of shape (batch, hidden_size): tensor containing the next hidden state for each element in the batch 
- c_1 of shape (batch, hidden_size): tensor containing the next cell state for each element in the batch 
 
 - Variables
- ~LSTMCell.weight_ih – the learnable input-hidden weights, of shape (4*hidden_size, input_size) 
- ~LSTMCell.weight_hh – the learnable hidden-hidden weights, of shape (4*hidden_size, hidden_size) 
- ~LSTMCell.bias_ih – the learnable input-hidden bias, of shape (4*hidden_size) 
- ~LSTMCell.bias_hh – the learnable hidden-hidden bias, of shape (4*hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Examples: - >>> rnn = nn.LSTMCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> cx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx, cx = rnn(input[i], (hx, cx)) output.append(hx) 
GRUCell¶
- 
class torch.nn.GRUCell(input_size, hidden_size, bias=True)¶
- A gated recurrent unit (GRU) cell \[\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\ h' = (1 - z) * n + z * h \end{array}\]- where \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. - Parameters
- input_size – The number of expected features in the input x 
- hidden_size – The number of features in the hidden state h 
- bias – If - False, then the layer does not use bias weights b_ih and b_hh. Default:- True
 
 - Inputs: input, hidden
- input of shape (batch, input_size): tensor containing input features 
- hidden of shape (batch, hidden_size): tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. 
 
- Outputs: h’
- h’ of shape (batch, hidden_size): tensor containing the next hidden state for each element in the batch 
 
- Shape:
- Input1: \((N, H_{in})\) tensor containing input features where \(H_{in}\) = input_size 
- Input2: \((N, H_{out})\) tensor containing the initial hidden state for each element in the batch where \(H_{out}\) = hidden_size Defaults to zero if not provided. 
- Output: \((N, H_{out})\) tensor containing the next hidden state for each element in the batch 
 
 - Variables
- ~GRUCell.weight_ih – the learnable input-hidden weights, of shape (3*hidden_size, input_size) 
- ~GRUCell.weight_hh – the learnable hidden-hidden weights, of shape (3*hidden_size, hidden_size) 
- ~GRUCell.bias_ih – the learnable input-hidden bias, of shape (3*hidden_size) 
- ~GRUCell.bias_hh – the learnable hidden-hidden bias, of shape (3*hidden_size) 
 
 - Note - All the weights and biases are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{hidden\_size}}\) - Examples: - >>> rnn = nn.GRUCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) 
Linear layers¶
Linear¶
- 
class torch.nn.Linear(in_features, out_features, bias=True)¶
- Applies a linear transformation to the incoming data: \(y = xA^T + b\) - Parameters
- in_features – size of each input sample 
- out_features – size of each output sample 
- bias – If set to - False, the layer will not learn an additive bias. Default:- True
 
 - Shape:
- Input: \((N, *, H_{in})\) where \(*\) means any number of additional dimensions and \(H_{in} = \text{in\_features}\) 
- Output: \((N, *, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = \text{out\_features}\). 
 
 - Variables
- ~Linear.weight – the learnable weights of the module of shape \((\text{out\_features}, \text{in\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in\_features}}\) 
- ~Linear.bias – the learnable bias of the module of shape \((\text{out\_features})\). If - biasis- True, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\) where \(k = \frac{1}{\text{in\_features}}\)
 
 - Examples: - >>> m = nn.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) 
Bilinear¶
- 
class torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True)¶
- Applies a bilinear transformation to the incoming data: \(y = x_1 A x_2 + b\) - Parameters
- in1_features – size of each first input sample 
- in2_features – size of each second input sample 
- out_features – size of each output sample 
- bias – If set to False, the layer will not learn an additive bias. Default: - True
 
 - Shape:
- Input1: \((N, *, H_{in1})\) where \(H_{in1}=\text{in1\_features}\) and \(*\) means any number of additional dimensions. All but the last dimension of the inputs should be the same. 
- Input2: \((N, *, H_{in2})\) where \(H_{in2}=\text{in2\_features}\). 
- Output: \((N, *, H_{out})\) where \(H_{out}=\text{out\_features}\) and all but the last dimension are the same shape as the input. 
 
 - Variables
- ~Bilinear.weight – the learnable weights of the module of shape \((\text{out\_features}, \text{in1\_features}, \text{in2\_features})\). The values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1\_features}}\) 
- ~Bilinear.bias – the learnable bias of the module of shape \((\text{out\_features})\). If - biasis- True, the values are initialized from \(\mathcal{U}(-\sqrt{k}, \sqrt{k})\), where \(k = \frac{1}{\text{in1\_features}}\)
 
 - Examples: - >>> m = nn.Bilinear(20, 30, 40) >>> input1 = torch.randn(128, 20) >>> input2 = torch.randn(128, 30) >>> output = m(input1, input2) >>> print(output.size()) torch.Size([128, 40]) 
Dropout layers¶
Dropout¶
- 
class torch.nn.Dropout(p=0.5, inplace=False)¶
- During training, randomly zeroes some of the elements of the input tensor with probability - pusing samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.- This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper Improving neural networks by preventing co-adaptation of feature detectors . - Furthermore, the outputs are scaled by a factor of \(\frac{1}{1-p}\) during training. This means that during evaluation the module simply computes an identity function. - Parameters
- p – probability of an element to be zeroed. Default: 0.5 
- inplace – If set to - True, will do this operation in-place. Default:- False
 
 - Shape:
- Input: \((*)\). Input can be of any shape 
- Output: \((*)\). Output is of the same shape as input 
 
 - Examples: - >>> m = nn.Dropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input) 
Dropout2d¶
- 
class torch.nn.Dropout2d(p=0.5, inplace=False)¶
- Randomly zero out entire channels (a channel is a 2D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 2D tensor \(\text{input}[i, j]\)). Each channel will be zeroed out independently on every forward call with probability - pusing samples from a Bernoulli distribution.- Usually the input comes from - nn.Conv2dmodules.- As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. - In this case, - nn.Dropout2d()will help promote independence between feature maps and should be used instead.- Parameters
 - Shape:
- Input: \((N, C, H, W)\) 
- Output: \((N, C, H, W)\) (same shape as input) 
 
 - Examples: - >>> m = nn.Dropout2d(p=0.2) >>> input = torch.randn(20, 16, 32, 32) >>> output = m(input) 
Dropout3d¶
- 
class torch.nn.Dropout3d(p=0.5, inplace=False)¶
- Randomly zero out entire channels (a channel is a 3D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 3D tensor \(\text{input}[i, j]\)). Each channel will be zeroed out independently on every forward call with probability - pusing samples from a Bernoulli distribution.- Usually the input comes from - nn.Conv3dmodules.- As described in the paper Efficient Object Localization Using Convolutional Networks , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. - In this case, - nn.Dropout3d()will help promote independence between feature maps and should be used instead.- Parameters
 - Shape:
- Input: \((N, C, D, H, W)\) 
- Output: \((N, C, D, H, W)\) (same shape as input) 
 
 - Examples: - >>> m = nn.Dropout3d(p=0.2) >>> input = torch.randn(20, 16, 4, 32, 32) >>> output = m(input) 
AlphaDropout¶
- 
class torch.nn.AlphaDropout(p=0.5, inplace=False)¶
- Applies Alpha Dropout over the input. - Alpha Dropout is a type of Dropout that maintains the self-normalizing property. For an input with zero mean and unit standard deviation, the output of Alpha Dropout maintains the original mean and standard deviation of the input. Alpha Dropout goes hand-in-hand with SELU activation function, which ensures that the outputs have zero mean and unit standard deviation. - During training, it randomly masks some of the elements of the input tensor with probability p using samples from a bernoulli distribution. The elements to masked are randomized on every forward call, and scaled and shifted to maintain zero mean and unit standard deviation. - During evaluation the module simply computes an identity function. - More details can be found in the paper Self-Normalizing Neural Networks . - Parameters
 - Shape:
- Input: \((*)\). Input can be of any shape 
- Output: \((*)\). Output is of the same shape as input 
 
 - Examples: - >>> m = nn.AlphaDropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input) 
Sparse layers¶
Embedding¶
- 
class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)¶
- A simple lookup table that stores embeddings of a fixed dictionary and size. - This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. - Parameters
- num_embeddings (int) – size of the dictionary of embeddings 
- embedding_dim (int) – the size of each embedding vector 
- padding_idx (int, optional) – If given, pads the output with the embedding vector at - padding_idx(initialized to zeros) whenever it encounters the index.
- max_norm (float, optional) – If given, each embedding vector with norm larger than - max_normis renormalized to have norm- max_norm.
- norm_type (float, optional) – The p of the p-norm to compute for the - max_normoption. Default- 2.
- scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default - False.
- sparse (bool, optional) – If - True, gradient w.r.t.- weightmatrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
 
- Variables
- ~Embedding.weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from \(\mathcal{N}(0, 1)\) 
 - Shape:
- Input: \((*)\), LongTensor of arbitrary shape containing the indices to extract 
- Output: \((*, H)\), where * is the input shape and \(H=\text{embedding\_dim}\) 
 
 - Note - Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s - optim.SGD(CUDA and CPU),- optim.SparseAdam(CUDA and CPU) and- optim.Adagrad(CPU)- Note - With - padding_idxset, the embedding vector at- padding_idxis initialized to all zeros. However, note that this vector can be modified afterwards, e.g., using a customized initialization method, and thus changing the vector used to pad the output. The gradient for this vector from- Embeddingis always zero.- Examples: - >>> # an Embedding module containing 10 tensors of size 3 >>> embedding = nn.Embedding(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) >>> embedding(input) tensor([[[-0.0251, -1.6902, 0.7172], [-0.6431, 0.0748, 0.6969], [ 1.4970, 1.3448, -0.9685], [-0.3677, -2.7265, -0.1685]], [[ 1.4970, 1.3448, -0.9685], [ 0.4362, -0.4004, 0.9400], [-0.6431, 0.0748, 0.6969], [ 0.9124, -2.3616, 1.1151]]]) >>> # example with padding_idx >>> embedding = nn.Embedding(10, 3, padding_idx=0) >>> input = torch.LongTensor([[0,2,0,5]]) >>> embedding(input) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.1535, -2.0309, 0.9315], [ 0.0000, 0.0000, 0.0000], [-0.1655, 0.9897, 0.0635]]]) - 
classmethod from_pretrained(embeddings, freeze=True, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)¶
- Creates Embedding instance from given 2-dimensional FloatTensor. - Parameters
- embeddings (Tensor) – FloatTensor containing weights for the Embedding. First dimension is being passed to Embedding as - num_embeddings, second as- embedding_dim.
- freeze (boolean, optional) – If - True, the tensor does not get updated in the learning process. Equivalent to- embedding.weight.requires_grad = False. Default:- True
- padding_idx (int, optional) – See module initialization documentation. 
- max_norm (float, optional) – See module initialization documentation. 
- norm_type (float, optional) – See module initialization documentation. Default - 2.
- scale_grad_by_freq (boolean, optional) – See module initialization documentation. Default - False.
- sparse (bool, optional) – See module initialization documentation. 
 
 - Examples: - >>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embedding = nn.Embedding.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([1]) >>> embedding(input) tensor([[ 4.0000, 5.1000, 6.3000]]) 
 
EmbeddingBag¶
- 
class torch.nn.EmbeddingBag(num_embeddings, embedding_dim, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False, _weight=None)¶
- Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings. - For bags of constant length, this class - However, - EmbeddingBagis much more time and memory efficient than using a chain of these operations.- Parameters
- num_embeddings (int) – size of the dictionary of embeddings 
- embedding_dim (int) – the size of each embedding vector 
- max_norm (float, optional) – If given, each embedding vector with norm larger than - max_normis renormalized to have norm- max_norm.
- norm_type (float, optional) – The p of the p-norm to compute for the - max_normoption. Default- 2.
- scale_grad_by_freq (boolean, optional) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default - False. Note: this option is not supported when- mode="max".
- mode (string, optional) – - "sum",- "mean"or- "max". Specifies the way to reduce the bag. Default:- "mean"
- sparse (bool, optional) – if - True, gradient w.r.t.- weightmatrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Note: this option is not supported when- mode="max".
 
- Variables
- ~EmbeddingBag.weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from \(\mathcal{N}(0, 1)\). 
 - Inputs: - input(LongTensor) and- offsets(LongTensor, optional)- If - inputis 2D of shape (B, N),- it will be treated as - Bbags (sequences) each of fixed length- N, and this will return- Bvalues aggregated in a way depending on the- mode.- offsetsis ignored and required to be- Nonein this case.
- If - inputis 1D of shape (N),- it will be treated as a concatenation of multiple bags (sequences). - offsetsis required to be a 1D tensor containing the starting index positions of each bag in- input. Therefore, for- offsetsof shape (B),- inputwill be viewed as having- Bbags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
 - Output shape: (B, embedding_dim) - Examples: - >>> # an Embedding module containing 10 tensors of size 3 >>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum') >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([1,2,4,5,4,3,2,9]) >>> offsets = torch.LongTensor([0,4]) >>> embedding_sum(input, offsets) tensor([[-0.8861, -5.4350, -0.0523], [ 1.1306, -2.5798, -1.0044]]) - 
classmethod from_pretrained(embeddings, freeze=True, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False)¶
- Creates EmbeddingBag instance from given 2-dimensional FloatTensor. - Parameters
- embeddings (Tensor) – FloatTensor containing weights for the EmbeddingBag. First dimension is being passed to EmbeddingBag as ‘num_embeddings’, second as ‘embedding_dim’. 
- freeze (boolean, optional) – If - True, the tensor does not get updated in the learning process. Equivalent to- embeddingbag.weight.requires_grad = False. Default:- True
- max_norm (float, optional) – See module initialization documentation. Default: - None
- norm_type (float, optional) – See module initialization documentation. Default - 2.
- scale_grad_by_freq (boolean, optional) – See module initialization documentation. Default - False.
- mode (string, optional) – See module initialization documentation. Default: - "mean"
- sparse (bool, optional) – See module initialization documentation. Default: - False.
 
 - Examples: - >>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embeddingbag = nn.EmbeddingBag.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([[1, 0]]) >>> embeddingbag(input) tensor([[ 2.5000, 3.7000, 4.6500]]) 
 
Distance functions¶
CosineSimilarity¶
- 
class torch.nn.CosineSimilarity(dim=1, eps=1e-08)¶
- Returns cosine similarity between \(x_1\) and \(x_2\), computed along dim. \[\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)} \]- Parameters
 - Shape:
- Input1: \((\ast_1, D, \ast_2)\) where D is at position dim 
- Input2: \((\ast_1, D, \ast_2)\), same shape as the Input1 
- Output: \((\ast_1, \ast_2)\) 
 
 - Examples: - >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> cos = nn.CosineSimilarity(dim=1, eps=1e-6) >>> output = cos(input1, input2) 
PairwiseDistance¶
- 
class torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)¶
- Computes the batchwise pairwise distance between vectors \(v_1\), \(v_2\) using the p-norm: \[\Vert x \Vert _p = \left( \sum_{i=1}^n \vert x_i \vert ^ p \right) ^ {1/p} \]- Parameters
 - Shape:
- Input1: \((N, D)\) where D = vector dimension 
- Input2: \((N, D)\), same shape as the Input1 
- Output: \((N)\). If - keepdimis- False, then \((N, 1)\).
 
 - Examples: - >>> pdist = nn.PairwiseDistance(p=2) >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> output = pdist(input1, input2) 
Loss functions¶
L1Loss¶
- 
class torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the mean absolute error (MAE) between each element in the input \(x\) and target \(y\). - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = \left| x_n - y_n \right|, \]- where \(N\) is the batch size. If - reductionis not- 'none'(default- 'mean'), then:\[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- \(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each. - The sum operation still operates over all the elements, and divides by \(n\). - The division by \(n\) can be avoided if one sets - reduction = 'sum'.- Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar. If - reductionis- 'none', then \((N, *)\), same shape as the input
 
 - Examples: - >>> loss = nn.L1Loss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward() 
MSELoss¶
- 
class torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input \(x\) and target \(y\). - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = \left( x_n - y_n \right)^2, \]- where \(N\) is the batch size. If - reductionis not- 'none'(default- 'mean'), then:\[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- \(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each. - The sum operation still operates over all the elements, and divides by \(n\). - The division by \(n\) can be avoided if one sets - reduction = 'sum'.- Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
 
 - Examples: - >>> loss = nn.MSELoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward() 
CrossEntropyLoss¶
- 
class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')¶
- This criterion combines - nn.LogSoftmax()and- nn.NLLLoss()in one single class.- It is useful when training a classification problem with C classes. If provided, the optional argument - weightshould be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.- The input is expected to contain raw, unnormalized scores for each class. - input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case (described later). - This criterion expects a class index in the range \([0, C-1]\) as the target for each value of a 1D tensor of size minibatch. - The loss can be described as: \[\text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right) = -x[class] + \log\left(\sum_j \exp(x[j])\right) \]- or in the case of the - weightargument being specified:\[\text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right) \]- The losses are averaged across observations for each minibatch. - Can also be used for higher dimension inputs, such as 2D images, by providing an input of size \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\), where \(K\) is the number of dimensions, and a target of appropriate shape (see below). - Parameters
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When - size_averageis- True, the loss is averaged over non-ignored targets.
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. 
- Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. 
- Output: scalar. If - reductionis- 'none', then the same size as the target: \((N)\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
 
 - Examples: - >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() 
CTCLoss¶
- 
class torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)¶
- The Connectionist Temporal Classification loss. - Parameters
- blank (int, optional) – blank label. Default \(0\). 
- reduction (string, optional) – Specifies the reduction to apply to the output: ‘none’ | ‘mean’ | ‘sum’. ‘none’: no reduction will be applied, ‘mean’: the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: ‘mean’ 
- zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default: - FalseInfinite losses mainly occur when the inputs are too short to be aligned to the targets.
 
 - Inputs:
- log_probs: Tensor of size \((T, N, C)\) where C = number of characters in alphabet including blank,
- T = input length, and N = batch size. The logarithmized probabilities of the outputs (e.g. obtained with - torch.nn.functional.log_softmax()).
- targets: Tensor of size \((N, S)\) or (sum(target_lengths)).
- Targets (cannot be blank). In the second form, the targets are assumed to be concatenated. 
- input_lengths: Tuple or tensor of size \((N)\).
- Lengths of the inputs (must each be \(\leq T\)) 
- target_lengths: Tuple or tensor of size \((N)\).
- Lengths of the targets 
 
 - Example: - >>> ctc_loss = nn.CTCLoss() >>> log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_() >>> targets = torch.randint(1, 20, (16, 30), dtype=torch.long) >>> input_lengths = torch.full((16,), 50, dtype=torch.long) >>> target_lengths = torch.randint(10,30,(16,), dtype=torch.long) >>> loss = ctc_loss(log_probs, targets, input_lengths, target_lengths) >>> loss.backward() - Reference:
- A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: https://www.cs.toronto.edu/~graves/icml_2006.pdf 
 - Note - In order to use CuDNN, the following must be satisfied: - targetsmust be in concatenated format, all- input_lengthsmust be T. \(blank=0\),- target_lengths\(\leq 256\), the integer arguments must be of dtype- torch.int32.- The regular implementation uses the (more common in PyTorch) torch.long dtype. - Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.
NLLLoss¶
- 
class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')¶
- The negative log likelihood loss. It is useful to train a classification problem with C classes. - If provided, the optional argument - weightshould be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set.- The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size either \((minibatch, C)\) or \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) for the K-dimensional case (described later). - Obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer of your network. You may use CrossEntropyLoss instead, if you prefer not to add an extra layer. - The target that this loss expects is a class index in the range \([0, C-1]\) where C = number of classes. - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} = \text{weight}[c] \cdot \mathbb{1}\{c \not= \text{ignore\_index}\}, \]- where \(N\) is the batch size. If - reductionis not- 'none'(default- 'mean'), then\[\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, & \text{if reduction} = \text{'mean';}\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{'sum'.} \end{cases} \]- Can also be used for higher dimension inputs, such as 2D images, by providing an input of size \((minibatch, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\), where \(K\) is the number of dimensions, and a target of appropriate shape (see below). In the case of images, it computes NLL loss per-pixel. - Parameters
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When - size_averageis- True, the loss is averaged over non-ignored targets.
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, C)\) where C = number of classes, or \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. 
- Target: \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss. 
- Output: scalar. If - reductionis- 'none', then the same size as the target: \((N)\), or \((N, d_1, d_2, ..., d_K)\) with \(K \geq 1\) in the case of K-dimensional loss.
 
 - Examples: - >>> m = nn.LogSoftmax(dim=1) >>> loss = nn.NLLLoss() >>> # input is of size N x C = 3 x 5 >>> input = torch.randn(3, 5, requires_grad=True) >>> # each element in target has to have 0 <= value < C >>> target = torch.tensor([1, 0, 4]) >>> output = loss(m(input), target) >>> output.backward() >>> >>> >>> # 2D loss example (used, for example, with image inputs) >>> N, C = 5, 4 >>> loss = nn.NLLLoss() >>> # input is of size N x C x height x width >>> data = torch.randn(N, 16, 10, 10) >>> conv = nn.Conv2d(16, C, (3, 3)) >>> m = nn.LogSoftmax(dim=1) >>> # each element in target has to have 0 <= value < C >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C) >>> output = loss(m(conv(data)), target) >>> output.backward() 
PoissonNLLLoss¶
- 
class torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')¶
- Negative log likelihood loss with Poisson distribution of target. - The loss can be described as: \[\text{target} \sim \mathrm{Poisson}(\text{input}) \text{loss}(\text{input}, \text{target}) = \text{input} - \text{target} * \log(\text{input}) + \log(\text{target!})\]- The last term can be omitted or approximated with Stirling formula. The approximation is used for target values more than 1. For targets less or equal to 1 zeros are added to the loss. - Parameters
- log_input (bool, optional) – if - Truethe loss is computed as \(\exp(\text{input}) - \text{target}*\text{input}\), if- Falsethe loss is \(\text{input} - \text{target}*\log(\text{input}+\text{eps})\).
- full (bool, optional) – - whether to compute full loss, i. e. to add the Stirling approximation term \[\text{target}*\log(\text{target}) - \text{target} + 0.5 * \log(2\pi\text{target}). \]
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- eps (float, optional) – Small value to avoid evaluation of \(\log(0)\) when - log_input = False. Default: 1e-8
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Examples: - >>> loss = nn.PoissonNLLLoss() >>> log_input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> output = loss(log_input, target) >>> output.backward() - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar by default. If - reductionis- 'none', then \((N, *)\), the same shape as the input
 
 
KLDivLoss¶
- 
class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean')¶
- The Kullback-Leibler divergence Loss - KL divergence is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions. - As with - NLLLoss, the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).- This criterion expects a target Tensor of the same size as the input Tensor. - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[l(x,y) = L = \{ l_1,\dots,l_N \}, \quad l_n = y_n \cdot \left( \log y_n - x_n \right) \]- where the index \(N\) spans all dimensions of - inputand \(L\) has the same shape as- input. If- reductionis not- 'none'(default- 'mean'), then:\[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';} \\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- In default - reductionmode- 'mean', the losses are averaged for each minibatch over observations as well as over dimensions.- 'batchmean'mode gives the correct KL divergence where losses are averaged over batch dimension only.- 'mean'mode’s behavior will be changed to the same as- 'batchmean'in the next major release.- Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'batchmean'|- 'sum'|- 'mean'.- 'none': no reduction will be applied.- 'batchmean': the sum of the output will be divided by batchsize.- 'sum': the output will be summed.- 'mean': the output will be divided by the number of elements in the output. Default:- 'mean'
 
 - Note - size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction.- Note - :attr: - reduction=- 'mean'doesn’t return the true kl divergence value, please use :attr:- reduction=- 'batchmean'which aligns with KL math definition. In the next major release,- 'mean'will be changed to be the same as- 'batchmean'.- Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar by default. If :attr: - reductionis- 'none', then \((N, *)\), the same shape as the input
 
 
BCELoss¶
- 
class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the Binary Cross Entropy between the target and the output: - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right], \]- where \(N\) is the batch size. If - reductionis not- 'none'(default- 'mean'), then\[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets \(y\) should be numbers between 0 and 1. - Parameters
- weight (Tensor, optional) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size nbatch. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar. If - reductionis- 'none', then \((N, *)\), same shape as input.
 
 - Examples: - >>> m = nn.Sigmoid() >>> loss = nn.BCELoss() >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> output = loss(m(input), target) >>> output.backward() 
BCEWithLogitsLoss¶
- 
class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)¶
- This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability. - The unreduced (i.e. with - reductionset to- 'none') loss can be described as:\[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right], \]- where \(N\) is the batch size. If - reductionis not- 'none'(default- 'mean'), then\[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets t[i] should be numbers between 0 and 1. - It’s possible to trade off recall and precision by adding weights to positive examples. In this case the loss can be described as: \[\ell(x, y) = L = \{l_1,\dots,l_N\}^\top, \quad l_n = - w_n \left[ p_n y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right], \]- where \(p_n\) is the weight of the positive class for sample \(n\) in the batch. \(p_n > 1\) increases the recall, \(p_n < 1\) increases the precision. - For example, if a dataset contains 100 positive and 300 negative examples of a single class, then pos_weight for the class should be equal to \(\frac{300}{100}=3\). The loss would act as if the dataset contains \(3\times 100=300\) positive examples. - Parameters
- weight (Tensor, optional) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size nbatch. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
- pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes. 
 
 - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar. If - reductionis- 'none', then \((N, *)\), same shape as input.
 - Examples: - >>> loss = nn.BCEWithLogitsLoss() >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> output = loss(input, target) >>> output.backward() 
 
MarginRankingLoss¶
- 
class torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the loss given inputs \(x1\), \(x2\), two 1D mini-batch Tensor`s, and a label 1D mini-batch tensor :math:`y (containing 1 or -1). - If \(y = 1\) then it assumed the first input should be ranked higher (have a larger value) than the second input, and vice-versa for \(y = -1\). - The loss function for each sample in the mini-batch is: \[\text{loss}(x, y) = \max(0, -y * (x1 - x2) + \text{margin}) \]- Parameters
- margin (float, optional) – Has a default value of \(0\). 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, D)\) where N is the batch size and D is the size of a sample. 
- Target: \((N)\) 
- Output: scalar. If - reductionis- 'none', then \((N)\).
 
 
HingeEmbeddingLoss¶
- 
class torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')¶
- Measures the loss given an input tensor \(x\) and a labels tensor \(y\) (containing 1 or -1). This is usually used for measuring whether two inputs are similar or dissimilar, e.g. using the L1 pairwise distance as \(x\), and is typically used for learning nonlinear embeddings or semi-supervised learning. - The loss function for \(n\)-th sample in the mini-batch is \[l_n = \begin{cases} x_n, & \text{if}\; y_n = 1,\\ \max \{0, \Delta - x_n\}, & \text{if}\; y_n = -1, \end{cases} \]- and the total loss functions is \[\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(L), & \text{if reduction} = \text{'sum'.} \end{cases} \]- where \(L = \{l_1,\dots,l_N\}^\top\). - Parameters
- margin (float, optional) – Has a default value of 1. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((*)\) where \(*\) means, any number of dimensions. The sum operation operates over all the elements. 
- Target: \((*)\), same shape as the input 
- Output: scalar. If :attr: - reductionis- 'none', then same shape as the input
 
 
MultiLabelMarginLoss¶
- 
class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 2D Tensor of target class indices). For each sample in the mini-batch: \[\text{loss}(x, y) = \sum_{ij}\frac{\max(0, 1 - (x[y[j]] - x[i]))}{\text{x.size}(0)} \]- where \(x \in \left\{0, \; \cdots , \; \text{x.size}(0) - 1\right\}\), \(y \in \left\{0, \; \cdots , \; \text{y.size}(0) - 1\right\}\), \(0 \leq y[j] \leq \text{x.size}(0)-1\), and \(i \neq y[j]\) for all \(i\) and \(j\). - \(y\) and \(x\) must have the same size. - The criterion only considers a contiguous block of non-negative targets that starts at the front. - This allows for different samples to have variable amounts of target classes. - Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((C)\) or \((N, C)\) where N is the batch size and C is the number of classes. 
- Target: \((C)\) or \((N, C)\), label targets padded by -1 ensuring same shape as the input. 
- Output: scalar. If :attr: - reductionis- 'none', then \((N)\).
 
 - Examples: - >>> loss = nn.MultiLabelMarginLoss() >>> x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]]) >>> # for target y, only consider labels 3 and 0, not after label -1 >>> y = torch.LongTensor([[3, 0, -1, 1]]) >>> loss(x, y) >>> # 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) tensor(0.8500) 
SmoothL1Loss¶
- 
class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. It is less sensitive to outliers than the MSELoss and in some cases prevents exploding gradients (e.g. see “Fast R-CNN” paper by Ross Girshick). Also known as the Huber loss: \[\text{loss}(x, y) = \frac{1}{n} \sum_{i} z_{i} \]- where \(z_{i}\) is given by: \[z_{i} = \begin{cases} 0.5 (x_i - y_i)^2, & \text{if } |x_i - y_i| < 1 \\ |x_i - y_i| - 0.5, & \text{otherwise } \end{cases} \]- \(x\) and \(y\) arbitrary shapes with a total of \(n\) elements each the sum operation still operates over all the elements, and divides by \(n\). - The division by \(n\) can be avoided if sets - reduction = 'sum'.- Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, *)\) where \(*\) means, any number of additional dimensions 
- Target: \((N, *)\), same shape as the input 
- Output: scalar. If - reductionis- 'none', then \((N, *)\), same shape as the input
 
 
SoftMarginLoss¶
- 
class torch.nn.SoftMarginLoss(size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that optimizes a two-class classification logistic loss between input tensor \(x\) and target tensor \(y\) (containing 1 or -1). \[\text{loss}(x, y) = \sum_i \frac{\log(1 + \exp(-y[i]*x[i]))}{\text{x.nelement}()} \]- Parameters
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((*)\) where \(*\) means, any number of additional dimensions 
- Target: \((*)\), same shape as the input 
- Output: scalar. If - reductionis- 'none', then same shape as the input
 
 
MultiLabelSoftMarginLoss¶
- 
class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input \(x\) and target \(y\) of size \((N, C)\). For each sample in the minibatch: \[loss(x, y) = - \frac{1}{C} * \sum_i y[i] * \log((1 + \exp(-x[i]))^{-1}) + (1-y[i]) * \log\left(\frac{\exp(-x[i])}{(1 + \exp(-x[i]))}\right) \]- where \(i \in \left\{0, \; \cdots , \; \text{x.nElement}() - 1\right\}\), \(y[i] \in \left\{0, \; 1\right\}\). - Parameters
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, C)\) where N is the batch size and C is the number of classes. 
- Target: \((N, C)\), label targets padded by -1 ensuring same shape as the input. 
- Output: scalar. If - reductionis- 'none', then \((N)\).
 
 
CosineEmbeddingLoss¶
- 
class torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the loss given input tensors \(x_1\), \(x_2\) and a Tensor label \(y\) with values 1 or -1. This is used for measuring whether two inputs are similar or dissimilar, using the cosine distance, and is typically used for learning nonlinear embeddings or semi-supervised learning. - The loss function for each sample is: \[\text{loss}(x, y) = \begin{cases} 1 - \cos(x_1, x_2), & \text{if } y = 1 \\ \max(0, \cos(x_1, x_2) - \text{margin}), & \text{if } y = -1 \end{cases} \]- Parameters
- margin (float, optional) – Should be a number from \(-1\) to \(1\), 
 - :param \(0\) to \(0.5\) is suggested. If - marginis missing, the: :param default value is \(0\).: :param size_average: Deprecated (see- reduction). By default,- the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field - size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True- Parameters
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 
MultiMarginLoss¶
- 
class torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input \(x\) (a 2D mini-batch Tensor) and output \(y\) (which is a 1D tensor of target class indices, \(0 \leq y \leq \text{x.size}(1)-1\)): - For each mini-batch sample, the loss in terms of the 1D input \(x\) and scalar output \(y\) is: \[\text{loss}(x, y) = \frac{\sum_i \max(0, \text{margin} - x[y] + x[i]))^p}{\text{x.size}(0)} \]- where \(x \in \left\{0, \; \cdots , \; \text{x.size}(0) - 1\right\}\) and \(i \neq y\). - Optionally, you can give non-equal weighting on the classes by passing a 1D - weighttensor into the constructor.- The loss function then becomes: \[\text{loss}(x, y) = \frac{\sum_i \max(0, w[y] * (\text{margin} - x[y] + x[i]))^p)}{\text{x.size}(0)} \]- Parameters
- p (int, optional) – Has a default value of \(1\). \(1\) and \(2\) are the only supported values. 
- margin (float, optional) – Has a default value of \(1\). 
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size C. Otherwise, it is treated as if having all ones. 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 
TripletMarginLoss¶
- 
class torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')¶
- Creates a criterion that measures the triplet loss given an input tensors \(x1\), \(x2\), \(x3\) and a margin with a value greater than \(0\). This is used for measuring a relative similarity between samples. A triplet is composed by a, p and n: anchor, positive examples and negative examples respectively. The shapes of all input tensors should be \((N, D)\). - The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. - The loss function for each sample in the mini-batch is: \[L(a, p, n) = \max \{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\} \]- where \[d(x_i, y_i) = \left\lVert {\bf x}_i - {\bf y}_i \right\rVert_p \]- Parameters
- margin (float, optional) – Default: \(1\). 
- p (int, optional) – The norm degree for pairwise distance. Default: \(2\). 
- swap (float, optional) – The distance swap is described in detail in the paper Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al. Default: - False.
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Shape:
- Input: \((N, D)\) where \(D\) is the vector dimension. 
- Output: scalar. If - reductionis- 'none', then \((N)\).
 
 - >>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2) >>> input1 = torch.randn(100, 128, requires_grad=True) >>> input2 = torch.randn(100, 128, requires_grad=True) >>> input3 = torch.randn(100, 128, requires_grad=True) >>> output = triplet_loss(input1, input2, input3) >>> output.backward() 
Vision layers¶
PixelShuffle¶
- 
class torch.nn.PixelShuffle(upscale_factor)¶
- Rearranges elements in a tensor of shape \((*, C \times r^2, H, W)\) to a tensor of shape \((*, C, H \times r, W \times r)\). - This is useful for implementing efficient sub-pixel convolution with a stride of \(1/r\). - Look at the paper: Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network by Shi et. al (2016) for more details. - Parameters
- upscale_factor (int) – factor to increase spatial resolution by 
 - Shape:
- Input: \((N, L, H_{in}, W_{in})\) where \(L=C \times \text{upscale\_factor}^2\) 
- Output: \((N, C, H_{out}, W_{out})\) where \(H_{out} = H_{in} \times \text{upscale\_factor}\) and \(W_{out} = W_{in} \times \text{upscale\_factor}\) 
 
 - Examples: - >>> pixel_shuffle = nn.PixelShuffle(3) >>> input = torch.randn(1, 9, 4, 4) >>> output = pixel_shuffle(input) >>> print(output.size()) torch.Size([1, 1, 12, 12]) 
Upsample¶
- 
class torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)¶
- Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. - The input data is assumed to be of the form minibatch x channels x [optional depth] x [optional height] x width. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor. - The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively. - One can either give a - scale_factoror the target output- sizeto calculate the output size. (You cannot give both, as it is ambiguous)- Parameters
- (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int], (size) – optional): output spatial sizes 
- (float or Tuple[float] or Tuple[float, float] or (scale_factor) – Tuple[float, float, float], optional): multiplier for spatial size. Has to match input size if it is a tuple. 
- mode (str, optional) – the upsampling algorithm: one of - 'nearest',- 'linear',- 'bilinear',- 'bicubic'and- 'trilinear'. Default:- 'nearest'
- align_corners (bool, optional) – if - True, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when- modeis- 'linear',- 'bilinear', or- 'trilinear'. Default:- False
 
 - Shape:
- Input: \((N, C, W_{in})\), \((N, C, H_{in}, W_{in})\) or \((N, C, D_{in}, H_{in}, W_{in})\) 
- Output: \((N, C, W_{out})\), \((N, C, H_{out}, W_{out})\) or \((N, C, D_{out}, H_{out}, W_{out})\), where 
 
 \[D_{out} = \left\lfloor D_{in} \times \text{scale\_factor} \right\rfloor \]\[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor \]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor \]- Warning - With - align_corners = True, the linearly interpolating modes (linear, bilinear, bicubic, and trilinear) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is- align_corners = False. See below for concrete examples on how this affects the outputs.- Note - If you want downsampling/general resizing, you should use - interpolate().- Examples: - >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='nearest') >>> m(input) tensor([[[[ 1., 1., 2., 2.], [ 1., 1., 2., 2.], [ 3., 3., 4., 4.], [ 3., 3., 4., 4.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> m(input) tensor([[[[ 1.0000, 1.2500, 1.7500, 2.0000], [ 1.5000, 1.7500, 2.2500, 2.5000], [ 2.5000, 2.7500, 3.2500, 3.5000], [ 3.0000, 3.2500, 3.7500, 4.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> m(input) tensor([[[[ 1.0000, 1.3333, 1.6667, 2.0000], [ 1.6667, 2.0000, 2.3333, 2.6667], [ 2.3333, 2.6667, 3.0000, 3.3333], [ 3.0000, 3.3333, 3.6667, 4.0000]]]]) >>> # Try scaling the same data in a larger tensor >>> >>> input_3x3 = torch.zeros(3, 3).view(1, 1, 3, 3) >>> input_3x3[:, :, :2, :2].copy_(input) tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> input_3x3 tensor([[[[ 1., 2., 0.], [ 3., 4., 0.], [ 0., 0., 0.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> # Notice that values in top left corner are the same with the small input (except at boundary) >>> m(input_3x3) tensor([[[[ 1.0000, 1.2500, 1.7500, 1.5000, 0.5000, 0.0000], [ 1.5000, 1.7500, 2.2500, 1.8750, 0.6250, 0.0000], [ 2.5000, 2.7500, 3.2500, 2.6250, 0.8750, 0.0000], [ 2.2500, 2.4375, 2.8125, 2.2500, 0.7500, 0.0000], [ 0.7500, 0.8125, 0.9375, 0.7500, 0.2500, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> # Notice that values in top left corner are now changed >>> m(input_3x3) tensor([[[[ 1.0000, 1.4000, 1.8000, 1.6000, 0.8000, 0.0000], [ 1.8000, 2.2000, 2.6000, 2.2400, 1.1200, 0.0000], [ 2.6000, 3.0000, 3.4000, 2.8800, 1.4400, 0.0000], [ 2.4000, 2.7200, 3.0400, 2.5600, 1.2800, 0.0000], [ 1.2000, 1.3600, 1.5200, 1.2800, 0.6400, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) 
UpsamplingNearest2d¶
- 
class torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)¶
- Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. - To specify the scale, it takes either the - sizeor the- scale_factoras it’s constructor argument.- When - sizeis given, it is the output size of the image (h, w).- Parameters
 - Warning - This class is deprecated in favor of - interpolate().- Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where 
 
 \[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor \]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor \]- Examples: - >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.UpsamplingNearest2d(scale_factor=2) >>> m(input) tensor([[[[ 1., 1., 2., 2.], [ 1., 1., 2., 2.], [ 3., 3., 4., 4.], [ 3., 3., 4., 4.]]]]) 
UpsamplingBilinear2d¶
- 
class torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)¶
- Applies a 2D bilinear upsampling to an input signal composed of several input channels. - To specify the scale, it takes either the - sizeor the- scale_factoras it’s constructor argument.- When - sizeis given, it is the output size of the image (h, w).- Parameters
 - Warning - This class is deprecated in favor of - interpolate(). It is equivalent to- nn.functional.interpolate(..., mode='bilinear', align_corners=True).- Shape:
- Input: \((N, C, H_{in}, W_{in})\) 
- Output: \((N, C, H_{out}, W_{out})\) where 
 
 \[H_{out} = \left\lfloor H_{in} \times \text{scale\_factor} \right\rfloor \]\[W_{out} = \left\lfloor W_{in} \times \text{scale\_factor} \right\rfloor \]- Examples: - >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.UpsamplingBilinear2d(scale_factor=2) >>> m(input) tensor([[[[ 1.0000, 1.3333, 1.6667, 2.0000], [ 1.6667, 2.0000, 2.3333, 2.6667], [ 2.3333, 2.6667, 3.0000, 3.3333], [ 3.0000, 3.3333, 3.6667, 4.0000]]]]) 
DataParallel layers (multi-GPU, distributed)¶
DataParallel¶
- 
class torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)¶
- Implements data parallelism at the module level. - This container parallelizes the application of the given - moduleby splitting the input across the specified devices by chunking in the batch dimension (other objects will be copied once per device). In the forward pass, the module is replicated on each device, and each replica handles a portion of the input. During the backwards pass, gradients from each replica are summed into the original module.- The batch size should be larger than the number of GPUs used. - See also: cuda-nn-dataparallel-instead - Arbitrary positional and keyword inputs are allowed to be passed into DataParallel but some types are specially handled. tensors will be scattered on dim specified (default 0). tuple, list and dict types will be shallow copied. The other types will be shared among different threads and can be corrupted if written to in the model’s forward pass. - The parallelized - modulemust have its parameters and buffers on- device_ids[0]before running this- DataParallelmodule.- Warning - In each forward, - moduleis replicated on each device, so any updates to the running module in- forwardwill be lost. For example, if- modulehas a counter attribute that is incremented in each- forward, it will always stay at the initial value becasue the update is done on the replicas which are destroyed after- forward. However,- DataParallelguarantees that the replica on- device[0]will have its parameters and buffers sharing storage with the base parallelized- module. So in-place updates to the parameters or buffers on- device[0]will be recorded. E.g.,- BatchNorm2dand- spectral_norm()rely on this behavior to update the buffers.- Warning - Forward and backward hooks defined on - moduleand its submodules will be invoked- len(device_ids)times, each with inputs located on a particular device. Particularly, the hooks are only guaranteed to be executed in correct order with respect to operations on corresponding devices. For example, it is not guaranteed that hooks set via- register_forward_pre_hook()be executed before all- len(device_ids)- forward()calls, but that each such hook be executed before the corresponding- forward()call of that device.- Warning - When - modulereturns a scalar (i.e., 0-dimensional tensor) in- forward(), this wrapper will return a vector of length equal to number of devices used in data parallelism, containing the result from each device.- Note - There is a subtlety in using the - pack sequence -> recurrent network -> unpack sequencepattern in a- Modulewrapped in- DataParallel. See pack-rnn-unpack-with-data-parallelism section in FAQ for details.- Parameters
- module (Module) – module to be parallelized 
- device_ids (list of python:int or torch.device) – CUDA devices (default: all devices) 
- output_device (int or torch.device) – device location of output (default: device_ids[0]) 
 
- Variables
- ~DataParallel.module (Module) – the module to be parallelized 
 - Example: - >>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2]) >>> output = net(input_var) # input_var can be on any device, including CPU 
DistributedDataParallel¶
- 
class torch.nn.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, check_reduction=False)¶
- Implements distributed data parallelism that is based on - torch.distributedpackage at the module level.- This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine and each device, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged. - The batch size should be larger than the number of GPUs used locally. - See also: Basics and cuda-nn-dataparallel-instead. The same constraints on input as in - torch.nn.DataParallelapply.- Creation of this class requires that - torch.distributedto be already initialized, by calling- torch.distributed.init_process_group().- DistributedDataParallelcan be used in the following two ways:- Single-Process Multi-GPU 
 - In this case, a single process will be spawned on each host/node and each process will operate on all the GPUs of the node where it’s running. To use - DistributedDataParallelin this way, you can simply construct the model as the following:- >>> torch.distributed.init_process_group(backend="nccl") >>> model = DistributedDataParallel(model) # device_ids will include all GPU devices by default - Multi-Process Single-GPU 
 - This is the highly recommended way to use - DistributedDataParallel, with multiple processes, each of which operates on a single GPU. This is currently the fastest approach to do data parallel training using PyTorch and applies to both single-node(multi-GPU) and multi-node data parallel training. It is proven to be significantly faster than- torch.nn.DataParallelfor single-node multi-GPU data parallel training.- Here is how to use it: on each host with N GPUs, you should spawn up N processes, while ensuring that each process individually works on a single GPU from 0 to N-1. Therefore, it is your job to ensure that your training script operates on a single given GPU by calling: - >>> torch.cuda.set_device(i) - where i is from 0 to N-1. In each process, you should refer the following to construct this module: - >>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...') >>> model = DistributedDataParallel(model, device_ids=[i], output_device=i) - In order to spawn up multiple processes per node, you can use either - torch.distributed.launchor- torch.multiprocessing.spawn- Note - ncclbackend is currently the fastest and highly recommended backend to be used with Multi-Process Single-GPU distributed training and this applies to both single-node and multi-node distributed training- Note - This module also supports mixed-precision distributed training. This means that your model can have different types of parameters such as mixed types of fp16 and fp32, the gradient reduction on these mixed types of parameters will just work fine. Also note that - ncclbackend is currently the fastest and highly recommended backend for fp16/fp32 mixed-precision training.- Warning - This module works only with the - glooand- ncclbackends.- Warning - Constructor, forward method, and differentiation of the output (or a function of the output of this module) is a distributed synchronization point. Take that into account in case different processes might be executing different code. - Warning - This module assumes all parameters are registered in the model by the time it is created. No parameters should be added nor removed later. Same applies to buffers. - Warning - This module assumes all parameters are registered in the model of each distributed processes are in the same order. The module itself will conduct gradient all-reduction following the reverse order of the registered parameters of the model. In other words, it is users’ responsibility to ensure that each distributed process has the exact same model and thus the exact same parameter registration order. - Warning - This module assumes all buffers and gradients are dense. - Warning - This module doesn’t work with - torch.autograd.grad()(i.e. it will only work if gradients are to be accumulated in- .gradattributes of parameters).- Warning - If you plan on using this module with a - ncclbackend or a- gloobackend (that uses Infiniband), together with a DataLoader that uses multiple workers, please change the multiprocessing start method to- forkserver(Python 3 only) or- spawn. Unfortunately Gloo (that uses Infiniband) and NCCL2 are not fork safe, and you will likely experience deadlocks if you don’t change this setting.- Warning - Forward and backward hooks defined on - moduleand its submodules won’t be invoked anymore, unless the hooks are initialized in the- forward()method.- Warning - You should never try to change your model’s parameters after wrapping up your model with DistributedDataParallel. In other words, when wrapping up your model with DistributedDataParallel, the constructor of DistributedDataParallel will register the additional gradient reduction functions on all the parameters of the model itself at the time of construction. If you change the model’s parameters after the DistributedDataParallel construction, this is not supported and unexpected behaviors can happen, since some parameters’ gradient reduction functions might not get called. - Note - Parameters are never broadcast between processes. The module performs an all-reduce step on gradients and assumes that they will be modified by the optimizer in all processes in the same way. Buffers (e.g. BatchNorm stats) are broadcast from the module in process of rank 0, to all other replicas in the system in every iteration. - Parameters
- module (Module) – module to be parallelized 
- device_ids (list of python:int or torch.device) – CUDA devices (default: all devices) 
- output_device (int or torch.device) – device location of output (default: device_ids[0]) 
- broadcast_buffers (bool) – flag that enables syncing (broadcasting) buffers of the module at beginning of the forward function. (default: - True)
- process_group – the process group to be used for distributed data all-reduction. If - None, the default process group, which is created by- `torch.distributed.init_process_group`, will be used. (default:- None)
- bucket_cap_mb – DistributedDataParallel will bucket parameters into multiple buckets so that gradient reduction of each bucket can potentially overlap with backward computation. - bucket_cap_mbcontrols the bucket size in MegaBytes (MB) (default: 25)
- check_reduction – when setting to - True, it enables DistributedDataParallel to automatically check if the previous iteration’s backward reductions were successfully issued at the beginning of every iteration’s forward function. You normally don’t need this option enabled unless you are observing weird behaviors such as different ranks are getting different gradients, which should not happen if DistributedDataParallel is correctly used. (default:- False)
 
- Variables
- ~DistributedDataParallel.module (Module) – the module to be parallelized 
 - Example: - >>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...') >>> net = torch.nn.DistributedDataParallel(model, pg) 
DistributedDataParallelCPU¶
- 
class torch.nn.parallel.DistributedDataParallelCPU(module)¶
- Implements distributed data parallelism for CPU at the module level. - This module supports the - mpiand- gloobackends.- This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged. - This module could be used in conjunction with the DistributedSampler, (see - DistributedSampler) which will load a subset of the original dataset for each node with the same batch size. So strong scaling should be configured like this:- n = 1, batch size = 12 - n = 2, batch size = 64 - n = 4, batch size = 32 - n = 8, batch size = 16 - Creation of this class requires the distributed package to be already initialized in the process group mode (see - torch.distributed.init_process_group()).- Warning - Constructor, forward method, and differentiation of the output (or a function of the output of this module) is a distributed synchronization point. Take that into account in case different node might be executing different code. - Warning - This module assumes all parameters are registered in the model by the time it is created. No parameters should be added nor removed later. - Warning - This module assumes all gradients are dense. - Warning - This module doesn’t work with - torch.autograd.grad()(i.e. it will only work if gradients are to be accumulated in- .gradattributes of parameters).- Warning - Forward and backward hooks defined on - moduleand its submodules won’t be invoked anymore, unless the hooks are initialized in the- forward()method.- Note - Parameters are broadcast between nodes in the __init__() function. The module performs an all-reduce step on gradients and assumes that they will be modified by the optimizer in all nodes in the same way. - Parameters
- module – module to be parallelized 
 - Example: - >>> torch.distributed.init_process_group(world_size=4, init_method='...') >>> net = torch.nn.DistributedDataParallelCPU(model) 
Utilities¶
clip_grad_norm_¶
- 
torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2)¶
- Clips gradient norm of an iterable of parameters. - The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. - Parameters
- Returns
- Total norm of the parameters (viewed as a single vector). 
 
clip_grad_value_¶
- 
torch.nn.utils.clip_grad_value_(parameters, clip_value)¶
- Clips gradient of an iterable of parameters at specified value. - Gradients are modified in-place. 
parameters_to_vector¶
vector_to_parameters¶
weight_norm¶
- 
torch.nn.utils.weight_norm(module, name='weight', dim=0)¶
- Applies weight normalization to a parameter in the given module. \[\mathbf{w} = g \dfrac{\mathbf{v}}{\|\mathbf{v}\|} \]- Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This replaces the parameter specified by - name(e.g.- 'weight') with two parameters: one specifying the magnitude (e.g.- 'weight_g') and one specifying the direction (e.g.- 'weight_v'). Weight normalization is implemented via a hook that recomputes the weight tensor from the magnitude and direction before every- forward()call.- By default, with - dim=0, the norm is computed independently per output channel/plane. To compute a norm over the entire weight tensor, use- dim=None.- See https://arxiv.org/abs/1602.07868 - Parameters
- Returns
- The original module with the weight norm hook 
 - Example: - >>> m = weight_norm(nn.Linear(20, 40), name='weight') >>> m Linear(in_features=20, out_features=40, bias=True) >>> m.weight_g.size() torch.Size([40, 1]) >>> m.weight_v.size() torch.Size([40, 20]) 
remove_weight_norm¶
spectral_norm¶
- 
torch.nn.utils.spectral_norm(module, name='weight', n_power_iterations=1, eps=1e-12, dim=None)¶
- Applies spectral normalization to a parameter in the given module. \[\mathbf{W}_{SN} = \dfrac{\mathbf{W}}{\sigma(\mathbf{W})}, \sigma(\mathbf{W}) = \max_{\mathbf{h}: \mathbf{h} \ne 0} \dfrac{\|\mathbf{W} \mathbf{h}\|_2}{\|\mathbf{h}\|_2} \]- Spectral normalization stabilizes the training of discriminators (critics) in Generative Adversarial Networks (GANs) by rescaling the weight tensor with spectral norm \(\sigma\) of the weight matrix calculated using power iteration method. If the dimension of the weight tensor is greater than 2, it is reshaped to 2D in power iteration method to get spectral norm. This is implemented via a hook that calculates spectral norm and rescales weight before every - forward()call.- See Spectral Normalization for Generative Adversarial Networks . - Parameters
- module (nn.Module) – containing module 
- name (str, optional) – name of weight parameter 
- n_power_iterations (int, optional) – number of power iterations to calculate spectral norm 
- eps (float, optional) – epsilon for numerical stability in calculating norms 
- dim (int, optional) – dimension corresponding to number of outputs, the default is - 0, except for modules that are instances of ConvTranspose{1,2,3}d, when it is- 1
 
- Returns
- The original module with the spectral norm hook 
 - Example: - >>> m = spectral_norm(nn.Linear(20, 40)) >>> m Linear(in_features=20, out_features=40, bias=True) >>> m.weight_u.size() torch.Size([40]) 
remove_spectral_norm¶
- 
torch.nn.utils.remove_spectral_norm(module, name='weight')¶
- Removes the spectral normalization reparameterization from a module. - Example - >>> m = spectral_norm(nn.Linear(40, 10)) >>> remove_spectral_norm(m) 
PackedSequence¶
- 
torch.nn.utils.rnn.PackedSequence(data, batch_sizes=None, sorted_indices=None, unsorted_indices=None)¶
- Holds the data and list of - batch_sizesof a packed sequence.- All RNN modules accept packed sequences as inputs. - Note - Instances of this class should never be created manually. They are meant to be instantiated by functions like - pack_padded_sequence().- Batch sizes represent the number elements at each sequence step in the batch, not the varying sequence lengths passed to - pack_padded_sequence(). For instance, given data- abcand- xthe- PackedSequencewould contain data- axbcwith- batch_sizes=[2,1,1].
pack_padded_sequence¶
- 
torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True)¶
- Packs a Tensor containing padded sequences of variable length. - Input can be of size - T x B x *where T is the length of the longest sequence (equal to- lengths[0]), B is the batch size, and * is any number of dimensions (including 0). If- batch_firstis True- B x T x *inputs are expected.- For unsorted sequences, use enforce_sorted = False. If - enforce_sortedis- True, the sequences should be sorted by length in a decreasing order, i.e.- input[:,0]should be the longest sequence, and- input[:,B-1]the shortest one. enforce_sorted = True is only necessary for ONNX export.- Note - This function accepts any input that has at least two dimensions. You can apply it to pack the labels, and use the output of the RNN with them to compute the loss directly. A Tensor can be retrieved from a - PackedSequenceobject by accessing its- .dataattribute.- Parameters
- input (Tensor) – padded batch of variable length sequences. 
- lengths (Tensor) – list of sequences lengths of each batch element. 
- batch_first (bool, optional) – if - True, the input is expected in- B x T x *format.
- enforce_sorted (bool, optional) – if - True, the input is expected to contain sequences sorted by length in a decreasing order. If- False, this condition is not checked. Default:- True.
 
- Returns
- a - PackedSequenceobject
 
pad_packed_sequence¶
- 
torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)¶
- Pads a packed batch of variable length sequences. - It is an inverse operation to - pack_padded_sequence().- The returned Tensor’s data will be of size - T x B x *, where T is the length of the longest sequence and B is the batch size. If- batch_firstis True, the data will be transposed into- B x T x *format.- Batch elements will be ordered decreasingly by their length. - Note - total_lengthis useful to implement the- pack sequence -> recurrent network -> unpack sequencepattern in a- Modulewrapped in- DataParallel. See this FAQ section for details.- Parameters
- sequence (PackedSequence) – batch to pad 
- batch_first (bool, optional) – if - True, the output will be in- B x T x *format.
- padding_value (float, optional) – values for padded elements. 
- total_length (int, optional) – if not - None, the output will be padded to have length- total_length. This method will throw- ValueErrorif- total_lengthis less than the max sequence length in- sequence.
 
- Returns
- Tuple of Tensor containing the padded sequence, and a Tensor containing the list of lengths of each sequence in the batch. 
 
pad_sequence¶
- 
torch.nn.utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0)¶
- Pad a list of variable length Tensors with - padding_value- pad_sequencestacks a list of Tensors along a new dimension, and pads them to equal length. For example, if the input is list of sequences with size- L x *and if batch_first is False, and- T x B x *otherwise.- B is batch size. It is equal to the number of elements in - sequences. T is length of the longest sequence. L is length of the sequence. * is any number of trailing dimensions, including none.- Example - >>> from torch.nn.utils.rnn import pad_sequence >>> a = torch.ones(25, 300) >>> b = torch.ones(22, 300) >>> c = torch.ones(15, 300) >>> pad_sequence([a, b, c]).size() torch.Size([25, 3, 300]) - Note - This function returns a Tensor of size - T x B x *or- B x T x *where T is the length of the longest sequence. This function assumes trailing dimensions and type of all the Tensors in sequences are same.- Parameters
- Returns
- Tensor of size - T x B x *if- batch_firstis- False. Tensor of size- B x T x *otherwise
 
pack_sequence¶
- 
torch.nn.utils.rnn.pack_sequence(sequences, enforce_sorted=True)¶
- Packs a list of variable length Tensors - sequencesshould be a list of Tensors of size- L x *, where L is the length of a sequence and * is any number of trailing dimensions, including zero.- For unsorted sequences, use enforce_sorted = False. If - enforce_sortedis- True, the sequences should be sorted in the order of decreasing length.- enforce_sorted = Trueis only necessary for ONNX export.- Example - >>> from torch.nn.utils.rnn import pack_sequence >>> a = torch.tensor([1,2,3]) >>> b = torch.tensor([4,5]) >>> c = torch.tensor([6]) >>> pack_sequence([a, b, c]) PackedSequence(data=tensor([ 1, 4, 6, 2, 5, 3]), batch_sizes=tensor([ 3, 2, 1])) - Parameters
- Returns
- a - PackedSequenceobject
 
torch.nn.functional¶
Convolution functions¶
conv1d¶
- 
torch.nn.functional.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros') → Tensor¶
- Applies a 1D convolution over an input signal composed of several input planes. - See - Conv1dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iW)\) 
- weight – filters of shape \((\text{out\_channels} , \frac{\text{in\_channels}}{\text{groups}} , kW)\) 
- bias – optional bias of shape \((\text{out\_channels})\). Default: - None
- stride – the stride of the convolving kernel. Can be a single number or a one-element tuple (sW,). Default: 1 
- padding – implicit paddings on both sides of the input. Can be a single number or a one-element tuple (padW,). Default: 0 
- dilation – the spacing between kernel elements. Can be a single number or a one-element tuple (dW,). Default: 1 
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- padding_mode – the type of paddings applied to both sided can be: zeros or circular. Default: zeros 
 
 - Examples: - >>> filters = torch.randn(33, 16, 3) >>> inputs = torch.randn(20, 16, 50) >>> F.conv1d(inputs, filters) 
conv2d¶
- 
torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros') → Tensor¶
- Applies a 2D convolution over an input image composed of several input planes. - See - Conv2dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iH , iW)\) 
- weight – filters of shape \((\text{out\_channels} , \frac{\text{in\_channels}}{\text{groups}} , kH , kW)\) 
- bias – optional bias tensor of shape \((\text{out\_channels})\). Default: - None
- stride – the stride of the convolving kernel. Can be a single number or a tuple (sH, sW). Default: 1 
- padding – implicit paddings on both sides of the input. Can be a single number or a tuple (padH, padW). Default: 0 
- dilation – the spacing between kernel elements. Can be a single number or a tuple (dH, dW). Default: 1 
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- padding_mode – the type of paddings applied to both sided can be: zeros or circular. Default: zeros 
 
 - Examples: - >>> # With square kernels and equal stride >>> filters = torch.randn(8,4,3,3) >>> inputs = torch.randn(1,4,5,5) >>> F.conv2d(inputs, filters, padding=1) 
conv3d¶
- 
torch.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros') → Tensor¶
- Applies a 3D convolution over an input image composed of several input planes. - See - Conv3dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iT , iH , iW)\) 
- weight – filters of shape \((\text{out\_channels} , \frac{\text{in\_channels}}{\text{groups}} , kT , kH , kW)\) 
- bias – optional bias tensor of shape \((\text{out\_channels})\). Default: None 
- stride – the stride of the convolving kernel. Can be a single number or a tuple (sT, sH, sW). Default: 1 
- padding – implicit paddings on both sides of the input. Can be a single number or a tuple (padT, padH, padW). Default: 0 
- dilation – the spacing between kernel elements. Can be a single number or a tuple (dT, dH, dW). Default: 1 
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- padding_mode – the type of paddings applied to both sided can be: zeros or circular. Default: zeros 
 
 - Examples: - >>> filters = torch.randn(33, 16, 3, 3, 3) >>> inputs = torch.randn(20, 16, 50, 10, 20) >>> F.conv3d(inputs, filters) 
conv_transpose1d¶
- 
torch.nn.functional.conv_transpose1d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor¶
- Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called “deconvolution”. - See - ConvTranspose1dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iW)\) 
- weight – filters of shape \((\text{in\_channels} , \frac{\text{out\_channels}}{\text{groups}} , kW)\) 
- bias – optional bias of shape \((\text{out\_channels})\). Default: None 
- stride – the stride of the convolving kernel. Can be a single number or a tuple - (sW,). Default: 1
- padding – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple- (padW,). Default: 0
- output_padding – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple - (out_padW). Default: 0
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- dilation – the spacing between kernel elements. Can be a single number or a tuple - (dW,). Default: 1
 
 - Examples: - >>> inputs = torch.randn(20, 16, 50) >>> weights = torch.randn(16, 33, 5) >>> F.conv_transpose1d(inputs, weights) 
conv_transpose2d¶
- 
torch.nn.functional.conv_transpose2d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor¶
- Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”. - See - ConvTranspose2dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iH , iW)\) 
- weight – filters of shape \((\text{in\_channels} , \frac{\text{out\_channels}}{\text{groups}} , kH , kW)\) 
- bias – optional bias of shape \((\text{out\_channels})\). Default: None 
- stride – the stride of the convolving kernel. Can be a single number or a tuple - (sH, sW). Default: 1
- padding – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple- (padH, padW). Default: 0
- output_padding – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple - (out_padH, out_padW). Default: 0
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- dilation – the spacing between kernel elements. Can be a single number or a tuple - (dH, dW). Default: 1
 
 - Examples: - >>> # With square kernels and equal stride >>> inputs = torch.randn(1, 4, 5, 5) >>> weights = torch.randn(4, 8, 3, 3) >>> F.conv_transpose2d(inputs, weights, padding=1) 
conv_transpose3d¶
- 
torch.nn.functional.conv_transpose3d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor¶
- Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution” - See - ConvTranspose3dfor details and output shape.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iT , iH , iW)\) 
- weight – filters of shape \((\text{in\_channels} , \frac{\text{out\_channels}}{\text{groups}} , kT , kH , kW)\) 
- bias – optional bias of shape \((\text{out\_channels})\). Default: None 
- stride – the stride of the convolving kernel. Can be a single number or a tuple - (sT, sH, sW). Default: 1
- padding – - dilation * (kernel_size - 1) - paddingzero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple- (padT, padH, padW). Default: 0
- output_padding – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple - (out_padT, out_padH, out_padW). Default: 0
- groups – split input into groups, \(\text{in\_channels}\) should be divisible by the number of groups. Default: 1 
- dilation – the spacing between kernel elements. Can be a single number or a tuple (dT, dH, dW). Default: 1 
 
 - Examples: - >>> inputs = torch.randn(20, 16, 50, 10, 20) >>> weights = torch.randn(16, 33, 3, 3, 3) >>> F.conv_transpose3d(inputs, weights) 
unfold¶
- 
torch.nn.functional.unfold(input, kernel_size, dilation=1, padding=0, stride=1)¶
- Extracts sliding local blocks from an batched input tensor. - Warning - Currently, only 4-D input tensors (batched image-like tensors) are supported. - Warning - More than one element of the unfolded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensor, please clone it first. - See - torch.nn.Unfoldfor details
fold¶
- 
torch.nn.functional.fold(input, output_size, kernel_size, dilation=1, padding=0, stride=1)¶
- Combines an array of sliding local blocks into a large containing tensor. - Warning - Currently, only 4-D output tensors (batched image-like tensors) are supported. - See - torch.nn.Foldfor details
Pooling functions¶
avg_pool1d¶
- 
torch.nn.functional.avg_pool1d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True) → Tensor¶
- Applies a 1D average pooling over an input signal composed of several input planes. - See - AvgPool1dfor details and output shape.- Parameters
- input – input tensor of shape \((\text{minibatch} , \text{in\_channels} , iW)\) 
- kernel_size – the size of the window. Can be a single number or a tuple (kW,) 
- stride – the stride of the window. Can be a single number or a tuple (sW,). Default: - kernel_size
- padding – implicit zero paddings on both sides of the input. Can be a single number or a tuple (padW,). Default: 0 
- ceil_mode – when True, will use ceil instead of floor to compute the output shape. Default: - False
- count_include_pad – when True, will include the zero-padding in the averaging calculation. Default: - True
 
 - Examples: - >>> # pool of square window of size=3, stride=2 >>> input = torch.tensor([[[1, 2, 3, 4, 5, 6, 7]]], dtype=torch.float32) >>> F.avg_pool1d(input, kernel_size=3, stride=2) tensor([[[ 2., 4., 6.]]]) 
avg_pool2d¶
- 
torch.nn.functional.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True) → Tensor¶
- Applies 2D average-pooling operation in \(kH \times kW\) regions by step size \(sH \times sW\) steps. The number of output features is equal to the number of input planes. - See - AvgPool2dfor details and output shape.- Parameters
- input – input tensor \((\text{minibatch} , \text{in\_channels} , iH , iW)\) 
- kernel_size – size of the pooling region. Can be a single number or a tuple (kH, kW) 
- stride – stride of the pooling operation. Can be a single number or a tuple (sH, sW). Default: - kernel_size
- padding – implicit zero paddings on both sides of the input. Can be a single number or a tuple (padH, padW). Default: 0 
- ceil_mode – when True, will use ceil instead of floor in the formula to compute the output shape. Default: - False
- count_include_pad – when True, will include the zero-padding in the averaging calculation. Default: - True
 
 
avg_pool3d¶
- 
torch.nn.functional.avg_pool3d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True) → Tensor¶
- Applies 3D average-pooling operation in \(kT \times kH \times kW\) regions by step size \(sT \times sH \times sW\) steps. The number of output features is equal to \(\lfloor\frac{\text{input planes}}{sT}\rfloor\). - See - AvgPool3dfor details and output shape.- Parameters
- input – input tensor \((\text{minibatch} , \text{in\_channels} , iT \times iH , iW)\) 
- kernel_size – size of the pooling region. Can be a single number or a tuple (kT, kH, kW) 
- stride – stride of the pooling operation. Can be a single number or a tuple (sT, sH, sW). Default: - kernel_size
- padding – implicit zero paddings on both sides of the input. Can be a single number or a tuple (padT, padH, padW), Default: 0 
- ceil_mode – when True, will use ceil instead of floor in the formula to compute the output shape 
- count_include_pad – when True, will include the zero-padding in the averaging calculation 
 
 
max_pool1d¶
max_pool2d¶
max_pool3d¶
max_unpool1d¶
- 
torch.nn.functional.max_unpool1d(input, indices, kernel_size, stride=None, padding=0, output_size=None)¶
- Computes a partial inverse of - MaxPool1d.- See - MaxUnpool1dfor details.
max_unpool2d¶
- 
torch.nn.functional.max_unpool2d(input, indices, kernel_size, stride=None, padding=0, output_size=None)¶
- Computes a partial inverse of - MaxPool2d.- See - MaxUnpool2dfor details.
max_unpool3d¶
- 
torch.nn.functional.max_unpool3d(input, indices, kernel_size, stride=None, padding=0, output_size=None)¶
- Computes a partial inverse of - MaxPool3d.- See - MaxUnpool3dfor details.
lp_pool1d¶
lp_pool2d¶
adaptive_max_pool1d¶
- 
torch.nn.functional.adaptive_max_pool1d(*args, **kwargs)¶
- Applies a 1D adaptive max pooling over an input signal composed of several input planes. - See - AdaptiveMaxPool1dfor details and output shape.- Parameters
- output_size – the target output size (single integer) 
- return_indices – whether to return pooling indices. Default: - False
 
 
adaptive_max_pool2d¶
- 
torch.nn.functional.adaptive_max_pool2d(*args, **kwargs)¶
- Applies a 2D adaptive max pooling over an input signal composed of several input planes. - See - AdaptiveMaxPool2dfor details and output shape.- Parameters
- output_size – the target output size (single integer or double-integer tuple) 
- return_indices – whether to return pooling indices. Default: - False
 
 
adaptive_max_pool3d¶
- 
torch.nn.functional.adaptive_max_pool3d(*args, **kwargs)¶
- Applies a 3D adaptive max pooling over an input signal composed of several input planes. - See - AdaptiveMaxPool3dfor details and output shape.- Parameters
- output_size – the target output size (single integer or triple-integer tuple) 
- return_indices – whether to return pooling indices. Default: - False
 
 
adaptive_avg_pool1d¶
- 
torch.nn.functional.adaptive_avg_pool1d(input, output_size) → Tensor¶
- Applies a 1D adaptive average pooling over an input signal composed of several input planes. - See - AdaptiveAvgPool1dfor details and output shape.- Parameters
- output_size – the target output size (single integer) 
 
adaptive_avg_pool2d¶
- 
torch.nn.functional.adaptive_avg_pool2d(input, output_size)¶
- Applies a 2D adaptive average pooling over an input signal composed of several input planes. - See - AdaptiveAvgPool2dfor details and output shape.- Parameters
- output_size – the target output size (single integer or double-integer tuple) 
 
adaptive_avg_pool3d¶
- 
torch.nn.functional.adaptive_avg_pool3d(input, output_size)¶
- Applies a 3D adaptive average pooling over an input signal composed of several input planes. - See - AdaptiveAvgPool3dfor details and output shape.- Parameters
- output_size – the target output size (single integer or triple-integer tuple) 
 
Non-linear activation functions¶
threshold¶
- 
torch.nn.functional.threshold(input, threshold, value, inplace=False)¶
- Thresholds each element of the input Tensor. - See - Thresholdfor more details.
- 
torch.nn.functional.threshold_(input, threshold, value) → Tensor¶
- In-place version of - threshold().
relu¶
hardtanh¶
- 
torch.nn.functional.hardtanh(input, min_val=-1., max_val=1., inplace=False) → Tensor¶
- Applies the HardTanh function element-wise. See - Hardtanhfor more details.
- 
torch.nn.functional.hardtanh_(input, min_val=-1., max_val=1.) → Tensor¶
- In-place version of - hardtanh().
relu6¶
elu¶
selu¶
celu¶
leaky_relu¶
- 
torch.nn.functional.leaky_relu(input, negative_slope=0.01, inplace=False) → Tensor¶
- Applies element-wise, \(\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)\) - See - LeakyReLUfor more details.
- 
torch.nn.functional.leaky_relu_(input, negative_slope=0.01) → Tensor¶
- In-place version of - leaky_relu().
prelu¶
rrelu¶
glu¶
- 
torch.nn.functional.glu(input, dim=-1) → Tensor¶
- The gated linear unit. Computes: \[\text{GLU}(a, b) = a \otimes \sigma(b) \]- where input is split in half along dim to form a and b, \(\sigma\) is the sigmoid function and \(\otimes\) is the element-wise product between matrices. 
logsigmoid¶
- 
torch.nn.functional.logsigmoid(input) → Tensor¶
- Applies element-wise \(\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)\) - See - LogSigmoidfor more details.
hardshrink¶
- 
torch.nn.functional.hardshrink(input, lambd=0.5) → Tensor¶
- Applies the hard shrinkage function element-wise - See - Hardshrinkfor more details.
tanhshrink¶
- 
torch.nn.functional.tanhshrink(input) → Tensor¶
- Applies element-wise, \(\text{Tanhshrink}(x) = x - \text{Tanh}(x)\) - See - Tanhshrinkfor more details.
softsign¶
softmin¶
- 
torch.nn.functional.softmin(input, dim=None, _stacklevel=3, dtype=None)¶
- Applies a softmin function. - Note that \(\text{Softmin}(x) = \text{Softmax}(-x)\). See softmax definition for mathematical formula. - See - Softminfor more details.- Parameters
- input (Tensor) – input 
- dim (int) – A dimension along which softmin will be computed (so every slice along dim will sum to 1). 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 
softmax¶
- 
torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)¶
- Applies a softmax function. - Softmax is defined as: - \(\text{Softmax}(x_{i}) = \frac{exp(x_i)}{\sum_j exp(x_j)}\) - It is applied to all slices along dim, and will re-scale them so that the elements lie in the range [0, 1] and sum to 1. - See - Softmaxfor more details.- Parameters
- input (Tensor) – input 
- dim (int) – A dimension along which softmax will be computed. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 - Note - This function doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use log_softmax instead (it’s faster and has better numerical properties). 
softshrink¶
- 
torch.nn.functional.softshrink(input, lambd=0.5) → Tensor¶
- Applies the soft shrinkage function elementwise - See - Softshrinkfor more details.
gumbel_softmax¶
- 
torch.nn.functional.gumbel_softmax(logits, tau=1, hard=False, eps=1e-10, dim=-1)¶
- Samples from the Gumbel-Softmax distribution and optionally discretizes. - Parameters
- logits – […, num_features] unnormalized log probabilities 
- tau – non-negative scalar temperature 
- hard – if - True, the returned samples will be discretized as one-hot vectors, but will be differentiated as if it is the soft sample in autograd
- dim (int) – A dimension along which softmax will be computed. Default: -1. 
 
- Returns
- Sampled tensor of same shape as logits from the Gumbel-Softmax distribution. If - hard=True, the returned samples will be one-hot, otherwise they will be probability distributions that sum to 1 across dim.
 - Note - This function is here for legacy reasons, may be removed from nn.Functional in the future. - Note - The main trick for hard is to do y_hard - y_soft.detach() + y_soft - It achieves two things: - makes the output value exactly one-hot (since we add then subtract y_soft value) - makes the gradient equal to y_soft gradient (since we strip all other gradients) - Examples::
- >>> logits = torch.randn(20, 32) >>> # Sample soft categorical using reparametrization trick: >>> F.gumbel_softmax(logits, tau=1, hard=False) >>> # Sample hard categorical using "Straight-through" trick: >>> F.gumbel_softmax(logits, tau=1, hard=True) 
 
log_softmax¶
- 
torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)¶
- Applies a softmax followed by a logarithm. - While mathematically equivalent to log(softmax(x)), doing these two operations separately is slower, and numerically unstable. This function uses an alternative formulation to compute the output and gradient correctly. - See - LogSoftmaxfor more details.- Parameters
- input (Tensor) – input 
- dim (int) – A dimension along which log_softmax will be computed. 
- dtype ( - torch.dtype, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to- dtypebefore the operation is performed. This is useful for preventing data type overflows. Default: None.
 
 
tanh¶
Normalization functions¶
batch_norm¶
- 
torch.nn.functional.batch_norm(input, running_mean, running_var, weight=None, bias=None, training=False, momentum=0.1, eps=1e-05)¶
- Applies Batch Normalization for each channel across a batch of data. - See - BatchNorm1d,- BatchNorm2d,- BatchNorm3dfor details.
instance_norm¶
- 
torch.nn.functional.instance_norm(input, running_mean=None, running_var=None, weight=None, bias=None, use_input_stats=True, momentum=0.1, eps=1e-05)¶
- Applies Instance Normalization for each channel in each data sample in a batch. - See - InstanceNorm1d,- InstanceNorm2d,- InstanceNorm3dfor details.
layer_norm¶
local_response_norm¶
- 
torch.nn.functional.local_response_norm(input, size, alpha=0.0001, beta=0.75, k=1.0)¶
- Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. Applies normalization across channels. - See - LocalResponseNormfor details.
normalize¶
- 
torch.nn.functional.normalize(input, p=2, dim=1, eps=1e-12, out=None)¶
- Performs \(L_p\) normalization of inputs over specified dimension. - For a tensor - inputof sizes \((n_0, ..., n_{dim}, ..., n_k)\), each \(n_{dim}\) -element vector \(v\) along dimension- dimis transformed as\[v = \frac{v}{\max(\lVert v \rVert_p, \epsilon)}. \]- With the default arguments it uses the Euclidean norm over vectors along dimension \(1\) for normalization. - Parameters
- input – input tensor of any shape 
- p (float) – the exponent value in the norm formulation. Default: 2 
- dim (int) – the dimension to reduce. Default: 1 
- eps (float) – small value to avoid division by zero. Default: 1e-12 
- out (Tensor, optional) – the output tensor. If - outis used, this operation won’t be differentiable.
 
 
Linear functions¶
linear¶
- 
torch.nn.functional.linear(input, weight, bias=None)¶
- Applies a linear transformation to the incoming data: \(y = xA^T + b\). - Shape: - Input: \((N, *, in\_features)\) where * means any number of additional dimensions 
- Weight: \((out\_features, in\_features)\) 
- Bias: \((out\_features)\) 
- Output: \((N, *, out\_features)\) 
 
Dropout functions¶
dropout¶
- 
torch.nn.functional.dropout(input, p=0.5, training=True, inplace=False)¶
- During training, randomly zeroes some of the elements of the input tensor with probability - pusing samples from a Bernoulli distribution.- See - Dropoutfor details.- Parameters
- p – probability of an element to be zeroed. Default: 0.5 
- training – apply dropout if is - True. Default:- True
- inplace – If set to - True, will do this operation in-place. Default:- False
 
 
alpha_dropout¶
- 
torch.nn.functional.alpha_dropout(input, p=0.5, training=False, inplace=False)¶
- Applies alpha dropout to the input. - See - AlphaDropoutfor details.
dropout2d¶
- 
torch.nn.functional.dropout2d(input, p=0.5, training=True, inplace=False)¶
- Randomly zero out entire channels (a channel is a 2D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 2D tensor \(\text{input}[i, j]\)) of the input tensor). Each channel will be zeroed out independently on every forward call with probability - pusing samples from a Bernoulli distribution.- See - Dropout2dfor details.- Parameters
- p – probability of a channel to be zeroed. Default: 0.5 
- training – apply dropout if is - True. Default:- True
- inplace – If set to - True, will do this operation in-place. Default:- False
 
 
dropout3d¶
- 
torch.nn.functional.dropout3d(input, p=0.5, training=True, inplace=False)¶
- Randomly zero out entire channels (a channel is a 3D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 3D tensor \(\text{input}[i, j]\)) of the input tensor). Each channel will be zeroed out independently on every forward call with probability - pusing samples from a Bernoulli distribution.- See - Dropout3dfor details.- Parameters
- p – probability of a channel to be zeroed. Default: 0.5 
- training – apply dropout if is - True. Default:- True
- inplace – If set to - True, will do this operation in-place. Default:- False
 
 
Sparse functions¶
embedding¶
- 
torch.nn.functional.embedding(input, weight, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)¶
- A simple lookup table that looks up embeddings in a fixed dictionary and size. - This module is often used to retrieve word embeddings using indices. The input to the module is a list of indices, and the embedding matrix, and the output is the corresponding word embeddings. - See - torch.nn.Embeddingfor more details.- Parameters
- input (LongTensor) – Tensor containing indices into the embedding matrix 
- weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size 
- padding_idx (int, optional) – If given, pads the output with the embedding vector at - padding_idx(initialized to zeros) whenever it encounters the index.
- max_norm (float, optional) – If given, each embedding vector with norm larger than - max_normis renormalized to have norm- max_norm. Note: this will modify- weightin-place.
- norm_type (float, optional) – The p of the p-norm to compute for the - max_normoption. Default- 2.
- scale_grad_by_freq (boolean, optional) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default - False.
- sparse (bool, optional) – If - True, gradient w.r.t.- weightwill be a sparse tensor. See Notes under- torch.nn.Embeddingfor more details regarding sparse gradients.
 
 - Shape:
- Input: LongTensor of arbitrary shape containing the indices to extract 
- Weight: Embedding matrix of floating point type with shape (V, embedding_dim),
- where V = maximum index + 1 and embedding_dim = the embedding size 
 
- Output: (*, embedding_dim), where * is the input shape 
 
 - Examples: - >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([[1,2,4,5],[4,3,2,9]]) >>> # an embedding matrix containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> F.embedding(input, embedding_matrix) tensor([[[ 0.8490, 0.9625, 0.6753], [ 0.9666, 0.7761, 0.6108], [ 0.6246, 0.9751, 0.3618], [ 0.4161, 0.2419, 0.7383]], [[ 0.6246, 0.9751, 0.3618], [ 0.0237, 0.7794, 0.0528], [ 0.9666, 0.7761, 0.6108], [ 0.3385, 0.8612, 0.1867]]]) >>> # example with padding_idx >>> weights = torch.rand(10, 3) >>> weights[0, :].zero_() >>> embedding_matrix = weights >>> input = torch.tensor([[0,2,0,5]]) >>> F.embedding(input, embedding_matrix, padding_idx=0) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.5609, 0.5384, 0.8720], [ 0.0000, 0.0000, 0.0000], [ 0.6262, 0.2438, 0.7471]]]) 
embedding_bag¶
- 
torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False)¶
- Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings. - See - torch.nn.EmbeddingBagfor more details.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
- input (LongTensor) – Tensor containing bags of indices into the embedding matrix 
- weight (Tensor) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size 
- offsets (LongTensor, optional) – Only used when - inputis 1D.- offsetsdetermines the starting index position of each bag (sequence) in- input.
- max_norm (float, optional) – If given, each embedding vector with norm larger than - max_normis renormalized to have norm- max_norm. Note: this will modify- weightin-place.
- norm_type (float, optional) – The - pin the- p-norm to compute for the- max_normoption. Default- 2.
- scale_grad_by_freq (boolean, optional) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default - False. Note: this option is not supported when- mode="max".
- mode (string, optional) – - "sum",- "mean"or- "max". Specifies the way to reduce the bag. Default:- "mean"
- sparse (bool, optional) – if - True, gradient w.r.t.- weightwill be a sparse tensor. See Notes under- torch.nn.Embeddingfor more details regarding sparse gradients. Note: this option is not supported when- mode="max".
 
 - Shape: - input(LongTensor) and- offsets(LongTensor, optional)- If - inputis 2D of shape (B, N),- it will be treated as - Bbags (sequences) each of fixed length- N, and this will return- Bvalues aggregated in a way depending on the- mode.- offsetsis ignored and required to be- Nonein this case.
- If - inputis 1D of shape (N),- it will be treated as a concatenation of multiple bags (sequences). - offsetsis required to be a 1D tensor containing the starting index positions of each bag in- input. Therefore, for- offsetsof shape (B),- inputwill be viewed as having- Bbags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros.
 
- weight(Tensor): the learnable weights of the module of shape (num_embeddings, embedding_dim)
- output: aggregated embedding values of shape (B, embedding_dim)
 - Examples: - >>> # an Embedding module containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([1,2,4,5,4,3,2,9]) >>> offsets = torch.tensor([0,4]) >>> F.embedding_bag(embedding_matrix, input, offsets) tensor([[ 0.3397, 0.3552, 0.5545], [ 0.5893, 0.4386, 0.5882]]) 
one_hot¶
- 
torch.nn.functional.one_hot(tensor, num_classes=0) → LongTensor¶
- Takes LongTensor with index values of shape - (*)and returns a tensor of shape- (*, num_classes)that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.- See also One-hot on Wikipedia . - Parameters
- tensor (LongTensor) – class values of any shape. 
- num_classes (int) – Total number of classes. If set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. 
 
- Returns
- LongTensor that has one more dimension with 1 values at the index of last dimension indicated by the input, and 0 everywhere else. 
 - Examples - >>> F.one_hot(torch.arange(0, 5) % 3) tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]]) >>> F.one_hot(torch.arange(0, 5) % 3, num_classes=5) tensor([[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0]]) >>> F.one_hot(torch.arange(0, 6).view(3,2) % 3) tensor([[[1, 0, 0], [0, 1, 0]], [[0, 0, 1], [1, 0, 0]], [[0, 1, 0], [0, 0, 1]]]) 
Distance functions¶
pairwise_distance¶
- 
torch.nn.functional.pairwise_distance(x1, x2, p=2.0, eps=1e-06, keepdim=False)¶
- See - torch.nn.PairwiseDistancefor details
cosine_similarity¶
- 
torch.nn.functional.cosine_similarity(x1, x2, dim=1, eps=1e-8) → Tensor¶
- Returns cosine similarity between x1 and x2, computed along dim. \[\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)} \]- Parameters
 - Shape:
- Input: \((\ast_1, D, \ast_2)\) where D is at position dim. 
- Output: \((\ast_1, \ast_2)\) where 1 is at position dim. 
 
 - Example: - >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> output = F.cosine_similarity(input1, input2) >>> print(output) 
pdist¶
- 
torch.nn.functional.pdist(input, p=2) → Tensor¶
- Computes the p-norm distance between every pair of row vectors in the input. This is identical to the upper triangular portion, excluding the diagonal, of torch.norm(input[:, None] - input, dim=2, p=p). This function will be faster if the rows are contiguous. - If input has shape \(N \times M\) then the output will have shape \(\frac{1}{2} N (N - 1)\). - This function is equivalent to scipy.spatial.distance.pdist(input, ‘minkowski’, p=p) if \(p \in (0, \infty)\). When \(p = 0\) it is equivalent to scipy.spatial.distance.pdist(input, ‘hamming’) * M. When \(p = \infty\), the closest scipy function is scipy.spatial.distance.pdist(xn, lambda x, y: np.abs(x - y).max()). - Parameters
- input – input tensor of shape \(N \times M\). 
- p – p value for the p-norm distance to calculate between each vector pair \(\in [0, \infty]\). 
 
 
Loss functions¶
binary_cross_entropy¶
- 
torch.nn.functional.binary_cross_entropy(input, target, weight=None, size_average=None, reduce=None, reduction='mean')¶
- Function that measures the Binary Cross Entropy between the target and the output. - See - BCELossfor details.- Parameters
- input – Tensor of arbitrary shape 
- target – Tensor of the same shape as input 
- weight (Tensor, optional) – a manual rescaling weight if provided it’s repeated to match input tensor shape 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Examples: - >>> input = torch.randn((3, 2), requires_grad=True) >>> target = torch.rand((3, 2), requires_grad=False) >>> loss = F.binary_cross_entropy(F.sigmoid(input), target) >>> loss.backward() 
binary_cross_entropy_with_logits¶
- 
torch.nn.functional.binary_cross_entropy_with_logits(input, target, weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)¶
- Function that measures Binary Cross Entropy between target and output logits. - See - BCEWithLogitsLossfor details.- Parameters
- input – Tensor of arbitrary shape 
- target – Tensor of the same shape as input 
- weight (Tensor, optional) – a manual rescaling weight if provided it’s repeated to match input tensor shape 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
- pos_weight (Tensor, optional) – a weight of positive examples. Must be a vector with length equal to the number of classes. 
 
 - Examples: - >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> loss = F.binary_cross_entropy_with_logits(input, target) >>> loss.backward() 
poisson_nll_loss¶
- 
torch.nn.functional.poisson_nll_loss(input, target, log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')¶
- Poisson negative log likelihood loss. - See - PoissonNLLLossfor details.- Parameters
- input – expectation of underlying Poisson distribution. 
- target – random sample \(target \sim \text{Poisson}(input)\). 
- log_input – if - Truethe loss is computed as \(\exp(\text{input}) - \text{target} * \text{input}\), if- Falsethen loss is \(\text{input} - \text{target} * \log(\text{input}+\text{eps})\). Default:- True
- full – whether to compute full loss, i. e. to add the Stirling approximation term. Default: - False\(\text{target} * \log(\text{target}) - \text{target} + 0.5 * \log(2 * \pi * \text{target})\).
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- eps (float, optional) – Small value to avoid evaluation of \(\log(0)\) when - log_input`=``False`. Default: 1e-8
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 
cosine_embedding_loss¶
- 
torch.nn.functional.cosine_embedding_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor¶
- See - CosineEmbeddingLossfor details.
cross_entropy¶
- 
torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')¶
- This criterion combines log_softmax and nll_loss in a single function. - See - CrossEntropyLossfor details.- Parameters
- input (Tensor) – \((N, C)\) where C = number of classes or \((N, C, H, W)\) in case of 2D Loss, or \((N, C, d_1, d_2, ..., d_K)\) where \(K \geq 1\) in the case of K-dimensional loss. 
- target (Tensor) – \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) where \(K \geq 1\) for K-dimensional loss. 
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When - size_averageis- True, the loss is averaged over non-ignored targets. Default: -100
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Examples: - >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randint(5, (3,), dtype=torch.int64) >>> loss = F.cross_entropy(input, target) >>> loss.backward() 
ctc_loss¶
- 
torch.nn.functional.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank=0, reduction='mean', zero_infinity=False)¶
- The Connectionist Temporal Classification loss. - See - CTCLossfor details.- Note - In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting - torch.backends.cudnn.deterministic = True. Please see the notes on /notes/randomness for background.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
- log_probs – \((T, N, C)\) where C = number of characters in alphabet including blank, T = input length, and N = batch size. The logarithmized probabilities of the outputs (e.g. obtained with - torch.nn.functional.log_softmax()).
- targets – \((N, S)\) or (sum(target_lengths)). Targets cannot be blank. In the second form, the targets are assumed to be concatenated. 
- input_lengths – \((N)\). Lengths of the inputs (must each be \(\leq T\)) 
- target_lengths – \((N)\). Lengths of the targets 
- blank (int, optional) – Blank label. Default \(0\). 
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the output losses will be divided by the target lengths and then the mean over the batch is taken,- 'sum': the output will be summed. Default:- 'mean'
- zero_infinity (bool, optional) – Whether to zero infinite losses and the associated gradients. Default: - FalseInfinite losses mainly occur when the inputs are too short to be aligned to the targets.
 
 - Example: - >>> log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_() >>> targets = torch.randint(1, 20, (16, 30), dtype=torch.long) >>> input_lengths = torch.full((16,), 50, dtype=torch.long) >>> target_lengths = torch.randint(10,30,(16,), dtype=torch.long) >>> loss = F.ctc_loss(log_probs, targets, input_lengths, target_lengths) >>> loss.backward() 
hinge_embedding_loss¶
- 
torch.nn.functional.hinge_embedding_loss(input, target, margin=1.0, size_average=None, reduce=None, reduction='mean') → Tensor¶
- See - HingeEmbeddingLossfor details.
kl_div¶
- 
torch.nn.functional.kl_div(input, target, size_average=None, reduce=None, reduction='mean')¶
- The Kullback-Leibler divergence Loss. - See - KLDivLossfor details.- Parameters
- input – Tensor of arbitrary shape 
- target – Tensor of the same shape as input 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'batchmean'|- 'sum'|- 'mean'.- 'none': no reduction will be applied- 'batchmean': the sum of the output will be divided by the batchsize- 'sum': the output will be summed- 'mean': the output will be divided by the number of elements in the output Default:- 'mean'
 
 - Note - size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction.- Note - :attr: - reduction=- 'mean'doesn’t return the true kl divergence value, please use :attr:- reduction=- 'batchmean'which aligns with KL math definition. In the next major release,- 'mean'will be changed to be the same as ‘batchmean’.
l1_loss¶
mse_loss¶
margin_ranking_loss¶
- 
torch.nn.functional.margin_ranking_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor¶
- See - MarginRankingLossfor details.
multilabel_margin_loss¶
- 
torch.nn.functional.multilabel_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor¶
- See - MultiLabelMarginLossfor details.
multilabel_soft_margin_loss¶
- 
torch.nn.functional.multilabel_soft_margin_loss(input, target, weight=None, size_average=None) → Tensor¶
- See - MultiLabelSoftMarginLossfor details.
multi_margin_loss¶
- 
torch.nn.functional.multi_margin_loss(input, target, p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')¶
- multi_margin_loss(input, target, p=1, margin=1, weight=None, size_average=None,
- reduce=None, reduction=’mean’) -> Tensor 
 - See - MultiMarginLossfor details.
nll_loss¶
- 
torch.nn.functional.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')¶
- The negative log likelihood loss. - See - NLLLossfor details.- Parameters
- input – \((N, C)\) where C = number of classes or \((N, C, H, W)\) in case of 2D Loss, or \((N, C, d_1, d_2, ..., d_K)\) where \(K \geq 1\) in the case of K-dimensional loss. 
- target – \((N)\) where each value is \(0 \leq \text{targets}[i] \leq C-1\), or \((N, d_1, d_2, ..., d_K)\) where \(K \geq 1\) for K-dimensional loss. 
- weight (Tensor, optional) – a manual rescaling weight given to each class. If given, has to be a Tensor of size C 
- size_average (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field- size_averageis set to- False, the losses are instead summed for each minibatch. Ignored when reduce is- False. Default:- True
- ignore_index (int, optional) – Specifies a target value that is ignored and does not contribute to the input gradient. When - size_averageis- True, the loss is averaged over non-ignored targets. Default: -100
- reduce (bool, optional) – Deprecated (see - reduction). By default, the losses are averaged or summed over observations for each minibatch depending on- size_average. When- reduceis- False, returns a loss per batch element instead and ignores- size_average. Default:- True
- reduction (string, optional) – Specifies the reduction to apply to the output: - 'none'|- 'mean'|- 'sum'.- 'none': no reduction will be applied,- 'mean': the sum of the output will be divided by the number of elements in the output,- 'sum': the output will be summed. Note:- size_averageand- reduceare in the process of being deprecated, and in the meantime, specifying either of those two args will override- reduction. Default:- 'mean'
 
 - Example: - >>> # input is of size N x C = 3 x 5 >>> input = torch.randn(3, 5, requires_grad=True) >>> # each element in target has to have 0 <= value < C >>> target = torch.tensor([1, 0, 4]) >>> output = F.nll_loss(F.log_softmax(input), target) >>> output.backward() 
smooth_l1_loss¶
- 
torch.nn.functional.smooth_l1_loss(input, target, size_average=None, reduce=None, reduction='mean')¶
- Function that uses a squared term if the absolute element-wise error falls below 1 and an L1 term otherwise. - See - SmoothL1Lossfor details.
soft_margin_loss¶
- 
torch.nn.functional.soft_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor¶
- See - SoftMarginLossfor details.
triplet_margin_loss¶
- 
torch.nn.functional.triplet_margin_loss(anchor, positive, negative, margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')¶
- See - TripletMarginLossfor details
Vision functions¶
pixel_shuffle¶
- 
torch.nn.functional.pixel_shuffle()¶
- Rearranges elements in a tensor of shape \((*, C \times r^2, H, W)\) to a tensor of shape \((*, C, H \times r, W \times r)\). - See - PixelShufflefor details.- Parameters
 - Examples: - >>> input = torch.randn(1, 9, 4, 4) >>> output = torch.nn.functional.pixel_shuffle(input, 3) >>> print(output.size()) torch.Size([1, 1, 12, 12]) 
pad¶
- 
torch.nn.functional.pad(input, pad, mode='constant', value=0)¶
- Pads tensor. - Padding size:
- The padding size by which to pad some dimensions of - inputare described starting from the last dimension and moving forward. \(\left\lfloor\frac{\text{len(pad)}}{2}\right\rfloor\) dimensions of- inputwill be padded. For example, to pad only the last dimension of the input tensor, then- padhas the form \((\text{padding\_left}, \text{padding\_right})\); to pad the last 2 dimensions of the input tensor, then use \((\text{padding\_left}, \text{padding\_right},\) \(\text{padding\_top}, \text{padding\_bottom})\); to pad the last 3 dimensions, use \((\text{padding\_left}, \text{padding\_right},\) \(\text{padding\_top}, \text{padding\_bottom}\) \(\text{padding\_front}, \text{padding\_back})\).
- Padding mode:
- See - torch.nn.ConstantPad2d,- torch.nn.ReflectionPad2d, and- torch.nn.ReplicationPad2dfor concrete examples on how each of the padding modes works. Constant padding is implemented for arbitrary dimensions. Replicate padding is implemented for padding the last 3 dimensions of 5D input tensor, or the last 2 dimensions of 4D input tensor, or the last dimension of 3D input tensor. Reflect padding is only implemented for padding the last 2 dimensions of 4D input tensor, or the last dimension of 3D input tensor.
 - Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
 - Examples: - >>> t4d = torch.empty(3, 3, 4, 2) >>> p1d = (1, 1) # pad last dim by 1 on each side >>> out = F.pad(t4d, p1d, "constant", 0) # effectively zero padding >>> print(out.data.size()) torch.Size([3, 3, 4, 4]) >>> p2d = (1, 1, 2, 2) # pad last dim by (1, 1) and 2nd to last by (2, 2) >>> out = F.pad(t4d, p2d, "constant", 0) >>> print(out.data.size()) torch.Size([3, 3, 8, 4]) >>> t4d = torch.empty(3, 3, 4, 2) >>> p3d = (0, 1, 2, 1, 3, 3) # pad by (0, 1), (2, 1), and (3, 3) >>> out = F.pad(t4d, p3d, "constant", 0) >>> print(out.data.size()) torch.Size([3, 9, 7, 3]) 
interpolate¶
- 
torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)¶
- Down/up samples the input to either the given - sizeor the given- scale_factor- The algorithm used for interpolation is determined by - mode.- Currently temporal, spatial and volumetric sampling are supported, i.e. expected inputs are 3-D, 4-D or 5-D in shape. - The input dimensions are interpreted in the form: mini-batch x channels x [optional depth] x [optional height] x width. - The modes available for resizing are: nearest, linear (3D-only), bilinear, bicubic (4D-only), trilinear (5D-only), area - Parameters
- input (Tensor) – the input tensor 
- size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – output spatial size. 
- scale_factor (float or Tuple[float]) – multiplier for spatial size. Has to match input size if it is a tuple. 
- mode (str) – algorithm used for upsampling: - 'nearest'|- 'linear'|- 'bilinear'|- 'bicubic'|- 'trilinear'|- 'area'. Default:- 'nearest'
- align_corners (bool, optional) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to - True, the input and output tensors are aligned by the center points of their corner pixels. If set to- False, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values. This only has effect when- modeis- 'linear',- 'bilinear',- 'bicubic', or- 'trilinear'. Default:- False
 
 - Warning - With - align_corners = True, the linearly interpolating modes (linear, bilinear, and trilinear) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is- align_corners = False. See- Upsamplefor concrete examples on how this affects the outputs.- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. 
upsample¶
- 
torch.nn.functional.upsample(input, size=None, scale_factor=None, mode='nearest', align_corners=None)¶
- Upsamples the input to either the given - sizeor the given- scale_factor- Warning - This function is deprecated in favor of - torch.nn.functional.interpolate(). This is equivalent with- nn.functional.interpolate(...).- Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. - The algorithm used for upsampling is determined by - mode.- Currently temporal, spatial and volumetric upsampling are supported, i.e. expected inputs are 3-D, 4-D or 5-D in shape. - The input dimensions are interpreted in the form: mini-batch x channels x [optional depth] x [optional height] x width. - The modes available for upsampling are: nearest, linear (3D-only), bilinear, bicubic (4D-only), trilinear (5D-only) - Parameters
- input (Tensor) – the input tensor 
- size (int or Tuple[int] or Tuple[int, int] or Tuple[int, int, int]) – output spatial size. 
- scale_factor (float or Tuple[float]) – multiplier for spatial size. Has to be an integer. 
- mode (string) – algorithm used for upsampling: - 'nearest'|- 'linear'|- 'bilinear'|- 'bicubic'|- 'trilinear'. Default:- 'nearest'
- align_corners (bool, optional) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to - True, the input and output tensors are aligned by the center points of their corner pixels. If set to- False, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values. This only has effect when- modeis- 'linear',- 'bilinear',- 'bicubic'or- 'trilinear'. Default:- False
 
 - Warning - With - align_corners = True, the linearly interpolating modes (linear, bilinear, and trilinear) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is- align_corners = False. See- Upsamplefor concrete examples on how this affects the outputs.
upsample_nearest¶
- 
torch.nn.functional.upsample_nearest(input, size=None, scale_factor=None)¶
- Upsamples the input, using nearest neighbours’ pixel values. - Warning - This function is deprecated in favor of - torch.nn.functional.interpolate(). This is equivalent with- nn.functional.interpolate(..., mode='nearest').- Currently spatial and volumetric upsampling are supported (i.e. expected inputs are 4 or 5 dimensional). - Parameters
 - Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. 
upsample_bilinear¶
- 
torch.nn.functional.upsample_bilinear(input, size=None, scale_factor=None)¶
- Upsamples the input, using bilinear upsampling. - Warning - This function is deprecated in favor of - torch.nn.functional.interpolate(). This is equivalent with- nn.functional.interpolate(..., mode='bilinear', align_corners=True).- Expected inputs are spatial (4 dimensional). Use upsample_trilinear fo volumetric (5 dimensional) inputs. - Parameters
 - Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. 
grid_sample¶
- 
torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros')¶
- Given an - inputand a flow-field- grid, computes the- outputusing- inputvalues and pixel locations from- grid.- Currently, only spatial (4-D) and volumetric (5-D) - inputare supported.- In the spatial (4-D) case, for - inputwith shape \((N, C, H_\text{in}, W_\text{in})\) and- gridwith shape \((N, H_\text{out}, W_\text{out}, 2)\), the output will have shape \((N, C, H_\text{out}, W_\text{out})\).- For each output location - output[n, :, h, w], the size-2 vector- grid[n, h, w]specifies- inputpixel locations- xand- y, which are used to interpolate the output value- output[n, :, h, w]. In the case of 5D inputs,- grid[n, d, h, w]specifies the- x,- y,- zpixel locations for interpolating- output[n, :, d, h, w].- modeargument specifies- nearestor- bilinearinterpolation method to sample the input pixels.- gridshould have most values in the range of- [-1, 1]. This is because the pixel locations are normalized by the- inputspatial dimensions. For example, values- x = -1, y = -1is the left-top pixel of- input, and values- x = 1, y = 1is the right-bottom pixel of- input.- If - gridhas values outside the range of- [-1, 1], those locations are handled as defined by- padding_mode. Options are- padding_mode="zeros": use- 0for out-of-bound values,
- padding_mode="border": use border values for out-of-bound values,
- padding_mode="reflection": use values at locations reflected by the border for out-of-bound values. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel location- x = -3.5reflects by- -1and becomes- x' = 1.5, then reflects by border- 1and becomes- x'' = -0.5.
 - Note - This function is often used in building Spatial Transformer Networks. - Note - When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on /notes/randomness for background. - Parameters
- input (Tensor) – input of shape \((N, C, H_\text{in}, W_\text{in})\) (4-D case) or \((N, C, D_\text{in}, H_\text{in}, W_\text{in})\) (5-D case) 
- grid (Tensor) – flow-field of shape \((N, H_\text{out}, W_\text{out}, 2)\) (4-D case) or \((N, D_\text{out}, H_\text{out}, W_\text{out}, 3)\) (5-D case) 
- mode (str) – interpolation mode to calculate output values - 'bilinear'|- 'nearest'. Default:- 'bilinear'
- padding_mode (str) – padding mode for outside grid values - 'zeros'|- 'border'|- 'reflection'. Default:- 'zeros'
 
- Returns
- output Tensor 
- Return type
- output (Tensor) 
 
affine_grid¶
- 
torch.nn.functional.affine_grid(theta, size)¶
- Generates a 2d flow field, given a batch of affine matrices - theta. Generally used in conjunction with- grid_sample()to implement Spatial Transformer Networks.
DataParallel functions (multi-GPU, distributed)¶
data_parallel¶
- 
torch.nn.parallel.data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None)¶
- Evaluates module(input) in parallel across the GPUs given in device_ids. - This is the functional version of the DataParallel module. - Parameters
- module (Module) – the module to evaluate in parallel 
- inputs (Tensor) – inputs to the module 
- device_ids (list of python:int or torch.device) – GPU ids on which to replicate module 
- output_device (list of python:int or torch.device) – GPU location of the output Use -1 to indicate the CPU. (default: device_ids[0]) 
 
- Returns
- a Tensor containing the result of module(input) located on output_device 
 
torch.nn.init¶
- 
torch.nn.init.calculate_gain(nonlinearity, param=None)¶
- Return the recommended gain value for the given nonlinearity function. The values are as follows: - nonlinearity - gain - Linear / Identity - \(1\) - Conv{1,2,3}D - \(1\) - Sigmoid - \(1\) - Tanh - \(\frac{5}{3}\) - ReLU - \(\sqrt{2}\) - Leaky Relu - \(\sqrt{\frac{2}{1 + \text{negative\_slope}^2}}\) - Parameters
- nonlinearity – the non-linear function (nn.functional name) 
- param – optional parameter for the non-linear function 
 
 - Examples - >>> gain = nn.init.calculate_gain('leaky_relu') 
- 
torch.nn.init.uniform_(tensor, a=0, b=1)¶
- Fills the input Tensor with values drawn from the uniform distribution \(\mathcal{U}(a, b)\). - Parameters
- tensor – an n-dimensional torch.Tensor 
- a – the lower bound of the uniform distribution 
- b – the upper bound of the uniform distribution 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.uniform_(w) 
- 
torch.nn.init.normal_(tensor, mean=0, std=1)¶
- Fills the input Tensor with values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std})\). - Parameters
- tensor – an n-dimensional torch.Tensor 
- mean – the mean of the normal distribution 
- std – the standard deviation of the normal distribution 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.normal_(w) 
- 
torch.nn.init.constant_(tensor, val)¶
- Fills the input Tensor with the value \(\text{val}\). - Parameters
- tensor – an n-dimensional torch.Tensor 
- val – the value to fill the tensor with 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.constant_(w, 0.3) 
- 
torch.nn.init.eye_(tensor)¶
- Fills the 2-dimensional input Tensor with the identity matrix. Preserves the identity of the inputs in Linear layers, where as many inputs are preserved as possible. - Parameters
- tensor – a 2-dimensional torch.Tensor 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.eye_(w) 
- 
torch.nn.init.dirac_(tensor)¶
- Fills the {3, 4, 5}-dimensional input Tensor with the Dirac delta function. Preserves the identity of the inputs in Convolutional layers, where as many input channels are preserved as possible. - Parameters
- tensor – a {3, 4, 5}-dimensional torch.Tensor 
 - Examples - >>> w = torch.empty(3, 16, 5, 5) >>> nn.init.dirac_(w) 
- 
torch.nn.init.xavier_uniform_(tensor, gain=1)¶
- Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-a, a)\) where \[a = \text{gain} \times \sqrt{\frac{6}{\text{fan\_in} + \text{fan\_out}}} \]- Also known as Glorot initialization. - Parameters
- tensor – an n-dimensional torch.Tensor 
- gain – an optional scaling factor 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu')) 
- 
torch.nn.init.xavier_normal_(tensor, gain=1)¶
- Fills the input Tensor with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std})\) where \[\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\_in} + \text{fan\_out}}} \]- Also known as Glorot initialization. - Parameters
- tensor – an n-dimensional torch.Tensor 
- gain – an optional scaling factor 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.xavier_normal_(w) 
- 
torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')¶
- Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from \(\mathcal{U}(-\text{bound}, \text{bound})\) where \[\text{bound} = \sqrt{\frac{6}{(1 + a^2) \times \text{fan\_in}}} \]- Also known as He initialization. - Parameters
- tensor – an n-dimensional torch.Tensor 
- a – the negative slope of the rectifier used after this layer (0 for ReLU by default) 
- mode – either - 'fan_in'(default) or- 'fan_out'. Choosing- 'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing- 'fan_out'preserves the magnitudes in the backwards pass.
- nonlinearity – the non-linear function (nn.functional name), recommended to use only with - 'relu'or- 'leaky_relu'(default).
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu') 
- 
torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')¶
- Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from \(\mathcal{N}(0, \text{std})\) where \[\text{std} = \sqrt{\frac{2}{(1 + a^2) \times \text{fan\_in}}} \]- Also known as He initialization. - Parameters
- tensor – an n-dimensional torch.Tensor 
- a – the negative slope of the rectifier used after this layer (0 for ReLU by default) 
- mode – either - 'fan_in'(default) or- 'fan_out'. Choosing- 'fan_in'preserves the magnitude of the variance of the weights in the forward pass. Choosing- 'fan_out'preserves the magnitudes in the backwards pass.
- nonlinearity – the non-linear function (nn.functional name), recommended to use only with - 'relu'or- 'leaky_relu'(default).
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu') 
- 
torch.nn.init.orthogonal_(tensor, gain=1)¶
- Fills the input Tensor with a (semi) orthogonal matrix, as described in Exact solutions to the nonlinear dynamics of learning in deep linear neural networks - Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. - Parameters
- tensor – an n-dimensional torch.Tensor, where \(n \geq 2\) 
- gain – optional scaling factor 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.orthogonal_(w) 
- 
torch.nn.init.sparse_(tensor, sparsity, std=0.01)¶
- Fills the 2D input Tensor as a sparse matrix, where the non-zero elements will be drawn from the normal distribution \(\mathcal{N}(0, 0.01)\), as described in Deep learning via Hessian-free optimization - Martens, J. (2010). - Parameters
- tensor – an n-dimensional torch.Tensor 
- sparsity – The fraction of elements in each column to be set to zero 
- std – the standard deviation of the normal distribution used to generate the non-zero values 
 
 - Examples - >>> w = torch.empty(3, 5) >>> nn.init.sparse_(w, sparsity=0.1) 
torch.optim¶
torch.optim is a package implementing various optimization algorithms.
Most commonly used methods are already supported, and the interface is general
enough, so that more sophisticated ones can be also easily integrated in the
future.
How to use an optimizer¶
To use torch.optim you have to construct an optimizer object, that will hold
the current state and will update the parameters based on the computed gradients.
Constructing it¶
To construct an Optimizer you have to give it an iterable containing the
parameters (all should be Variable s) to optimize. Then,
you can specify optimizer-specific options such as the learning rate, weight decay, etc.
Note
If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.
Example:
optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)
Per-parameter options¶
Optimizer s also support specifying per-parameter options. To do this, instead
of passing an iterable of Variable s, pass in an iterable of
dict s. Each of them will define a separate parameter group, and should contain
a params key, containing a list of parameters belonging to it. Other keys
should match the keyword arguments accepted by the optimizers, and will be used
as optimization options for this group.
Note
You can still pass options as keyword arguments. They will be used as defaults, in the groups that didn’t override them. This is useful when you only want to vary a single option, while keeping all others consistent between parameter groups.
For example, this is very useful when one wants to specify per-layer learning rates:
optim.SGD([
                {'params': model.base.parameters()},
                {'params': model.classifier.parameters(), 'lr': 1e-3}
            ], lr=1e-2, momentum=0.9)
This means that model.base’s parameters will use the default learning rate of 1e-2,
model.classifier’s parameters will use a learning rate of 1e-3, and a momentum of
0.9 will be used for all parameters.
Taking an optimization step¶
All optimizers implement a step() method, that updates the
parameters. It can be used in two ways:
optimizer.step()¶
This is a simplified version supported by most optimizers. The function can be
called once the gradients are computed using e.g.
backward().
Example:
for input, target in dataset:
    optimizer.zero_grad()
    output = model(input)
    loss = loss_fn(output, target)
    loss.backward()
    optimizer.step()
optimizer.step(closure)¶
Some optimization algorithms such as Conjugate Gradient and LBFGS need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients, compute the loss, and return it.
Example:
for input, target in dataset:
    def closure():
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        return loss
    optimizer.step(closure)
Algorithms¶
- 
class torch.optim.Optimizer(params, defaults)¶
- Base class for all optimizers. - Warning - Parameters need to be specified as collections that have a deterministic ordering that is consistent between runs. Examples of objects that don’t satisfy those properties are sets and iterators over values of dictionaries. - Parameters
- params (iterable) – an iterable of - torch.Tensors or- dicts. Specifies what Tensors should be optimized.
- defaults – (dict): a dict containing default values of optimization options (used when a parameter group doesn’t specify them). 
 
 - 
add_param_group(param_group)¶
- Add a param group to the - Optimizers param_groups.- This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the - Optimizeras training progresses.- Parameters
- param_group (dict) – Specifies what Tensors should be optimized along with group 
- optimization options. (specific) – 
 
 
 - 
load_state_dict(state_dict)¶
- Loads the optimizer state. - Parameters
- state_dict (dict) – optimizer state. Should be an object returned from a call to - state_dict().
 
 - 
state_dict()¶
- Returns the state of the optimizer as a - dict.- It contains two entries: - state - a dict holding current optimization state. Its content
- differs between optimizer classes. 
 
- param_groups - a dict containing all parameter groups 
 
 - 
step(closure)¶
- Performs a single optimization step (parameter update). - Parameters
- closure (callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers. 
 
 - 
zero_grad()¶
- Clears the gradients of all optimized - torch.Tensors.
 
- 
class torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)¶
- Implements Adadelta algorithm. - It has been proposed in ADADELTA: An Adaptive Learning Rate Method. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- rho (float, optional) – coefficient used for computing a running average of squared gradients (default: 0.9) 
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-6) 
- lr (float, optional) – coefficient that scale delta before it is applied to the parameters (default: 1.0) 
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)¶
- Implements Adagrad algorithm. - It has been proposed in Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. - Parameters
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)¶
- Implements Adam algorithm. - It has been proposed in Adam: A Method for Stochastic Optimization. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 1e-3) 
- betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) 
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8) 
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
- amsgrad (boolean, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default: False) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)¶
- Implements lazy version of Adam algorithm suitable for sparse tensors. - In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 1e-3) 
- betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) 
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)¶
- Implements Adamax algorithm (a variant of Adam based on infinity norm). - It has been proposed in Adam: A Method for Stochastic Optimization. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 2e-3) 
- betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square 
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8) 
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)¶
- Implements Averaged Stochastic Gradient Descent. - It has been proposed in Acceleration of stochastic approximation by averaging. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 1e-2) 
- lambd (float, optional) – decay term (default: 1e-4) 
- alpha (float, optional) – power for eta update (default: 0.75) 
- t0 (float, optional) – point at which to start averaging (default: 1e6) 
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-05, tolerance_change=1e-09, history_size=100, line_search_fn=None)¶
- Implements L-BFGS algorithm. - Warning - This optimizer doesn’t support per-parameter options and parameter groups (there can be only one). - Warning - Right now all parameters have to be on a single device. This will be improved in the future. - Note - This is a very memory intensive optimizer (it requires additional - param_bytes * (history_size + 1)bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm.- Parameters
- lr (float) – learning rate (default: 1) 
- max_iter (int) – maximal number of iterations per optimization step (default: 20) 
- max_eval (int) – maximal number of function evaluations per optimization step (default: max_iter * 1.25). 
- tolerance_grad (float) – termination tolerance on first order optimality (default: 1e-5). 
- tolerance_change (float) – termination tolerance on function value/parameter changes (default: 1e-9). 
- history_size (int) – update history size (default: 100). 
 
 - 
step(closure)¶
- Performs a single optimization step. - Parameters
- closure (callable) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)¶
- Implements RMSprop algorithm. - Proposed by G. Hinton in his course. - The centered version first appears in Generating Sequences With Recurrent Neural Networks. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 1e-2) 
- momentum (float, optional) – momentum factor (default: 0) 
- alpha (float, optional) – smoothing constant (default: 0.99) 
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8) 
- centered (bool, optional) – if - True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))¶
- Implements the resilient backpropagation algorithm. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float, optional) – learning rate (default: 1e-2) 
- etas (Tuple[float, float], optional) – pair of (etaminus, etaplis), that are multiplicative increase and decrease factors (default: (0.5, 1.2)) 
- step_sizes (Tuple[float, float], optional) – a pair of minimal and maximal allowed step sizes (default: (1e-6, 50)) 
 
 - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
- 
class torch.optim.SGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)¶
- Implements stochastic gradient descent (optionally with momentum). - Nesterov momentum is based on the formula from On the importance of initialization and momentum in deep learning. - Parameters
- params (iterable) – iterable of parameters to optimize or dicts defining parameter groups 
- lr (float) – learning rate 
- momentum (float, optional) – momentum factor (default: 0) 
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0) 
- dampening (float, optional) – dampening for momentum (default: 0) 
- nesterov (bool, optional) – enables Nesterov momentum (default: False) 
 
 - Example - >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step() - Note - The implementation of SGD with Momentum/Nesterov subtly differs from Sutskever et. al. and implementations in some other frameworks. - Considering the specific case of Momentum, the update can be written as \[v = \rho * v + g \\ p = p - lr * v \]- where p, g, v and \(\rho\) denote the parameters, gradient, velocity, and momentum respectively. - This is in contrast to Sutskever et. al. and other frameworks which employ an update of the form \[v = \rho * v + lr * g \\ p = p - v \]- The Nesterov version is analogously modified. - 
step(closure=None)¶
- Performs a single optimization step. - Parameters
- closure (callable, optional) – A closure that reevaluates the model and returns the loss. 
 
 
How to adjust Learning Rate¶
torch.optim.lr_scheduler provides several methods to adjust the learning
rate based on the number of epochs. torch.optim.lr_scheduler.ReduceLROnPlateau
allows dynamic learning rate reducing based on some validation measurements.
- 
class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1)¶
- Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. - Parameters
 - Example - >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>> validate(...) - 
load_state_dict(state_dict)¶
- Loads the schedulers state. - Parameters
- state_dict (dict) – scheduler state. Should be an object returned from a call to - state_dict().
 
 
- 
class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)¶
- Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr. - Parameters
 - Example - >>> # Assuming optimizer uses lr = 0.05 for all groups >>> # lr = 0.05 if epoch < 30 >>> # lr = 0.005 if 30 <= epoch < 60 >>> # lr = 0.0005 if 60 <= epoch < 90 >>> # ... >>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>> validate(...) 
- 
class torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1)¶
- Decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr. - Parameters
 - Example - >>> # Assuming optimizer uses lr = 0.05 for all groups >>> # lr = 0.05 if epoch < 30 >>> # lr = 0.005 if 30 <= epoch < 80 >>> # lr = 0.0005 if epoch >= 80 >>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1) >>> for epoch in range(100): >>> scheduler.step() >>> train(...) >>> validate(...) 
- 
class torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1)¶
- Decays the learning rate of each parameter group by gamma every epoch. When last_epoch=-1, sets initial lr as lr. 
- 
class torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)¶
- Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr and \(T_{cur}\) is the number of epochs since the last restart in SGDR: \[\eta_{t+1} = \eta_{min} + (\eta_t - \eta_{min})\frac{1 + \cos(\frac{T_{cur+1}}{T_{max}}\pi)}{1 + \cos(\frac{T_{cur}}{T_{max}}\pi)} \]- When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes: \[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})(1 + \cos(\frac{T_{cur}}{T_{max}}\pi)) \]- It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts. Note that this only implements the cosine annealing part of SGDR, and not the restarts. 
- 
class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)¶
- Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced. - Parameters
- optimizer (Optimizer) – Wrapped optimizer. 
- mode (str) – One of min, max. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’. 
- factor (float) – Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1. 
- patience (int) – Number of epochs with no improvement after which learning rate will be reduced. For example, if patience = 2, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10. 
- verbose (bool) – If - True, prints a message to stdout for each update. Default:- False.
- threshold (float) – Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4. 
- threshold_mode (str) – One of rel, abs. In rel mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in min mode. In abs mode, dynamic_threshold = best + threshold in max mode or best - threshold in min mode. Default: ‘rel’. 
- cooldown (int) – Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0. 
- min_lr (float or list) – A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0. 
- eps (float) – Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8. 
 
 - Example - >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> scheduler = ReduceLROnPlateau(optimizer, 'min') >>> for epoch in range(10): >>> train(...) >>> val_loss = validate(...) >>> # Note that step should be called after validate() >>> scheduler.step(val_loss) 
Automatic differentiation package - torch.autograd¶
torch.autograd provides classes and functions implementing automatic
differentiation of arbitrary scalar valued functions. It requires minimal
changes to the existing code - you only need to declare Tensor s
for which gradients should be computed with the requires_grad=True keyword.
- 
torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)¶
- Computes the sum of gradients of given tensors w.r.t. graph leaves. - The graph is differentiated using the chain rule. If any of - tensorsare non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifying- grad_tensors. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors (- Noneis an acceptable value for all tensors that don’t need gradient tensors).- This function accumulates gradients in the leaves - you might need to zero them before calling it. - Parameters
- tensors (sequence of Tensor) – Tensors of which the derivative will be computed. 
- grad_tensors (sequence of (Tensor or None)) – The “vector” in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. 
- retain_graph (bool, optional) – If - False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to- Trueis not needed and often can be worked around in a much more efficient way. Defaults to the value of- create_graph.
- create_graph (bool, optional) – If - True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to- False.
 
 
- 
torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)¶
- Computes and returns the sum of gradients of outputs w.r.t. the inputs. - grad_outputsshould be a sequence of length matching- outputcontaining the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can be- None).- If - only_inputsis- True, the function will only return a list of gradients w.r.t the specified inputs. If it’s- False, then gradient w.r.t. all remaining leaves will still be computed, and will be accumulated into their- .gradattribute.- Parameters
- outputs (sequence of Tensor) – outputs of the differentiated function. 
- inputs (sequence of Tensor) – Inputs w.r.t. which the gradient will be returned (and not accumulated into - .grad).
- grad_outputs (sequence of Tensor) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None. 
- retain_graph (bool, optional) – If - False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to- Trueis not needed and often can be worked around in a much more efficient way. Defaults to the value of- create_graph.
- create_graph (bool, optional) – If - True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default:- False.
- allow_unused (bool, optional) – If - False, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to- False.
 
 
Locally disabling gradient computation¶
- 
class torch.autograd.no_grad¶
- Context-manager that disabled gradient calculation. - Disabling gradient calculation is useful for inference, when you are sure that you will not call - Tensor.backward(). It will reduce memory consumption for computations that would otherwise have requires_grad=True. In this mode, the result of every computation will have requires_grad=False, even when the inputs have requires_grad=True.- Also functions as a decorator. - Example: - >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad False >>> @torch.no_grad() ... def doubler(x): ... return x * 2 >>> z = doubler(x) >>> z.requires_grad False 
- 
class torch.autograd.enable_grad¶
- Context-manager that enables gradient calculation. - Enables gradient calculation inside a - no_gradcontext. This has no effect outside of- no_grad.- Also functions as a decorator. - Example: - >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... with torch.enable_grad(): ... y = x * 2 >>> y.requires_grad True >>> y.backward() >>> x.grad >>> @torch.enable_grad() ... def doubler(x): ... return x * 2 >>> with torch.no_grad(): ... z = doubler(x) >>> z.requires_grad True 
- 
class torch.autograd.set_grad_enabled(mode)¶
- Context-manager that sets gradient calculation to on or off. - set_grad_enabledwill enable or disable grads based on its argument- mode. It can be used as a context-manager or as a function.- Parameters
- mode (bool) – Flag whether to enable grad ( - True), or disable (- False). This can be used to conditionally enable gradients.
 - Example: - >>> x = torch.tensor([1], requires_grad=True) >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 >>> y.requires_grad False >>> torch.set_grad_enabled(True) >>> y = x * 2 >>> y.requires_grad True >>> torch.set_grad_enabled(False) >>> y = x * 2 >>> y.requires_grad False 
In-place operations on Tensors¶
Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them.
In-place correctness checks¶
All Tensor s keep track of in-place operations applied to them, and
if the implementation detects that a tensor was saved for backward in one of
the functions, but it was modified in-place afterwards, an error will be raised
once backward pass is started. This ensures that if you’re using in-place
functions and not seeing any errors, you can be sure that the computed
gradients are correct.
Variable (deprecated)¶
Warning
The Variable API has been deprecated: Variables are no longer necessary to
use autograd with tensors. Autograd automatically supports Tensors with
requires_grad set to True. Below please find a quick guide on what
has changed:
- Variable(tensor)and- Variable(tensor, requires_grad)still work as expected, but they return Tensors instead of Variables.
- var.datais the same thing as- tensor.data.
- Methods such as - var.backward(), var.detach(), var.register_hook()now work on tensors with the same method names.
In addition, one can now create tensors with requires_grad=True using factory
methods such as torch.randn(), torch.zeros(), torch.ones(), and others
like the following:
autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)
Tensor autograd functions¶
- 
class torch.Tensor¶
- 
backward(gradient=None, retain_graph=None, create_graph=False)¶
- Computes the gradient of current tensor w.r.t. graph leaves. - The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying - gradient. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t.- self.- This function accumulates gradients in the leaves - you might need to zero them before calling it. - Parameters
- gradient (Tensor or None) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless - create_graphis True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional.
- retain_graph (bool, optional) – If - False, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of- create_graph.
- create_graph (bool, optional) – If - True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to- False.
 
 
 - 
detach()¶
- Returns a new Tensor, detached from the current graph. - The result will never require gradient. - Note - Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as resize_ / resize_as_ / set_ / transpose_) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as zero_ / copy_ / add_) to the returned tensor will not update the original tensor anymore, and will instead trigger an error. 
 - 
detach_()¶
- Detaches the Tensor from the graph that created it, making it a leaf. Views cannot be detached in-place. 
 - 
grad¶
- This attribute is - Noneby default and becomes a Tensor the first time a call to- backward()computes gradients for- self. The attribute will then contain the gradients computed and future calls to- backward()will accumulate (add) gradients into it.
 - 
is_leaf¶
- All Tensors that have - requires_gradwhich is- Falsewill be leaf Tensors by convention.- For Tensors that have - requires_gradwhich is- True, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so- grad_fnis None.- Only leaf Tensors will have their - gradpopulated during a call to- backward(). To get- gradpopulated for non-leaf Tensors, you can use- retain_grad().- Example: - >>> a = torch.rand(10, requires_grad=True) >>> a.is_leaf True >>> b = torch.rand(10, requires_grad=True).cuda() >>> b.is_leaf False # b was created by the operation that cast a cpu Tensor into a cuda Tensor >>> c = torch.rand(10, requires_grad=True) + 2 >>> c.is_leaf False # c was created by the addition operation >>> d = torch.rand(10).cuda() >>> d.is_leaf True # d does not require gradients and so has no operation creating it (that is tracked by the autograd engine) >>> e = torch.rand(10).cuda().requires_grad_() >>> e.is_leaf True # e requires gradients and has no operations creating it >>> f = torch.rand(10, requires_grad=True, device="cuda") >>> f.is_leaf True # f requires grad, has not operation creating it 
 - 
register_hook(hook)¶
- Registers a backward hook. - The hook will be called every time a gradient with respect to the Tensor is computed. The hook should have the following signature: - hook(grad) -> Tensor or None - The hook should not modify its argument, but it can optionally return a new gradient which will be used in place of - grad.- This function returns a handle with a method - handle.remove()that removes the hook from the module.- Example: - >>> v = torch.tensor([0., 0., 0.], requires_grad=True) >>> h = v.register_hook(lambda grad: grad * 2) # double the gradient >>> v.backward(torch.tensor([1., 2., 3.])) >>> v.grad 2 4 6 [torch.FloatTensor of size (3,)] >>> h.remove() # removes the hook 
 - 
requires_grad¶
- Is - Trueif gradients need to be computed for this Tensor,- Falseotherwise.
 - 
retain_grad()¶
- Enables .grad attribute for non-leaf Tensors. 
 
- 
Function¶
- 
class torch.autograd.Function¶
- Records operation history and defines formulas for differentiating ops. - Every operation performed on - Tensors creates a new function object, that performs the computation, and records that it happened. The history is retained in the form of a DAG of functions, with edges denoting data dependencies (- input <- output). Then, when backward is called, the graph is processed in the topological ordering, by calling- backward()methods of each- Functionobject, and passing returned gradients on to next- Functions.- Normally, the only way users interact with functions is by creating subclasses and defining new operations. This is a recommended way of extending torch.autograd. - Each function object is meant to be used only once (in the forward pass). - Examples: - >>> class Exp(Function): >>> >>> @staticmethod >>> def forward(ctx, i): >>> result = i.exp() >>> ctx.save_for_backward(result) >>> return result >>> >>> @staticmethod >>> def backward(ctx, grad_output): >>> result, = ctx.saved_tensors >>> return grad_output * result - 
static backward(ctx, *grad_outputs)¶
- Defines a formula for differentiating the operation. - This function is to be overridden by all subclasses. - It must accept a context - ctxas the first argument, followed by as many outputs did- forward()return, and it should return as many tensors, as there were inputs to- forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.- The context can be used to retrieve tensors saved during the forward pass. It also has an attribute - ctx.needs_input_gradas a tuple of booleans representing whether each input needs gradient. E.g.,- backward()will have- ctx.needs_input_grad[0] = Trueif the first input to- forward()needs gradient computated w.r.t. the output.
 - 
static forward(ctx, *args, **kwargs)¶
- Performs the operation. - This function is to be overridden by all subclasses. - It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types). - The context can be used to store tensors that can be then retrieved during the backward pass. 
 
- 
static 
Numerical gradient checking¶
- 
torch.autograd.gradcheck(func, inputs, eps=1e-06, atol=1e-05, rtol=0.001, raise_exception=True, check_sparse_nnz=False)¶
- Check gradients computed via small finite differences against analytical gradients w.r.t. tensors in - inputsthat are of floating point type and with- requires_grad=True.- The check between numerical and analytical gradients uses - allclose().- Note - The default values are designed for - inputof double precision. This check will likely fail if- inputis of less precision, e.g.,- FloatTensor.- Warning - If any checked tensor in - inputhas overlapping memory, i.e., different indices pointing to the same memory address (e.g., from- torch.expand()), this check will likely fail because the numerical gradients computed by point perturbation at such indices will change values at all other indices that share the same memory address.- Parameters
- func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors 
- inputs (tuple of Tensor or Tensor) – inputs to the function 
- eps (float, optional) – perturbation for finite differences 
- atol (float, optional) – absolute tolerance 
- rtol (float, optional) – relative tolerance 
- raise_exception (bool, optional) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks. 
- check_sparse_nnz (bool, optional) – if True, gradcheck allows for SparseTensor input, and for any SparseTensor at input, gradcheck will perform check at nnz positions only. 
 
- Returns
- True if all differences satisfy allclose condition 
 
- 
torch.autograd.gradgradcheck(func, inputs, grad_outputs=None, eps=1e-06, atol=1e-05, rtol=0.001, gen_non_contig_grad_outputs=False, raise_exception=True)¶
- Check gradients of gradients computed via small finite differences against analytical gradients w.r.t. tensors in - inputsand- grad_outputsthat are of floating point type and with- requires_grad=True.- This function checks that backpropagating through the gradients computed to the given - grad_outputsare correct.- The check between numerical and analytical gradients uses - allclose().- Note - The default values are designed for - inputand- grad_outputsof double precision. This check will likely fail if they are of less precision, e.g.,- FloatTensor.- Warning - If any checked tensor in - inputand- grad_outputshas overlapping memory, i.e., different indices pointing to the same memory address (e.g., from- torch.expand()), this check will likely fail because the numerical gradients computed by point perturbation at such indices will change values at all other indices that share the same memory address.- Parameters
- func (function) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors 
- inputs (tuple of Tensor or Tensor) – inputs to the function 
- grad_outputs (tuple of Tensor or Tensor, optional) – The gradients with respect to the function’s outputs. 
- eps (float, optional) – perturbation for finite differences 
- atol (float, optional) – absolute tolerance 
- rtol (float, optional) – relative tolerance 
- gen_non_contig_grad_outputs (bool, optional) – if - grad_outputsis- Noneand- gen_non_contig_grad_outputsis- True, the randomly generated gradient outputs are made to be noncontiguous
- raise_exception (bool, optional) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks. 
 
- Returns
- True if all differences satisfy allclose condition 
 
Profiler¶
Autograd includes a profiler that lets you inspect the cost of different
operators inside your model - both on the CPU and GPU. There are two modes
implemented at the moment - CPU-only using profile.
and nvprof based (registers both CPU and GPU activity) using
emit_nvtx.
- 
class torch.autograd.profiler.profile(enabled=True, use_cuda=False)¶
- Context manager that manages autograd profiler state and holds a summary of results. - Parameters
 - Example - >>> x = torch.randn((1, 1), requires_grad=True) >>> with torch.autograd.profiler.profile() as prof: ... y = x ** 2 ... y.backward() >>> # NOTE: some columns were removed for brevity ... print(prof) ------------------------------------- --------------- --------------- Name CPU time CUDA time ------------------------------------- --------------- --------------- PowConstant 142.036us 0.000us N5torch8autograd9GraphRootE 63.524us 0.000us PowConstantBackward 184.228us 0.000us MulConstant 50.288us 0.000us PowConstant 28.439us 0.000us Mul 20.154us 0.000us N5torch8autograd14AccumulateGradE 13.790us 0.000us N5torch8autograd5CloneE 4.088us 0.000us - 
export_chrome_trace(path)¶
- Exports an EventList as a Chrome tracing tools file. - The checkpoint can be later loaded and inspected under - chrome://tracingURL.- Parameters
- path (str) – Path where the trace will be written. 
 
 - 
key_averages()¶
- Averages all function events over their keys. - Returns
- An EventList containing FunctionEventAvg objects. 
 
 - 
table(sort_by=None)¶
- Prints an EventList as a nicely formatted table. - Parameters
- sort_by (str, optional) – Attribute used to sort entries. By default they are printed in the same order as they were registered. Valid keys include: - cpu_time,- cuda_time,- cpu_time_total,- cuda_time_total,- count.
- Returns
- A string containing the table. 
 
 - 
total_average()¶
- Averages all events. - Returns
- A FunctionEventAvg object. 
 
 
- 
class torch.autograd.profiler.emit_nvtx(enabled=True)¶
- Context manager that makes every autograd operation emit an NVTX range. - It is useful when running the program under nvprof: - nvprof --profile-from-start off -o trace_name.prof -- <regular command here> - Unfortunately, there’s no way to force nvprof to flush the data it collected to disk, so for CUDA profiling one has to use this context manager to annotate nvprof traces and wait for the process to exit before inspecting them. Then, either NVIDIA Visual Profiler (nvvp) can be used to visualize the timeline, or - torch.autograd.profiler.load_nvprof()can load the results for inspection e.g. in Python REPL.- Parameters
- enabled (bool, optional) – Setting this to False makes this context manager a no-op. Default: - True.
 - Example - >>> with torch.cuda.profiler.profile(): ... model(x) # Warmup CUDA memory allocator and profiler ... with torch.autograd.profiler.emit_nvtx(): ... model(x) - Forward-backward correlation - When viewing a profile created using - emit_nvtxin the Nvidia Visual Profiler, correlating each backward-pass op with the corresponding forward-pass op can be difficult. To ease this task,- emit_nvtxappends sequence number information to the ranges it generates.- During the forward pass, each function range is decorated with - seq=<N>.- seqis a running counter, incremented each time a new backward Function object is created and stashed for backward. Thus, the seq=<N> annotation associated with each forward function range tells you that if a backward Function object is created by this forward function, the backward object will receive sequence number N. During the backward pass, the top-level range wrapping each C++ backward Function’s- apply()call is decorated with- stashed seq=<M>.- Mis the sequence number that the backward object was created with. By comparing- stashed seqnumbers in backward with- seqnumbers in forward, you can track down which forward op created each backward Function.- Any functions executed during the backward pass are also decorated with - seq=<N>. During default backward (with- create_graph=False) this information is irrelevant, and in fact,- Nmay simply be 0 for all such functions. Only the top-level ranges associated with backward Function objects’- apply()methods are useful, as a way to correlate these Function objects with the earlier forward pass.- Double-backward - If, on the other hand, a backward pass with - create_graph=Trueis underway (in other words, if you are setting up for a double-backward), each function’s execution during backward is given a nonzero, useful- seq=<N>. Those functions may themselves create Function objects to be executed later during double-backward, just as the original functions in the forward pass did. The relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions still emit current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and during the eventual double-backward, the Function objects’- apply()ranges are still tagged with- stashed seqnumbers, which can be compared to seq numbers from the backward pass.
Anomaly detection¶
- 
class torch.autograd.detect_anomaly¶
- Context-manager that enable anomaly detection for the autograd engine. - This does two things: - Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function. - Any backward computation that generate “nan” value will raise an error. - Example - >>> import torch >>> from torch import autograd >>> class MyFunc(autograd.Function): ... @staticmethod ... def forward(ctx, inp): ... return inp.clone() ... @staticmethod ... def backward(ctx, gO): ... # Error during the backward pass ... raise RuntimeError("Some error in backward") ... return gO.clone() >>> def run_fn(a): ... out = MyFunc.apply(a) ... return out.sum() >>> inp = torch.rand(10, 10, requires_grad=True) >>> out = run_fn(inp) >>> out.backward() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/your/pytorch/install/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply return self._forward_cls.backward(self, *args) File "<stdin>", line 8, in backward RuntimeError: Some error in backward >>> with autograd.detect_anomaly(): ... inp = torch.rand(10, 10, requires_grad=True) ... out = run_fn(inp) ... out.backward() Traceback of forward call that caused the error: File "tmp.py", line 53, in <module> out = run_fn(inp) File "tmp.py", line 44, in run_fn out = MyFunc.apply(a) Traceback (most recent call last): File "<stdin>", line 4, in <module> File "/your/pytorch/install/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply return self._forward_cls.backward(self, *args) File "<stdin>", line 8, in backward RuntimeError: Some error in backward 
- 
class torch.autograd.set_detect_anomaly(mode)¶
- Context-manager that sets the anomaly detection for the autograd engine on or off. - set_detect_anomalywill enable or disable the autograd anomaly detection based on its argument- mode. It can be used as a context-manager or as a function.- See - detect_anomalyabove for details of the anomaly detection behaviour.- Parameters
- mode (bool) – Flag whether to enable anomaly detection ( - True), or disable (- False).
 
Distributed communication package - torch.distributed¶
Backends¶
torch.distributed supports three backends, each with
different capabilities. The table below shows which functions are available
for use with CPU / CUDA tensors.
MPI supports CUDA only if the implementation used to build PyTorch supports it.
| Backend | 
 | 
 | 
 | |||
|---|---|---|---|---|---|---|
| Device | CPU | GPU | CPU | GPU | CPU | GPU | 
| send | ✓ | ✘ | ✓ | ? | ✘ | ✘ | 
| recv | ✓ | ✘ | ✓ | ? | ✘ | ✘ | 
| broadcast | ✓ | ✓ | ✓ | ? | ✘ | ✓ | 
| all_reduce | ✓ | ✓ | ✓ | ? | ✘ | ✓ | 
| reduce | ✓ | ✘ | ✓ | ? | ✘ | ✓ | 
| all_gather | ✓ | ✘ | ✓ | ? | ✘ | ✓ | 
| gather | ✓ | ✘ | ✓ | ? | ✘ | ✘ | 
| scatter | ✓ | ✘ | ✓ | ? | ✘ | ✘ | 
| barrier | ✓ | ✘ | ✓ | ? | ✘ | ✓ | 
Backends that come with PyTorch¶
PyTorch distributed currently only supports Linux. By default, the Gloo and NCCL backends are built and included in PyTorch distributed (NCCL only when building with CUDA). MPI is an optional backend that can only be included if you build PyTorch from source. (e.g. building PyTorch on a host that has MPI installed.)
Which backend to use?¶
In the past, we were often asked: “which backend should I use?”.
- Rule of thumb - Use the NCCL backend for distributed GPU training 
- Use the Gloo backend for distributed CPU training. 
 
- GPU hosts with InfiniBand interconnect - Use NCCL, since it’s the only backend that currently supports InfiniBand and GPUDirect. 
 
- GPU hosts with Ethernet interconnect - Use NCCL, since it currently provides the best distributed GPU training performance, especially for multiprocess single-node or multi-node distributed training. If you encounter any problem with NCCL, use Gloo as the fallback option. (Note that Gloo currently runs slower than NCCL for GPUs.) 
 
- CPU hosts with InfiniBand interconnect - If your InfiniBand has enabled IP over IB, use Gloo, otherwise, use MPI instead. We are planning on adding InfiniBand support for Gloo in the upcoming releases. 
 
- CPU hosts with Ethernet interconnect - Use Gloo, unless you have specific reasons to use MPI. 
 
Common environment variables¶
Choosing the network interface to use¶
By default, both NCCL and Gloo backends will try to find the network interface to use for communication. However, this is not always guaranteed to be successful from our experiences. Therefore, if you encounter any problem on either backend not being able to find the correct network interface. You can try to set the following environment variables (each one applicable to its respective backend):
- NCCL_SOCKET_IFNAME, for example - export NCCL_SOCKET_IFNAME=eth0
- GLOO_SOCKET_IFNAME, for example - export GLOO_SOCKET_IFNAME=eth0
Other NCCL environment variables¶
NCCL has also provided a number of environment variables for fine-tuning purposes.
Commonly used ones include the following for debugging purposes:
- export NCCL_DEBUG=INFO
- export NCCL_DEBUG_SUBSYS=ALL
For the full list of NCCL environment variables, please refer to NVIDIA NCCL’s official documentation
Basics¶
The torch.distributed package provides PyTorch support and communication primitives
for multiprocess parallelism across several computation nodes running on one or more
machines. The class torch.nn.parallel.DistributedDataParallel() builds on this
functionality to provide synchronous distributed training as a wrapper around any
PyTorch model. This differs from the kinds of parallelism provided by
Multiprocessing package - torch.multiprocessing and torch.nn.DataParallel() in that it supports
multiple network-connected machines and in that the user must explicitly launch a separate
copy of the main training script for each process.
In the single-machine synchronous case, torch.distributed or the
torch.nn.parallel.DistributedDataParallel() wrapper may still have advantages over other
approaches to data-parallelism, including torch.nn.DataParallel():
- Each process maintains its own optimizer and performs a complete optimization step with each iteration. While this may appear redundant, since the gradients have already been gathered together and averaged across processes and are thus the same for every process, this means that no parameter broadcast step is needed, reducing time spent transferring tensors between nodes. 
- Each process contains an independent Python interpreter, eliminating the extra interpreter overhead and “GIL-thrashing” that comes from driving several execution threads, model replicas, or GPUs from a single Python process. This is especially important for models that make heavy use of the Python runtime, including models with recurrent layers or many small components. 
Initialization¶
The package needs to be initialized using the torch.distributed.init_process_group()
function before calling any other methods. This blocks until all processes have
joined.
Currently three initialization methods are supported:
TCP initialization¶
There are two ways to initialize using TCP, both requiring a network address
reachable from all processes and a desired world_size. The first way
requires specifying an address that belongs to the rank 0 process. This
initialization method requires that all processes have manually specified ranks.
Note that multicast address is not supported anymore in the latest distributed
package. group_name is deprecated as well.
import torch.distributed as dist
# Use address of one of the machines
dist.init_process_group(backend, init_method='tcp://10.1.1.20:23456',
                        rank=args.rank, world_size=4)
Environment variable initialization¶
This method will read the configuration from environment variables, allowing one to fully customize how the information is obtained. The variables to be set are:
- MASTER_PORT- required; has to be a free port on machine with rank 0
- MASTER_ADDR- required (except for rank 0); address of rank 0 node
- WORLD_SIZE- required; can be set either here, or in a call to init function
- RANK- required; can be set either here, or in a call to init function
The machine with rank 0 will be used to set up all connections.
This is the default method, meaning that init_method does not have to be specified (or
can be env://).
Groups¶
By default collectives operate on the default group (also called the world) and
require all processes to enter the distributed function call. However, some workloads can benefit
from more fine-grained communication. This is where distributed groups come
into play. new_group() function can be
used to create new groups, with arbitrary subsets of all processes. It returns
an opaque group handle that can be given as a group argument to all collectives
(collectives are distributed functions to exchange information in certain well-known programming patterns).
Currently torch.distributed does not support creating groups with different backends.
In other words, each group being created will use the same backend as you specified in
init_process_group().
Point-to-point communication¶
isend() and irecv()
return distributed request objects when used. In general, the type of this object is unspecified
as they should never be created manually, but they are guaranteed to support two methods:
- is_completed()- returns True if the operation has finished
- wait()- will block the process until the operation is finished.- is_completed()is guaranteed to return True once it returns.
Synchronous and asynchronous collective operations¶
Every collective operation function supports the following two kinds of operations:
synchronous operation - the default mode, when async_op is set to False.
when the function returns, it is guaranteed that
the collective operation is performed (not necessarily completed if it’s a CUDA op since all
CUDA ops are asynchronous), and any further function calls depending on the data of the
collective operation can be called. In the synchronous mode, the collective function does not
return anything
asynchronous operation - when async_op is set to True. The collective operation function
returns a distributed request object. In general, you don’t need to create it manually and it
is guaranteed to support two methods:
- is_completed()- returns True if the operation has finished
- wait()- will block the process until the operation is finished.
Collective functions¶
- 
class torch.distributed.reduce_op¶
- Deprecated enum-like class for reduction operations: - SUM,- PRODUCT,- MIN, and- MAX.- ReduceOpis recommended to use instead.
Multi-GPU collective functions¶
If you have more than one GPU on each node, when using the NCCL and Gloo backend,
broadcast_multigpu()
all_reduce_multigpu()
reduce_multigpu() and
all_gather_multigpu() support distributed collective
operations among multiple GPUs within each node. These functions can potentially
improve the overall distributed training performance and be easily used by
passing a list of tensors. Each Tensor in the passed tensor list needs
to be on a separate GPU device of the host where the function is called. Note
that the length of the tensor list needs to be identical among all the
distributed processes. Also note that currently the multi-GPU collective
functions are only supported by the NCCL backend.
For example, if the system we use for distributed training has 2 nodes, each of which has 8 GPUs. On each of the 16 GPUs, there is a tensor that we would like to all-reduce. The following code can serve as a reference:
Code running on Node 0
import torch
import torch.distributed as dist
dist.init_process_group(backend="nccl",
                        init_method="file:///distributed_test",
                        world_size=2,
                        rank=0)
tensor_list = []
for dev_idx in range(torch.cuda.device_count()):
    tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
dist.all_reduce_multigpu(tensor_list)
Code running on Node 1
import torch
import torch.distributed as dist
dist.init_process_group(backend="nccl",
                        init_method="file:///distributed_test",
                        world_size=2,
                        rank=1)
tensor_list = []
for dev_idx in range(torch.cuda.device_count()):
    tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
dist.all_reduce_multigpu(tensor_list)
After the call, all 16 tensors on the two nodes will have the all-reduced value of 16
Launch utility¶
The torch.distributed package also provides a launch utility in torch.distributed.launch. This helper utility can be used to launch multiple processes per node for distributed training. This utility also supports both python2 and python3.
torch.distributed.launch is a module that spawns up multiple distributed training processes on each of the training nodes.
The utility can be used for single-node distributed training, in which one or more processes per node will be spawned. The utility can be used for either CPU training or GPU training. If the utility is used for GPU training, each distributed process will be operating on a single GPU. This can achieve well-improved single-node training performance. It can also be used in multi-node distributed training, by spawning up multiple processes on each node for well-improved multi-node distributed training performance as well. This will especially be benefitial for systems with multiple Infiniband interfaces that have direct-GPU support, since all of them can be utilized for aggregated communication bandwidth.
In both cases of single-node distributed training or multi-node distributed
training, this utility will launch the given number of processes per node
(--nproc_per_node). If used for GPU training, this number needs to be less
or euqal to the number of GPUs on the current system (nproc_per_node),
and each process will be operating on a single GPU from GPU 0 to
GPU (nproc_per_node - 1).
How to use this module:
- Single-Node multi-process distributed training 
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
           arguments of your training script)
- Multi-Node multi-process distributed training: (e.g. two nodes) 
Node 1: (IP: 192.168.1.1, and has a free port: 1234)
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)
Node 2:
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)
- To look up what optional arguments this module offers: 
>>> python -m torch.distributed.launch --help
Important Notices:
1. This utilty and multi-process distributed (single-node or multi-node) GPU training currently only achieves the best performance using the NCCL distributed backend. Thus NCCL backend is the recommended backend to use for GPU training.
2. In your training program, you must parse the command-line argument:
--local_rank=LOCAL_PROCESS_RANK, which will be provided by this module.
If your training program uses GPUs, you should ensure that your code only
runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by:
Parsing the local_rank argument
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("--local_rank", type=int)
>>> args = parser.parse_args()
Set your device to local rank using either
>>> torch.cuda.set_device(arg.local_rank)  # before your code runs
or
>>> with torch.cuda.device(arg.local_rank):
>>>    # your code to run
3. In your training program, you are supposed to call the following function
at the beginning to start the distributed backend. You need to make sure that
the init_method uses env://, which is the only supported init_method
by this module.
torch.distributed.init_process_group(backend='YOUR BACKEND',
                                     init_method='env://')
4. In your training program, you can either use regular distributed functions
or use torch.nn.parallel.DistributedDataParallel() module. If your
training program uses GPUs for training and you would like to use
torch.nn.parallel.DistributedDataParallel() module,
here is how to configure it.
model = torch.nn.parallel.DistributedDataParallel(model,
                                                  device_ids=[arg.local_rank],
                                                  output_device=arg.local_rank)
Please ensure that device_ids argument is set to be the only GPU device id
that your code will be operating on. This is generally the local rank of the
process. In other words, the device_ids needs to be [args.local_rank],
and output_device needs to be args.local_rank in order to use this
utility
5. Another way to pass local_rank to the subprocesses via environment variable
LOCAL_RANK. This behavior is enabled when you launch the script with
--use_env=True. You must adjust the subprocess example above to replace
args.local_rank with os.environ['LOCAL_RANK']; the launcher
will not pass --local_rank when you specify this flag.
Warning
local_rank is NOT globally unique: it is only unique per process
on a machine.  Thus, don’t use it to decide if you should, e.g.,
write to a networked filesystem.  See
https://github.com/pytorch/pytorch/issues/12042 for an example of
how things can go wrong if you don’t do this correctly.
Spawn utility¶
The torch.multiprocessing package also provides a spawn
function in torch.multiprocessing.spawn(). This helper function
can be used to spawn multiple processes. It works by passing in the
function that you want to run and spawns N processes to run it. This
can be used for multiprocess distributed training as well.
For references on how to use it, please refer to PyToch example - ImageNet implementation
Note that this function requires Python 3.4 or higher.
Probability distributions - torch.distributions¶
The distributions package contains parameterizable probability distributions
and sampling functions. This allows the construction of stochastic computation
graphs and stochastic gradient estimators for optimization. This package
generally follows the design of the TensorFlow Distributions package.
It is not possible to directly backpropagate through random samples. However, there are two main methods for creating surrogate functions that can be backpropagated through. These are the score function estimator/likelihood ratio estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonly seen as the basis for policy gradient methods in reinforcement learning, and the pathwise derivative estimator is commonly seen in the reparameterization trick in variational autoencoders. Whilst the score function only requires the value of samples \(f(x)\), the pathwise derivative requires the derivative \(f'(x)\). The next sections discuss these two in a reinforcement learning example. For more details see Gradient Estimation Using Stochastic Computation Graphs .
Score function¶
When the probability density function is differentiable with respect to its
parameters, we only need sample() and
log_prob() to implement REINFORCE:
where \(\theta\) are the parameters, \(\alpha\) is the learning rate, \(r\) is the reward and \(p(a|\pi^\theta(s))\) is the probability of taking action \(a\) in state \(s\) given policy \(\pi^\theta\).
In practice we would sample an action from the output of a network, apply this
action in an environment, and then use log_prob to construct an equivalent
loss function. Note that we use a negative because optimizers use gradient
descent, whilst the rule above assumes gradient ascent. With a categorical
policy, the code for implementing REINFORCE would be as follows:
probs = policy_network(state)
# Note that this is equivalent to what used to be called multinomial
m = Categorical(probs)
action = m.sample()
next_state, reward = env.step(action)
loss = -m.log_prob(action) * reward
loss.backward()
Pathwise derivative¶
The other way to implement these stochastic/policy gradients would be to use the
reparameterization trick from the
rsample() method, where the
parameterized random variable can be constructed via a parameterized
deterministic function of a parameter-free random variable. The reparameterized
sample therefore becomes differentiable. The code for implementing the pathwise
derivative would be as follows:
params = policy_network(state)
m = Normal(*params)
# Any distribution with .has_rsample == True could work based on the application
action = m.rsample()
next_state, reward = env.step(action)  # Assuming that reward is differentiable
loss = -reward
loss.backward()
Distribution¶
- 
class torch.distributions.distribution.Distribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)¶
- Bases: - object- Distribution is the abstract base class for probability distributions. - 
arg_constraints¶
- Returns a dictionary from argument names to - Constraintobjects that should be satisfied by each argument of this distribution. Args that are not tensors need not appear in this dict.
 - 
batch_shape¶
- Returns the shape over which parameters are batched. 
 - 
cdf(value)¶
- Returns the cumulative density/mass function evaluated at value. - Parameters
- value (Tensor) – 
 
 - 
entropy()¶
- Returns entropy of distribution, batched over batch_shape. - Returns
- Tensor of shape batch_shape. 
 
 - 
enumerate_support(expand=True)¶
- Returns tensor containing all values supported by a discrete distribution. The result will enumerate over dimension 0, so the shape of the result will be (cardinality,) + batch_shape + event_shape (where event_shape = () for univariate distributions). - Note that this enumerates over all batched tensors in lock-step [[0, 0], [1, 1], …]. With expand=False, enumeration happens along dim 0, but with the remaining batch dimensions being singleton dimensions, [[0], [1], ... - To iterate over the full Cartesian product use itertools.product(m.enumerate_support()). - Parameters
- expand (bool) – whether to expand the support over the batch dims to match the distribution’s batch_shape. 
- Returns
- Tensor iterating over dimension 0. 
 
 - 
event_shape¶
- Returns the shape of a single sample (without batching). 
 - 
expand(batch_shape, _instance=None)¶
- Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to batch_shape. This method calls - expandon the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in __init__.py, when an instance is first created.- Parameters
- batch_shape (torch.Size) – the desired expanded size. 
- _instance – new instance provided by subclasses that need to override .expand. 
 
- Returns
- New distribution instance with batch dimensions expanded to batch_size. 
 
 - 
icdf(value)¶
- Returns the inverse cumulative density/mass function evaluated at value. - Parameters
- value (Tensor) – 
 
 - 
log_prob(value)¶
- Returns the log of the probability density/mass function evaluated at value. - Parameters
- value (Tensor) – 
 
 - 
mean¶
- Returns the mean of the distribution. 
 - 
perplexity()¶
- Returns perplexity of distribution, batched over batch_shape. - Returns
- Tensor of shape batch_shape. 
 
 - 
rsample(sample_shape=torch.Size([]))¶
- Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched. 
 - 
sample(sample_shape=torch.Size([]))¶
- Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched. 
 - 
sample_n(n)¶
- Generates n samples or n batches of samples if the distribution parameters are batched. 
 - 
stddev¶
- Returns the standard deviation of the distribution. 
 - 
support¶
- Returns a - Constraintobject representing this distribution’s support.
 - 
variance¶
- Returns the variance of the distribution. 
 
- 
ExponentialFamily¶
- 
class torch.distributions.exp_family.ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- ExponentialFamily is the abstract base class for probability distributions belonging to an exponential family, whose probability mass/density function has the form is defined below \[p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle - F(\theta) + k(x))\]- where \(\theta\) denotes the natural parameters, \(t(x)\) denotes the sufficient statistic, \(F(\theta)\) is the log normalizer function for a given family and \(k(x)\) is the carrier measure. - Note - This class is an intermediary between the Distribution class and distributions which belong to an exponential family mainly to check the correctness of the .entropy() and analytic KL divergence methods. We use this class to compute the entropy and KL divergence using the AD framework and Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies and Cross-entropies of Exponential Families). - 
entropy()¶
- Method to compute the entropy using Bregman divergence of the log normalizer. 
 
- 
Bernoulli¶
- 
class torch.distributions.bernoulli.Bernoulli(probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a Bernoulli distribution parameterized by - probsor- logits(but not both).- Samples are binary (0 or 1). They take the value 1 with probability p and 0 with probability 1 - p. - Example: - >>> m = Bernoulli(torch.tensor([0.3])) >>> m.sample() # 30% chance 1; 70% chance 0 tensor([ 0.]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}¶
 - 
entropy()¶
 - 
enumerate_support(expand=True)¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_enumerate_support= True¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support= Boolean()¶
 - 
variance¶
 
Beta¶
- 
class torch.distributions.beta.Beta(concentration1, concentration0, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Beta distribution parameterized by - concentration1and- concentration0.- Example: - >>> m = Beta(torch.tensor([0.5]), torch.tensor([0.5])) >>> m.sample() # Beta distributed with concentration concentration1 and concentration0 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0)}¶
 - 
concentration0¶
 - 
concentration1¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=())¶
 - 
support= Interval(lower_bound=0.0, upper_bound=1.0)¶
 - 
variance¶
 
Binomial¶
- 
class torch.distributions.binomial.Binomial(total_count=1, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Binomial distribution parameterized by - total_countand either- probsor- logits(but not both).- total_countmust be broadcastable with- probs/- logits.- Example: - >>> m = Binomial(100, torch.tensor([0 , .2, .8, 1])) >>> x = m.sample() tensor([ 0., 22., 71., 100.]) >>> m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8])) >>> x = m.sample() tensor([[ 4., 5.], [ 7., 6.]]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': IntegerGreaterThan(lower_bound=0)}¶
 - 
enumerate_support(expand=True)¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_enumerate_support= True¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support¶
 - 
variance¶
 
Categorical¶
- 
class torch.distributions.categorical.Categorical(probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a categorical distribution parameterized by either - probsor- logits(but not both).- Note - It is equivalent to the distribution that - torch.multinomial()samples from.- Samples are integers from \(\{0, \ldots, K-1\}\) where K is - probs.size(-1).- If - probsis 1D with length-K, each element is the relative probability of sampling the class at that index.- If - probsis 2D, it is treated as a batch of relative probability vectors.- Note - probsmust be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1.- See also: - torch.multinomial()- Example: - >>> m = Categorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor(3) - 
arg_constraints= {'logits': Real(), 'probs': Simplex()}¶
 - 
entropy()¶
 - 
enumerate_support(expand=True)¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_enumerate_support= True¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support¶
 - 
variance¶
 
- 
Cauchy¶
- 
class torch.distributions.cauchy.Cauchy(loc, scale, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio of independent normally distributed random variables with means 0 follows a Cauchy distribution. - Example: - >>> m = Cauchy(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Cauchy distribution with loc=0 and scale=1 tensor([ 2.3214]) - Parameters
 - 
arg_constraints= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(value)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
support= Real()¶
 - 
variance¶
 
Chi2¶
- 
class torch.distributions.chi2.Chi2(df, validate_args=None)¶
- Bases: - torch.distributions.gamma.Gamma- Creates a Chi2 distribution parameterized by shape parameter - df. This is exactly equivalent to- Gamma(alpha=0.5*df, beta=0.5)- Example: - >>> m = Chi2(torch.tensor([1.0])) >>> m.sample() # Chi2 distributed with shape df=1 tensor([ 0.1046]) - 
arg_constraints= {'df': GreaterThan(lower_bound=0.0)}¶
 - 
df¶
 - 
expand(batch_shape, _instance=None)¶
 
- 
Dirichlet¶
- 
class torch.distributions.dirichlet.Dirichlet(concentration, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a Dirichlet distribution parameterized by concentration - concentration.- Example: - >>> m = Dirichlet(torch.tensor([0.5, 0.5])) >>> m.sample() # Dirichlet distributed with concentrarion concentration tensor([ 0.1046, 0.8954]) - Parameters
- concentration (Tensor) – concentration parameter of the distribution (often referred to as alpha) 
 - 
arg_constraints= {'concentration': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=())¶
 - 
support= Simplex()¶
 - 
variance¶
 
Exponential¶
- 
class torch.distributions.exponential.Exponential(rate, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a Exponential distribution parameterized by - rate.- Example: - >>> m = Exponential(torch.tensor([1.0])) >>> m.sample() # Exponential distributed with rate=1 tensor([ 0.1046]) - 
arg_constraints= {'rate': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(value)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
stddev¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
- 
FisherSnedecor¶
- 
class torch.distributions.fishersnedecor.FisherSnedecor(df1, df2, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Fisher-Snedecor distribution parameterized by - df1and- df2.- Example: - >>> m = FisherSnedecor(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # Fisher-Snedecor-distributed with df1=1 and df2=2 tensor([ 0.2453]) - Parameters
 - 
arg_constraints= {'df1': GreaterThan(lower_bound=0.0), 'df2': GreaterThan(lower_bound=0.0)}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
Gamma¶
- 
class torch.distributions.gamma.Gamma(concentration, rate, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a Gamma distribution parameterized by shape - concentrationand- rate.- Example: - >>> m = Gamma(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # Gamma distributed with concentration=1 and rate=1 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
Geometric¶
- 
class torch.distributions.geometric.Geometric(probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Geometric distribution parameterized by - probs, where- probsis the probability of success of Bernoulli trials. It represents the probability that in \(k + 1\) Bernoulli trials, the first \(k\) trials failed, before seeing a success.- Samples are non-negative integers [0, \(\inf\)). - Example: - >>> m = Geometric(torch.tensor([0.3])) >>> m.sample() # underlying Bernoulli has 30% chance 1; 70% chance 0 tensor([ 2.]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support= IntegerGreaterThan(lower_bound=0)¶
 - 
variance¶
 
Gumbel¶
- 
class torch.distributions.gumbel.Gumbel(loc, scale, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Samples from a Gumbel Distribution. - Examples: - >>> m = Gumbel(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # sample from Gumbel distribution with loc=1, scale=2 tensor([ 1.0124]) - Parameters
 - 
arg_constraints= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
stddev¶
 - 
support= Real()¶
 - 
variance¶
 
HalfCauchy¶
- 
class torch.distributions.half_cauchy.HalfCauchy(scale, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Creates a half-normal distribution parameterized by scale where: - X ~ Cauchy(0, scale) Y = |X| ~ HalfCauchy(scale) - Example: - >>> m = HalfCauchy(torch.tensor([1.0])) >>> m.sample() # half-cauchy distributed with scale=1 tensor([ 2.3214]) - 
arg_constraints= {'scale': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(prob)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
scale¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
- 
HalfNormal¶
- 
class torch.distributions.half_normal.HalfNormal(scale, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Creates a half-normal distribution parameterized by scale where: - X ~ Normal(0, scale) Y = |X| ~ HalfNormal(scale) - Example: - >>> m = HalfNormal(torch.tensor([1.0])) >>> m.sample() # half-normal distributed with scale=1 tensor([ 0.1046]) - 
arg_constraints= {'scale': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(prob)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
scale¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
- 
Independent¶
- 
class torch.distributions.independent.Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Reinterprets some of the batch dims of a distribution as event dims. - This is mainly useful for changing the shape of the result of - log_prob(). For example to create a diagonal Normal distribution with the same shape as a Multivariate Normal distribution (so they are interchangeable), you can:- >>> loc = torch.zeros(3) >>> scale = torch.ones(3) >>> mvn = MultivariateNormal(loc, scale_tril=torch.diag(scale)) >>> [mvn.batch_shape, mvn.event_shape] [torch.Size(()), torch.Size((3,))] >>> normal = Normal(loc, scale) >>> [normal.batch_shape, normal.event_shape] [torch.Size((3,)), torch.Size(())] >>> diagn = Independent(normal, 1) >>> [diagn.batch_shape, diagn.event_shape] [torch.Size(()), torch.Size((3,))] - Parameters
- base_distribution (torch.distributions.distribution.Distribution) – a base distribution 
- reinterpreted_batch_ndims (int) – the number of batch dims to reinterpret as event dims 
 
 - 
arg_constraints= {}¶
 - 
entropy()¶
 - 
enumerate_support(expand=True)¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_enumerate_support¶
 - 
has_rsample¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support¶
 - 
variance¶
 
Laplace¶
- 
class torch.distributions.laplace.Laplace(loc, scale, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Laplace distribution parameterized by - locand :attr:’scale’.- Example: - >>> m = Laplace(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # Laplace distributed with loc=0, scale=1 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(value)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
stddev¶
 - 
support= Real()¶
 - 
variance¶
 
LogNormal¶
- 
class torch.distributions.log_normal.LogNormal(loc, scale, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Creates a log-normal distribution parameterized by - locand- scalewhere:- X ~ Normal(loc, scale) Y = exp(X) ~ LogNormal(loc, scale) - Example: - >>> m = LogNormal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # log-normal distributed with mean=0 and stddev=1 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
loc¶
 - 
mean¶
 - 
scale¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
LowRankMultivariateNormal¶
- 
class torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by - cov_factorand- cov_diag:- covariance_matrix = cov_factor @ cov_factor.T + cov_diag - Example - >>> m = LowRankMultivariateNormal(torch.zeros(2), torch.tensor([1, 0]), torch.tensor([1, 1])) >>> m.sample() # normally distributed with mean=`[0,0]`, cov_factor=`[1,0]`, cov_diag=`[1,1]` tensor([-0.2102, -0.5429]) - Parameters
- loc (Tensor) – mean of the distribution with shape batch_shape + event_shape 
- cov_factor (Tensor) – factor part of low-rank form of covariance matrix with shape batch_shape + event_shape + (rank,) 
- cov_diag (Tensor) – diagonal part of low-rank form of covariance matrix with shape batch_shape + event_shape 
 
 - Note - The computation for determinant and inverse of covariance matrix is avoided when cov_factor.shape[1] << cov_factor.shape[0] thanks to Woodbury matrix identity and matrix determinant lemma. Thanks to these formulas, we just need to compute the determinant and inverse of the small size “capacitance” matrix: - capacitance = I + cov_factor.T @ inv(cov_diag) @ cov_factor - 
arg_constraints= {'cov_diag': GreaterThan(lower_bound=0.0), 'cov_factor': Real(), 'loc': Real()}¶
 - 
covariance_matrix¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
precision_matrix¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
scale_tril¶
 - 
support= Real()¶
 - 
variance¶
 
Multinomial¶
- 
class torch.distributions.multinomial.Multinomial(total_count=1, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Multinomial distribution parameterized by - total_countand either- probsor- logits(but not both). The innermost dimension of- probsindexes over categories. All other dimensions index over batches.- Note that - total_countneed not be specified if only- log_prob()is called (see example below)- Note - probsmust be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1.- sample()requires a single shared total_count for all parameters and samples.
- log_prob()allows different total_count for each parameter and sample.
 - Example: - >>> m = Multinomial(100, torch.tensor([ 1., 1., 1., 1.])) >>> x = m.sample() # equal probability of 0, 1, 2, 3 tensor([ 21., 24., 30., 25.]) >>> Multinomial(probs=torch.tensor([1., 1., 1., 1.])).log_prob(x) tensor([-4.1338]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Simplex()}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support¶
 - 
variance¶
 
MultivariateNormal¶
- 
class torch.distributions.multivariate_normal.MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a multivariate normal (also called Gaussian) distribution parameterized by a mean vector and a covariance matrix. - The multivariate normal distribution can be parameterized either in terms of a positive definite covariance matrix \(\mathbf{\Sigma}\) or a positive definite precision matrix \(\mathbf{\Sigma}^{-1}\) or a lower-triangular matrix \(\mathbf{L}\) with positive-valued diagonal entries, such that \(\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top\). This triangular matrix can be obtained via e.g. Cholesky decomposition of the covariance. - Example - >>> m = MultivariateNormal(torch.zeros(2), torch.eye(2)) >>> m.sample() # normally distributed with mean=`[0,0]` and covariance_matrix=`I` tensor([-0.2102, -0.5429]) - Parameters
 - Note - Only one of - covariance_matrixor- precision_matrixor- scale_trilcan be specified.- Using - scale_trilwill be more efficient: all computations internally are based on- scale_tril. If- covariance_matrixor- precision_matrixis passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition.- 
arg_constraints= {'covariance_matrix': PositiveDefinite(), 'loc': RealVector(), 'precision_matrix': PositiveDefinite(), 'scale_tril': LowerCholesky()}¶
 - 
covariance_matrix¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
precision_matrix¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
scale_tril¶
 - 
support= Real()¶
 - 
variance¶
 
NegativeBinomial¶
- 
class torch.distributions.negative_binomial.NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Negative Binomial distribution, i.e. distribution of the number of independent identical Bernoulli trials needed before - total_countfailures are achieved. The probability of success of each Bernoulli trial is- probs.- Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support= IntegerGreaterThan(lower_bound=0)¶
 - 
variance¶
 
Normal¶
- 
class torch.distributions.normal.Normal(loc, scale, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a normal (also called Gaussian) distribution parameterized by - locand- scale.- Example: - >>> m = Normal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # normally distributed with loc=0 and scale=1 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(value)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
stddev¶
 - 
support= Real()¶
 - 
variance¶
 
OneHotCategorical¶
- 
class torch.distributions.one_hot_categorical.OneHotCategorical(probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a one-hot categorical distribution parameterized by - probsor- logits.- Samples are one-hot coded vectors of size - probs.size(-1).- Note - probsmust be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1.- See also: - torch.distributions.Categorical()for specifications of- probsand- logits.- Example: - >>> m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor([ 0., 0., 0., 1.]) - 
arg_constraints= {'logits': Real(), 'probs': Simplex()}¶
 - 
entropy()¶
 - 
enumerate_support(expand=True)¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_enumerate_support= True¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
mean¶
 - 
param_shape¶
 - 
probs¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support= Simplex()¶
 - 
variance¶
 
- 
Pareto¶
- 
class torch.distributions.pareto.Pareto(scale, alpha, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Samples from a Pareto Type 1 distribution. - Example: - >>> m = Pareto(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Pareto distribution with scale=1 and alpha=1 tensor([ 1.5623]) - Parameters
 - 
arg_constraints= {'alpha': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
mean¶
 - 
support¶
 - 
variance¶
 
Poisson¶
- 
class torch.distributions.poisson.Poisson(rate, validate_args=None)¶
- Bases: - torch.distributions.exp_family.ExponentialFamily- Creates a Poisson distribution parameterized by - rate, the rate parameter.- Samples are nonnegative integers, with a pmf given by \[\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!} \]- Example: - >>> m = Poisson(torch.tensor([4])) >>> m.sample() tensor([ 3.]) - Parameters
- rate (Number, Tensor) – the rate parameter 
 - 
arg_constraints= {'rate': GreaterThan(lower_bound=0.0)}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
sample(sample_shape=torch.Size([]))¶
 - 
support= IntegerGreaterThan(lower_bound=0)¶
 - 
variance¶
 
RelaxedBernoulli¶
- 
class torch.distributions.relaxed_bernoulli.RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Creates a RelaxedBernoulli distribution, parametrized by - temperature, and either- probsor- logits(but not both). This is a relaxed version of the Bernoulli distribution, so the values are in (0, 1), and has reparametrizable samples.- Example: - >>> m = RelaxedBernoulli(torch.tensor([2.2]), torch.tensor([0.1, 0.2, 0.3, 0.99])) >>> m.sample() tensor([ 0.2951, 0.3442, 0.8918, 0.9021]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
logits¶
 - 
probs¶
 - 
support= Interval(lower_bound=0.0, upper_bound=1.0)¶
 - 
temperature¶
 
LogitRelaxedBernoulli¶
- 
class torch.distributions.relaxed_bernoulli.LogitRelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a LogitRelaxedBernoulli distribution parameterized by - probsor- logits(but not both), which is the logit of a RelaxedBernoulli distribution.- Samples are logits of values in (0, 1). See [1] for more details. - Parameters
 - [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables (Maddison et al, 2017) - [2] Categorical Reparametrization with Gumbel-Softmax (Jang et al, 2017) - 
arg_constraints= {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
log_prob(value)¶
 - 
logits¶
 - 
param_shape¶
 - 
probs¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
support= Real()¶
 
RelaxedOneHotCategorical¶
- 
class torch.distributions.relaxed_categorical.RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Creates a RelaxedOneHotCategorical distribution parametrized by - temperature, and either- probsor- logits. This is a relaxed version of the- OneHotCategoricaldistribution, so its samples are on simplex, and are reparametrizable.- Example: - >>> m = RelaxedOneHotCategorical(torch.tensor([2.2]), torch.tensor([0.1, 0.2, 0.3, 0.4])) >>> m.sample() tensor([ 0.1294, 0.2324, 0.3859, 0.2523]) - Parameters
 - 
arg_constraints= {'logits': Real(), 'probs': Simplex()}¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
logits¶
 - 
probs¶
 - 
support= Simplex()¶
 - 
temperature¶
 
StudentT¶
- 
class torch.distributions.studentT.StudentT(df, loc=0.0, scale=1.0, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Creates a Student’s t-distribution parameterized by degree of freedom - df, mean- locand scale- scale.- Example: - >>> m = StudentT(torch.tensor([2.0])) >>> m.sample() # Student's t-distributed with degrees of freedom=2 tensor([ 0.1046]) - Parameters
 - 
arg_constraints= {'df': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
support= Real()¶
 - 
variance¶
 
TransformedDistribution¶
- 
class torch.distributions.transformed_distribution.TransformedDistribution(base_distribution, transforms, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Extension of the Distribution class, which applies a sequence of Transforms to a base distribution. Let f be the composition of transforms applied: - X ~ BaseDistribution Y = f(X) ~ TransformedDistribution(BaseDistribution, f) log p(Y) = log p(X) + log |det (dX/dY)| - Note that the - .event_shapeof a- TransformedDistributionis the maximum shape of its base distribution and its transforms, since transforms can introduce correlations among events.- An example for the usage of - TransformedDistributionwould be:- # Building a Logistic Distribution # X ~ Uniform(0, 1) # f = a + b * logit(X) # Y ~ f(X) ~ Logistic(a, b) base_distribution = Uniform(0, 1) transforms = [SigmoidTransform().inv, AffineTransform(loc=a, scale=b)] logistic = TransformedDistribution(base_distribution, transforms) - For more examples, please look at the implementations of - Gumbel,- HalfCauchy,- HalfNormal,- LogNormal,- Pareto,- Weibull,- RelaxedBernoulliand- RelaxedOneHotCategorical- 
arg_constraints= {}¶
 - 
cdf(value)¶
- Computes the cumulative distribution function by inverting the transform(s) and computing the score of the base distribution. 
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample¶
 - 
icdf(value)¶
- Computes the inverse cumulative distribution function using transform(s) and computing the score of the base distribution. 
 - 
log_prob(value)¶
- Scores the sample by inverting the transform(s) and computing the score using the score of the base distribution and the log abs det jacobian. 
 - 
rsample(sample_shape=torch.Size([]))¶
- Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched. Samples first from base distribution and applies transform() for every transform in the list. 
 - 
sample(sample_shape=torch.Size([]))¶
- Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched. Samples first from base distribution and applies transform() for every transform in the list. 
 - 
support¶
 
- 
Uniform¶
- 
class torch.distributions.uniform.Uniform(low, high, validate_args=None)¶
- Bases: - torch.distributions.distribution.Distribution- Generates uniformly distributed random samples from the half-open interval - [low, high).- Example: - >>> m = Uniform(torch.tensor([0.0]), torch.tensor([5.0])) >>> m.sample() # uniformly distributed in the range [0.0, 5.0) tensor([ 2.3418]) - Parameters
 - 
arg_constraints= {'high': Dependent(), 'low': Dependent()}¶
 - 
cdf(value)¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
has_rsample= True¶
 - 
icdf(value)¶
 - 
log_prob(value)¶
 - 
mean¶
 - 
rsample(sample_shape=torch.Size([]))¶
 - 
stddev¶
 - 
support¶
 - 
variance¶
 
Weibull¶
- 
class torch.distributions.weibull.Weibull(scale, concentration, validate_args=None)¶
- Bases: - torch.distributions.transformed_distribution.TransformedDistribution- Samples from a two-parameter Weibull distribution. - Example - >>> m = Weibull(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Weibull distribution with scale=1, concentration=1 tensor([ 0.4784]) - Parameters
 - 
arg_constraints= {'concentration': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}¶
 - 
entropy()¶
 - 
expand(batch_shape, _instance=None)¶
 - 
mean¶
 - 
support= GreaterThan(lower_bound=0.0)¶
 - 
variance¶
 
KL Divergence¶
- 
torch.distributions.kl.kl_divergence(p, q)¶
- Compute Kullback-Leibler divergence \(KL(p \| q)\) between two distributions. \[KL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx\]- Parameters
- p (Distribution) – A - Distributionobject.
- q (Distribution) – A - Distributionobject.
 
- Returns
- A batch of KL divergences of shape batch_shape. 
- Return type
- Raises
- NotImplementedError – If the distribution types have not been registered via - register_kl().
 
- 
torch.distributions.kl.register_kl(type_p, type_q)¶
- Decorator to register a pairwise function with - kl_divergence(). Usage:- @register_kl(Normal, Normal) def kl_normal_normal(p, q): # insert implementation here - Lookup returns the most specific (type,type) match ordered by subclass. If the match is ambiguous, a RuntimeWarning is raised. For example to resolve the ambiguous situation: - @register_kl(BaseP, DerivedQ) def kl_version1(p, q): ... @register_kl(DerivedP, BaseQ) def kl_version2(p, q): ... - you should register a third most-specific implementation, e.g.: - register_kl(DerivedP, DerivedQ)(kl_version1) # Break the tie. 
Transforms¶
- 
class torch.distributions.transforms.Transform(cache_size=0)¶
- Abstract class for invertable transformations with computable log det jacobians. They are primarily used in - torch.distributions.TransformedDistribution.- Caching is useful for tranforms whose inverses are either expensive or numerically unstable. Note that care must be taken with memoized values since the autograd graph may be reversed. For example while the following works with or without caching: - y = t(x) t.log_abs_det_jacobian(x, y).backward() # x will receive gradients. - However the following will error when caching due to dependency reversal: - y = t(x) z = t.inv(y) grad(z.sum(), [y]) # error because z is x - Derived classes should implement one or both of - _call()or- _inverse(). Derived classes that set bijective=True should also implement- log_abs_det_jacobian().- Parameters
- cache_size (int) – Size of cache. If zero, no caching is done. If one, the latest single value is cached. Only 0 and 1 are supported. 
- Variables
- ~Transform.domain ( - Constraint) – The constraint representing valid inputs to this transform.
- ~Transform.codomain ( - Constraint) – The constraint representing valid outputs to this transform which are inputs to the inverse transform.
- ~Transform.bijective (bool) – Whether this transform is bijective. A transform - tis bijective iff- t.inv(t(x)) == xand- t(t.inv(y)) == yfor every- xin the domain and- yin the codomain. Transforms that are not bijective should at least maintain the weaker pseudoinverse properties- t(t.inv(t(x)) == t(x)and- t.inv(t(t.inv(y))) == t.inv(y).
- ~Transform.sign (int or Tensor) – For bijective univariate transforms, this should be +1 or -1 depending on whether transform is monotone increasing or decreasing. 
- ~Transform.event_dim (int) – Number of dimensions that are correlated together in the transform - event_shape. This should be 0 for pointwise transforms, 1 for transforms that act jointly on vectors, 2 for transforms that act jointly on matrices, etc.
 
 - 
sign¶
- Returns the sign of the determinant of the Jacobian, if applicable. In general this only makes sense for bijective transforms. 
 - 
log_abs_det_jacobian(x, y)¶
- Computes the log det jacobian log |dy/dx| given input and output. 
 
- 
class torch.distributions.transforms.ComposeTransform(parts)¶
- Composes multiple transforms in a chain. The transforms being composed are responsible for caching. - Parameters
- parts (list of - Transform) – A list of transforms to compose.
 
- 
class torch.distributions.transforms.ExpTransform(cache_size=0)¶
- Transform via the mapping \(y = \exp(x)\). 
- 
class torch.distributions.transforms.PowerTransform(exponent, cache_size=0)¶
- Transform via the mapping \(y = x^{\text{exponent}}\). 
- 
class torch.distributions.transforms.SigmoidTransform(cache_size=0)¶
- Transform via the mapping \(y = \frac{1}{1 + \exp(-x)}\) and \(x = \text{logit}(y)\). 
- 
class torch.distributions.transforms.AbsTransform(cache_size=0)¶
- Transform via the mapping \(y = |x|\). 
- 
class torch.distributions.transforms.AffineTransform(loc, scale, event_dim=0, cache_size=0)¶
- Transform via the pointwise affine mapping \(y = \text{loc} + \text{scale} \times x\). 
- 
class torch.distributions.transforms.SoftmaxTransform(cache_size=0)¶
- Transform from unconstrained space to the simplex via \(y = \exp(x)\) then normalizing. - This is not bijective and cannot be used for HMC. However this acts mostly coordinate-wise (except for the final normalization), and thus is appropriate for coordinate-wise optimization algorithms. 
- 
class torch.distributions.transforms.StickBreakingTransform(cache_size=0)¶
- Transform from unconstrained space to the simplex of one additional dimension via a stick-breaking process. - This transform arises as an iterated sigmoid transform in a stick-breaking construction of the Dirichlet distribution: the first logit is transformed via sigmoid to the first probability and the probability of everything else, and then the process recurses. - This is bijective and appropriate for use in HMC; however it mixes coordinates together and is less appropriate for optimization. 
- 
class torch.distributions.transforms.LowerCholeskyTransform(cache_size=0)¶
- Transform from unconstrained matrices to lower-triangular matrices with nonnegative diagonal entries. - This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization. 
Constraints¶
The following constraints are implemented:
- constraints.boolean
- constraints.dependent
- constraints.greater_than(lower_bound)
- constraints.integer_interval(lower_bound, upper_bound)
- constraints.interval(lower_bound, upper_bound)
- constraints.lower_cholesky
- constraints.lower_triangular
- constraints.nonnegative_integer
- constraints.positive
- constraints.positive_definite
- constraints.positive_integer
- constraints.real
- constraints.real_vector
- constraints.simplex
- constraints.unit_interval
- 
class torch.distributions.constraints.Constraint¶
- Abstract base class for constraints. - A constraint object represents a region over which a variable is valid, e.g. within which a variable can be optimized. - 
check(value)¶
- Returns a byte tensor of sample_shape + batch_shape indicating whether each event in value satisfies this constraint. 
 
- 
- 
torch.distributions.constraints.dependent_property¶
- alias of - torch.distributions.constraints._DependentProperty
- 
torch.distributions.constraints.integer_interval¶
- alias of - torch.distributions.constraints._IntegerInterval
- 
torch.distributions.constraints.greater_than¶
- alias of - torch.distributions.constraints._GreaterThan
- 
torch.distributions.constraints.greater_than_eq¶
- alias of - torch.distributions.constraints._GreaterThanEq
- 
torch.distributions.constraints.less_than¶
- alias of - torch.distributions.constraints._LessThan
- 
torch.distributions.constraints.interval¶
- alias of - torch.distributions.constraints._Interval
- 
torch.distributions.constraints.half_open_interval¶
- alias of - torch.distributions.constraints._HalfOpenInterval
Constraint Registry¶
PyTorch provides two global ConstraintRegistry objects that link
Constraint objects to
Transform objects. These objects both
input constraints and return transforms, but they have different guarantees on
bijectivity.
- biject_to(constraint)looks up a bijective- Transformfrom- constraints.realto the given- constraint. The returned transform is guaranteed to have- .bijective = Trueand should implement- .log_abs_det_jacobian().
- transform_to(constraint)looks up a not-necessarily bijective- Transformfrom- constraints.realto the given- constraint. The returned transform is not guaranteed to implement- .log_abs_det_jacobian().
The transform_to() registry is useful for performing unconstrained
optimization on constrained parameters of probability distributions, which are
indicated by each distribution’s .arg_constraints dict. These transforms often
overparameterize a space in order to avoid rotation; they are thus more
suitable for coordinate-wise optimization algorithms like Adam:
loc = torch.zeros(100, requires_grad=True)
unconstrained = torch.zeros(100, requires_grad=True)
scale = transform_to(Normal.arg_constraints['scale'])(unconstrained)
loss = -Normal(loc, scale).log_prob(data).sum()
The biject_to() registry is useful for Hamiltonian Monte Carlo, where
samples from a probability distribution with constrained .support are
propagated in an unconstrained space, and algorithms are typically rotation
invariant.:
dist = Exponential(rate)
unconstrained = torch.zeros(100, requires_grad=True)
sample = biject_to(dist.support)(unconstrained)
potential_energy = -dist.log_prob(sample).sum()
Note
An example where transform_to and biject_to differ is
constraints.simplex: transform_to(constraints.simplex) returns a
SoftmaxTransform that simply
exponentiates and normalizes its inputs; this is a cheap and mostly
coordinate-wise operation appropriate for algorithms like SVI. In
contrast, biject_to(constraints.simplex) returns a
StickBreakingTransform that
bijects its input down to a one-fewer-dimensional space; this a more
expensive less numerically stable transform but is needed for algorithms
like HMC.
The biject_to and transform_to objects can be extended by user-defined
constraints and transforms using their .register() method either as a
function on singleton constraints:
transform_to.register(my_constraint, my_transform)
or as a decorator on parameterized constraints:
@transform_to.register(MyConstraintClass)
def my_factory(constraint):
    assert isinstance(constraint, MyConstraintClass)
    return MyTransform(constraint.param1, constraint.param2)
You can create your own registry by creating a new ConstraintRegistry
object.
- 
class torch.distributions.constraint_registry.ConstraintRegistry¶
- Registry to link constraints to transforms. - 
register(constraint, factory=None)¶
- Registers a - Constraintsubclass in this registry. Usage:- @my_registry.register(MyConstraintClass) def construct_transform(constraint): assert isinstance(constraint, MyConstraint) return MyTransform(constraint.arg_constraints) - Parameters
- constraint (subclass of - Constraint) – A subclass of- Constraint, or a singleton object of the desired class.
- factory (callable) – A callable that inputs a constraint object and returns a - Transformobject.
 
 
 
- 
TorchScript¶
TorchScript is a way to create serializable and optimizable models from PyTorch code. Any code written in TorchScript can be saved from your Python process and loaded in a process where there is no Python dependency.
We provide tools to incrementally transition a model from being a pure Python program to a TorchScript program that can be run independently from Python, for instance, in a standalone C++ program. This makes it possible to train models in PyTorch using familiar tools and then export the model to a production environment where it is not a good idea to run models as Python programs for performance and multi-threading reasons.
Creating TorchScript Code¶
- 
class torch.jit.ScriptModule(optimize=True)¶
- The core data structure in TorchScript is the - ScriptModule. It is an analogue of torch’s nn.Module and represents an entire model as a tree of submodules. Like normal modules, each individual module in a ScriptModule can have submodules, parameters, and methods. In nn.Modules methods are implemented as Python functions, but in ScriptModules methods typically implemented as TorchScript functions, a statically-typed subset of Python that contains all of PyTorch’s built-in Tensor operations. This difference allows your ScriptModules code to run without the need for a Python interpreter.- ScriptModules and the TorchScript functions inside of them can be created in two ways: - Tracing: - Using - torch.jit.trace, you can take an existing module or python function, provide example inputs, and we run the function, recording the operations performed on all the tensors. We turn the resulting recording into a TorchScript method that is installed as the- forwardmethod of a ScriptModule. This module also contains any parameters that the original module had as well.- Example: - import torch def foo(x, y): return 2*x + y traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3))) - Note - Tracing a function will produce a - ScriptModulewith a single- forwardmethod that implements that function, and that contains no parameters.- Example: - import torch import torchvision traced_net = torch.jit.trace(torchvision.models.resnet18(), torch.rand(1, 3, 224, 224)) - Note - Tracing only records operations done when the given function is run on the given tensors. Therefore, the returned - ScriptModulewill always run the same traced graph on any input. This has some important implications when your module is expected to run different sets of operations, depending on the input and/or the module state. For example,- Tracing will not record any control-flow like if statements or loops. When this control-flow is constant across your module, this is fine and it often just inlines configuration decisions. But sometimes the control-flow is actually part of the model itself. For instance, a recurrent network is a loop over the (possibly dynamic) length of an input sequence. 
- In the returned - ScriptModule, operations that have different behaviors in- trainingand- evalmodes will always behave as if it is in the mode it was in during tracing, no matter which mode the- ScriptModuleis in.
 - In cases like these, tracing would not be appropriate and scripting is a better choice. - Scripting: - You can write TorchScript code directly using Python syntax. You do this using the - torch.jit.scriptannotation (for functions) or- torch.jit.script_methodannotation (for methods) on subclasses of ScriptModule. With this annotation the body of the annotated function is directly translated into TorchScript. TorchScript itself is a subset of the Python language, so not all features in python work, but we provide enough functionality to compute on tensors and do control-dependent operations.- Example: - import torch @torch.jit.script def foo(x, y): if x.max() > y.max(): r = x else: r = y return r - Note - A script function annotation will construct a ScriptModule with a single - forwardmethod that implements that function, and that contains no parameters.- Example: - import torch class MyModule(torch.jit.ScriptModule): def __init__(self, N, M): super(MyModule, self).__init__() self.weight = torch.nn.Parameter(torch.rand(N, M)) @torch.jit.script_method def forward(self, input): return self.weight.mv(input) - Example: - import torch import torch.nn as nn import torch.nn.functional as F from torch.jit import ScriptModule, script_method, trace class MyScriptModule(ScriptModule): def __init__(self): super(MyScriptModule, self).__init__() # trace produces a ScriptModule's conv1 and conv2 self.conv1 = trace(nn.Conv2d(1, 20, 5), torch.rand(1, 1, 16, 16)) self.conv2 = trace(nn.Conv2d(20, 20, 5), torch.rand(1, 20, 16, 16)) @script_method def forward(self, input): input = F.relu(self.conv1(input)) input = F.relu(self.conv2(input)) return input - 
save(filename)¶
- Save an offline version of this module for use in a separate process. The saved module serializes all of the methods and parameters of this module. It can be loaded into the C++ API using - torch::jit::load(filename)or into the Python API with- torch.jit.load(filename).- To be able to save a module, it must not make any calls to native python functions. This means that all submodules must be subclasses of ScriptModules as well. - Danger - All modules, no matter their device, are always loaded onto the CPU during loading. This is different from - torch.load()’s semantics and may change in the future.
 
- 
torch.jit.load(f, map_location=None, _extra_files=ExtraFilesMap{})¶
- Load a - ScriptModulepreviously saved with- save- All previously saved modules, no matter their device, are first loaded onto CPU, and then are moved to the devices they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised. However, storages can be dynamically remapped to an alternative set of devices using the map_location argument. Comparing to - torch.load(), map_location in this function is simplified, which only accepts a string (e.g., ‘cpu’, ‘cuda:0’), or torch.device (e.g., torch.device(‘cpu’))- Parameters
- f – a file-like object (has to implement read, readline, tell, and seek), or a string containing a file name 
- map_location – can a string (e.g., ‘cpu’, ‘cuda:0’), a device (e.g., torch.device(‘cpu’)) 
- _extra_files – map from filename to content. The extra filenames given in the map would be loaded and their content would be stored in the provided map. 
 
- Returns
- A - ScriptModuleobject.
 - Example - >>> torch.jit.load('scriptmodule.pt') # Load ScriptModule from io.BytesIO object >>> with open('scriptmodule.pt', 'rb') as f: buffer = io.BytesIO(f.read()) # Load all tensors to the original device >>> torch.jit.load(buffer) # Load all tensors onto CPU, using a device >>> torch.jit.load(buffer, map_location=torch.device('cpu')) # Load all tensors onto CPU, using a string >>> torch.jit.load(buffer, map_location='cpu') # Load with extra files. >>> files = {'metadata.json' : ''} >>> torch.jit.load('scriptmodule.pt', _extra_files = files) >>> print (files['metadata.json']) 
- 
torch.jit.trace(func, example_inputs, optimize=True, check_trace=True, check_inputs=None, check_tolerance=1e-05, _force_outplace=False, _module_class=None)¶
- Trace a function and return an executable trace that will be optimized using just-in-time compilation. - Warning - Tracing only correctly records functions and modules which are not data dependent (e.g., have conditionals on data in tensors) and do not have any untracked external dependencies (e.g., perform input/output or access global variables). If you trace such models, you may silently get incorrect results on subsequent invocations of the model. The tracer will try to emit warnings when doing something that may cause an incorrect trace to be produced. - Parameters
- func (callable or torch.nn.Module) – a python function or torch.nn.Module that will be run with example_inputs. arguments and returns to func must be Tensors or (possibly nested) tuples that contain tensors. 
- example_inputs (tuple) – a tuple of example inputs that will be passed to the function while tracing. The resulting trace can be run with inputs of different types and shapes assuming the traced operations support those types and shapes. example_inputs may also be a single Tensor in which case it is automatically wrapped in a tuple 
 
- Keyword Arguments
- optimize (bool, optional) – whether or not to apply optimizations. Default: - True.
- check_trace (bool, optional) – check if the same inputs run through traced code produce the same outputs. Default: - True. You might want to disable this if, for example, your network contains non- deterministic ops or if you are sure that the network is correct despite a checker failure.
- check_inputs (list of tuples, optional) – A list of tuples of input arguments that should be used to check the trace against what is expected. Each tuple is equivalent to a seet of input arguments that would be specified in - args. For best results, pass in a set of checking inputs representative of the space of shapes and types of inputs you expect the network to see. If not specified, the original- argsis used for checking
- check_tolerance (float, optional) – Floating-point comparison tolerance to use in the checker procedure. This can be used to relax the checker strictness in the event that results diverge numerically for a known reason, such as operator fusion. 
 
- Returns
- A - ScriptModuleobject with a single- forward()method containing the traced code. When func is a- torch.nn.Module, the returned- ScriptModulewill have the same set of sub-modules and parameters as func.
 - Example - >>> def f(x): ... return x * 2 >>> traced_f = torch.jit.trace(f, torch.rand(1)) 
Mixing Tracing and Scripting¶
In many cases either tracing or script is an easier approach for converting a model. We allow you to compose tracing and scripting to suit the particular requirements of a part of a model.
Scripted functions can call traced ones. This is particularly useful when you need to use control-flow around a simple feed-forward model. For instance the beam search of a sequence to sequence model will typically be written in script but can call an encoder module generated using tracing.
Example:
import torch
def foo(x, y):
    return 2 * x + y
traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3)))
@torch.jit.script
def bar(x):
    return traced_foo(x, x)
Traced functions can call script functions. This is useful when a small part of a model requires some control-flow even though most of the model is just a feed-forward network. Control-flow inside of a script function called by a traced function is preserved correctly:
Example:
import torch
@torch.jit.script
def foo(x, y):
    if x.max() > y.max():
        r = x
    else:
        r = y
    return r
def bar(x, y, z):
    return foo(x, y) + z
traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3))
This composition also works for modules as well, where it can be used to generate a submodule using tracing that can be called from the methods of a script module:
Example:
import torch
import torchvision
class MyScriptModule(torch.jit.ScriptModule):
    def __init__(self):
        super(MyScriptModule, self).__init__()
        self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
                                        .resize_(1, 3, 1, 1))
        self.resnet = torch.jit.trace(torchvision.models.resnet18(),
                                      torch.rand(1, 3, 224, 224))
    @torch.jit.script_method
    def forward(self, input):
        return self.resnet(input - self.means)
TorchScript Language Reference¶
TorchScript is a subset of Python that can either be written directly (using the @script annotations) or generated automatically from Python code via tracing. When using tracing, code is automatically converted into this subset of Python by recording only the actual operators on tensors and simply executing and discarding the other surrounding Python code.
When writing TorchScript directly using @script annotations, the programmer must only use the subset of Python supported in TorchScript. This section documents what is supported in TorchScript as if it were a language reference for a stand alone language. Any features of Python not mentioned in this reference are not part of TorchScript.
As a subset of Python any valid TorchScript function is also a valid Python function. This makes it possible to remove the @script annotations and debug the function using standard Python tools like pdb. The reverse is not true: there are many valid python programs that are not valid TorchScript programs. Instead, TorchScript focuses specifically on the features of Python that are needed to represent neural network models in Torch.
- 
PYTORCH_JIT=1¶
- Setting the environment variable - PYTORCH_JIT=0will disable all script and tracing annotations. If there is hard-to-debug error in one of your ScriptModules, you can use this flag to force everything to run using native Python. This allows the use of tools like- pdbto debug code.
Types¶
The largest difference between TorchScript and the full Python language is that TorchScript only support a small set of types that are needed to express neural net models. In particular TorchScript supports:
- Tensor
- A PyTorch tensor of any dtype, dimension, or backend. 
- Tuple[T0, T1, ...]
- A tuple containing subtypes - T0,- T1, etc. (e.g.- Tuple[Tensor, Tensor])
- bool
- A boolean value 
- int
- A scalar integer 
- float
- A scalar floating point number 
- List[T]
- A list of which all members are type - T
- Optional[T]
- A value which is either None or type - T
- `Dict[K, V]
- A dict with key type - Kand value type- V. Only- str,- int, and- floatare allowed as key types.
Unlike Python, each variable in TorchScript function must have a single static type. This makes it easier to optimize TorchScript functions.
Example:
@torch.jit.script
def an_error(x):
    if x:
        r = torch.rand(1)
    else:
        r = 4
    return r # Type mismatch: r is set to type Tensor in the true branch
             # and type int in the false branch
There are 2 scenarios in which you can annotate:
- Function Argument Type Annotation 
By default, all parameters to a TorchScript function are assumed to be Tensor because this is the most common type used in modules. To specify that an argument to a TorchScript function is another type, it is possible to use MyPy-style type annotations using the types listed above:
Example:
@torch.jit.script
def foo(x, tup):
    # type: (int, Tuple[Tensor, Tensor]) -> Tensor
    t0, t1 = tup
    return t0 + t1 + x
print(foo(3, (torch.rand(3), torch.rand(3))))
Note
It is also possible to annotate types with Python 3 type annotations. In our examples, we use comment-based annotations to ensure Python 2 compatibility as well.
- Variable Type Annotation 
A list by default is assumed to be List[Tensor] and empty dicts
Dict[str, Tensor]. To instantiate an empty list or dict of other types,
use torch.jit.annotate.
Example:
import torch
from torch.jit import Tensor
from typing import List, Tuple
class EmptyDataStructures(torch.jit.ScriptModule):
    def __init__(self):
        super(EmptyDataStructures, self).__init__()
    @torch.jit.script_method
    def forward(self, x):
        # type: (Tensor) -> Tuple[List[Tuple[Tensor, Tensor]], Dict[int, Tensor]]
        # This annotates the list to be a `List[Tuple[Tensor, Tensor]]`
        list_of_tuple = torch.jit.annotate(List[Tuple[Tensor, Tensor]], [])
        for i in range(10):
            list_of_tuple.append((x, x))
            # This annotates the list to be a `Dict[int, Tensor]`
        int_tensor_dict = torch.jit.annotate(Dict[int, Tensor], {})
        return list_of_tuple, int_tensor_dict
Optional Type Refinement:
TorchScript will refine the type of a variable of type Optional[T] when a comparison to None is made inside the conditional of an if statement. The compiler can reason about multiple None checks that are combined with AND, OR, or NOT. Refinement will also occur for else blocks of if statements that are not explicitly written.
The expression must be emitted within the conditional; assigning a None check to a variable and using it in the conditional will not refine types.
Example:
@torch.jit.script
def opt_unwrap(x, y, z):
  # type: (Optional[int], Optional[int], Optional[int]) -> int
  if x is None:
    x = 1
  x = x + 1
  if y is not None and z is not None:
    x = y + z
  return x
Expressions¶
The following Python Expressions are supported
- Literals
- True,- False,- None,- 'string literals',- "string literals", number literals- 3(interpreted as int)- 3.4(interpreter as a float)
- Variables
- a- Note - See Variable Resolution for how variables are resolved. 
- Tuple Construction
- (3, 4),- (3,)
- List Construction
- [3, 4],- [],- [torch.rand(3), torch.rand(4)]- Note - an empty list is assumed have type - List[Tensor]. The types of other list literals are derived from the type of the members.
- Dict Construction
- {'hello': 3},- {},- {'a': torch.rand(3), 'b': torch.rand(4)}- Note - an empty dict is assumed have type - Dict[str, Tensor]. The types of other dict literals are derived from the type of the members.
- Arithmetic Operators
- a + b- a - b- a * b- a / b- a ^ b- a @ b
- Comparison Operators
- a == b- a != b- a < b- a > b- a <= b- a >= b
- Logical Operators
- a and b- a or b- not b
- Subscripts
- t[0]- t[-1]- t[0:2]- t[1:]- t[:1]- t[:]- t[0, 1]- t[0, 1:2]- t[0, :1]- t[-1, 1:, 0]- t[1:, -1, 0]- t[i:j, i]- Note - TorchScript currently does not support mutating tensors in place, so any tensor indexing can only appear on the right-hand size of an expression. 
- Function calls
- Calls to built-in functions: - torch.rand(3, dtype=torch.int)- Calls to other script functions: - import torch @torch.jit.script def foo(x): return x + 1 @torch.jit.script def bar(x): return foo(x) 
- Method calls
- Calls to methods of builtin types like tensor: - x.mm(y)- When defining a Script method inside of a ScriptModule, the - @script_methodannotation is used. Inside of these methods it is possible to call other methods of this class or access methods on the submodules.- Calling a submodule directly (e.g. - self.resnet(input)) is equivalent to calling its- forwardmethod (e.g.- self.resnet.forward(input))- import torch class MyScriptModule(torch.jit.ScriptModule): def __init__(self): super(MyScriptModule, self).__init__() self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68]) .resize_(1, 3, 1, 1)) self.resnet = torch.jit.trace(torchvision.models.resnet18(), torch.rand(1, 3, 224, 224)) @torch.jit.script_method def helper(self, input): return self.resnet(input - self.means) @torch.jit.script_method def forward(self, input): return self.helper(input) 
- If expressions
- x if x > y else y
- Casts
- float(ten),- int(3.5),- bool(ten)
- Accessing Module Parameters
- self.my_parameter- self.my_submodule.my_parameter
Statements¶
TorchScript supports the following types of statements:
Simple Assignments
a = b a += b # short-hand for a = a + b, does not operate in-place on a a -= b
Pattern Matching Assignments
a, b = tuple_or_list a, b, *c = a_tuple
Print Statements
print("the result of an add:", a + b)
If Statements
if a < 4: r = -a elif a < 3: r = a + a else: r = 3 * a
While Loops
a = 0 while a < 4: print(a) a += 1
For loops with range
x = 0 for i in range(10): x *= iNote
Script currently does not support iterating over generic iterable objects like lists or tensors. Script currently does not support start or increment parameters to range. These will be added in a future version.
For loops over tuples:
tup = (3, torch.rand(4)) for x in tup: print(x)Note
for loops over tuples will unroll the loop, generating a body for each member of the tuple. The body must type-check correctly for each member.
For loops over constant torch.nn.ModuleList
class SubModule(torch.jit.ScriptModule): def __init__(self): super(Sub, self).__init__() self.weight = nn.Parameter(torch.randn(2)) @torch.jit.script_method def forward(self, input): return self.weight + input class MyModule(torch.jit.ScriptModule): __constants__ = ['mods'] def __init__(self): super(MyModule, self).__init__() self.mods = torch.nn.ModuleList([SubModule() for i in range(10)]) @torch.jit.script_method def forward(self, v): for module in self.mods: v = m(v) return vNote
To use a module list inside a
@script_methodit must be marked constant by adding the name of the attribute to the__constants__list for the type. For loops over a ModuleList will unroll the body of the loop at compile time, with each member of the constant module list.
- Return
- return a, b- Note - TorchScript allows returns in the following circumstances:
- At the end of a function 
- In an if-statement where <true> and <false> both return 
- In an if-statement where <true> returns and <false> is empty (an early return) 
 
 
Variable Resolution¶
TorchScript supports a subset of Python’s variable resolution (i.e. scoping) rules. Local variables behave the same as in Python, except for the restriction that a variable must have the same type along all paths through a function. If a variable has a different type on different sides of an if statement, it is an error to use it after the end of the if statement.
Similarly, a variable is not allowed to be used if it is only defined along some paths through the function.
Example:
@torch.jit.script
def foo(x):
    if x < 0:
        y = 4
    print(y) # Error: undefined value y
Non-local variables are resolved to Python values at compile time when the function is defined. These values are then converted into TorchScript values using the rules described in Use of Python Values.
Use of Python Values¶
To make writing TorchScript more convenient, we allow script code to refer
to Python values in the surrounding scope. For instance, any time there is a
reference to torch, the TorchScript compiler is actually resolving it to the
torch Python module when the function is declared.  These Python values are
not a first class part of TorchScript. Instead they are desugared at compile-time
into the primitive types that TorchScript supports. This section describes the
rules that are used when accessing Python values in TorchScript. They depend
on the dynamic type of the python valued referenced.
- Functions
- TorchScript can call python functions. This functionality is very useful when incrementally converting a model into script. The model can be moved function-by-function to script, leaving calls to Python functions in place. This way you can incrementally check the correctness of the model as you go. - Example: - def foo(x): print("I am called with {}".format(x)) import pdb; pdb.set_trace() return x @torch.jit.script def bar(x) return foo(x + 1) - Note - Attempting to call - saveon a ScriptModule that contains calls to Python functions will fail. The intention is that this pathway is used for debugging and the calls removed or turned into script functions before saving.
- Attribute Lookup On Python Modules
- TorchScript can lookup attributes on modules. Builtin functions like - torch.addare accessed this way. This allows TorchScript to call functions defined in other modules.
- Python-defined Constants
- TorchScript also provides a way to use constants that are defined in Python. These can be used to hard-code hyper-parameters into the function, or to define universal constants. There are two ways of specifying that a Python value should be treated as a constant. - Values looked up as attributes of a module are assumed to be constant. Example: - math.pi
- Attributes of a ScriptModule can be marked constant by listing them as a member of the - __constants__property of the class:- Example: - class Foo(torch.jit.ScriptModule): __constants__ = ['a'] def __init__(self): super(Foo, self).__init__(False) self.a = 1 + 4 @torch.jit.ScriptModule def forward(self, input): return self.a + input 
 - Supported constant Python Values are - int
- bool
- torch.device
- torch.layout
- torch.dtype
- tuples containing supported types 
- torch.nn.ModuleListwhich can be used in a TorchScript for loop
 
Debugging¶
- Disable JIT for Debugging
- If you want to disable all JIT modes (tracing and scripting) so you can debug your program in raw Python, you can use the - PYTORCH_JITenvironment variable.- PYTORCH_JITcan be used to globally disable the JIT by setting its value to- 0. Given an example script:- @torch.jit.script def scripted_fn(x : torch.Tensor): for i in range(12): x = x + x return x def fn(x): x = torch.neg(x) import pdb; pdb.set_trace() return scripted_fn(x) traced_fn = torch.jit.trace(fn, (torch.rand(4, 5),)) traced_fn(torch.rand(3, 4)) - Debugging this script with PDB works except for when we invoke the @script function. We can globally disable JIT, so that we can call the @script function as a normal python function and not compile it. If the above script is called - disable_jit_example.py, we can invoke it like so:- $ PYTORCH_JIT=0 python disable_jit_example.py - and we will be able to step into the @script function as a normal Python function. 
- Interpreting Graphs
- TorchScript uses a static single assignment (SSA) intermediate representation (IR) to represent computation. The instructions in this format consist of ATen (the C++ backend of PyTorch) operators and other primitive operators, including control flow operators for loops and conditionals. As an example: - @torch.jit.script def foo(len): # type: (int) -> torch.Tensor rv = torch.zeros(3, 4) for i in range(len): if i < 10: rv = rv - 1.0 else: rv = rv + 1.0 return rv print(foo.graph) - A - ScriptModulewith a single- forwardmethod will have an attribute- graph, which you can use to inspect the IR representing the computation. If the ScriptModule has more than one method, you will need to access- .graphon the method itself and not the module. We can inspect the graph of a method named- baron a ScriptModule by accessing- .bar.graph.- The example script above produces the graph: - graph(%len : int) { %15 : int = prim::Constant[value=1]() %9 : bool = prim::Constant[value=1]() %7 : Device = prim::Constant[value="cpu"]() %6 : int = prim::Constant[value=0]() %5 : int = prim::Constant[value=6]() %1 : int = prim::Constant[value=3]() %2 : int = prim::Constant[value=4]() %11 : int = prim::Constant[value=10]() %14 : float = prim::Constant[value=1]() %4 : int[] = prim::ListConstruct(%1, %2) %rv.1 : Tensor = aten::zeros(%4, %5, %6, %7) %rv : Tensor = prim::Loop(%len, %9, %rv.1) block0(%i : int, %13 : Tensor) { %12 : bool = aten::lt(%i, %11) %rv.4 : Tensor = prim::If(%12) block0() { %rv.2 : Tensor = aten::sub(%13, %14, %15) -> (%rv.2) } block1() { %rv.3 : Tensor = aten::add(%13, %14, %15) -> (%rv.3) } -> (%9, %rv.4) } return (%rv); } - Take the instruction - %rv.1 : Dynamic = aten::zeros(%3, %4, %5, %6)for example.- %rv.1 : Dynamicmeans we assign the output to a (unique) value named- rv.1, and that value is of- Dynamictype, i.e. we do not know its concrete shape.- aten::zerosis the operator (equivalent to- torch.zeros) and the input list- (%3, %4, %5, %6)specifies which values in scope should be passed as inputs. The schema for built-in functions like- aten::zeroscan be found at Builtin Functions.- Notice that operators can also have associated - blocks, namely the- prim::Loopand- prim::Ifoperators. In the graph print-out, these operators are formatted to reflect their equivalent source code forms to facilitate easy debugging.- Graphs can be inspected as shown to confirm that the computation described by a - ScriptModuleis correct, in both automated and manual fashion, as described below.
- Tracing Edge Cases
- There are some edge cases that exist where the trace of a given Python function/module will not be representative of the underlying code. These cases can include: - Tracing of control flow that is dependent on inputs (e.g. tensor shapes) 
- Tracing of in-place operations of tensor views (e.g. indexing on the left-hand side of an assignment) 
 - Note that these cases may in fact be traceable in the future. 
- Automatic Trace Checking
- One way to automatically catch many errors in traces is by using - check_inputson the- torch.jit.trace()API.- check_inputstakes a list of tuples of inputs that will be used to re-trace the computation and verify the results. For example:- def loop_in_traced_fn(x): result = x[0] for i in range(x.size(0)): result = result * x[i] return result inputs = (torch.rand(3, 4, 5),) check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)] traced = torch.jit.trace(loop_in_traced_fn, inputs, check_inputs=check_inputs) - Gives us the following diagnostic information::
- ERROR: Graphs differed across invocations! Graph diff: - graph(%x : Tensor) { %1 : int = prim::Constant[value=0]() %2 : int = prim::Constant[value=0]() %result.1 : Tensor = aten::select(%x, %1, %2) %4 : int = prim::Constant[value=0]() %5 : int = prim::Constant[value=0]() %6 : Tensor = aten::select(%x, %4, %5) %result.2 : Tensor = aten::mul(%result.1, %6) %8 : int = prim::Constant[value=0]() %9 : int = prim::Constant[value=1]() %10 : Tensor = aten::select(%x, %8, %9) - %result : Tensor = aten::mul(%result.2, %10) + %result.3 : Tensor = aten::mul(%result.2, %10) ? ++ %12 : int = prim::Constant[value=0]() %13 : int = prim::Constant[value=2]() %14 : Tensor = aten::select(%x, %12, %13) + %result : Tensor = aten::mul(%result.3, %14) + %16 : int = prim::Constant[value=0]() + %17 : int = prim::Constant[value=3]() + %18 : Tensor = aten::select(%x, %16, %17) - %15 : Tensor = aten::mul(%result, %14) ? ^ ^ + %19 : Tensor = aten::mul(%result, %18) ? ^ ^ - return (%15); ? ^ + return (%19); ? ^ }
 - This message indicates to us that the computation differed between when we first traced it and when we traced it with the - check_inputs. Indeed, the loop within the body of- loop_in_traced_fndepends on the shape of the input- x, and thus when we try another- xwith a different shape, the trace differs.- In this case, data-dependent control flow like this can be captured using script instead: - def fn(x): result = x[0] for i in range(x.size(0)): result = result * x[i] return result inputs = (torch.rand(3, 4, 5),) check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)] scripted_fn = torch.jit.script(fn) print(scripted_fn.graph) for input_tuple in [inputs] + check_inputs: torch.testing.assert_allclose(fn(*input_tuple), scripted_fn(*input_tuple)) - Which produces: - graph(%x : Tensor) { %5 : bool = prim::Constant[value=1]() %1 : int = prim::Constant[value=0]() %result.1 : Tensor = aten::select(%x, %1, %1) %4 : int = aten::size(%x, %1) %result : Tensor = prim::Loop(%4, %5, %result.1) block0(%i : int, %7 : Tensor) { %10 : Tensor = aten::select(%x, %1, %i) %result.2 : Tensor = aten::mul(%7, %10) -> (%5, %result.2) } return (%result); } 
- Tracer Warnings
- The tracer produces warnings for several problematic patterns in traced computation. As an example, take a trace of a function that contains an in-place assignment on a slice (a view) of a Tensor: - def fill_row_zero(x): x[0] = torch.rand(*x.shape[1:2]) return x traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) print(traced.graph) - Produces several warnings and a graph which simply returns the input: - fill_row_zero.py:4: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe. x[0] = torch.rand(*x.shape[1:2]) fill_row_zero.py:6: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error: Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 1] (0.09115803241729736 vs. 0.6782537698745728) and 3 other locations (33.00%) traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) graph(%0 : Float(3, 4)) { return (%0); }- We can fix this by modifying the code to not use the in-place update, but rather build up the result tensor out-of-place with torch.cat: - def fill_row_zero(x): x = torch.cat((torch.rand(1, *x.shape[1:2]), x[1:2]), dim=0) return x traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) print(traced.graph) 
Builtin Functions¶
TorchScript supports a subset of the builtin tensor and neural network
functions that PyTorch provides. Most methods on Tensor as well as functions in
the torch namespace, all functions in torch.nn.functional and all
modules from torch.nn are supported in TorchScript, excluding those in the
table below. For unsupported modules, we suggest using torch.jit.trace().
Unsupported torch.nn Modules
torch.nn.modules.adaptive.AdaptiveLogSoftmaxWithLoss
torch.nn.modules.normalization.CrossMapLRN2d
torch.nn.modules.fold.Fold
torch.nn.modules.fold.Unfold
torch.nn.modules.rnn.GRU
torch.nn.modules.rnn.LSTM
torch.nn.modules.rnn.RNN
torch.nn.modules.rnn.GRUCell
torch.nn.modules.rnn.LSTMCell
torch.nn.modules.rnn.RNNCell
Supported Functions¶
torch.Size(sizes : List[int]) -> List[int]
torch.abs(self : Tensor) -> Tensor
torch.abs(self : Tensor,
          out : Tensor) -> Tensor
torch.abs_(self : Tensor) -> Tensor
torch.acos(self : Tensor,
           out : Tensor) -> Tensor
torch.acos(self : Tensor) -> Tensor
torch.acos_(self : Tensor) -> Tensor
torch.adaptive_avg_pool1d(self : Tensor,
                          output_size : List[int]) -> Tensor
torch.adaptive_max_pool1d(self : Tensor,
                          output_size : List[int]) -> Tuple[Tensor, Tensor]
torch.add(self : Tensor,
          other : Tensor,
          alpha : number=1) -> Tensor
torch.add(self : Tensor,
          other : number,
          alpha : number=1) -> Tensor
torch.add(self : Tensor,
          other : Tensor,
          alpha : number=1,
          out : Tensor) -> Tensor
torch.add(a : str,
          b : str) -> str
torch.add(a : List[int],
          b : List[int]) -> List[int]
torch.add(a : List[float],
          b : List[float]) -> List[float]
torch.add(a : List[bool],
          b : List[bool]) -> List[bool]
torch.add(a : List[Tensor],
          b : List[Tensor]) -> List[Tensor]
torch.add(a : List[t],
          b : List[t]) -> List[t]
torch.add(a : int,
          b : int) -> int
torch.add(a : float,
          b : float) -> float
torch.add(a : int,
          b : float) -> float
torch.add(a : float,
          b : int) -> float
torch.addbmm(self : Tensor,
             batch1 : Tensor,
             batch2 : Tensor,
             beta : number=1,
             alpha : number=1) -> Tensor
torch.addbmm(self : Tensor,
             batch1 : Tensor,
             batch2 : Tensor,
             beta : number=1,
             alpha : number=1,
             out : Tensor) -> Tensor
torch.addcdiv(self : Tensor,
              tensor1 : Tensor,
              tensor2 : Tensor,
              value : number=1,
              out : Tensor) -> Tensor
torch.addcdiv(self : Tensor,
              tensor1 : Tensor,
              tensor2 : Tensor,
              value : number=1) -> Tensor
torch.addcmul(self : Tensor,
              tensor1 : Tensor,
              tensor2 : Tensor,
              value : number=1) -> Tensor
torch.addcmul(self : Tensor,
              tensor1 : Tensor,
              tensor2 : Tensor,
              value : number=1,
              out : Tensor) -> Tensor
torch.addmm(self : Tensor,
            mat1 : Tensor,
            mat2 : Tensor,
            beta : number=1,
            alpha : number=1,
            out : Tensor) -> Tensor
torch.addmm(self : Tensor,
            mat1 : Tensor,
            mat2 : Tensor,
            beta : number=1,
            alpha : number=1) -> Tensor
torch.addmv(self : Tensor,
            mat : Tensor,
            vec : Tensor,
            beta : number=1,
            alpha : number=1,
            out : Tensor) -> Tensor
torch.addmv(self : Tensor,
            mat : Tensor,
            vec : Tensor,
            beta : number=1,
            alpha : number=1) -> Tensor
torch.addmv_(self : Tensor,
             mat : Tensor,
             vec : Tensor,
             beta : number=1,
             alpha : number=1) -> Tensor
torch.addr(self : Tensor,
           vec1 : Tensor,
           vec2 : Tensor,
           beta : number=1,
           alpha : number=1) -> Tensor
torch.addr(self : Tensor,
           vec1 : Tensor,
           vec2 : Tensor,
           beta : number=1,
           alpha : number=1,
           out : Tensor) -> Tensor
torch.affine_grid_generator(theta : Tensor,
                            size : List[int]) -> Tensor
torch.all(self : Tensor) -> Tensor
torch.all(self : Tensor,
          dim : int,
          keepdim : bool=False) -> Tensor
torch.all(self : Tensor,
          dim : int,
          keepdim : bool=False,
          out : Tensor) -> Tensor
torch.allclose(self : Tensor,
               other : Tensor,
               rtol : float=1e-05,
               atol : float=1e-08,
               equal_nan : bool=False) -> bool
torch.alpha_dropout(input : Tensor,
                    p : float,
                    train : bool) -> Tensor
torch.alpha_dropout_(self : Tensor,
                     p : float,
                     train : bool) -> Tensor
torch.any(self : Tensor) -> Tensor
torch.any(self : Tensor,
          dim : int,
          keepdim : bool=False) -> Tensor
torch.any(self : Tensor,
          dim : int,
          keepdim : bool=False,
          out : Tensor) -> Tensor
torch.arange(end : number,
             dtype : Optional[int],
             layout : Optional[int],
             device : Optional[Device]) -> Tensor
torch.arange(start : number,
             end : number,
             dtype : Optional[int],
             layout : Optional[int],
             device : Optional[Device]) -> Tensor
torch.arange(start : number,
             end : number,
             step : number,
             dtype : Optional[int],
             layout : Optional[int],
             device : Optional[Device]) -> Tensor
torch.arange(end : number,
             out : Tensor) -> Tensor
torch.arange(start : number,
             end : number,
             step : number=1,
             out : Tensor) -> Tensor
torch.argmax(self : Tensor) -> Tensor
torch.argmax(self : Tensor,
             dim : int,
             keepdim : bool=False) -> Tensor
torch.argmin(self : Tensor) -> Tensor
torch.argmin(self : Tensor,
             dim : int,
             keepdim : bool=False) -> Tensor
torch.argsort(self : Tensor,
              dim : int=-1,
              descending : bool=False) -> Tensor
torch.as_strided(self : Tensor,
                 size : List[int],
                 stride : List[int],
                 storage_offset : Optional[int]) -> Tensor
torch.as_strided_(self : Tensor,
                  size : List[int],
                  stride : List[int],
                  storage_offset : Optional[int]) -> Tensor
torch.asin(self : Tensor) -> Tensor
torch.asin(self : Tensor,
           out : Tensor) -> Tensor
torch.asin_(self : Tensor) -> Tensor
torch.atan(self : Tensor) -> Tensor
torch.atan(self : Tensor,
           out : Tensor) -> Tensor
torch.atan2(self : Tensor,
            other : Tensor,
            out : Tensor) -> Tensor
torch.atan2(self : Tensor,
            other : Tensor) -> Tensor
torch.atan_(self : Tensor) -> Tensor
torch.avg_pool1d(self : Tensor,
                 kernel_size : List[int],
                 stride : List[int]=[],
                 padding : List[int]=[0],
                 ceil_mode : bool=False,
                 count_include_pad : bool=True) -> Tensor
torch.baddbmm(self : Tensor,
              batch1 : Tensor,
              batch2 : Tensor,
              beta : number=1,
              alpha : number=1) -> Tensor
torch.baddbmm(self : Tensor,
              batch1 : Tensor,
              batch2 : Tensor,
              beta : number=1,
              alpha : number=1,
              out : Tensor) -> Tensor
torch.bartlett_window(window_length : int,
                      dtype : Optional[int],
                      layout : Optional[int],
                      device : Optional[Device]) -> Tensor
torch.bartlett_window(window_length : int,
                      periodic : bool,
                      dtype : Optional[int],
                      layout : Optional[int],
                      device : Optional[Device]) -> Tensor
torch.batch_norm(input : Tensor,
                 weight : Optional[Tensor],
                 bias : Optional[Tensor],
                 running_mean : Optional[Tensor],
                 running_var : Optional[Tensor],
                 training : bool,
                 momentum : float,
                 eps : float,
                 cudnn_enabled : bool) -> Tensor
torch.batch_norm_backward_elemt(grad_out : Tensor,
                                input : Tensor,
                                mean : Tensor,
                                invstd : Tensor,
                                weight : Optional[Tensor],
                                mean_dy : Tensor,
                                mean_dy_xmu : Tensor) -> Tensor
torch.batch_norm_backward_reduce(grad_out : Tensor,
                                 input : Tensor,
                                 mean : Tensor,
                                 invstd : Tensor,
                                 input_g : bool,
                                 weight_g : bool,
                                 bias_g : bool) -> Tuple[Tensor, Tensor, Tensor, Tensor]
torch.batch_norm_elemt(input : Tensor,
                       weight : Optional[Tensor],
                       bias : Optional[Tensor],
                       mean : Tensor,
                       invstd : Tensor,
                       eps : float) -> Tensor
torch.batch_norm_gather_stats(input : Tensor,
                              mean : Tensor,
                              invstd : Tensor,
                              running_mean : Optional[Tensor],
                              running_var : Optional[Tensor],
                              momentum : float,
                              eps : float,
                              count : int) -> Tuple[Tensor, Tensor]
torch.batch_norm_stats(input : Tensor,
                       eps : float) -> Tuple[Tensor, Tensor]
torch.batch_norm_update_stats(input : Tensor,
                              running_mean : Optional[Tensor],
                              running_var : Optional[Tensor],
                              momentum : float) -> Tuple[Tensor, Tensor]
torch.bernoulli(self : Tensor,
                generator : Optional[Generator]) -> Tensor
torch.bernoulli(self : Tensor,
                p : float,
                generator : Optional[Generator]) -> Tensor
torch.bernoulli(self : Tensor,
                generator : Optional[Generator],
                out : Tensor) -> Tensor
torch.bilinear(input1 : Tensor,
               input2 : Tensor,
               weight : Tensor,
               bias : Optional[Tensor]) -> Tensor
torch.binary_cross_entropy_with_logits(self : Tensor,
                                       target : Tensor,
                                       weight : Optional[Tensor],
                                       pos_weight : Optional[Tensor],
                                       reduction : int) -> Tensor
torch.bincount(self : Tensor,
               weights : Optional[Tensor],
               minlength : int=0) -> Tensor
torch.blackman_window(window_length : int,
                      dtype : Optional[int],
                      layout : Optional[int],
                      device : Optional[Device]) -> Tensor
torch.blackman_window(window_length : int,
                      periodic : bool,
                      dtype : Optional[int],
                      layout : Optional[int],
                      device : Optional[Device]) -> Tensor
torch.bmm(self : Tensor,
          mat2 : Tensor) -> Tensor
torch.bmm(self : Tensor,
          mat2 : Tensor,
          out : Tensor) -> Tensor
torch.broadcast_tensors(tensors : List[Tensor]) -> List[Tensor]
torch.btrifact(self : Tensor,
               pivot : bool=True) -> Tuple[Tensor, Tensor]
torch.btrifact_with_info(self : Tensor,
                         pivot : bool=True) -> Tuple[Tensor, Tensor, Tensor]
torch.btrisolve(self : Tensor,
                LU_data : Tensor,
                LU_pivots : Tensor,
                out : Tensor) -> Tensor
torch.btrisolve(self : Tensor,
                LU_data : Tensor,
                LU_pivots : Tensor) -> Tensor
torch.cartesian_prod(tensors : List[Tensor]) -> Tensor
torch.cat(tensors : List[Tensor],
          dim : int=0) -> Tensor
torch.cat(tensors : List[Tensor],
          dim : int=0,
          out : Tensor) -> Tensor
torch.cdist(x1 : Tensor,
            x2 : Tensor,
            p : float=2.0) -> Tensor
torch.ceil(self : Tensor,
           out : Tensor) -> Tensor
torch.ceil(self : Tensor) -> Tensor
torch.ceil_(self : Tensor) -> Tensor
torch.celu(self : Tensor,
           alpha : number=1.0) -> Tensor
torch.celu_(self : Tensor,
            alpha : number=1.0) -> Tensor
torch.chain_matmul(matrices : List[Tensor]) -> Tensor
torch.cholesky(self : Tensor,
               upper : bool=False,
               out : Tensor) -> Tensor
torch.cholesky(self : Tensor,
               upper : bool=False) -> Tensor
torch.cholesky_solve(self : Tensor,
                     input2 : Tensor,
                     upper : bool=False,
                     out : Tensor) -> Tensor
torch.cholesky_solve(self : Tensor,
                     input2 : Tensor,
                     upper : bool=False) -> Tensor
torch.chunk(self : Tensor,
            chunks : int,
            dim : int=0) -> List[Tensor]
torch.clamp(self : Tensor,
            min : Optional[number],
            max : Optional[number]) -> Tensor
torch.clamp(self : Tensor,
            min : Optional[number],
            max : Optional[number],
            out : Tensor) -> Tensor
torch.clamp_(self : Tensor,
             min : Optional[number],
             max : Optional[number]) -> Tensor
torch.clamp_max(self : Tensor,
                max : number) -> Tensor
torch.clamp_max(self : Tensor,
                max : number,
                out : Tensor) -> Tensor
torch.clamp_max_(self : Tensor,
                 max : number) -> Tensor
torch.clamp_min(self : Tensor,
                min : number,
                out : Tensor) -> Tensor
torch.clamp_min(self : Tensor,
                min : number) -> Tensor
torch.clamp_min_(self : Tensor,
                 min : number) -> Tensor
torch.clone(self : Tensor) -> Tensor
torch.combinations(self : Tensor,
                   r : int=2,
                   with_replacement : bool=False) -> Tensor
torch.constant_pad_nd(self : Tensor,
                      pad : List[int],
                      value : number=0) -> Tensor
torch.conv1d(input : Tensor,
             weight : Tensor,
             bias : Optional[Tensor],
             stride : List[int]=[1],
             padding : List[int]=[0],
             dilation : List[int]=[1],
             groups : int=1) -> Tensor
torch.conv2d(input : Tensor,
             weight : Tensor,
             bias : Optional[Tensor],
             stride : List[int]=[1, 1],
             padding : List[int]=[0, 0],
             dilation : List[int]=[1, 1],
             groups : int=1) -> Tensor
torch.conv3d(input : Tensor,
             weight : Tensor,
             bias : Optional[Tensor],
             stride : List[int]=[1, 1, 1],
             padding : List[int]=[0, 0, 0],
             dilation : List[int]=[1, 1, 1],
             groups : int=1) -> Tensor
torch.conv_tbc(self : Tensor,
               weight : Tensor,
               bias : Tensor,
               pad : int=0) -> Tensor
torch.conv_transpose1d(input : Tensor,
                       weight : Tensor,
                       bias : Optional[Tensor],
                       stride : List[int]=[1],
                       padding : List[int]=[0],
                       output_padding : List[int]=[0],
                       groups : int=1,
                       dilation : List[int]=[1]) -> Tensor
torch.conv_transpose2d(input : Tensor,
                       weight : Tensor,
                       bias : Optional[Tensor],
                       stride : List[int]=[1, 1],
                       padding : List[int]=[0, 0],
                       output_padding : List[int]=[0, 0],
                       groups : int=1,
                       dilation : List[int]=[1, 1]) -> Tensor
torch.conv_transpose3d(input : Tensor,
                       weight : Tensor,
                       bias : Optional[Tensor],
                       stride : List[int]=[1, 1, 1],
                       padding : List[int]=[0, 0, 0],
                       output_padding : List[int]=[0, 0, 0],
                       groups : int=1,
                       dilation : List[int]=[1, 1, 1]) -> Tensor
torch.convolution(input : Tensor,
                  weight : Tensor,
                  bias : Optional[Tensor],
                  stride : List[int],
                  padding : List[int],
                  dilation : List[int],
                  transposed : bool,
                  output_padding : List[int],
                  groups : int) -> Tensor
torch.cos(self : Tensor) -> Tensor
torch.cos(self : Tensor,
          out : Tensor) -> Tensor
torch.cos_(self : Tensor) -> Tensor
torch.cosh(self : Tensor) -> Tensor
torch.cosh(self : Tensor,
           out : Tensor) -> Tensor
torch.cosh_(self : Tensor) -> Tensor
torch.cosine_embedding_loss(input1 : Tensor,
                            input2 : Tensor,
                            target : Tensor,
                            margin : float=0.0,
                            reduction : int=1) -> Tensor
torch.cosine_similarity(x1 : Tensor,
                        x2 : Tensor,
                        dim : int=1,
                        eps : float=1e-08) -> Tensor
torch.cross(self : Tensor,
            other : Tensor,
            dim : int=-1,
            out : Tensor) -> Tensor
torch.cross(self : Tensor,
            other : Tensor,
            dim : int=-1) -> Tensor
torch.ctc_loss(log_probs : Tensor,
               targets : Tensor,
               input_lengths : Tensor,
               target_lengths : Tensor,
               blank : int=0,
               reduction : int=1,
               zero_infinity : bool=False) -> Tensor
torch.ctc_loss(log_probs : Tensor,
               targets : Tensor,
               input_lengths : List[int],
               target_lengths : List[int],
               blank : int=0,
               reduction : int=1,
               zero_infinity : bool=False) -> Tensor
torch.cudnn_affine_grid_generator(theta : Tensor,
                                  N : int,
                                  C : int,
                                  H : int,
                                  W : int) -> Tensor
torch.cudnn_batch_norm(input : Tensor,
                       weight : Tensor,
                       bias : Optional[Tensor],
                       running_mean : Optional[Tensor],
                       running_var : Optional[Tensor],
                       training : bool,
                       exponential_average_factor : float,
                       epsilon : float) -> Tuple[Tensor, Tensor, Tensor]
torch.cudnn_convolution(self : Tensor,
                        weight : Tensor,
                        bias : Optional[Tensor],
                        padding : List[int],
                        stride : List[int],
                        dilation : List[int],
                        groups : int,
                        benchmark : bool,
                        deterministic : bool) -> Tensor
torch.cudnn_convolution_transpose(self : Tensor,
                                  weight : Tensor,
                                  bias : Optional[Tensor],
                                  padding : List[int],
                                  output_padding : List[int],
                                  stride : List[int],
                                  dilation : List[int],
                                  groups : int,
                                  benchmark : bool,
                                  deterministic : bool) -> Tensor
torch.cudnn_grid_sampler(self : Tensor,
                         grid : Tensor) -> Tensor
torch.cudnn_is_acceptable(self : Tensor) -> bool
torch.cumprod(self : Tensor,
              dim : int) -> Tensor
torch.cumprod(self : Tensor,
              dim : int,
              dtype : int) -> Tensor
torch.cumprod(self : Tensor,
              dim : int,
              out : Tensor) -> Tensor
torch.cumprod(self : Tensor,
              dim : int,
              dtype : int,
              out : Tensor) -> Tensor
torch.cumsum(self : Tensor,
             dim : int) -> Tensor
torch.cumsum(self : Tensor,
             dim : int,
             dtype : int) -> Tensor
torch.cumsum(self : Tensor,
             dim : int,
             out : Tensor) -> Tensor
torch.cumsum(self : Tensor,
             dim : int,
             dtype : int,
             out : Tensor) -> Tensor
torch.det(self : Tensor) -> Tensor
torch.detach(self : Tensor) -> Tensor
torch.detach_(self : Tensor) -> Tensor
torch.device(a : str) -> Device
torch.diag(self : Tensor,
           diagonal : int=0) -> Tensor
torch.diag(self : Tensor,
           diagonal : int=0,
           out : Tensor) -> Tensor
torch.diag_embed(self : Tensor,
                 offset : int=0,
                 dim1 : int=-2,
                 dim2 : int=-1) -> Tensor
torch.diagflat(self : Tensor,
               offset : int=0) -> Tensor
torch.diagonal(self : Tensor,
               offset : int=0,
               dim1 : int=0,
               dim2 : int=1) -> Tensor
torch.digamma(self : Tensor) -> Tensor
torch.digamma(self : Tensor,
              out : Tensor) -> Tensor
torch.dist(self : Tensor,
           other : Tensor,
           p : number=2) -> Tensor
torch.div(self : Tensor,
          other : Tensor,
          out : Tensor) -> Tensor
torch.div(self : Tensor,
          other : Tensor) -> Tensor
torch.div(self : Tensor,
          other : number) -> Tensor
torch.div(a : int,
          b : int) -> float
torch.div(a : float,
          b : float) -> float
torch.dot(self : Tensor,
          tensor : Tensor) -> Tensor
torch.dot(self : Tensor,
          tensor : Tensor,
          out : Tensor) -> Tensor
torch.dropout(input : Tensor,
              p : float,
              train : bool) -> Tensor
torch.dropout_(self : Tensor,
               p : float,
               train : bool) -> Tensor
torch.eig(self : Tensor,
          eigenvectors : bool=False) -> Tuple[Tensor, Tensor]
torch.einsum(equation : str,
             tensors : List[Tensor]) -> Tensor
torch.embedding(weight : Tensor,
                indices : Tensor,
                padding_idx : int=-1,
                scale_grad_by_freq : bool=False,
                sparse : bool=False) -> Tensor
torch.embedding_bag(weight : Tensor,
                    indices : Tensor,
                    offsets : Tensor,
                    scale_grad_by_freq : bool=False,
                    mode : int=0,
                    sparse : bool=False) -> Tuple[Tensor, Tensor, Tensor, Tensor]
torch.embedding_renorm_(self : Tensor,
                        indices : Tensor,
                        max_norm : float,
                        norm_type : float) -> Tensor
torch.empty(size : List[int],
            dtype : Optional[int],
            layout : Optional[int],
            device : Optional[Device]) -> Tensor
torch.empty(size : List[int],
            out : Tensor) -> Tensor
torch.empty_like(self : Tensor) -> Tensor
torch.empty_like(self : Tensor,
                 dtype : int,
                 layout : int,
                 device : Device) -> Tensor
torch.empty_strided(size : List[int],
                    stride : List[int],
                    dtype : Optional[int],
                    layout : Optional[int],
                    device : Optional[Device]) -> Tensor
torch.eq(self : Tensor,
         other : Tensor) -> Tensor
torch.eq(self : Tensor,
         other : number) -> Tensor
torch.eq(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.eq(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.eq(a : Device,
         b : Device) -> bool
torch.eq(a : str,
         b : str) -> bool
torch.eq(a : List[int],
         b : List[int]) -> bool
torch.eq(a : List[float],
         b : List[float]) -> bool
torch.eq(a : List[Tensor],
         b : List[Tensor]) -> bool
torch.eq(a : List[bool],
         b : List[bool]) -> bool
torch.eq(a : int,
         b : int) -> bool
torch.eq(a : float,
         b : float) -> bool
torch.eq(a : int,
         b : float) -> bool
torch.eq(a : float,
         b : int) -> bool
torch.equal(self : Tensor,
            other : Tensor) -> bool
torch.erf(self : Tensor,
          out : Tensor) -> Tensor
torch.erf(self : Tensor) -> Tensor
torch.erf_(self : Tensor) -> Tensor
torch.erfc(self : Tensor,
           out : Tensor) -> Tensor
torch.erfc(self : Tensor) -> Tensor
torch.erfc_(self : Tensor) -> Tensor
torch.erfinv(self : Tensor,
             out : Tensor) -> Tensor
torch.erfinv(self : Tensor) -> Tensor
torch.exp(self : Tensor) -> Tensor
torch.exp(self : Tensor,
          out : Tensor) -> Tensor
torch.exp_(self : Tensor) -> Tensor
torch.expm1(self : Tensor,
            out : Tensor) -> Tensor
torch.expm1(self : Tensor) -> Tensor
torch.expm1_(self : Tensor) -> Tensor
torch.eye(n : int,
          out : Tensor) -> Tensor
torch.eye(n : int,
          m : int,
          out : Tensor) -> Tensor
torch.eye(n : int,
          dtype : Optional[int],
          layout : Optional[int],
          device : Optional[Device]) -> Tensor
torch.eye(n : int,
          m : int,
          dtype : Optional[int],
          layout : Optional[int],
          device : Optional[Device]) -> Tensor
torch.fbgemm_is_cpu_supported() -> bool
torch.fbgemm_linear_int8_weight(input : Tensor,
                                weight : Tensor,
                                packed : Tensor,
                                col_offsets : Tensor,
                                weight_scale : number,
                                weight_zero_point : number,
                                bias : Tensor) -> Tensor
torch.fbgemm_linear_quantize_weight(input : Tensor) -> Tuple[Tensor, Tensor, float, int]
torch.fbgemm_pack_quantized_matrix(input : Tensor,
                                   K : int,
                                   N : int) -> Tensor
torch.feature_alpha_dropout(input : Tensor,
                            p : float,
                            train : bool) -> Tensor
torch.feature_alpha_dropout_(self : Tensor,
                             p : float,
                             train : bool) -> Tensor
torch.feature_dropout(input : Tensor,
                      p : float,
                      train : bool) -> Tensor
torch.feature_dropout_(self : Tensor,
                       p : float,
                       train : bool) -> Tensor
torch.fft(self : Tensor,
          signal_ndim : int,
          normalized : bool=False) -> Tensor
torch.fill_(self : Tensor,
            value : Tensor) -> Tensor
torch.fill_(self : Tensor,
            value : number) -> Tensor
torch.flatten(self : Tensor,
              start_dim : int=0,
              end_dim : int=-1) -> Tensor
torch.flip(self : Tensor,
           dims : List[int]) -> Tensor
torch.floor(self : Tensor) -> Tensor
torch.floor(self : Tensor,
            out : Tensor) -> Tensor
torch.floor(a : float) -> int
torch.floor_(self : Tensor) -> Tensor
torch.fmod(self : Tensor,
           other : Tensor,
           out : Tensor) -> Tensor
torch.fmod(self : Tensor,
           other : number,
           out : Tensor) -> Tensor
torch.fmod(self : Tensor,
           other : Tensor) -> Tensor
torch.fmod(self : Tensor,
           other : number) -> Tensor
torch.frac(self : Tensor) -> Tensor
torch.frac(self : Tensor,
           out : Tensor) -> Tensor
torch.frobenius_norm(self : Tensor) -> Tensor
torch.frobenius_norm(self : Tensor,
                     dim : List[int],
                     keepdim : bool=False) -> Tensor
torch.frobenius_norm(self : Tensor,
                     dim : List[int],
                     keepdim : bool=False,
                     out : Tensor) -> Tensor
torch.full(size : List[int],
           fill_value : number,
           dtype : Optional[int],
           layout : Optional[int],
           device : Optional[Device]) -> Tensor
torch.full(size : List[int],
           fill_value : number,
           out : Tensor) -> Tensor
torch.full_like(self : Tensor,
                fill_value : number) -> Tensor
torch.full_like(self : Tensor,
                fill_value : number,
                dtype : int,
                layout : int,
                device : Device) -> Tensor
torch.gather(self : Tensor,
             dim : int,
             index : Tensor,
             sparse_grad : bool=False,
             out : Tensor) -> Tensor
torch.gather(self : Tensor,
             dim : int,
             index : Tensor,
             sparse_grad : bool=False) -> Tensor
torch.ge(self : Tensor,
         other : Tensor) -> Tensor
torch.ge(self : Tensor,
         other : number) -> Tensor
torch.ge(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.ge(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.ge(a : int,
         b : int) -> bool
torch.ge(a : float,
         b : float) -> bool
torch.ge(a : int,
         b : float) -> bool
torch.ge(a : float,
         b : int) -> bool
torch.gels(self : Tensor,
           A : Tensor) -> Tuple[Tensor, Tensor]
torch.geqrf(self : Tensor) -> Tuple[Tensor, Tensor]
torch.ger(self : Tensor,
          vec2 : Tensor) -> Tensor
torch.ger(self : Tensor,
          vec2 : Tensor,
          out : Tensor) -> Tensor
torch.get_device(self : Tensor) -> int
torch.get_device(self : Tensor) -> int
torch.get_device(self : Tensor) -> int
torch.grid_sampler(input : Tensor,
                   grid : Tensor,
                   interpolation_mode : int,
                   padding_mode : int) -> Tensor
torch.grid_sampler_2d(input : Tensor,
                      grid : Tensor,
                      interpolation_mode : int,
                      padding_mode : int) -> Tensor
torch.grid_sampler_3d(input : Tensor,
                      grid : Tensor,
                      interpolation_mode : int,
                      padding_mode : int) -> Tensor
torch.group_norm(input : Tensor,
                 num_groups : int,
                 weight : Optional[Tensor],
                 bias : Optional[Tensor],
                 eps : float=1e-05,
                 cudnn_enabled : bool=True) -> Tensor
torch.gru(data : Tensor,
          batch_sizes : Tensor,
          hx : Tensor,
          params : List[Tensor],
          has_biases : bool,
          num_layers : int,
          dropout : float,
          train : bool,
          bidirectional : bool) -> Tuple[Tensor, Tensor]
torch.gru(input : Tensor,
          hx : Tensor,
          params : List[Tensor],
          has_biases : bool,
          num_layers : int,
          dropout : float,
          train : bool,
          bidirectional : bool,
          batch_first : bool) -> Tuple[Tensor, Tensor]
torch.gru_cell(input : Tensor,
               hx : Tensor,
               w_ih : Tensor,
               w_hh : Tensor,
               b_ih : Optional[Tensor],
               b_hh : Optional[Tensor]) -> Tensor
torch.gt(self : Tensor,
         other : Tensor) -> Tensor
torch.gt(self : Tensor,
         other : number) -> Tensor
torch.gt(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.gt(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.gt(a : int,
         b : int) -> bool
torch.gt(a : float,
         b : float) -> bool
torch.gt(a : int,
         b : float) -> bool
torch.gt(a : float,
         b : int) -> bool
torch.hamming_window(window_length : int,
                     dtype : Optional[int],
                     layout : Optional[int],
                     device : Optional[Device]) -> Tensor
torch.hamming_window(window_length : int,
                     periodic : bool,
                     dtype : Optional[int],
                     layout : Optional[int],
                     device : Optional[Device]) -> Tensor
torch.hamming_window(window_length : int,
                     periodic : bool,
                     alpha : float,
                     dtype : Optional[int],
                     layout : Optional[int],
                     device : Optional[Device]) -> Tensor
torch.hamming_window(window_length : int,
                     periodic : bool,
                     alpha : float,
                     beta : float,
                     dtype : Optional[int],
                     layout : Optional[int],
                     device : Optional[Device]) -> Tensor
torch.hann_window(window_length : int,
                  dtype : Optional[int],
                  layout : Optional[int],
                  device : Optional[Device]) -> Tensor
torch.hann_window(window_length : int,
                  periodic : bool,
                  dtype : Optional[int],
                  layout : Optional[int],
                  device : Optional[Device]) -> Tensor
torch.hardshrink(self : Tensor,
                 lambd : number=0.5) -> Tensor
torch.hinge_embedding_loss(self : Tensor,
                           target : Tensor,
                           margin : float=1.0,
                           reduction : int=1) -> Tensor
torch.histc(self : Tensor,
            bins : int=100,
            min : number=0,
            max : number=0,
            out : Tensor) -> Tensor
torch.histc(self : Tensor,
            bins : int=100,
            min : number=0,
            max : number=0) -> Tensor
torch.hspmm(mat1 : Tensor,
            mat2 : Tensor) -> Tensor
torch.hspmm(mat1 : Tensor,
            mat2 : Tensor,
            out : Tensor) -> Tensor
torch.ifft(self : Tensor,
           signal_ndim : int,
           normalized : bool=False) -> Tensor
torch.index_add(self : Tensor,
                dim : int,
                index : Tensor,
                source : Tensor) -> Tensor
torch.index_copy(self : Tensor,
                 dim : int,
                 index : Tensor,
                 source : Tensor) -> Tensor
torch.index_fill(self : Tensor,
                 dim : int,
                 index : Tensor,
                 value : Tensor) -> Tensor
torch.index_fill(self : Tensor,
                 dim : int,
                 index : Tensor,
                 value : number) -> Tensor
torch.index_put(self : Tensor,
                indices : List[Optional[Tensor]],
                values : Tensor,
                accumulate : bool=False) -> Tensor
torch.index_put(self : Tensor,
                indices : List[Tensor],
                values : Tensor,
                accumulate : bool=False) -> Tensor
torch.index_put_(self : Tensor,
                 indices : List[Optional[Tensor]],
                 values : Tensor,
                 accumulate : bool=False) -> Tensor
torch.index_put_(self : Tensor,
                 indices : List[Tensor],
                 values : Tensor,
                 accumulate : bool=False) -> Tensor
torch.index_select(self : Tensor,
                   dim : int,
                   index : Tensor,
                   out : Tensor) -> Tensor
torch.index_select(self : Tensor,
                   dim : int,
                   index : Tensor) -> Tensor
torch.instance_norm(input : Tensor,
                    weight : Optional[Tensor],
                    bias : Optional[Tensor],
                    running_mean : Optional[Tensor],
                    running_var : Optional[Tensor],
                    use_input_stats : bool,
                    momentum : float,
                    eps : float,
                    cudnn_enabled : bool) -> Tensor
torch.inverse(self : Tensor,
              out : Tensor) -> Tensor
torch.inverse(self : Tensor) -> Tensor
torch.irfft(self : Tensor,
            signal_ndim : int,
            normalized : bool=False,
            onesided : bool=True,
            signal_sizes : List[int]=[]) -> Tensor
torch.is_complex(self : Tensor) -> bool
torch.is_distributed(self : Tensor) -> bool
torch.is_floating_point(self : Tensor) -> bool
torch.is_nonzero(self : Tensor) -> bool
torch.is_same_size(self : Tensor,
                   other : Tensor) -> bool
torch.is_signed(self : Tensor) -> bool
torch.isclose(self : Tensor,
              other : Tensor,
              rtol : float=1e-05,
              atol : float=1e-08,
              equal_nan : bool=False) -> Tensor
torch.isnan(self : Tensor) -> Tensor
torch.kl_div(self : Tensor,
             target : Tensor,
             reduction : int=1) -> Tensor
torch.kthvalue(self : Tensor,
               k : int,
               dim : int=-1,
               keepdim : bool=False) -> Tuple[Tensor, Tensor]
torch.layer_norm(input : Tensor,
                 normalized_shape : List[int],
                 weight : Optional[Tensor],
                 bias : Optional[Tensor],
                 eps : float=1e-05,
                 cudnn_enable : bool=True) -> Tensor
torch.le(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.le(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.le(self : Tensor,
         other : Tensor) -> Tensor
torch.le(self : Tensor,
         other : number) -> Tensor
torch.le(a : int,
         b : int) -> bool
torch.le(a : float,
         b : float) -> bool
torch.le(a : int,
         b : float) -> bool
torch.le(a : float,
         b : int) -> bool
torch.lerp(self : Tensor,
           end : Tensor,
           weight : Tensor) -> Tensor
torch.lerp(self : Tensor,
           end : Tensor,
           weight : number) -> Tensor
torch.lerp(self : Tensor,
           end : Tensor,
           weight : Tensor,
           out : Tensor) -> Tensor
torch.lerp(self : Tensor,
           end : Tensor,
           weight : number,
           out : Tensor) -> Tensor
torch.lgamma(self : Tensor,
             out : Tensor) -> Tensor
torch.lgamma(self : Tensor) -> Tensor
torch.linspace(start : number,
               end : number,
               steps : int=100,
               dtype : Optional[int],
               layout : Optional[int],
               device : Optional[Device]) -> Tensor
torch.linspace(start : number,
               end : number,
               steps : int=100,
               out : Tensor) -> Tensor
torch.log(self : Tensor) -> Tensor
torch.log(self : Tensor,
          out : Tensor) -> Tensor
torch.log10(self : Tensor,
            out : Tensor) -> Tensor
torch.log10(self : Tensor) -> Tensor
torch.log10_(self : Tensor) -> Tensor
torch.log1p(self : Tensor) -> Tensor
torch.log1p(self : Tensor,
            out : Tensor) -> Tensor
torch.log1p_(self : Tensor) -> Tensor
torch.log2(self : Tensor) -> Tensor
torch.log2(self : Tensor,
           out : Tensor) -> Tensor
torch.log2_(self : Tensor) -> Tensor
torch.log_(self : Tensor) -> Tensor
torch.log_softmax(self : Tensor,
                  dim : int) -> Tensor
torch.log_softmax(self : Tensor,
                  dim : int,
                  dtype : int) -> Tensor
torch.logdet(self : Tensor) -> Tensor
torch.logspace(start : number,
               end : number,
               steps : int=100,
               dtype : Optional[int],
               layout : Optional[int],
               device : Optional[Device]) -> Tensor
torch.logspace(start : number,
               end : number,
               steps : int=100,
               out : Tensor) -> Tensor
torch.logsumexp(self : Tensor,
                dim : List[int],
                keepdim : bool=False) -> Tensor
torch.logsumexp(self : Tensor,
                dim : List[int],
                keepdim : bool=False,
                out : Tensor) -> Tensor
torch.lstm(data : Tensor,
           batch_sizes : Tensor,
           hx : List[Tensor],
           params : List[Tensor],
           has_biases : bool,
           num_layers : int,
           dropout : float,
           train : bool,
           bidirectional : bool) -> Tuple[Tensor, Tensor, Tensor]
torch.lstm(input : Tensor,
           hx : List[Tensor],
           params : List[Tensor],
           has_biases : bool,
           num_layers : int,
           dropout : float,
           train : bool,
           bidirectional : bool,
           batch_first : bool) -> Tuple[Tensor, Tensor, Tensor]
torch.lstm_cell(input : Tensor,
                hx : List[Tensor],
                w_ih : Tensor,
                w_hh : Tensor,
                b_ih : Optional[Tensor],
                b_hh : Optional[Tensor]) -> Tuple[Tensor, Tensor]
torch.lt(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.lt(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.lt(self : Tensor,
         other : Tensor) -> Tensor
torch.lt(self : Tensor,
         other : number) -> Tensor
torch.lt(a : int,
         b : int) -> bool
torch.lt(a : float,
         b : float) -> bool
torch.lt(a : int,
         b : float) -> bool
torch.lt(a : float,
         b : int) -> bool
torch.margin_ranking_loss(input1 : Tensor,
                          input2 : Tensor,
                          target : Tensor,
                          margin : float=0.0,
                          reduction : int=1) -> Tensor
torch.masked_fill(self : Tensor,
                  mask : Tensor,
                  value : Tensor) -> Tensor
torch.masked_fill(self : Tensor,
                  mask : Tensor,
                  value : number) -> Tensor
torch.masked_scatter(self : Tensor,
                     mask : Tensor,
                     source : Tensor) -> Tensor
torch.masked_select(self : Tensor,
                    mask : Tensor,
                    out : Tensor) -> Tensor
torch.masked_select(self : Tensor,
                    mask : Tensor) -> Tensor
torch.matmul(self : Tensor,
             other : Tensor,
             out : Tensor) -> Tensor
torch.matmul(self : Tensor,
             other : Tensor) -> Tensor
torch.matrix_power(self : Tensor,
                   n : int) -> Tensor
torch.matrix_rank(self : Tensor,
                  symmetric : bool=False) -> Tensor
torch.matrix_rank(self : Tensor,
                  tol : float,
                  symmetric : bool=False) -> Tensor
torch.max(self : Tensor,
          other : Tensor,
          out : Tensor) -> Tensor
torch.max(self : Tensor) -> Tensor
torch.max(self : Tensor,
          other : Tensor) -> Tensor
torch.max(self : Tensor,
          dim : int,
          keepdim : bool=False) -> Tuple[Tensor, Tensor]
torch.max_pool1d_with_indices(self : Tensor,
                              kernel_size : List[int],
                              stride : List[int]=[],
                              padding : List[int]=[0],
                              dilation : List[int]=[1],
                              ceil_mode : bool=False) -> Tuple[Tensor, Tensor]
torch.mean(self : Tensor) -> Tensor
torch.mean(self : Tensor,
           dtype : int) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           keepdim : bool=False) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           dtype : int) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           keepdim : bool,
           dtype : int) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           keepdim : bool=False,
           out : Tensor) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           dtype : int,
           out : Tensor) -> Tensor
torch.mean(self : Tensor,
           dim : List[int],
           keepdim : bool,
           dtype : int,
           out : Tensor) -> Tensor
torch.median(self : Tensor) -> Tensor
torch.median(self : Tensor,
             dim : int,
             keepdim : bool=False) -> Tuple[Tensor, Tensor]
torch.meshgrid(tensors : List[Tensor]) -> List[Tensor]
torch.min(self : Tensor) -> Tensor
torch.min(self : Tensor,
          other : Tensor) -> Tensor
torch.min(self : Tensor,
          dim : int,
          keepdim : bool=False) -> Tuple[Tensor, Tensor]
torch.min(self : Tensor,
          other : Tensor,
          out : Tensor) -> Tensor
torch.miopen_batch_norm(input : Tensor,
                        weight : Tensor,
                        bias : Optional[Tensor],
                        running_mean : Optional[Tensor],
                        running_var : Optional[Tensor],
                        training : bool,
                        exponential_average_factor : float,
                        epsilon : float) -> Tuple[Tensor, Tensor, Tensor]
torch.miopen_convolution(self : Tensor,
                         weight : Tensor,
                         bias : Optional[Tensor],
                         padding : List[int],
                         stride : List[int],
                         dilation : List[int],
                         groups : int,
                         benchmark : bool,
                         deterministic : bool) -> Tensor
torch.miopen_convolution_transpose(self : Tensor,
                                   weight : Tensor,
                                   bias : Optional[Tensor],
                                   padding : List[int],
                                   output_padding : List[int],
                                   stride : List[int],
                                   dilation : List[int],
                                   groups : int,
                                   benchmark : bool,
                                   deterministic : bool) -> Tensor
torch.miopen_depthwise_convolution(self : Tensor,
                                   weight : Tensor,
                                   bias : Optional[Tensor],
                                   padding : List[int],
                                   stride : List[int],
                                   dilation : List[int],
                                   groups : int,
                                   benchmark : bool,
                                   deterministic : bool) -> Tensor
torch.mkldnn_convolution(self : Tensor,
                         weight : Tensor,
                         bias : Optional[Tensor],
                         padding : List[int],
                         stride : List[int],
                         dilation : List[int],
                         groups : int) -> Tensor
torch.mkldnn_convolution_backward_weights(weight_size : List[int],
                                          grad_output : Tensor,
                                          self : Tensor,
                                          padding : List[int],
                                          stride : List[int],
                                          dilation : List[int],
                                          groups : int,
                                          bias_defined : bool) -> Tuple[Tensor, Tensor]
torch.mm(self : Tensor,
         mat2 : Tensor,
         out : Tensor) -> Tensor
torch.mm(self : Tensor,
         mat2 : Tensor) -> Tensor
torch.mode(self : Tensor,
           dim : int=-1,
           keepdim : bool=False) -> Tuple[Tensor, Tensor]
torch.mul(self : Tensor,
          other : Tensor) -> Tensor
torch.mul(self : Tensor,
          other : number) -> Tensor
torch.mul(self : Tensor,
          other : Tensor,
          out : Tensor) -> Tensor
torch.mul(l : List[int],
          n : int) -> List[int]
torch.mul(n : int,
          l : List[int]) -> List[int]
torch.mul(l : List[float],
          n : int) -> List[float]
torch.mul(n : int,
          l : List[float]) -> List[float]
torch.mul(l : List[bool],
          n : int) -> List[bool]
torch.mul(n : int,
          l : List[bool]) -> List[bool]
torch.mul(l : List[Tensor],
          n : int) -> List[Tensor]
torch.mul(n : int,
          l : List[Tensor]) -> List[Tensor]
torch.mul(l : List[t],
          n : int) -> List[t]
torch.mul(n : int,
          l : List[t]) -> List[t]
torch.mul(a : int,
          b : int) -> int
torch.mul(a : float,
          b : float) -> float
torch.mul(a : int,
          b : float) -> float
torch.mul(a : float,
          b : int) -> float
torch.multinomial(self : Tensor,
                  num_samples : int,
                  replacement : bool=False,
                  generator : Optional[Generator]) -> Tensor
torch.multinomial(self : Tensor,
                  num_samples : int,
                  replacement : bool=False,
                  generator : Optional[Generator],
                  out : Tensor) -> Tensor
torch.mv(self : Tensor,
         vec : Tensor,
         out : Tensor) -> Tensor
torch.mv(self : Tensor,
         vec : Tensor) -> Tensor
torch.mvlgamma(self : Tensor,
               p : int) -> Tensor
torch.narrow(self : Tensor,
             dim : int,
             start : int,
             length : int) -> Tensor
torch.native_batch_norm(input : Tensor,
                        weight : Optional[Tensor],
                        bias : Optional[Tensor],
                        running_mean : Optional[Tensor],
                        running_var : Optional[Tensor],
                        training : bool,
                        momentum : float,
                        eps : float) -> Tuple[Tensor, Tensor, Tensor]
torch.native_clone(self : Tensor) -> Tensor
torch.native_norm(self : Tensor,
                  p : number=2) -> Tensor
torch.native_pow(self : Tensor,
                 exponent : number) -> Tensor
torch.native_pow(self : Tensor,
                 exponent : number,
                 out : Tensor) -> Tensor
torch.native_resize_as_(self : Tensor,
                        the_template : Tensor) -> Tensor
torch.native_zero_(self : Tensor) -> Tensor
torch.ne(self : Tensor,
         other : Tensor) -> Tensor
torch.ne(self : Tensor,
         other : number) -> Tensor
torch.ne(self : Tensor,
         other : Tensor,
         out : Tensor) -> Tensor
torch.ne(self : Tensor,
         other : number,
         out : Tensor) -> Tensor
torch.ne(a : str,
         b : str) -> bool
torch.ne(a : List[int],
         b : List[int]) -> bool
torch.ne(a : List[float],
         b : List[float]) -> bool
torch.ne(a : List[Tensor],
         b : List[Tensor]) -> bool
torch.ne(a : List[bool],
         b : List[bool]) -> bool
torch.ne(a : int,
         b : int) -> bool
torch.ne(a : float,
         b : float) -> bool
torch.ne(a : int,
         b : float) -> bool
torch.ne(a : float,
         b : int) -> bool
torch.neg(self : Tensor,
          out : Tensor) -> Tensor
torch.neg(self : Tensor) -> Tensor
torch.neg(self : int) -> int
torch.neg(self : float) -> float
torch.nonzero(self : Tensor,
              out : Tensor) -> Tensor
torch.nonzero(self : Tensor) -> Tensor
torch.norm(self : Tensor,
           p : number=2) -> Tensor
torch.norm(self : Tensor,
           p : Optional[number],
           dtype : int) -> Tensor
torch.norm(self : Tensor,
           p : Optional[number],
           dim : List[int],
           keepdim : bool=False) -> Tensor
torch.norm(self : Tensor,
           p : Optional[number],
           dim : List[int],
           keepdim : bool,
           dtype : int) -> Tensor
torch.norm(self : Tensor,
           p : Optional[number],
           dim : List[int],
           keepdim : bool=False,
           out : Tensor) -> Tensor
torch.norm(self : Tensor,
           p : Optional[number],
           dim : List[int],
           keepdim : bool,
           dtype : int,
           out : Tensor) -> Tensor
torch.norm_except_dim(v : Tensor,
                      pow : int=2,
                      dim : int=0) -> Tensor
torch.normal(mean : Tensor,
             std : Tensor,
             generator : Optional[Generator]) -> Tensor
torch.normal(mean : float,
             std : Tensor,
             generator : Optional[Generator]) -> Tensor
torch.normal(mean : Tensor,
             std : float=1.0,
             generator : Optional[Generator]) -> Tensor
torch.normal(mean : Tensor,
             std : Tensor,
             generator : Optional[Generator],
             out : Tensor) -> Tensor
torch.normal(mean : float,
             std : Tensor,
             generator : Optional[Generator],
             out : Tensor) -> Tensor
torch.normal(mean : Tensor,
             std : float=1.0,
             generator : Optional[Generator],
             out : Tensor) -> Tensor
torch.nuclear_norm(self : Tensor,
                   keepdim : bool=False) -> Tensor
torch.nuclear_norm(self : Tensor,
                   keepdim : bool=False,
                   out : Tensor) -> Tensor
torch.numel(self : Tensor) -> int
torch.ones(size : List[int],
           out : Tensor) -> Tensor
torch.ones(size : List[int],
           dtype : Optional[int],
           layout : Optional[int],
           device : Optional[Device]) -> Tensor
torch.ones_like(self : Tensor) -> Tensor
torch.ones_like(self : Tensor,
                dtype : int,
                layout : int,
                device : Device) -> Tensor
torch.orgqr(self : Tensor,
            input2 : Tensor) -> Tensor
torch.orgqr(self : Tensor,
            input2 : Tensor,
            out : Tensor) -> Tensor
torch.ormqr(self : Tensor,
            input2 : Tensor,
            input3 : Tensor,
            left : bool=True,
            transpose : bool=False) -> Tensor
torch.ormqr(self : Tensor,
            input2 : Tensor,
            input3 : Tensor,
            left : bool=True,
            transpose : bool=False,
            out : Tensor) -> Tensor
torch.pairwise_distance(x1 : Tensor,
                        x2 : Tensor,
                        p : float=2.0,
                        eps : float=1e-06,
                        keepdim : bool=False) -> Tensor
torch.pdist(self : Tensor,
            p : float=2.0) -> Tensor
torch.pin_memory(self : Tensor) -> Tensor
torch.pinverse(self : Tensor,
               rcond : float=1e-15) -> Tensor
torch.pixel_shuffle(self : Tensor,
                    upscale_factor : int) -> Tensor
torch.poisson(self : Tensor,
              generator : Optional[Generator]) -> Tensor
torch.polygamma(n : int,
                self : Tensor) -> Tensor
torch.polygamma(n : int,
                self : Tensor,
                out : Tensor) -> Tensor
torch.potri(self : Tensor,
            upper : bool=True) -> Tensor
torch.potri(self : Tensor,
            upper : bool=True,
            out : Tensor) -> Tensor
torch.pow(self : Tensor,
          exponent : Tensor) -> Tensor
torch.pow(self : number,
          exponent : Tensor) -> Tensor
torch.pow(self : Tensor,
          exponent : number) -> Tensor
torch.pow(self : Tensor,
          exponent : Tensor,
          out : Tensor) -> Tensor
torch.pow(self : number,
          exponent : Tensor,
          out : Tensor) -> Tensor
torch.pow(self : Tensor,
          exponent : number,
          out : Tensor) -> Tensor
torch.pow(a : int,
          b : int) -> int
torch.pow(a : float,
          b : float) -> float
torch.pow(a : int,
          b : float) -> float
torch.pow(a : float,
          b : int) -> float
torch.prelu(self : Tensor,
            weight : Tensor) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           keepdim : bool=False,
           out : Tensor) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           dtype : int,
           out : Tensor) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           keepdim : bool,
           dtype : int,
           out : Tensor) -> Tensor
torch.prod(self : Tensor) -> Tensor
torch.prod(self : Tensor,
           dtype : int) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           keepdim : bool=False) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           dtype : int) -> Tensor
torch.prod(self : Tensor,
           dim : int,
           keepdim : bool,
           dtype : int) -> Tensor
torch.pstrf(self : Tensor,
            upper : bool=True,
            tol : number=-1) -> Tuple[Tensor, Tensor]
torch.qr(self : Tensor) -> Tuple[Tensor, Tensor]
torch.quantized_gru_cell(input : Tensor,
                         hx : Tensor,
                         w_ih : Tensor,
                         w_hh : Tensor,
                         b_ih : Tensor,
                         b_hh : Tensor,
                         packed_ih : Tensor,
                         packed_hh : Tensor,
                         col_offsets_ih : Tensor,
                         col_offsets_hh : Tensor,
                         scale_ih : number,
                         scale_hh : number,
                         zero_point_ih : number,
                         zero_point_hh : number) -> Tensor
torch.quantized_lstm(input : Tensor,
                     hx : List[Tensor],
                     params : List[Tensor],
                     has_biases : bool,
                     num_layers : int,
                     dropout : float,
                     train : bool,
                     bidirectional : bool,
                     batch_first : bool) -> Tuple[Tensor, Tensor, Tensor]
torch.quantized_lstm_cell(input : Tensor,
                          hx : List[Tensor],
                          w_ih : Tensor,
                          w_hh : Tensor,
                          b_ih : Tensor,
                          b_hh : Tensor,
                          packed_ih : Tensor,
                          packed_hh : Tensor,
                          col_offsets_ih : Tensor,
                          col_offsets_hh : Tensor,
                          scale_ih : number,
                          scale_hh : number,
                          zero_point_ih : number,
                          zero_point_hh : number) -> Tuple[Tensor, Tensor]
torch.quantized_rnn_relu_cell(input : Tensor,
                              hx : Tensor,
                              w_ih : Tensor,
                              w_hh : Tensor,
                              b_ih : Tensor,
                              b_hh : Tensor,
                              packed_ih : Tensor,
                              packed_hh : Tensor,
                              col_offsets_ih : Tensor,
                              col_offsets_hh : Tensor,
                              scale_ih : number,
                              scale_hh : number,
                              zero_point_ih : number,
                              zero_point_hh : number) -> Tensor
torch.quantized_rnn_tanh_cell(input : Tensor,
                              hx : Tensor,
                              w_ih : Tensor,
                              w_hh : Tensor,
                              b_ih : Tensor,
                              b_hh : Tensor,
                              packed_ih : Tensor,
                              packed_hh : Tensor,
                              col_offsets_ih : Tensor,
                              col_offsets_hh : Tensor,
                              scale_ih : number,
                              scale_hh : number,
                              zero_point_ih : number,
                              zero_point_hh : number) -> Tensor
torch.rand(size : List[int],
           dtype : Optional[int],
           layout : Optional[int],
           device : Optional[Device]) -> Tensor
torch.rand(size : List[int],
           out : Tensor) -> Tensor
torch.rand_like(self : Tensor) -> Tensor
torch.rand_like(self : Tensor,
                dtype : int,
                layout : int,
                device : Device) -> Tensor
torch.randint(high : int,
              size : List[int],
              out : Tensor) -> Tensor
torch.randint(low : int,
              high : int,
              size : List[int],
              out : Tensor) -> Tensor
torch.randint(high : int,
              size : List[int],
              dtype : Optional[int],
              layout : Optional[int],
              device : Optional[Device]) -> Tensor
torch.randint(low : int,
              high : int,
              size : List[int],
              dtype : Optional[int],
              layout : Optional[int],
              device : Optional[Device]) -> Tensor
torch.randint_like(self : Tensor,
                   high : int) -> Tensor
torch.randint_like(self : Tensor,
                   low : int,
                   high : int) -> Tensor
torch.randint_like(self : Tensor,
                   high : int,
                   dtype : int,
                   layout : int,
                   device : Device) -> Tensor
torch.randint_like(self : Tensor,
                   low : int,
                   high : int,
                   dtype : int,
                   layout : int,
                   device : Device) -> Tensor
torch.randn(size : List[int],
            dtype : Optional[int],
            layout : Optional[int],
            device : Optional[Device]) -> Tensor
torch.randn(size : List[int],
            out : Tensor) -> Tensor
torch.randn_like(self : Tensor) -> Tensor
torch.randn_like(self : Tensor,
                 dtype : int,
                 layout : int,
                 device : Device) -> Tensor
torch.randperm(n : int,
               out : Tensor) -> Tensor
torch.randperm(n : int,
               dtype : Optional[int],
               layout : Optional[int],
               device : Optional[Device]) -> Tensor
torch.range(start : number,
            end : number,
            dtype : Optional[int],
            layout : Optional[int],
            device : Optional[Device]) -> Tensor
torch.range(start : number,
            end : number,
            step : number=1,
            dtype : Optional[int],
            layout : Optional[int],
            device : Optional[Device]) -> Tensor
torch.range(start : number,
            end : number,
            step : number=1,
            out : Tensor) -> Tensor
torch.reciprocal(self : Tensor) -> Tensor
torch.reciprocal(self : Tensor,
                 out : Tensor) -> Tensor
torch.relu(self : Tensor) -> Tensor
torch.relu_(self : Tensor) -> Tensor
torch.remainder(self : Tensor,
                other : Tensor) -> Tensor
torch.remainder(self : Tensor,
                other : number) -> Tensor
torch.remainder(self : Tensor,
                other : Tensor,
                out : Tensor) -> Tensor
torch.remainder(self : Tensor,
                other : number,
                out : Tensor) -> Tensor
torch.remainder(a : int,
                b : int) -> int
torch.remainder(a : float,
                b : float) -> float
torch.remainder(a : int,
                b : float) -> float
torch.remainder(a : float,
                b : int) -> float
torch.renorm(self : Tensor,
             p : number,
             dim : int,
             maxnorm : number,
             out : Tensor) -> Tensor
torch.renorm(self : Tensor,
             p : number,
             dim : int,
             maxnorm : number) -> Tensor
torch.reshape(self : Tensor,
              shape : List[int]) -> Tensor
torch.resize_as_(self : Tensor,
                 the_template : Tensor) -> Tensor
torch.rfft(self : Tensor,
           signal_ndim : int,
           normalized : bool=False,
           onesided : bool=True) -> Tensor
torch.rnn_relu(data : Tensor,
               batch_sizes : Tensor,
               hx : Tensor,
               params : List[Tensor],
               has_biases : bool,
               num_layers : int,
               dropout : float,
               train : bool,
               bidirectional : bool) -> Tuple[Tensor, Tensor]
torch.rnn_relu(input : Tensor,
               hx : Tensor,
               params : List[Tensor],
               has_biases : bool,
               num_layers : int,
               dropout : float,
               train : bool,
               bidirectional : bool,
               batch_first : bool) -> Tuple[Tensor, Tensor]
torch.rnn_relu_cell(input : Tensor,
                    hx : Tensor,
                    w_ih : Tensor,
                    w_hh : Tensor,
                    b_ih : Optional[Tensor],
                    b_hh : Optional[Tensor]) -> Tensor
torch.rnn_tanh(data : Tensor,
               batch_sizes : Tensor,
               hx : Tensor,
               params : List[Tensor],
               has_biases : bool,
               num_layers : int,
               dropout : float,
               train : bool,
               bidirectional : bool) -> Tuple[Tensor, Tensor]
torch.rnn_tanh(input : Tensor,
               hx : Tensor,
               params : List[Tensor],
               has_biases : bool,
               num_layers : int,
               dropout : float,
               train : bool,
               bidirectional : bool,
               batch_first : bool) -> Tuple[Tensor, Tensor]
torch.rnn_tanh_cell(input : Tensor,
                    hx : Tensor,
                    w_ih : Tensor,
                    w_hh : Tensor,
                    b_ih : Optional[Tensor],
                    b_hh : Optional[Tensor]) -> Tensor
torch.roll(self : Tensor,
           shifts : List[int],
           dims : List[int]=[]) -> Tensor
torch.rot90(self : Tensor,
            k : int=1,
            dims : List[int]=[0, 1]) -> Tensor
torch.round(self : Tensor) -> Tensor
torch.round(self : Tensor,
            out : Tensor) -> Tensor
torch.round_(self : Tensor) -> Tensor
torch.rrelu(self : Tensor,
            lower : number=0.125,
            upper : number=0.3333333333333333,
            training : bool=False,
            generator : Optional[Generator]) -> Tensor
torch.rrelu_(self : Tensor,
             lower : number=0.125,
             upper : number=0.3333333333333333,
             training : bool=False,
             generator : Optional[Generator]) -> Tensor
torch.rsqrt(self : Tensor,
            out : Tensor) -> Tensor
torch.rsqrt(self : Tensor) -> Tensor
torch.rsqrt_(self : Tensor) -> Tensor
torch.rsub(self : Tensor,
           other : Tensor,
           alpha : number=1) -> Tensor
torch.rsub(self : Tensor,
           other : number,
           alpha : number=1) -> Tensor
torch.s_copy_(self : Tensor,
              src : Tensor,
              non_blocking : bool=False) -> Tensor
torch.s_native_addmm(self : Tensor,
                     mat1 : Tensor,
                     mat2 : Tensor,
                     beta : number=1,
                     alpha : number=1) -> Tensor
torch.s_native_addmm(self : Tensor,
                     mat1 : Tensor,
                     mat2 : Tensor,
                     beta : number=1,
                     alpha : number=1,
                     out : Tensor) -> Tensor
torch.s_native_addmm_(self : Tensor,
                      mat1 : Tensor,
                      mat2 : Tensor,
                      beta : number=1,
                      alpha : number=1) -> Tensor
torch.scalar_tensor(s : number,
                    dtype : Optional[int],
                    layout : Optional[int],
                    device : Optional[Device]) -> Tensor
torch.scatter(self : Tensor,
              dim : int,
              index : Tensor,
              src : Tensor) -> Tensor
torch.scatter(self : Tensor,
              dim : int,
              index : Tensor,
              value : number) -> Tensor
torch.scatter_add(self : Tensor,
                  dim : int,
                  index : Tensor,
                  src : Tensor) -> Tensor
torch.select(self : Tensor,
             dim : int,
             index : int) -> Tensor
torch.select(list : List[Tensor],
             idx : int) -> Tensor
torch.select(a : List[int],
             b : int) -> int
torch.select(a : List[float],
             b : int) -> float
torch.select(a : List[bool],
             b : int) -> bool
torch.select(list : List[t],
             idx : int) -> t
torch.selu(self : Tensor) -> Tensor
torch.selu_(self : Tensor) -> Tensor
torch.sigmoid(self : Tensor) -> Tensor
torch.sigmoid(self : Tensor,
              out : Tensor) -> Tensor
torch.sigmoid_(self : Tensor) -> Tensor
torch.sign(self : Tensor) -> Tensor
torch.sign(self : Tensor,
           out : Tensor) -> Tensor
torch.sin(self : Tensor) -> Tensor
torch.sin(self : Tensor,
          out : Tensor) -> Tensor
torch.sin_(self : Tensor) -> Tensor
torch.sinh(self : Tensor,
           out : Tensor) -> Tensor
torch.sinh(self : Tensor) -> Tensor
torch.sinh_(self : Tensor) -> Tensor
torch.slogdet(self : Tensor) -> Tuple[Tensor, Tensor]
torch.smm(self : Tensor,
          mat2 : Tensor) -> Tensor
torch.softmax(self : Tensor,
              dim : int) -> Tensor
torch.softmax(self : Tensor,
              dim : int,
              dtype : int) -> Tensor
torch.solve(self : Tensor,
            A : Tensor) -> Tuple[Tensor, Tensor]
torch.sort(self : Tensor,
           dim : int=-1,
           descending : bool=False) -> Tuple[Tensor, Tensor]
torch.sparse_coo_tensor(size : List[int],
                        dtype : int,
                        layout : int,
                        device : Device) -> Tensor
torch.sparse_coo_tensor(indices : Tensor,
                        values : Tensor,
                        dtype : Optional[int],
                        layout : Optional[int],
                        device : Optional[Device]) -> Tensor
torch.sparse_coo_tensor(indices : Tensor,
                        values : Tensor,
                        size : List[int],
                        dtype : Optional[int],
                        layout : Optional[int],
                        device : Optional[Device]) -> Tensor
torch.split(self : Tensor,
            split_size : int,
            dim : int=0) -> List[Tensor]
torch.split(self : Tensor,
            split_sizes : List[int],
            dim : int=0) -> List[Tensor]
torch.split_with_sizes(self : Tensor,
                       split_sizes : List[int],
                       dim : int=0) -> List[Tensor]
torch.sqrt(self : Tensor,
           out : Tensor) -> Tensor
torch.sqrt(self : Tensor) -> Tensor
torch.sqrt_(self : Tensor) -> Tensor
torch.squeeze(self : Tensor) -> Tensor
torch.squeeze(self : Tensor,
              dim : int) -> Tensor
torch.sspaddmm(self : Tensor,
               mat1 : Tensor,
               mat2 : Tensor,
               beta : number=1,
               alpha : number=1,
               out : Tensor) -> Tensor
torch.sspaddmm(self : Tensor,
               mat1 : Tensor,
               mat2 : Tensor,
               beta : number=1,
               alpha : number=1) -> Tensor
torch.stack(tensors : List[Tensor],
            dim : int=0) -> Tensor
torch.stack(tensors : List[Tensor],
            dim : int=0,
            out : Tensor) -> Tensor
torch.std(self : Tensor,
          unbiased : bool=True) -> Tensor
torch.std(self : Tensor,
          dim : List[int],
          unbiased : bool=True,
          keepdim : bool=False) -> Tensor
torch.std(self : Tensor,
          dim : List[int],
          unbiased : bool=True,
          keepdim : bool=False,
          out : Tensor) -> Tensor
torch.stft(self : Tensor,
           n_fft : int,
           hop_length : Optional[int],
           win_length : Optional[int],
           window : Optional[Tensor],
           normalized : bool=False,
           onesided : bool=True) -> Tensor
torch.sub(self : Tensor,
          other : Tensor,
          alpha : number=1) -> Tensor
torch.sub(self : Tensor,
          other : number,
          alpha : number=1) -> Tensor
torch.sub(self : Tensor,
          other : Tensor,
          alpha : number=1,
          out : Tensor) -> Tensor
torch.sub(a : int,
          b : int) -> int
torch.sub(a : float,
          b : float) -> float
torch.sub(a : int,
          b : float) -> float
torch.sub(a : float,
          b : int) -> float
torch.sum(self : Tensor,
          dim : List[int],
          keepdim : bool=False,
          out : Tensor) -> Tensor
torch.sum(self : Tensor,
          dim : List[int],
          dtype : int,
          out : Tensor) -> Tensor
torch.sum(self : Tensor,
          dim : List[int],
          keepdim : bool,
          dtype : int,
          out : Tensor) -> Tensor
torch.sum(self : Tensor) -> Tensor
torch.sum(self : Tensor,
          dtype : int) -> Tensor
torch.sum(self : Tensor,
          dim : List[int],
          keepdim : bool=False) -> Tensor
torch.sum(self : Tensor,
          dim : List[int],
          dtype : int) -> Tensor
torch.sum(self : Tensor,
          dim : List[int],
          keepdim : bool,
          dtype : int) -> Tensor
torch.svd(self : Tensor,
          some : bool=True,
          compute_uv : bool=True) -> Tuple[Tensor, Tensor, Tensor]
torch.symeig(self : Tensor,
             eigenvectors : bool=False,
             upper : bool=True) -> Tuple[Tensor, Tensor]
torch.t(self : Tensor) -> Tensor
torch.take(self : Tensor,
           index : Tensor) -> Tensor
torch.take(self : Tensor,
           index : Tensor,
           out : Tensor) -> Tensor
torch.tan(self : Tensor,
          out : Tensor) -> Tensor
torch.tan(self : Tensor) -> Tensor
torch.tan_(self : Tensor) -> Tensor
torch.tanh(self : Tensor) -> Tensor
torch.tanh(self : Tensor,
           out : Tensor) -> Tensor
torch.tanh_(self : Tensor) -> Tensor
torch.tensor(t : float,
             dtype : Optional[int],
             device : Optional[Device]) -> Tensor
torch.tensor(t : int,
             dtype : Optional[int],
             device : Optional[Device]) -> Tensor
torch.tensor(t : bool,
             dtype : Optional[int],
             device : Optional[Device]) -> Tensor
torch.tensor(data : List[t],
             dtype : Optional[int],
             device : Optional[Device]) -> Tensor
torch.tensordot(self : Tensor,
                other : Tensor,
                dims_self : List[int],
                dims_other : List[int]) -> Tensor
torch.threshold(self : Tensor,
                threshold : number,
                value : number) -> Tensor
torch.threshold(self : Tensor,
                threshold : number,
                value : number,
                out : Tensor) -> Tensor
torch.threshold_(self : Tensor,
                 threshold : number,
                 value : number) -> Tensor
torch.topk(self : Tensor,
           k : int,
           dim : int=-1,
           largest : bool=True,
           sorted : bool=True) -> Tuple[Tensor, Tensor]
torch.trace(self : Tensor) -> Tensor
torch.transpose(self : Tensor,
                dim0 : int,
                dim1 : int) -> Tensor
torch.tril(self : Tensor,
           diagonal : int=0,
           out : Tensor) -> Tensor
torch.tril(self : Tensor,
           diagonal : int=0) -> Tensor
torch.tril_indices(row : int,
                   col : int,
                   offset : int=0,
                   dtype : Optional[int]=4,
                   layout : Optional[int],
                   device : Optional[Device]) -> Tensor
torch.triplet_margin_loss(anchor : Tensor,
                          positive : Tensor,
                          negative : Tensor,
                          margin : float=1.0,
                          p : float=2.0,
                          eps : float=1e-06,
                          swap : bool=False,
                          reduction : int=1) -> Tensor
torch.triu(self : Tensor,
           diagonal : int=0,
           out : Tensor) -> Tensor
torch.triu(self : Tensor,
           diagonal : int=0) -> Tensor
torch.triu_indices(row : int,
                   col : int,
                   offset : int=0,
                   dtype : Optional[int]=4,
                   layout : Optional[int],
                   device : Optional[Device]) -> Tensor
torch.trtrs(self : Tensor,
            A : Tensor,
            upper : bool=True,
            transpose : bool=False,
            unitriangular : bool=False) -> Tuple[Tensor, Tensor]
torch.trunc(self : Tensor) -> Tensor
torch.trunc(self : Tensor,
            out : Tensor) -> Tensor
torch.trunc_(self : Tensor) -> Tensor
torch.unbind(self : Tensor,
             dim : int=0) -> List[Tensor]
torch.unsqueeze(self : Tensor,
                dim : int) -> Tensor
torch.var(self : Tensor,
          dim : List[int],
          unbiased : bool=True,
          keepdim : bool=False,
          out : Tensor) -> Tensor
torch.var(self : Tensor,
          unbiased : bool=True) -> Tensor
torch.var(self : Tensor,
          dim : List[int],
          unbiased : bool=True,
          keepdim : bool=False) -> Tensor
torch.wait(self : Future[t]) -> t
torch.where(condition : Tensor,
            self : Tensor,
            other : Tensor) -> Tensor
torch.zero_(self : Tensor) -> Tensor
torch.zeros(size : List[int],
            out : Tensor) -> Tensor
torch.zeros(size : List[int],
            dtype : Optional[int],
            layout : Optional[int],
            device : Optional[Device]) -> Tensor
torch.zeros_like(self : Tensor) -> Tensor
torch.zeros_like(self : Tensor,
                 dtype : int,
                 layout : int,
                 device : Device) -> Tensor
torch._C._nn.adaptive_avg_pool2d(self : Tensor,
                                 output_size : List[int],
                                 out : Tensor) -> Tensor
torch._C._nn.adaptive_avg_pool2d(self : Tensor,
                                 output_size : List[int]) -> Tensor
torch._C._nn.adaptive_avg_pool3d(self : Tensor,
                                 output_size : List[int]) -> Tensor
torch._C._nn.adaptive_avg_pool3d(self : Tensor,
                                 output_size : List[int],
                                 out : Tensor) -> Tensor
torch._C._nn.adaptive_max_pool2d(self : Tensor,
                                 output_size : List[int]) -> Tuple[Tensor, Tensor]
torch._C._nn.adaptive_max_pool3d(self : Tensor,
                                 output_size : List[int]) -> Tuple[Tensor, Tensor]
torch._C._nn.avg_pool2d(self : Tensor,
                        kernel_size : List[int],
                        stride : List[int]=[],
                        padding : List[int]=[0, 0],
                        ceil_mode : bool=False,
                        count_include_pad : bool=True) -> Tensor
torch._C._nn.avg_pool2d(self : Tensor,
                        kernel_size : List[int],
                        stride : List[int]=[],
                        padding : List[int]=[0, 0],
                        ceil_mode : bool=False,
                        count_include_pad : bool=True,
                        out : Tensor) -> Tensor
torch._C._nn.avg_pool3d(self : Tensor,
                        kernel_size : List[int],
                        stride : List[int]=[],
                        padding : List[int]=[0, 0, 0],
                        ceil_mode : bool=False,
                        count_include_pad : bool=True) -> Tensor
torch._C._nn.avg_pool3d(self : Tensor,
                        kernel_size : List[int],
                        stride : List[int]=[],
                        padding : List[int]=[0, 0, 0],
                        ceil_mode : bool=False,
                        count_include_pad : bool=True,
                        out : Tensor) -> Tensor
torch._C._nn.binary_cross_entropy(self : Tensor,
                                  target : Tensor,
                                  weight : Optional[Tensor],
                                  reduction : int=1,
                                  out : Tensor) -> Tensor
torch._C._nn.binary_cross_entropy(self : Tensor,
                                  target : Tensor,
                                  weight : Optional[Tensor],
                                  reduction : int=1) -> Tensor
torch._C._nn.elu(self : Tensor,
                 alpha : number=1,
                 scale : number=1,
                 input_scale : number=1,
                 out : Tensor) -> Tensor
torch._C._nn.elu(self : Tensor,
                 alpha : number=1,
                 scale : number=1,
                 input_scale : number=1) -> Tensor
torch._C._nn.elu_(self : Tensor,
                  alpha : number=1,
                  scale : number=1,
                  input_scale : number=1) -> Tensor
torch._C._nn.fractional_max_pool2d(self : Tensor,
                                   kernel_size : List[int],
                                   output_size : List[int],
                                   random_samples : Tensor) -> Tuple[Tensor, Tensor]
torch._C._nn.fractional_max_pool3d(self : Tensor,
                                   kernel_size : List[int],
                                   output_size : List[int],
                                   random_samples : Tensor) -> Tuple[Tensor, Tensor]
torch._C._nn.glu(self : Tensor,
                 dim : int=-1) -> Tensor
torch._C._nn.glu(self : Tensor,
                 dim : int=-1,
                 out : Tensor) -> Tensor
torch._C._nn.hardtanh(self : Tensor,
                      min_val : number=-1,
                      max_val : number=1,
                      out : Tensor) -> Tensor
torch._C._nn.hardtanh(self : Tensor,
                      min_val : number=-1,
                      max_val : number=1) -> Tensor
torch._C._nn.hardtanh_(self : Tensor,
                       min_val : number=-1,
                       max_val : number=1) -> Tensor
torch._C._nn.l1_loss(self : Tensor,
                     target : Tensor,
                     reduction : int=1,
                     out : Tensor) -> Tensor
torch._C._nn.l1_loss(self : Tensor,
                     target : Tensor,
                     reduction : int=1) -> Tensor
torch._C._nn.leaky_relu(self : Tensor,
                        negative_slope : number=0.01) -> Tensor
torch._C._nn.leaky_relu(self : Tensor,
                        negative_slope : number=0.01,
                        out : Tensor) -> Tensor
torch._C._nn.leaky_relu_(self : Tensor,
                         negative_slope : number=0.01) -> Tensor
torch._C._nn.log_sigmoid(self : Tensor) -> Tensor
torch._C._nn.log_sigmoid(self : Tensor,
                         out : Tensor) -> Tensor
torch._C._nn.max_pool2d_with_indices(self : Tensor,
                                     kernel_size : List[int],
                                     stride : List[int]=[],
                                     padding : List[int]=[0, 0],
                                     dilation : List[int]=[1, 1],
                                     ceil_mode : bool=False) -> Tuple[Tensor, Tensor]
torch._C._nn.max_pool3d_with_indices(self : Tensor,
                                     kernel_size : List[int],
                                     stride : List[int]=[],
                                     padding : List[int]=[0, 0, 0],
                                     dilation : List[int]=[1, 1, 1],
                                     ceil_mode : bool=False) -> Tuple[Tensor, Tensor]
torch._C._nn.max_unpool2d(self : Tensor,
                          indices : Tensor,
                          output_size : List[int]) -> Tensor
torch._C._nn.max_unpool2d(self : Tensor,
                          indices : Tensor,
                          output_size : List[int],
                          out : Tensor) -> Tensor
torch._C._nn.max_unpool3d(self : Tensor,
                          indices : Tensor,
                          output_size : List[int],
                          stride : List[int],
                          padding : List[int],
                          out : Tensor) -> Tensor
torch._C._nn.max_unpool3d(self : Tensor,
                          indices : Tensor,
                          output_size : List[int],
                          stride : List[int],
                          padding : List[int]) -> Tensor
torch._C._nn.mse_loss(self : Tensor,
                      target : Tensor,
                      reduction : int=1,
                      out : Tensor) -> Tensor
torch._C._nn.mse_loss(self : Tensor,
                      target : Tensor,
                      reduction : int=1) -> Tensor
torch._C._nn.multi_margin_loss(self : Tensor,
                               target : Tensor,
                               p : number=1,
                               margin : number=1,
                               weight : Optional[Tensor],
                               reduction : int=1,
                               out : Tensor) -> Tensor
torch._C._nn.multi_margin_loss(self : Tensor,
                               target : Tensor,
                               p : number=1,
                               margin : number=1,
                               weight : Optional[Tensor],
                               reduction : int=1) -> Tensor
torch._C._nn.multilabel_margin_loss(self : Tensor,
                                    target : Tensor,
                                    reduction : int=1) -> Tensor
torch._C._nn.multilabel_margin_loss(self : Tensor,
                                    target : Tensor,
                                    reduction : int=1,
                                    out : Tensor) -> Tensor
torch._C._nn.nll_loss(self : Tensor,
                      target : Tensor,
                      weight : Optional[Tensor],
                      reduction : int=1,
                      ignore_index : int=-100) -> Tensor
torch._C._nn.nll_loss(self : Tensor,
                      target : Tensor,
                      weight : Optional[Tensor],
                      reduction : int=1,
                      ignore_index : int=-100,
                      out : Tensor) -> Tensor
torch._C._nn.nll_loss2d(self : Tensor,
                        target : Tensor,
                        weight : Optional[Tensor],
                        reduction : int=1,
                        ignore_index : int=-100) -> Tensor
torch._C._nn.nll_loss2d(self : Tensor,
                        target : Tensor,
                        weight : Optional[Tensor],
                        reduction : int=1,
                        ignore_index : int=-100,
                        out : Tensor) -> Tensor
torch._C._nn.one_hot(self : Tensor,
                     num_classes : int=-1) -> Tensor
torch._C._nn.reflection_pad1d(self : Tensor,
                              padding : List[int]) -> Tensor
torch._C._nn.reflection_pad1d(self : Tensor,
                              padding : List[int],
                              out : Tensor) -> Tensor
torch._C._nn.reflection_pad2d(self : Tensor,
                              padding : List[int]) -> Tensor
torch._C._nn.reflection_pad2d(self : Tensor,
                              padding : List[int],
                              out : Tensor) -> Tensor
torch._C._nn.replication_pad1d(self : Tensor,
                               padding : List[int]) -> Tensor
torch._C._nn.replication_pad1d(self : Tensor,
                               padding : List[int],
                               out : Tensor) -> Tensor
torch._C._nn.replication_pad2d(self : Tensor,
                               padding : List[int]) -> Tensor
torch._C._nn.replication_pad2d(self : Tensor,
                               padding : List[int],
                               out : Tensor) -> Tensor
torch._C._nn.replication_pad3d(self : Tensor,
                               padding : List[int],
                               out : Tensor) -> Tensor
torch._C._nn.replication_pad3d(self : Tensor,
                               padding : List[int]) -> Tensor
torch._C._nn.rrelu_with_noise(self : Tensor,
                              noise : Tensor,
                              lower : number=0.125,
                              upper : number=0.3333333333333333,
                              training : bool=False,
                              generator : Optional[Generator],
                              out : Tensor) -> Tensor
torch._C._nn.rrelu_with_noise(self : Tensor,
                              noise : Tensor,
                              lower : number=0.125,
                              upper : number=0.3333333333333333,
                              training : bool=False,
                              generator : Optional[Generator]) -> Tensor
torch._C._nn.rrelu_with_noise_(self : Tensor,
                               noise : Tensor,
                               lower : number=0.125,
                               upper : number=0.3333333333333333,
                               training : bool=False,
                               generator : Optional[Generator]) -> Tensor
torch._C._nn.smooth_l1_loss(self : Tensor,
                            target : Tensor,
                            reduction : int=1) -> Tensor
torch._C._nn.smooth_l1_loss(self : Tensor,
                            target : Tensor,
                            reduction : int=1,
                            out : Tensor) -> Tensor
torch._C._nn.soft_margin_loss(self : Tensor,
                              target : Tensor,
                              reduction : int=1,
                              out : Tensor) -> Tensor
torch._C._nn.soft_margin_loss(self : Tensor,
                              target : Tensor,
                              reduction : int=1) -> Tensor
torch._C._nn.softplus(self : Tensor,
                      beta : number=1,
                      threshold : number=20,
                      out : Tensor) -> Tensor
torch._C._nn.softplus(self : Tensor,
                      beta : number=1,
                      threshold : number=20) -> Tensor
torch._C._nn.softshrink(self : Tensor,
                        lambd : number=0.5) -> Tensor
torch._C._nn.softshrink(self : Tensor,
                        lambd : number=0.5,
                        out : Tensor) -> Tensor
torch._C._nn.thnn_col2im(self : Tensor,
                         output_size : List[int],
                         kernel_size : List[int],
                         dilation : List[int],
                         padding : List[int],
                         stride : List[int]) -> Tensor
torch._C._nn.thnn_conv2d(self : Tensor,
                         weight : Tensor,
                         kernel_size : List[int],
                         bias : Optional[Tensor],
                         stride : List[int]=[1, 1],
                         padding : List[int]=[0, 0]) -> Tensor
torch._C._nn.thnn_conv2d(self : Tensor,
                         weight : Tensor,
                         kernel_size : List[int],
                         bias : Optional[Tensor],
                         stride : List[int]=[1, 1],
                         padding : List[int]=[0, 0],
                         out : Tensor) -> Tensor
torch._C._nn.thnn_conv3d(self : Tensor,
                         weight : Tensor,
                         kernel_size : List[int],
                         bias : Optional[Tensor],
                         stride : List[int]=[1, 1, 1],
                         padding : List[int]=[0, 0, 0],
                         out : Tensor) -> Tensor
torch._C._nn.thnn_conv3d(self : Tensor,
                         weight : Tensor,
                         kernel_size : List[int],
                         bias : Optional[Tensor],
                         stride : List[int]=[1, 1, 1],
                         padding : List[int]=[0, 0, 0]) -> Tensor
torch._C._nn.thnn_conv_depthwise2d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1],
                                   padding : List[int]=[0, 0],
                                   dilation : List[int]=[1, 1]) -> Tensor
torch._C._nn.thnn_conv_depthwise2d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1],
                                   padding : List[int]=[0, 0],
                                   dilation : List[int]=[1, 1],
                                   out : Tensor) -> Tensor
torch._C._nn.thnn_conv_dilated2d(self : Tensor,
                                 weight : Tensor,
                                 kernel_size : List[int],
                                 bias : Optional[Tensor],
                                 stride : List[int]=[1, 1],
                                 padding : List[int]=[0, 0],
                                 dilation : List[int]=[1, 1]) -> Tensor
torch._C._nn.thnn_conv_dilated2d(self : Tensor,
                                 weight : Tensor,
                                 kernel_size : List[int],
                                 bias : Optional[Tensor],
                                 stride : List[int]=[1, 1],
                                 padding : List[int]=[0, 0],
                                 dilation : List[int]=[1, 1],
                                 out : Tensor) -> Tensor
torch._C._nn.thnn_conv_dilated3d(self : Tensor,
                                 weight : Tensor,
                                 kernel_size : List[int],
                                 bias : Optional[Tensor],
                                 stride : List[int]=[1, 1, 1],
                                 padding : List[int]=[0, 0, 0],
                                 dilation : List[int]=[1, 1, 1],
                                 out : Tensor) -> Tensor
torch._C._nn.thnn_conv_dilated3d(self : Tensor,
                                 weight : Tensor,
                                 kernel_size : List[int],
                                 bias : Optional[Tensor],
                                 stride : List[int]=[1, 1, 1],
                                 padding : List[int]=[0, 0, 0],
                                 dilation : List[int]=[1, 1, 1]) -> Tensor
torch._C._nn.thnn_conv_transpose2d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1],
                                   padding : List[int]=[0, 0],
                                   output_padding : List[int]=[0, 0],
                                   dilation : List[int]=[1, 1]) -> Tensor
torch._C._nn.thnn_conv_transpose2d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1],
                                   padding : List[int]=[0, 0],
                                   output_padding : List[int]=[0, 0],
                                   dilation : List[int]=[1, 1],
                                   out : Tensor) -> Tensor
torch._C._nn.thnn_conv_transpose3d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1, 1],
                                   padding : List[int]=[0, 0, 0],
                                   output_padding : List[int]=[0, 0, 0],
                                   dilation : List[int]=[1, 1, 1],
                                   out : Tensor) -> Tensor
torch._C._nn.thnn_conv_transpose3d(self : Tensor,
                                   weight : Tensor,
                                   kernel_size : List[int],
                                   bias : Optional[Tensor],
                                   stride : List[int]=[1, 1, 1],
                                   padding : List[int]=[0, 0, 0],
                                   output_padding : List[int]=[0, 0, 0],
                                   dilation : List[int]=[1, 1, 1]) -> Tensor
torch._C._nn.thnn_im2col(self : Tensor,
                         kernel_size : List[int],
                         dilation : List[int],
                         padding : List[int],
                         stride : List[int]) -> Tensor
torch._C._nn.upsample_bicubic2d(self : Tensor,
                                output_size : List[int],
                                align_corners : bool) -> Tensor
torch._C._nn.upsample_bicubic2d(self : Tensor,
                                output_size : List[int],
                                align_corners : bool,
                                out : Tensor) -> Tensor
torch._C._nn.upsample_bilinear2d(self : Tensor,
                                 output_size : List[int],
                                 align_corners : bool) -> Tensor
torch._C._nn.upsample_bilinear2d(self : Tensor,
                                 output_size : List[int],
                                 align_corners : bool,
                                 out : Tensor) -> Tensor
torch._C._nn.upsample_linear1d(self : Tensor,
                               output_size : List[int],
                               align_corners : bool) -> Tensor
torch._C._nn.upsample_linear1d(self : Tensor,
                               output_size : List[int],
                               align_corners : bool,
                               out : Tensor) -> Tensor
torch._C._nn.upsample_nearest1d(self : Tensor,
                                output_size : List[int]) -> Tensor
torch._C._nn.upsample_nearest1d(self : Tensor,
                                output_size : List[int],
                                out : Tensor) -> Tensor
torch._C._nn.upsample_nearest2d(self : Tensor,
                                output_size : List[int]) -> Tensor
torch._C._nn.upsample_nearest2d(self : Tensor,
                                output_size : List[int],
                                out : Tensor) -> Tensor
torch._C._nn.upsample_nearest3d(self : Tensor,
                                output_size : List[int],
                                out : Tensor) -> Tensor
torch._C._nn.upsample_nearest3d(self : Tensor,
                                output_size : List[int]) -> Tensor
torch._C._nn.upsample_trilinear3d(self : Tensor,
                                  output_size : List[int],
                                  align_corners : bool,
                                  out : Tensor) -> Tensor
torch._C._nn.upsample_trilinear3d(self : Tensor,
                                  output_size : List[int],
                                  align_corners : bool) -> Tensor
torch.nn.functional.adaptive_avg_pool2d(input : Tensor,
                                        output_size : List[int]) -> Tensor
torch.nn.functional.adaptive_avg_pool3d(input : Tensor,
                                        output_size : List[int]) -> Tensor
torch.nn.functional.adaptive_max_pool1d_with_indices(input : Tensor,
                                                     output_size : List[int],
                                                     return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.adaptive_max_pool2d_with_indices(input : Tensor,
                                                     output_size : List[int],
                                                     return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.adaptive_max_pool3d_with_indices(input : Tensor,
                                                     output_size : List[int],
                                                     return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.affine_grid(theta : Tensor,
                                size : List[int]) -> Tensor
torch.nn.functional.alpha_dropout(input : Tensor,
                                  p : float=0.5,
                                  training : bool=False,
                                  inplace : bool=False) -> Tensor
torch.nn.functional.batch_norm(input : Tensor,
                               running_mean : Optional[Tensor],
                               running_var : Optional[Tensor],
                               weight : Optional[Tensor],
                               bias : Optional[Tensor],
                               training : bool=False,
                               momentum : float=0.1,
                               eps : float=1e-05) -> Tensor
torch.nn.functional.bilinear(input1 : Tensor,
                             input2 : Tensor,
                             weight : Tensor,
                             bias : Optional[Tensor]) -> Tensor
torch.nn.functional.binary_cross_entropy(input : Tensor,
                                         target : Tensor,
                                         weight : Optional[Tensor],
                                         size_average : Optional[bool],
                                         reduce : Optional[bool],
                                         reduction : str=mean) -> Tensor
torch.nn.functional.binary_cross_entropy_with_logits(input : Tensor,
                                                     target : Tensor,
                                                     weight : Optional[Tensor],
                                                     size_average : Optional[bool],
                                                     reduce : Optional[bool],
                                                     reduction : str=mean,
                                                     pos_weight : Optional[Tensor]) -> Tensor
torch.nn.functional.celu(input : Tensor,
                         alpha : float=1.0,
                         inplace : bool=False) -> Tensor
torch.nn.functional.cosine_embedding_loss(input1 : Tensor,
                                          input2 : Tensor,
                                          target : Tensor,
                                          margin : float=0.0,
                                          size_average : Optional[bool],
                                          reduce : Optional[bool],
                                          reduction : str=mean) -> Tensor
torch.nn.functional.cross_entropy(input : Tensor,
                                  target : Tensor,
                                  weight : Optional[Tensor],
                                  size_average : Optional[bool],
                                  ignore_index : int=-100,
                                  reduce : Optional[bool],
                                  reduction : str=mean) -> Tensor
torch.nn.functional.ctc_loss(log_probs : Tensor,
                             targets : Tensor,
                             input_lengths : Tensor,
                             target_lengths : Tensor,
                             blank : int=0,
                             reduction : str=mean,
                             zero_infinity : bool=False) -> Tensor
torch.nn.functional.dropout(input : Tensor,
                            p : float=0.5,
                            training : bool=True,
                            inplace : bool=False) -> Tensor
torch.nn.functional.dropout2d(input : Tensor,
                              p : float=0.5,
                              training : bool=True,
                              inplace : bool=False) -> Tensor
torch.nn.functional.dropout3d(input : Tensor,
                              p : float=0.5,
                              training : bool=True,
                              inplace : bool=False) -> Tensor
torch.nn.functional.elu(input : Tensor,
                        alpha : float=1.0,
                        inplace : bool=False) -> Tensor
torch.nn.functional.embedding(input : Tensor,
                              weight : Tensor,
                              padding_idx : Optional[int],
                              max_norm : Optional[float],
                              norm_type : float=2.0,
                              scale_grad_by_freq : bool=False,
                              sparse : bool=False) -> Tensor
torch.nn.functional.embedding_bag(input : Tensor,
                                  weight : Tensor,
                                  offsets : Optional[Tensor],
                                  max_norm : Optional[float],
                                  norm_type : float=2.0,
                                  scale_grad_by_freq : bool=False,
                                  mode : str=mean,
                                  sparse : bool=False) -> Tensor
torch.nn.functional.feature_alpha_dropout(input : Tensor,
                                          p : float=0.5,
                                          training : bool=False,
                                          inplace : bool=False) -> Tensor
torch.nn.functional.fold(input : Tensor,
                         output_size : List[int],
                         kernel_size : List[int],
                         dilation : List[int]=1,
                         padding : List[int]=0,
                         stride : List[int]=1) -> Tensor
torch.nn.functional.fractional_max_pool2d_with_indices(input : Tensor,
                                                       kernel_size : List[int],
                                                       output_size : Optional[List[int]],
                                                       output_ratio : Optional[List[float]],
                                                       return_indices : bool=False,
                                                       _random_samples : Optional[Tensor]) -> Tuple[Tensor, Tensor]
torch.nn.functional.fractional_max_pool3d_with_indices(input : Tensor,
                                                       kernel_size : List[int],
                                                       output_size : Optional[List[int]],
                                                       output_ratio : Optional[List[float]],
                                                       return_indices : bool=False,
                                                       _random_samples : Optional[Tensor]) -> Tuple[Tensor, Tensor]
torch.nn.functional.glu(input : Tensor,
                        dim : int=-1) -> Tensor
torch.nn.functional.grid_sample(input : Tensor,
                                grid : Tensor,
                                mode : str=bilinear,
                                padding_mode : str=zeros) -> Tensor
torch.nn.functional.group_norm(input : Tensor,
                               num_groups : int,
                               weight : Optional[Tensor],
                               bias : Optional[Tensor],
                               eps : float=1e-05) -> Tensor
torch.nn.functional.gumbel_softmax(logits : Tensor,
                                   tau : float=1.0,
                                   hard : bool=False,
                                   eps : float=1e-10,
                                   dim : int=-1) -> Tensor
torch.nn.functional.hardshrink(input : Tensor,
                               lambd : float=0.5) -> Tensor
torch.nn.functional.hardtanh(input : Tensor,
                             min_val : float=-1.0,
                             max_val : float=1.0,
                             inplace : bool=False) -> Tensor
torch.nn.functional.hinge_embedding_loss(input : Tensor,
                                         target : Tensor,
                                         margin : float=1.0,
                                         size_average : Optional[bool],
                                         reduce : Optional[bool],
                                         reduction : str=mean) -> Tensor
torch.nn.functional.instance_norm(input : Tensor,
                                  running_mean : Optional[Tensor],
                                  running_var : Optional[Tensor],
                                  weight : Optional[Tensor],
                                  bias : Optional[Tensor],
                                  use_input_stats : bool=True,
                                  momentum : float=0.1,
                                  eps : float=1e-05) -> Tensor
torch.nn.functional.kl_div(input : Tensor,
                           target : Tensor,
                           size_average : Optional[bool],
                           reduce : Optional[bool],
                           reduction : str=mean) -> Tensor
torch.nn.functional.l1_loss(input : Tensor,
                            target : Tensor,
                            size_average : Optional[bool],
                            reduce : Optional[bool],
                            reduction : str=mean) -> Tensor
torch.nn.functional.layer_norm(input : Tensor,
                               normalized_shape : List[int],
                               weight : Optional[Tensor],
                               bias : Optional[Tensor],
                               eps : float=1e-05) -> Tensor
torch.nn.functional.leaky_relu(input : Tensor,
                               negative_slope : float=0.01,
                               inplace : bool=False) -> Tensor
torch.nn.functional.linear(input : Tensor,
                           weight : Tensor,
                           bias : Optional[Tensor]) -> Tensor
torch.nn.functional.local_response_norm(input : Tensor,
                                        size : int,
                                        alpha : float=0.0001,
                                        beta : float=0.75,
                                        k : float=1.0) -> Tensor
torch.nn.functional.log_softmax(input : Tensor,
                                dim : Optional[int],
                                _stacklevel : int=3,
                                dtype : Optional[int]) -> Tensor
torch.nn.functional.lp_pool1d(input : Tensor,
                              norm_type : float,
                              kernel_size : int,
                              stride : Optional[List[int]],
                              ceil_mode : bool=False) -> Tensor
torch.nn.functional.lp_pool2d(input : Tensor,
                              norm_type : float,
                              kernel_size : int,
                              stride : Optional[List[int]],
                              ceil_mode : bool=False) -> Tensor
torch.nn.functional.margin_ranking_loss(input1 : Tensor,
                                        input2 : Tensor,
                                        target : Tensor,
                                        margin : float=0.0,
                                        size_average : Optional[bool],
                                        reduce : Optional[bool],
                                        reduction : str=mean) -> Tensor
torch.nn.functional.max_pool1d_with_indices(input : Tensor,
                                            kernel_size : List[int],
                                            stride : Optional[List[int]],
                                            padding : List[int]=0,
                                            dilation : List[int]=1,
                                            ceil_mode : bool=False,
                                            return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.max_pool2d_with_indices(input : Tensor,
                                            kernel_size : List[int],
                                            stride : Optional[List[int]],
                                            padding : List[int]=0,
                                            dilation : List[int]=1,
                                            ceil_mode : bool=False,
                                            return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.max_pool3d_with_indices(input : Tensor,
                                            kernel_size : List[int],
                                            stride : Optional[List[int]],
                                            padding : List[int]=0,
                                            dilation : List[int]=1,
                                            ceil_mode : bool=False,
                                            return_indices : bool=False) -> Tuple[Tensor, Tensor]
torch.nn.functional.max_unpool1d(input : Tensor,
                                 indices : Tensor,
                                 kernel_size : List[int],
                                 stride : Optional[List[int]],
                                 padding : List[int]=0,
                                 output_size : Optional[List[int]]) -> Tensor
torch.nn.functional.max_unpool2d(input : Tensor,
                                 indices : Tensor,
                                 kernel_size : List[int],
                                 stride : Optional[List[int]],
                                 padding : List[int]=0,
                                 output_size : Optional[List[int]]) -> Tensor
torch.nn.functional.max_unpool3d(input : Tensor,
                                 indices : Tensor,
                                 kernel_size : List[int],
                                 stride : Optional[List[int]],
                                 padding : List[int]=0,
                                 output_size : Optional[List[int]]) -> Tensor
torch.nn.functional.mse_loss(input : Tensor,
                             target : Tensor,
                             size_average : Optional[bool],
                             reduce : Optional[bool],
                             reduction : str=mean) -> Tensor
torch.nn.functional.multi_margin_loss(input : Tensor,
                                      target : Tensor,
                                      p : int=1,
                                      margin : float=1.0,
                                      weight : Optional[Tensor],
                                      size_average : Optional[bool],
                                      reduce : Optional[bool],
                                      reduction : str=mean) -> Tensor
torch.nn.functional.multilabel_margin_loss(input : Tensor,
                                           target : Tensor,
                                           size_average : Optional[bool],
                                           reduce : Optional[bool],
                                           reduction : str=mean) -> Tensor
torch.nn.functional.multilabel_soft_margin_loss(input : Tensor,
                                                target : Tensor,
                                                weight : Optional[Tensor],
                                                size_average : Optional[bool],
                                                reduce : Optional[bool],
                                                reduction : str=mean) -> Tensor
torch.nn.functional.nll_loss(input : Tensor,
                             target : Tensor,
                             weight : Optional[Tensor],
                             size_average : Optional[bool],
                             ignore_index : int=-100,
                             reduce : Optional[bool],
                             reduction : str=mean) -> Tensor
torch.nn.functional.normalize(input : Tensor,
                              p : float=2.0,
                              dim : int=1,
                              eps : float=1e-12,
                              out : Optional[Tensor]) -> Tensor
torch.nn.functional.pad(input : Tensor,
                        pad : List[int],
                        mode : str=constant,
                        value : float=0.0) -> Tensor
torch.nn.functional.pad_circular(input : Tensor,
                                 padding : List[int]) -> Tensor
torch.nn.functional.pairwise_distance(x1 : Tensor,
                                      x2 : Tensor,
                                      p : float=2.0,
                                      eps : float=1e-06,
                                      keepdim : bool=False) -> Tensor
torch.nn.functional.poisson_nll_loss(input : Tensor,
                                     target : Tensor,
                                     log_input : bool=True,
                                     full : bool=False,
                                     size_average : Optional[bool],
                                     eps : float=1e-08,
                                     reduce : Optional[bool],
                                     reduction : str=mean) -> Tensor
torch.nn.functional.prelu(input : Tensor,
                          weight : Tensor) -> Tensor
torch.nn.functional.relu(input : Tensor,
                         inplace : bool=False) -> Tensor
torch.nn.functional.relu6(input : Tensor,
                          inplace : bool=False) -> Tensor
torch.nn.functional.rrelu(input : Tensor,
                          lower : float=0.125,
                          upper : float=0.3333333333333333,
                          training : bool=False,
                          inplace : bool=False) -> Tensor
torch.nn.functional.selu(input : Tensor,
                         inplace : bool=False) -> Tensor
torch.nn.functional.sigmoid(input : Tensor) -> Tensor
torch.nn.functional.smooth_l1_loss(input : Tensor,
                                   target : Tensor,
                                   size_average : Optional[bool],
                                   reduce : Optional[bool],
                                   reduction : str=mean) -> Tensor
torch.nn.functional.soft_margin_loss(input : Tensor,
                                     target : Tensor,
                                     size_average : Optional[bool],
                                     reduce : Optional[bool],
                                     reduction : str=mean) -> Tensor
torch.nn.functional.softmax(input : Tensor,
                            dim : Optional[int],
                            _stacklevel : int=3,
                            dtype : Optional[int]) -> Tensor
torch.nn.functional.softmin(input : Tensor,
                            dim : Optional[int],
                            _stacklevel : int=3,
                            dtype : Optional[int]) -> Tensor
torch.nn.functional.softsign(input : Tensor) -> Tensor
torch.nn.functional.tanh(input : Tensor) -> Tensor
torch.nn.functional.tanhshrink(input : Tensor) -> Tensor
torch.nn.functional.threshold(input : Tensor,
                              threshold : float,
                              value : float,
                              inplace : bool=False) -> Tensor
torch.nn.functional.triplet_margin_loss(anchor : Tensor,
                                        positive : Tensor,
                                        negative : Tensor,
                                        margin : float=1.0,
                                        p : float=2.0,
                                        eps : float=1e-06,
                                        swap : bool=False,
                                        size_average : Optional[bool],
                                        reduce : Optional[bool],
                                        reduction : str=mean) -> Tensor
torch.nn.functional.unfold(input : Tensor,
                           kernel_size : List[int],
                           dilation : List[int]=1,
                           padding : List[int]=0,
                           stride : List[int]=1) -> Tensor
Supported Methods¶
Tensor.__and__(other : Tensor) -> Tensor
Tensor.__and__(other : number) -> Tensor
Tensor.__iand__(other : Tensor) -> Tensor
Tensor.__iand__(other : number) -> Tensor
Tensor.__ilshift__(other : Tensor) -> Tensor
Tensor.__ilshift__(other : number) -> Tensor
Tensor.__ior__(other : Tensor) -> Tensor
Tensor.__ior__(other : number) -> Tensor
Tensor.__irshift__(other : Tensor) -> Tensor
Tensor.__irshift__(other : number) -> Tensor
Tensor.__ixor__(other : Tensor) -> Tensor
Tensor.__ixor__(other : number) -> Tensor
Tensor.__lshift__(other : Tensor) -> Tensor
Tensor.__lshift__(other : number) -> Tensor
Tensor.__or__(other : Tensor) -> Tensor
Tensor.__or__(other : number) -> Tensor
Tensor.__rshift__(other : Tensor) -> Tensor
Tensor.__rshift__(other : number) -> Tensor
Tensor.__xor__(other : Tensor) -> Tensor
Tensor.__xor__(other : number) -> Tensor
Tensor.abs() -> Tensor
Tensor.abs(out : Tensor) -> Tensor
Tensor.abs_() -> Tensor
Tensor.acos(out : Tensor) -> Tensor
Tensor.acos() -> Tensor
Tensor.acos_() -> Tensor
Tensor.add(other : Tensor,
           alpha : number=1) -> Tensor
Tensor.add(other : number,
           alpha : number=1) -> Tensor
Tensor.add(other : Tensor,
           alpha : number=1,
           out : Tensor) -> Tensor
Tensor.add_(other : Tensor,
            alpha : number=1) -> Tensor
Tensor.add_(other : number,
            alpha : number=1) -> Tensor
Tensor.addbmm(batch1 : Tensor,
              batch2 : Tensor,
              beta : number=1,
              alpha : number=1) -> Tensor
Tensor.addbmm(batch1 : Tensor,
              batch2 : Tensor,
              beta : number=1,
              alpha : number=1,
              out : Tensor) -> Tensor
Tensor.addbmm_(batch1 : Tensor,
               batch2 : Tensor,
               beta : number=1,
               alpha : number=1) -> Tensor
Tensor.addcdiv(tensor1 : Tensor,
               tensor2 : Tensor,
               value : number=1,
               out : Tensor) -> Tensor
Tensor.addcdiv(tensor1 : Tensor,
               tensor2 : Tensor,
               value : number=1) -> Tensor
Tensor.addcdiv_(tensor1 : Tensor,
                tensor2 : Tensor,
                value : number=1) -> Tensor
Tensor.addcmul(tensor1 : Tensor,
               tensor2 : Tensor,
               value : number=1) -> Tensor
Tensor.addcmul(tensor1 : Tensor,
               tensor2 : Tensor,
               value : number=1,
               out : Tensor) -> Tensor
Tensor.addcmul_(tensor1 : Tensor,
                tensor2 : Tensor,
                value : number=1) -> Tensor
Tensor.addmm(mat1 : Tensor,
             mat2 : Tensor,
             beta : number=1,
             alpha : number=1,
             out : Tensor) -> Tensor
Tensor.addmm(mat1 : Tensor,
             mat2 : Tensor,
             beta : number=1,
             alpha : number=1) -> Tensor
Tensor.addmm_(mat1 : Tensor,
              mat2 : Tensor,
              beta : number=1,
              alpha : number=1) -> Tensor
Tensor.addmv(mat : Tensor,
             vec : Tensor,
             beta : number=1,
             alpha : number=1,
             out : Tensor) -> Tensor
Tensor.addmv(mat : Tensor,
             vec : Tensor,
             beta : number=1,
             alpha : number=1) -> Tensor
Tensor.addmv_(mat : Tensor,
              vec : Tensor,
              beta : number=1,
              alpha : number=1) -> Tensor
Tensor.addr(vec1 : Tensor,
            vec2 : Tensor,
            beta : number=1,
            alpha : number=1) -> Tensor
Tensor.addr(vec1 : Tensor,
            vec2 : Tensor,
            beta : number=1,
            alpha : number=1,
            out : Tensor) -> Tensor
Tensor.addr_(vec1 : Tensor,
             vec2 : Tensor,
             beta : number=1,
             alpha : number=1) -> Tensor
Tensor.all() -> Tensor
Tensor.all(dim : int,
           keepdim : bool=False) -> Tensor
Tensor.all(dim : int,
           keepdim : bool=False,
           out : Tensor) -> Tensor
Tensor.allclose(other : Tensor,
                rtol : float=1e-05,
                atol : float=1e-08,
                equal_nan : bool=False) -> bool
Tensor.any() -> Tensor
Tensor.any(dim : int,
           keepdim : bool=False) -> Tensor
Tensor.any(dim : int,
           keepdim : bool=False,
           out : Tensor) -> Tensor
Tensor.argmax() -> Tensor
Tensor.argmax(dim : int,
              keepdim : bool=False) -> Tensor
Tensor.argmin() -> Tensor
Tensor.argmin(dim : int,
              keepdim : bool=False) -> Tensor
Tensor.argsort(dim : int=-1,
               descending : bool=False) -> Tensor
Tensor.as_strided(size : List[int],
                  stride : List[int],
                  storage_offset : Optional[int]) -> Tensor
Tensor.as_strided_(size : List[int],
                   stride : List[int],
                   storage_offset : Optional[int]) -> Tensor
Tensor.asin() -> Tensor
Tensor.asin(out : Tensor) -> Tensor
Tensor.asin_() -> Tensor
Tensor.atan() -> Tensor
Tensor.atan(out : Tensor) -> Tensor
Tensor.atan2(other : Tensor,
             out : Tensor) -> Tensor
Tensor.atan2(other : Tensor) -> Tensor
Tensor.atan2_(other : Tensor) -> Tensor
Tensor.atan_() -> Tensor
Tensor.baddbmm(batch1 : Tensor,
               batch2 : Tensor,
               beta : number=1,
               alpha : number=1) -> Tensor
Tensor.baddbmm(batch1 : Tensor,
               batch2 : Tensor,
               beta : number=1,
               alpha : number=1,
               out : Tensor) -> Tensor
Tensor.baddbmm_(batch1 : Tensor,
                batch2 : Tensor,
                beta : number=1,
                alpha : number=1) -> Tensor
Tensor.bernoulli(generator : Optional[Generator]) -> Tensor
Tensor.bernoulli(p : float,
                 generator : Optional[Generator]) -> Tensor
Tensor.bernoulli(generator : Optional[Generator],
                 out : Tensor) -> Tensor
Tensor.bernoulli_(p : Tensor,
                  generator : Optional[Generator]) -> Tensor
Tensor.bernoulli_(p : float=0.5,
                  generator : Optional[Generator]) -> Tensor
Tensor.bincount(weights : Optional[Tensor],
                minlength : int=0) -> Tensor
Tensor.bmm(mat2 : Tensor) -> Tensor
Tensor.bmm(mat2 : Tensor,
           out : Tensor) -> Tensor
Tensor.btrifact(pivot : bool=True) -> Tuple[Tensor, Tensor]
Tensor.btrifact_with_info(pivot : bool=True) -> Tuple[Tensor, Tensor, Tensor]
Tensor.btrisolve(LU_data : Tensor,
                 LU_pivots : Tensor,
                 out : Tensor) -> Tensor
Tensor.btrisolve(LU_data : Tensor,
                 LU_pivots : Tensor) -> Tensor
Tensor.cauchy_(median : float=0.0,
               sigma : float=1.0,
               generator : Optional[Generator]) -> Tensor
Tensor.ceil(out : Tensor) -> Tensor
Tensor.ceil() -> Tensor
Tensor.ceil_() -> Tensor
Tensor.cholesky(upper : bool=False,
                out : Tensor) -> Tensor
Tensor.cholesky(upper : bool=False) -> Tensor
Tensor.cholesky_solve(input2 : Tensor,
                      upper : bool=False,
                      out : Tensor) -> Tensor
Tensor.cholesky_solve(input2 : Tensor,
                      upper : bool=False) -> Tensor
Tensor.chunk(chunks : int,
             dim : int=0) -> List[Tensor]
Tensor.clamp(min : Optional[number],
             max : Optional[number]) -> Tensor
Tensor.clamp(min : Optional[number],
             max : Optional[number],
             out : Tensor) -> Tensor
Tensor.clamp_(min : Optional[number],
              max : Optional[number]) -> Tensor
Tensor.clamp_max(max : number) -> Tensor
Tensor.clamp_max(max : number,
                 out : Tensor) -> Tensor
Tensor.clamp_max_(max : number) -> Tensor
Tensor.clamp_min(min : number,
                 out : Tensor) -> Tensor
Tensor.clamp_min(min : number) -> Tensor
Tensor.clamp_min_(min : number) -> Tensor
Tensor.clone() -> Tensor
Tensor.coalesce() -> Tensor
Tensor.contiguous() -> Tensor
Tensor.copy_(other : Tensor) -> Tensor
Tensor.copy_(other : int) -> Tensor
Tensor.copy_(other : float) -> Tensor
Tensor.cos() -> Tensor
Tensor.cos(out : Tensor) -> Tensor
Tensor.cos_() -> Tensor
Tensor.cosh() -> Tensor
Tensor.cosh(out : Tensor) -> Tensor
Tensor.cosh_() -> Tensor
Tensor.cpu() -> Tensor
Tensor.cross(other : Tensor,
             dim : int=-1,
             out : Tensor) -> Tensor
Tensor.cross(other : Tensor,
             dim : int=-1) -> Tensor
Tensor.cuda() -> Tensor
Tensor.cumprod(dim : int) -> Tensor
Tensor.cumprod(dim : int,
               dtype : int) -> Tensor
Tensor.cumprod(dim : int,
               out : Tensor) -> Tensor
Tensor.cumprod(dim : int,
               dtype : int,
               out : Tensor) -> Tensor
Tensor.cumsum(dim : int) -> Tensor
Tensor.cumsum(dim : int,
              dtype : int) -> Tensor
Tensor.cumsum(dim : int,
              out : Tensor) -> Tensor
Tensor.cumsum(dim : int,
              dtype : int,
              out : Tensor) -> Tensor
Tensor.dense_dim() -> int
Tensor.det() -> Tensor
Tensor.detach() -> Tensor
Tensor.detach_() -> Tensor
Tensor.diag(diagonal : int=0) -> Tensor
Tensor.diag(diagonal : int=0,
            out : Tensor) -> Tensor
Tensor.diag_embed(offset : int=0,
                  dim1 : int=-2,
                  dim2 : int=-1) -> Tensor
Tensor.diagflat(offset : int=0) -> Tensor
Tensor.diagonal(offset : int=0,
                dim1 : int=0,
                dim2 : int=1) -> Tensor
Tensor.digamma() -> Tensor
Tensor.digamma(out : Tensor) -> Tensor
Tensor.digamma_() -> Tensor
Tensor.dim() -> int
Tensor.dist(other : Tensor,
            p : number=2) -> Tensor
Tensor.div(other : Tensor,
           out : Tensor) -> Tensor
Tensor.div(other : Tensor) -> Tensor
Tensor.div(other : number) -> Tensor
Tensor.div_(other : Tensor) -> Tensor
Tensor.div_(other : number) -> Tensor
Tensor.dot(tensor : Tensor) -> Tensor
Tensor.dot(tensor : Tensor,
           out : Tensor) -> Tensor
Tensor.eig(eigenvectors : bool=False) -> Tuple[Tensor, Tensor]
Tensor.eq(other : Tensor) -> Tensor
Tensor.eq(other : number) -> Tensor
Tensor.eq(other : Tensor,
          out : Tensor) -> Tensor
Tensor.eq(other : number,
          out : Tensor) -> Tensor
Tensor.eq_(other : Tensor) -> Tensor
Tensor.eq_(other : number) -> Tensor
Tensor.equal(other : Tensor) -> bool
Tensor.erf(out : Tensor) -> Tensor
Tensor.erf() -> Tensor
Tensor.erf_() -> Tensor
Tensor.erfc(out : Tensor) -> Tensor
Tensor.erfc() -> Tensor
Tensor.erfc_() -> Tensor
Tensor.erfinv(out : Tensor) -> Tensor
Tensor.erfinv() -> Tensor
Tensor.erfinv_() -> Tensor
Tensor.exp() -> Tensor
Tensor.exp(out : Tensor) -> Tensor
Tensor.exp_() -> Tensor
Tensor.expand(size : List[int],
              implicit : bool=False) -> Tensor
Tensor.expand_as(other : Tensor) -> Tensor
Tensor.expm1(out : Tensor) -> Tensor
Tensor.expm1() -> Tensor
Tensor.expm1_() -> Tensor
Tensor.exponential_(lambd : float=1.0,
                    generator : Optional[Generator]) -> Tensor
Tensor.fft(signal_ndim : int,
           normalized : bool=False) -> Tensor
Tensor.fill_(value : Tensor) -> Tensor
Tensor.fill_(value : number) -> Tensor
Tensor.flatten(start_dim : int=0,
               end_dim : int=-1) -> Tensor
Tensor.flip(dims : List[int]) -> Tensor
Tensor.floor() -> Tensor
Tensor.floor(out : Tensor) -> Tensor
Tensor.floor_() -> Tensor
Tensor.fmod(other : Tensor,
            out : Tensor) -> Tensor
Tensor.fmod(other : number,
            out : Tensor) -> Tensor
Tensor.fmod(other : Tensor) -> Tensor
Tensor.fmod(other : number) -> Tensor
Tensor.fmod_(other : Tensor) -> Tensor
Tensor.fmod_(other : number) -> Tensor
Tensor.frac() -> Tensor
Tensor.frac(out : Tensor) -> Tensor
Tensor.frac_() -> Tensor
Tensor.gather(dim : int,
              index : Tensor,
              sparse_grad : bool=False,
              out : Tensor) -> Tensor
Tensor.gather(dim : int,
              index : Tensor,
              sparse_grad : bool=False) -> Tensor
Tensor.ge(other : Tensor) -> Tensor
Tensor.ge(other : number) -> Tensor
Tensor.ge(other : Tensor,
          out : Tensor) -> Tensor
Tensor.ge(other : number,
          out : Tensor) -> Tensor
Tensor.ge_(other : Tensor) -> Tensor
Tensor.ge_(other : number) -> Tensor
Tensor.gels(A : Tensor) -> Tuple[Tensor, Tensor]
Tensor.geometric_(p : float,
                  generator : Optional[Generator]) -> Tensor
Tensor.geqrf() -> Tuple[Tensor, Tensor]
Tensor.ger(vec2 : Tensor) -> Tensor
Tensor.ger(vec2 : Tensor,
           out : Tensor) -> Tensor
Tensor.get_device() -> int
Tensor.get_device() -> int
Tensor.get_device() -> int
Tensor.gt(other : Tensor) -> Tensor
Tensor.gt(other : number) -> Tensor
Tensor.gt(other : Tensor,
          out : Tensor) -> Tensor
Tensor.gt(other : number,
          out : Tensor) -> Tensor
Tensor.gt_(other : Tensor) -> Tensor
Tensor.gt_(other : number) -> Tensor
Tensor.hardshrink(lambd : number=0.5) -> Tensor
Tensor.histc(bins : int=100,
             min : number=0,
             max : number=0,
             out : Tensor) -> Tensor
Tensor.histc(bins : int=100,
             min : number=0,
             max : number=0) -> Tensor
Tensor.ifft(signal_ndim : int,
            normalized : bool=False) -> Tensor
Tensor.index_add(dim : int,
                 index : Tensor,
                 source : Tensor) -> Tensor
Tensor.index_add_(dim : int,
                  index : Tensor,
                  source : Tensor) -> Tensor
Tensor.index_copy(dim : int,
                  index : Tensor,
                  source : Tensor) -> Tensor
Tensor.index_copy_(dim : int,
                   index : Tensor,
                   source : Tensor) -> Tensor
Tensor.index_fill(dim : int,
                  index : Tensor,
                  value : Tensor) -> Tensor
Tensor.index_fill(dim : int,
                  index : Tensor,
                  value : number) -> Tensor
Tensor.index_fill_(dim : int,
                   index : Tensor,
                   value : Tensor) -> Tensor
Tensor.index_fill_(dim : int,
                   index : Tensor,
                   value : number) -> Tensor
Tensor.index_put(indices : List[Optional[Tensor]],
                 values : Tensor,
                 accumulate : bool=False) -> Tensor
Tensor.index_put(indices : List[Tensor],
                 values : Tensor,
                 accumulate : bool=False) -> Tensor
Tensor.index_put_(indices : List[Optional[Tensor]],
                  values : Tensor,
                  accumulate : bool=False) -> Tensor
Tensor.index_put_(indices : List[Tensor],
                  values : Tensor,
                  accumulate : bool=False) -> Tensor
Tensor.index_select(dim : int,
                    index : Tensor,
                    out : Tensor) -> Tensor
Tensor.index_select(dim : int,
                    index : Tensor) -> Tensor
Tensor.indices() -> Tensor
Tensor.inverse(out : Tensor) -> Tensor
Tensor.inverse() -> Tensor
Tensor.irfft(signal_ndim : int,
             normalized : bool=False,
             onesided : bool=True,
             signal_sizes : List[int]=[]) -> Tensor
Tensor.is_coalesced() -> bool
Tensor.is_complex() -> bool
Tensor.is_distributed() -> bool
Tensor.is_floating_point() -> bool
Tensor.is_nonzero() -> bool
Tensor.is_same_size(other : Tensor) -> bool
Tensor.is_set_to(tensor : Tensor) -> bool
Tensor.is_signed() -> bool
Tensor.isclose(other : Tensor,
               rtol : float=1e-05,
               atol : float=1e-08,
               equal_nan : bool=False) -> Tensor
Tensor.item() -> number
Tensor.kthvalue(k : int,
                dim : int=-1,
                keepdim : bool=False) -> Tuple[Tensor, Tensor]
Tensor.le(other : Tensor,
          out : Tensor) -> Tensor
Tensor.le(other : number,
          out : Tensor) -> Tensor
Tensor.le(other : Tensor) -> Tensor
Tensor.le(other : number) -> Tensor
Tensor.le_(other : Tensor) -> Tensor
Tensor.le_(other : number) -> Tensor
Tensor.lerp(end : Tensor,
            weight : Tensor) -> Tensor
Tensor.lerp(end : Tensor,
            weight : number) -> Tensor
Tensor.lerp(end : Tensor,
            weight : Tensor,
            out : Tensor) -> Tensor
Tensor.lerp(end : Tensor,
            weight : number,
            out : Tensor) -> Tensor
Tensor.lerp_(end : Tensor,
             weight : Tensor) -> Tensor
Tensor.lerp_(end : Tensor,
             weight : number) -> Tensor
Tensor.lgamma(out : Tensor) -> Tensor
Tensor.lgamma() -> Tensor
Tensor.lgamma_() -> Tensor
Tensor.log() -> Tensor
Tensor.log(out : Tensor) -> Tensor
Tensor.log10(out : Tensor) -> Tensor
Tensor.log10() -> Tensor
Tensor.log10_() -> Tensor
Tensor.log1p() -> Tensor
Tensor.log1p(out : Tensor) -> Tensor
Tensor.log1p_() -> Tensor
Tensor.log2() -> Tensor
Tensor.log2(out : Tensor) -> Tensor
Tensor.log2_() -> Tensor
Tensor.log_() -> Tensor
Tensor.log_normal_(mean : float=1.0,
                   std : float=2.0,
                   generator : Optional[Generator]) -> Tensor
Tensor.log_softmax(dim : int) -> Tensor
Tensor.log_softmax(dim : int,
                   dtype : int) -> Tensor
Tensor.logdet() -> Tensor
Tensor.logsumexp(dim : List[int],
                 keepdim : bool=False) -> Tensor
Tensor.logsumexp(dim : List[int],
                 keepdim : bool=False,
                 out : Tensor) -> Tensor
Tensor.lt(other : Tensor,
          out : Tensor) -> Tensor
Tensor.lt(other : number,
          out : Tensor) -> Tensor
Tensor.lt(other : Tensor) -> Tensor
Tensor.lt(other : number) -> Tensor
Tensor.lt_(other : Tensor) -> Tensor
Tensor.lt_(other : number) -> Tensor
Tensor.masked_fill(mask : Tensor,
                   value : Tensor) -> Tensor
Tensor.masked_fill(mask : Tensor,
                   value : number) -> Tensor
Tensor.masked_fill_(mask : Tensor,
                    value : Tensor) -> Tensor
Tensor.masked_fill_(mask : Tensor,
                    value : number) -> Tensor
Tensor.masked_scatter(mask : Tensor,
                      source : Tensor) -> Tensor
Tensor.masked_scatter_(mask : Tensor,
                       source : Tensor) -> Tensor
Tensor.masked_select(mask : Tensor,
                     out : Tensor) -> Tensor
Tensor.masked_select(mask : Tensor) -> Tensor
Tensor.matmul(other : Tensor,
              out : Tensor) -> Tensor
Tensor.matmul(other : Tensor) -> Tensor
Tensor.matrix_power(n : int) -> Tensor
Tensor.max(other : Tensor,
           out : Tensor) -> Tensor
Tensor.max() -> Tensor
Tensor.max(other : Tensor) -> Tensor
Tensor.max(dim : int,
           keepdim : bool=False) -> Tuple[Tensor, Tensor]
Tensor.mean() -> Tensor
Tensor.mean(dtype : int) -> Tensor
Tensor.mean(dim : List[int],
            keepdim : bool=False) -> Tensor
Tensor.mean(dim : List[int],
            dtype : int) -> Tensor
Tensor.mean(dim : List[int],
            keepdim : bool,
            dtype : int) -> Tensor
Tensor.mean(dim : List[int],
            keepdim : bool=False,
            out : Tensor) -> Tensor
Tensor.mean(dim : List[int],
            dtype : int,
            out : Tensor) -> Tensor
Tensor.mean(dim : List[int],
            keepdim : bool,
            dtype : int,
            out : Tensor) -> Tensor
Tensor.median() -> Tensor
Tensor.median(dim : int,
              keepdim : bool=False) -> Tuple[Tensor, Tensor]
Tensor.min() -> Tensor
Tensor.min(other : Tensor) -> Tensor
Tensor.min(dim : int,
           keepdim : bool=False) -> Tuple[Tensor, Tensor]
Tensor.min(other : Tensor,
           out : Tensor) -> Tensor
Tensor.mm(mat2 : Tensor,
          out : Tensor) -> Tensor
Tensor.mm(mat2 : Tensor) -> Tensor
Tensor.mode(dim : int=-1,
            keepdim : bool=False) -> Tuple[Tensor, Tensor]
Tensor.mul(other : Tensor) -> Tensor
Tensor.mul(other : number) -> Tensor
Tensor.mul(other : Tensor,
           out : Tensor) -> Tensor
Tensor.mul_(other : Tensor) -> Tensor
Tensor.mul_(other : number) -> Tensor
Tensor.multinomial(num_samples : int,
                   replacement : bool=False,
                   generator : Optional[Generator]) -> Tensor
Tensor.multinomial(num_samples : int,
                   replacement : bool=False,
                   generator : Optional[Generator],
                   out : Tensor) -> Tensor
Tensor.mv(vec : Tensor,
          out : Tensor) -> Tensor
Tensor.mv(vec : Tensor) -> Tensor
Tensor.mvlgamma(p : int) -> Tensor
Tensor.mvlgamma_(p : int) -> Tensor
Tensor.narrow(dim : int,
              start : int,
              length : int) -> Tensor
Tensor.narrow_copy(dim : int,
                   start : int,
                   length : int) -> Tensor
Tensor.ne(other : Tensor) -> Tensor
Tensor.ne(other : number) -> Tensor
Tensor.ne(other : Tensor,
          out : Tensor) -> Tensor
Tensor.ne(other : number,
          out : Tensor) -> Tensor
Tensor.ne_(other : Tensor) -> Tensor
Tensor.ne_(other : number) -> Tensor
Tensor.neg(out : Tensor) -> Tensor
Tensor.neg() -> Tensor
Tensor.neg_() -> Tensor
Tensor.nonzero(out : Tensor) -> Tensor
Tensor.nonzero() -> Tensor
Tensor.norm(p : number=2) -> Tensor
Tensor.norm(p : Optional[number],
            dtype : int) -> Tensor
Tensor.norm(p : Optional[number],
            dim : List[int],
            keepdim : bool=False) -> Tensor
Tensor.norm(p : Optional[number],
            dim : List[int],
            keepdim : bool,
            dtype : int) -> Tensor
Tensor.norm(p : Optional[number],
            dim : List[int],
            keepdim : bool=False,
            out : Tensor) -> Tensor
Tensor.norm(p : Optional[number],
            dim : List[int],
            keepdim : bool,
            dtype : int,
            out : Tensor) -> Tensor
Tensor.normal_(mean : float=0.0,
               std : float=1.0,
               generator : Optional[Generator]) -> Tensor
Tensor.numel() -> int
Tensor.orgqr(input2 : Tensor) -> Tensor
Tensor.orgqr(input2 : Tensor,
             out : Tensor) -> Tensor
Tensor.ormqr(input2 : Tensor,
             input3 : Tensor,
             left : bool=True,
             transpose : bool=False) -> Tensor
Tensor.ormqr(input2 : Tensor,
             input3 : Tensor,
             left : bool=True,
             transpose : bool=False,
             out : Tensor) -> Tensor
Tensor.permute(dims : List[int]) -> Tensor
Tensor.pin_memory() -> Tensor
Tensor.pinverse(rcond : float=1e-15) -> Tensor
Tensor.polygamma_(n : int) -> Tensor
Tensor.potri(upper : bool=True) -> Tensor
Tensor.potri(upper : bool=True,
             out : Tensor) -> Tensor
Tensor.pow(exponent : Tensor) -> Tensor
Tensor.pow(exponent : number) -> Tensor
Tensor.pow(exponent : Tensor,
           out : Tensor) -> Tensor
Tensor.pow(exponent : number,
           out : Tensor) -> Tensor
Tensor.pow_(exponent : Tensor) -> Tensor
Tensor.pow_(exponent : number) -> Tensor
Tensor.prelu(weight : Tensor) -> Tensor
Tensor.prod(dim : int,
            keepdim : bool=False,
            out : Tensor) -> Tensor
Tensor.prod(dim : int,
            dtype : int,
            out : Tensor) -> Tensor
Tensor.prod(dim : int,
            keepdim : bool,
            dtype : int,
            out : Tensor) -> Tensor
Tensor.prod() -> Tensor
Tensor.prod(dtype : int) -> Tensor
Tensor.prod(dim : int,
            keepdim : bool=False) -> Tensor
Tensor.prod(dim : int,
            dtype : int) -> Tensor
Tensor.prod(dim : int,
            keepdim : bool,
            dtype : int) -> Tensor
Tensor.pstrf(upper : bool=True,
             tol : number=-1) -> Tuple[Tensor, Tensor]
Tensor.put_(index : Tensor,
            source : Tensor,
            accumulate : bool=False) -> Tensor
Tensor.qr() -> Tuple[Tensor, Tensor]
Tensor.random_(generator : Optional[Generator]) -> Tensor
Tensor.random_(to : int,
               generator : Optional[Generator]) -> Tensor
Tensor.random_(from : int,
               to : int,
               generator : Optional[Generator]) -> Tensor
Tensor.reciprocal() -> Tensor
Tensor.reciprocal(out : Tensor) -> Tensor
Tensor.reciprocal_() -> Tensor
Tensor.relu() -> Tensor
Tensor.relu_() -> Tensor
Tensor.remainder(other : Tensor) -> Tensor
Tensor.remainder(other : number) -> Tensor
Tensor.remainder(other : Tensor,
                 out : Tensor) -> Tensor
Tensor.remainder(other : number,
                 out : Tensor) -> Tensor
Tensor.remainder_(other : Tensor) -> Tensor
Tensor.remainder_(other : number) -> Tensor
Tensor.renorm(p : number,
              dim : int,
              maxnorm : number,
              out : Tensor) -> Tensor
Tensor.renorm(p : number,
              dim : int,
              maxnorm : number) -> Tensor
Tensor.renorm_(p : number,
               dim : int,
               maxnorm : number) -> Tensor
Tensor.repeat(repeats : List[int]) -> Tensor
Tensor.reshape(shape : List[int]) -> Tensor
Tensor.reshape_as(other : Tensor) -> Tensor
Tensor.resize_(size : List[int]) -> Tensor
Tensor.resize_as_(the_template : Tensor) -> Tensor
Tensor.rfft(signal_ndim : int,
            normalized : bool=False,
            onesided : bool=True) -> Tensor
Tensor.roll(shifts : List[int],
            dims : List[int]=[]) -> Tensor
Tensor.rot90(k : int=1,
             dims : List[int]=[0, 1]) -> Tensor
Tensor.round() -> Tensor
Tensor.round(out : Tensor) -> Tensor
Tensor.round_() -> Tensor
Tensor.rsqrt(out : Tensor) -> Tensor
Tensor.rsqrt() -> Tensor
Tensor.rsqrt_() -> Tensor
Tensor.scatter(dim : int,
               index : Tensor,
               src : Tensor) -> Tensor
Tensor.scatter(dim : int,
               index : Tensor,
               value : number) -> Tensor
Tensor.scatter_(dim : int,
                index : Tensor,
                src : Tensor) -> Tensor
Tensor.scatter_(dim : int,
                index : Tensor,
                value : number) -> Tensor
Tensor.scatter_add(dim : int,
                   index : Tensor,
                   src : Tensor) -> Tensor
Tensor.scatter_add_(dim : int,
                    index : Tensor,
                    src : Tensor) -> Tensor
Tensor.select(dim : int,
              index : int) -> Tensor
Tensor.set_() -> Tensor
Tensor.set_(source : Tensor) -> Tensor
Tensor.sigmoid() -> Tensor
Tensor.sigmoid(out : Tensor) -> Tensor
Tensor.sigmoid_() -> Tensor
Tensor.sign() -> Tensor
Tensor.sign(out : Tensor) -> Tensor
Tensor.sign_() -> Tensor
Tensor.sin() -> Tensor
Tensor.sin(out : Tensor) -> Tensor
Tensor.sin_() -> Tensor
Tensor.sinh(out : Tensor) -> Tensor
Tensor.sinh() -> Tensor
Tensor.sinh_() -> Tensor
Tensor.size(dim : int) -> int
Tensor.size() -> List[int]
Tensor.slogdet() -> Tuple[Tensor, Tensor]
Tensor.smm(mat2 : Tensor) -> Tensor
Tensor.softmax(dim : int) -> Tensor
Tensor.softmax(dim : int,
               dtype : int) -> Tensor
Tensor.solve(A : Tensor) -> Tuple[Tensor, Tensor]
Tensor.sort(dim : int=-1,
            descending : bool=False) -> Tuple[Tensor, Tensor]
Tensor.sparse_dim() -> int
Tensor.sparse_resize_(size : List[int],
                      sparse_dim : int,
                      dense_dim : int) -> Tensor
Tensor.sparse_resize_and_clear_(size : List[int],
                                sparse_dim : int,
                                dense_dim : int) -> Tensor
Tensor.split(split_size : int,
             dim : int=0) -> List[Tensor]
Tensor.split(split_sizes : List[int],
             dim : int=0) -> List[Tensor]
Tensor.split_with_sizes(split_sizes : List[int],
                        dim : int=0) -> List[Tensor]
Tensor.sqrt(out : Tensor) -> Tensor
Tensor.sqrt() -> Tensor
Tensor.sqrt_() -> Tensor
Tensor.squeeze() -> Tensor
Tensor.squeeze(dim : int) -> Tensor
Tensor.squeeze_() -> Tensor
Tensor.squeeze_(dim : int) -> Tensor
Tensor.sspaddmm(mat1 : Tensor,
                mat2 : Tensor,
                beta : number=1,
                alpha : number=1,
                out : Tensor) -> Tensor
Tensor.sspaddmm(mat1 : Tensor,
                mat2 : Tensor,
                beta : number=1,
                alpha : number=1) -> Tensor
Tensor.std(unbiased : bool=True) -> Tensor
Tensor.std(dim : List[int],
           unbiased : bool=True,
           keepdim : bool=False) -> Tensor
Tensor.std(dim : List[int],
           unbiased : bool=True,
           keepdim : bool=False,
           out : Tensor) -> Tensor
Tensor.stft(n_fft : int,
            hop_length : Optional[int],
            win_length : Optional[int],
            window : Optional[Tensor],
            normalized : bool=False,
            onesided : bool=True) -> Tensor
Tensor.storage_offset() -> int
Tensor.storage_offset() -> int
Tensor.storage_offset() -> int
Tensor.stride(dim : int) -> int
Tensor.sub(other : Tensor,
           alpha : number=1) -> Tensor
Tensor.sub(other : number,
           alpha : number=1) -> Tensor
Tensor.sub(other : Tensor,
           alpha : number=1,
           out : Tensor) -> Tensor
Tensor.sub_(other : Tensor,
            alpha : number=1) -> Tensor
Tensor.sub_(other : number,
            alpha : number=1) -> Tensor
Tensor.sum(dim : List[int],
           keepdim : bool=False,
           out : Tensor) -> Tensor
Tensor.sum(dim : List[int],
           dtype : int,
           out : Tensor) -> Tensor
Tensor.sum(dim : List[int],
           keepdim : bool,
           dtype : int,
           out : Tensor) -> Tensor
Tensor.sum() -> Tensor
Tensor.sum(dtype : int) -> Tensor
Tensor.sum(dim : List[int],
           keepdim : bool=False) -> Tensor
Tensor.sum(dim : List[int],
           dtype : int) -> Tensor
Tensor.sum(dim : List[int],
           keepdim : bool,
           dtype : int) -> Tensor
Tensor.sum_to_size(size : List[int]) -> Tensor
Tensor.svd(some : bool=True,
           compute_uv : bool=True) -> Tuple[Tensor, Tensor, Tensor]
Tensor.symeig(eigenvectors : bool=False,
              upper : bool=True) -> Tuple[Tensor, Tensor]
Tensor.t() -> Tensor
Tensor.t_() -> Tensor
Tensor.take(index : Tensor) -> Tensor
Tensor.take(index : Tensor,
            out : Tensor) -> Tensor
Tensor.tan(out : Tensor) -> Tensor
Tensor.tan() -> Tensor
Tensor.tan_() -> Tensor
Tensor.tanh() -> Tensor
Tensor.tanh(out : Tensor) -> Tensor
Tensor.tanh_() -> Tensor
Tensor.to(other : Tensor,
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(dtype : int,
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(device : Device,
          dtype : int,
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(dtype : int,
          layout : int,
          device : Device,
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(device : Optional[Device],
          dtype : Optional[int],
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(dtype : Optional[int],
          non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to(non_blocking : bool=False,
          copy : bool=False) -> Tensor
Tensor.to_dense() -> Tensor
Tensor.to_sparse() -> Tensor
Tensor.to_sparse(sparse_dim : int) -> Tensor
Tensor.topk(k : int,
            dim : int=-1,
            largest : bool=True,
            sorted : bool=True) -> Tuple[Tensor, Tensor]
Tensor.trace() -> Tensor
Tensor.transpose(dim0 : int,
                 dim1 : int) -> Tensor
Tensor.transpose_(dim0 : int,
                  dim1 : int) -> Tensor
Tensor.tril(diagonal : int=0,
            out : Tensor) -> Tensor
Tensor.tril(diagonal : int=0) -> Tensor
Tensor.tril_(diagonal : int=0) -> Tensor
Tensor.triu(diagonal : int=0,
            out : Tensor) -> Tensor
Tensor.triu(diagonal : int=0) -> Tensor
Tensor.triu_(diagonal : int=0) -> Tensor
Tensor.trtrs(A : Tensor,
             upper : bool=True,
             transpose : bool=False,
             unitriangular : bool=False) -> Tuple[Tensor, Tensor]
Tensor.trunc() -> Tensor
Tensor.trunc(out : Tensor) -> Tensor
Tensor.trunc_() -> Tensor
Tensor.type_as(other : Tensor) -> Tensor
Tensor.unbind(dim : int=0) -> List[Tensor]
Tensor.unfold(dimension : int,
              size : int,
              step : int) -> Tensor
Tensor.uniform_(from : float=0.0,
                to : float=1.0,
                generator : Optional[Generator]) -> Tensor
Tensor.unsqueeze(dim : int) -> Tensor
Tensor.unsqueeze_(dim : int) -> Tensor
Tensor.values() -> Tensor
Tensor.var(dim : List[int],
           unbiased : bool=True,
           keepdim : bool=False,
           out : Tensor) -> Tensor
Tensor.var(unbiased : bool=True) -> Tensor
Tensor.var(dim : List[int],
           unbiased : bool=True,
           keepdim : bool=False) -> Tensor
Tensor.view(size : List[int]) -> Tensor
Tensor.view_as(other : Tensor) -> Tensor
Tensor.zero_() -> Tensor
Frequently Asked Questions¶
Q: I would like to train a model on GPU and do inference on CPU. What are the best practices?
First convert your model from GPU to CPU and then save it, like so:
cpu_model = gpu_model.cpu() sample_input_cpu = sample_input_gpu.cpu() traced_cpu = torch.jit.trace(traced_cpu, sample_input_cpu) torch.jit.save(traced_cpu, "cpu.pth") traced_gpu = torch.jit.trace(traced_gpu, sample_input_gpu) torch.jit.save(traced_gpu, "gpu.pth") # ... later, when using the model: if use_gpu: model = torch.jit.load("gpu.pth") else: model = torch.jit.load("cpu.pth") model(input)This is recommended because the tracer may witness tensor creation on a specific device, so casting an already-loaded model may have unexpected effects. Casting the model before saving it ensures that the tracer has the correct device information.
Multiprocessing package - torch.multiprocessing¶
torch.multiprocessing is a wrapper around the native multiprocessing
module. It registers custom reducers, that use shared memory to provide shared
views on the same data in different processes. Once the tensor/storage is moved
to shared_memory (see share_memory_()), it will be possible
to send it to other processes without making any copies.
The API is 100% compatible with the original module - it’s enough to change
import multiprocessing to import torch.multiprocessing to have all the
tensors sent through the queues or shared via other mechanisms, moved to shared
memory.
Because of the similarity of APIs we do not document most of this package contents, and we recommend referring to very good docs of the original module.
Warning
If the main process exits abruptly (e.g. because of an incoming signal),
Python’s multiprocessing sometimes fails to clean up its children.
It’s a known caveat, so if you’re seeing any resource leaks after
interrupting the interpreter, it probably means that this has just happened
to you.
Strategy management¶
- 
torch.multiprocessing.get_all_sharing_strategies()¶
- Returns a set of sharing strategies supported on a current system. 
- 
torch.multiprocessing.get_sharing_strategy()¶
- Returns the current strategy for sharing CPU tensors. 
- 
torch.multiprocessing.set_sharing_strategy(new_strategy)¶
- Sets the strategy for sharing CPU tensors. - Parameters
- new_strategy (str) – Name of the selected strategy. Should be one of the values returned by - get_all_sharing_strategies().
 
Sharing CUDA tensors¶
Sharing CUDA tensors between processes is supported only in Python 3, using
a spawn or forkserver start methods. multiprocessing in
Python 2 can only create subprocesses using fork, and it’s not supported
by the CUDA runtime.
Unlike CPU tensors, the sending process is required to keep the original tensor as long as the receiving process retains a copy of the tensor. This shouldn’t be a problem for sharing model parameters (which stay live for the entire execution of the model), but passing other kinds of data should be done with care.
Here is an example program which handles these requirements correctly:
import torch
import torch.multiprocessing as mp
torch.set_default_tensor_type(torch.cuda.FloatTensor)
def sender(q, e):
    for i in range(10):
        s_sample = [torch.zeros(1), torch.ones(1)]
        q.put(s_sample)
        e.wait()
        del s_sample
        e.clear()
if __name__ == "__main__":
    ctx = mp.get_context("spawn")
    q = ctx.Queue()
    e = ctx.Event()
    p = ctx.Process(target=sender, args=(q, e))
    p.start()
    for i in range(10):
        print('=== ITER {} ===".format(i))
        r_sample = q.get()
        del r_sample
        e.set()
    p.join()
In the example above, calling e.wait() on sender side ensures tensor s_sample doesn’t get deleted while receiver is working on it. The receiver signals when it is done with the tensor using e.set(), being careful to del its reference to the received tensor first. It is INSUFFICIENT to promise never to call r_sample again; while r_sample is live, it may be confused with any subsequent tensors allocated by the source process at the same address.
If a receiver wants to save the data of r_sample for future use while letting the source process deallocate the original, it must clone() it.
This behavior is very confusing, and we are tracking a fix for it at https://github.com/pytorch/pytorch/issues/16141
Sharing strategies¶
This section provides a brief overview into how different sharing strategies work. Note that it applies only to CPU tensor - CUDA tensors will always use the CUDA API, as that’s the only way they can be shared.
File descriptor - file_descriptor¶
Note
This is the default strategy (except for macOS and OS X where it’s not supported).
This strategy will use file descriptors as shared memory handles. Whenever a
storage is moved to shared memory, a file descriptor obtained from shm_open
is cached with the object, and when it’s going to be sent to other processes,
the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
receiver will also cache the file descriptor and mmap it, to obtain a shared
view onto the storage data.
Note that if there will be a lot of tensors shared, this strategy will keep a
large number of file descriptors open most of the time. If your system has low
limits for the number of open file descriptors, and you can’t raise them, you
should use the file_system strategy.
File system - file_system¶
This strategy will use file names given to shm_open to identify the shared
memory regions. This has a benefit of not requiring the implementation to cache
the file descriptors obtained from it, but at the same time is prone to shared
memory leaks. The file can’t be deleted right after its creation, because other
processes need to access it to open their views. If the processes fatally
crash, or are killed, and don’t call the storage destructors, the files will
remain in the system. This is very serious, because they keep using up the
memory until the system is restarted, or they’re freed manually.
To counter the problem of shared memory file leaks, torch.multiprocessing
will spawn a daemon named torch_shm_manager that will isolate itself from
the current process group, and will keep track of all shared memory allocations.
Once all processes connected to it exit, it will wait a moment to ensure there
will be no new connections, and will iterate over all shared memory files
allocated by the group. If it finds that any of them still exist, they will be
deallocated. We’ve tested this method and it proved to be robust to various
failures. Still, if your system has high enough limits, and file_descriptor
is a supported strategy, we do not recommend switching to this one.
Spawning subprocesses¶
Note
Available for Python >= 3.4.
This depends on the spawn start method in Python’s
multiprocessing package.
Spawning a number of subprocesses to perform some function can be done
by creating Process instances and calling join to wait for
their completion. This approach works fine when dealing with a single
subprocess but presents potential issues when dealing with multiple
processes.
Namely, joining processes sequentially implies they will terminate sequentially. If they don’t, and the first process does not terminate, the process termination will go unnoticed. Also, there are no native facilities for error propagation.
The spawn function below addresses these concerns and takes care
of error propagation, out of order termination, and will actively
terminate processes upon detecting an error in one of them.
- 
torch.multiprocessing.spawn(fn, args=(), nprocs=1, join=True, daemon=False)¶
- Spawns - nprocsprocesses that run- fnwith- args.- If one of the processes exits with a non-zero exit status, the remaining processes are killed and an exception is raised with the cause of termination. In the case an exception was caught in the child process, it is forwarded and its traceback is included in the exception raised in the parent process. - Parameters
- fn (function) – - Function is called as the entrypoint of the spawned process. This function must be defined at the top level of a module so it can be pickled and spawned. This is a requirement imposed by multiprocessing. - The function is called as - fn(i, *args), where- iis the process index and- argsis the passed through tuple of arguments.
- args (tuple) – Arguments passed to - fn.
- nprocs (int) – Number of processes to spawn. 
- join (bool) – Perform a blocking join on all processes. 
- daemon (bool) – The spawned processes’ daemon flag. If set to True, daemonic processes will be created. 
 
- Returns
- None if - joinis- True,- SpawnContextif- joinis- False
 
- 
class torch.multiprocessing.SpawnContext¶
- Returned by - spawn()when called with- join=False.- 
join(timeout=None)¶
- Tries to join one or more processes in this spawn context. If one of them exited with a non-zero exit status, this function kills the remaining processes and raises an exception with the cause of the first process exiting. - Returns - Trueif all processes have been joined successfully,- Falseif there are more processes that need to be joined.- Parameters
- timeout (float) – Wait this long before giving up on waiting. 
 
 
- 
torch.utils.bottleneck¶
torch.utils.bottleneck is a tool that can be used as an initial step for debugging bottlenecks in your program. It summarizes runs of your script with the Python profiler and PyTorch’s autograd profiler.
Run it on the command line with
python -m torch.utils.bottleneck /path/to/source/script.py [args]
where [args] are any number of arguments to script.py, or run
python -m torch.utils.bottleneck -h for more usage instructions.
Warning
Because your script will be profiled, please ensure that it exits in a finite amount of time.
Warning
Due to the asynchronous nature of CUDA kernels, when running against CUDA code, the cProfile output and CPU-mode autograd profilers may not show correct timings: the reported CPU time reports the amount of time used to launch the kernels but does not include the time the kernel spent executing on a GPU unless the operation does a synchronize. Ops that do synchronize appear to be extremely expensive under regular CPU-mode profilers. In these case where timings are incorrect, the CUDA-mode autograd profiler may be helpful.
Note
To decide which (CPU-only-mode or CUDA-mode) autograd profiler output to look at, you should first check if your script is CPU-bound (“CPU total time is much greater than CUDA total time”). If it is CPU-bound, looking at the results of the CPU-mode autograd profiler will help. If on the other hand your script spends most of its time executing on the GPU, then it makes sense to start looking for responsible CUDA operators in the output of the CUDA-mode autograd profiler.
Of course the reality is much more complicated and your script might not be
in one of those two extremes depending on the part of the model you’re
evaluating. If the profiler outputs don’t help, you could try looking at
the result of torch.autograd.profiler.emit_nvtx() with nvprof.
However, please take into account that the NVTX overhead is very high and
often gives a heavily skewed timeline.
Warning
If you are profiling CUDA code, the first profiler that bottleneck runs
(cProfile) will include the CUDA startup time (CUDA buffer allocation cost)
in its time reporting. This should not matter if your bottlenecks result
in code much slower than the CUDA startup time.
For more complicated uses of the profilers (like in a multi-GPU case),
please see https://docs.python.org/3/library/profile.html
or torch.autograd.profiler.profile() for more information.
torch.utils.checkpoint¶
Note
Checkpointing is implemented by rerunning a forward-pass segment for
each checkpointed segment during backward.  This can cause persistent
states like the RNG state to be advanced than they would without
checkpointing.  By default, checkpointing includes logic to juggle
the RNG state such that checkpointed passes making use of RNG
(through dropout for example) have deterministic output as
compared to non-checkpointed passes.  The logic to stash and restore
RNG states can incur a moderate performance hit depending on the runtime
of checkpointed operations.  If deterministic output compared to
non-checkpointed passes is not required, supply preserve_rng_state=False
to checkpoint or checkpoint_sequential to omit stashing and
restoring the RNG state during each checkpoint.
The stashing logic saves and restores the RNG state for the current device
and the device of all cuda Tensor arguments to the run_fn.
However, the logic has no way to anticipate if the user will move
Tensors to a new device within the run_fn itself.  Therefore, if you move
Tensors to a new device (“new” meaning not belonging to the set of
[current device + devices of Tensor arguments]) within run_fn, deterministic
output compared to non-checkpointed passes is never guaranteed.
- 
torch.utils.checkpoint.checkpoint(function, *args, **kwargs)¶
- Checkpoint a model or part of the model - Checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward, the checkpointed part does not save intermediate activations, and instead recomputes them in backward pass. It can be applied on any part of a model. - Specifically, in the forward pass, - functionwill run in- torch.no_grad()manner, i.e., not storing the intermediate activations. Instead, the forward pass saves the inputs tuple and the- functionparameter. In the backwards pass, the saved inputs and- functionis retreived, and the forward pass is computed on- functionagain, now tracking the intermediate activations, and then the gradients are calculated using these activation values.- Warning - Checkpointing doesn’t work with - torch.autograd.grad(), but only with- torch.autograd.backward().- Warning - If - functioninvocation during backward does anything different than the one during forward, e.g., due to some global variable, the checkpointed version won’t be equivalent, and unfortunately it can’t be detected.- Parameters
- function – describes what to run in the forward pass of the model or part of the model. It should also know how to handle the inputs passed as the tuple. For example, in LSTM, if user passes - (activation, hidden),- functionshould correctly use the first input as- activationand the second input as- hidden
- preserve_rng_state (bool, optional, default=True) – Omit stashing and restoring the RNG state during each checkpoint. 
- args – tuple containing inputs to the - function
 
- Returns
- Output of running - functionon- *args
 
- 
torch.utils.checkpoint.checkpoint_sequential(functions, segments, *inputs, **kwargs)¶
- A helper function for checkpointing sequential models. - Sequential models execute a list of modules/functions in order (sequentially). Therefore, we can divide such a model in various segments and checkpoint each segment. All segments except the last will run in - torch.no_grad()manner, i.e., not storing the intermediate activations. The inputs of each checkpointed segment will be saved for re-running the segment in the backward pass.- See - checkpoint()on how checkpointing works.- Warning - Checkpointing doesn’t work with - torch.autograd.grad(), but only with- torch.autograd.backward().- Parameters
- functions – A - torch.nn.Sequentialor the list of modules or functions (comprising the model) to run sequentially.
- segments – Number of chunks to create in the model 
- inputs – tuple of Tensors that are inputs to - functions
- preserve_rng_state (bool, optional, default=True) – Omit stashing and restoring the RNG state during each checkpoint. 
 
- Returns
- Output of running - functionssequentially on- *inputs
 - Example - >>> model = nn.Sequential(...) >>> input_var = checkpoint_sequential(model, chunks, input_var) 
torch.utils.cpp_extension¶
- 
torch.utils.cpp_extension.CppExtension(name, sources, *args, **kwargs)¶
- Creates a - setuptools.Extensionfor C++.- Convenience method that creates a - setuptools.Extensionwith the bare minimum (but often sufficient) arguments to build a C++ extension.- All arguments are forwarded to the - setuptools.Extensionconstructor.- Example - >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CppExtension >>> setup( name='extension', ext_modules=[ CppExtension( name='extension', sources=['extension.cpp'], extra_compile_args=['-g']), ], cmdclass={ 'build_ext': BuildExtension }) 
- 
torch.utils.cpp_extension.CUDAExtension(name, sources, *args, **kwargs)¶
- Creates a - setuptools.Extensionfor CUDA/C++.- Convenience method that creates a - setuptools.Extensionwith the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.- All arguments are forwarded to the - setuptools.Extensionconstructor.- Example - >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension >>> setup( name='cuda_extension', ext_modules=[ CUDAExtension( name='cuda_extension', sources=['extension.cpp', 'extension_kernel.cu'], extra_compile_args={'cxx': ['-g'], 'nvcc': ['-O2']}) ], cmdclass={ 'build_ext': BuildExtension }) 
- 
torch.utils.cpp_extension.BuildExtension(*args, **kwargs)¶
- A custom - setuptoolsbuild extension .- This - setuptools.build_extsubclass takes care of passing the minimum required compiler flags (e.g.- -std=c++11) as well as mixed C++/CUDA compilation (and support for CUDA files in general).- When using - BuildExtension, it is allowed to supply a dictionary for- extra_compile_args(rather than the usual list) that maps from languages (- cxxor- cuda) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.
- 
torch.utils.cpp_extension.load(name, sources, extra_cflags=None, extra_cuda_cflags=None, extra_ldflags=None, extra_include_paths=None, build_directory=None, verbose=False, with_cuda=None, is_python_module=True)¶
- Loads a PyTorch C++ extension just-in-time (JIT). - To load an extension, a Ninja build file is emitted, which is used to compile the given sources into a dynamic library. This library is subsequently loaded into the current Python process as a module and returned from this function, ready for use. - By default, the directory to which the build file is emitted and the resulting library compiled to is - <tmp>/torch_extensions/<name>, where- <tmp>is the temporary folder on the current platform and- <name>the name of the extension. This location can be overridden in two ways. First, if the- TORCH_EXTENSIONS_DIRenvironment variable is set, it replaces- <tmp>/torch_extensionsand all extensions will be compiled into subfolders of this directory. Second, if the- build_directoryargument to this function is supplied, it overrides the entire path, i.e. the library will be compiled into that folder directly.- To compile the sources, the default system compiler ( - c++) is used, which can be overridden by setting the- CXXenvironment variable. To pass additional arguments to the compilation process,- extra_cflagsor- extra_ldflagscan be provided. For example, to compile your extension with optimizations, pass- extra_cflags=['-O3']. You can also use- extra_cflagsto pass further include directories.- CUDA support with mixed compilation is provided. Simply pass CUDA source files ( - .cuor- .cuh) along with other sources. Such files will be detected and compiled with nvcc rather than the C++ compiler. This includes passing the CUDA lib64 directory as a library directory, and linking- cudart. You can pass additional flags to nvcc via- extra_cuda_cflags, just like with- extra_cflagsfor C++. Various heuristics for finding the CUDA install directory are used, which usually work fine. If not, setting the- CUDA_HOMEenvironment variable is the safest option.- Parameters
- name – The name of the extension to build. This MUST be the same as the name of the pybind11 module! 
- sources – A list of relative or absolute paths to C++ source files. 
- extra_cflags – optional list of compiler flags to forward to the build. 
- extra_cuda_cflags – optional list of compiler flags to forward to nvcc when building CUDA sources. 
- extra_ldflags – optional list of linker flags to forward to the build. 
- extra_include_paths – optional list of include directories to forward to the build. 
- build_directory – optional path to use as build workspace. 
- verbose – If - True, turns on verbose logging of load steps.
- with_cuda – Determines whether CUDA headers and libraries are added to the build. If set to - None(default), this value is automatically determined based on the existence of- .cuor- .cuhin- sources. Set it to True` to force CUDA headers and libraries to be included.
- is_python_module – If - True(default), imports the produced shared library as a Python module. If- False, loads it into the process as a plain dynamic library.
 
- Returns
- If - is_python_moduleis- True, returns the loaded PyTorch extension as a Python module. If- is_python_moduleis- Falsereturns nothing (the shared library is loaded into the process as a side effect).
 - Example - >>> from torch.utils.cpp_extension import load >>> module = load( name='extension', sources=['extension.cpp', 'extension_kernel.cu'], extra_cflags=['-O2'], verbose=True) 
- 
torch.utils.cpp_extension.load_inline(name, cpp_sources, cuda_sources=None, functions=None, extra_cflags=None, extra_cuda_cflags=None, extra_ldflags=None, extra_include_paths=None, build_directory=None, verbose=False, with_cuda=None, is_python_module=True)¶
- Loads a PyTorch C++ extension just-in-time (JIT) from string sources. - This function behaves exactly like - load(), but takes its sources as strings rather than filenames. These strings are stored to files in the build directory, after which the behavior of- load_inline()is identical to- load().- See the tests for good examples of using this function. - Sources may omit two required parts of a typical non-inline C++ extension: the necessary header includes, as well as the (pybind11) binding code. More precisely, strings passed to - cpp_sourcesare first concatenated into a single- .cppfile. This file is then prepended with- #include <torch/extension.h>.- Furthermore, if the - functionsargument is supplied, bindings will be automatically generated for each function specified.- functionscan either be a list of function names, or a dictionary mapping from function names to docstrings. If a list is given, the name of each function is used as its docstring.- The sources in - cuda_sourcesare concatenated into a separate- .cufile and prepended with- torch/types.h,- cuda.hand- cuda_runtime.hincludes. The- .cppand- .cufiles are compiled separately, but ultimately linked into a single library. Note that no bindings are generated for functions in- cuda_sourcesper se. To bind to a CUDA kernel, you must create a C++ function that calls it, and either declare or define this C++ function in one of the- cpp_sources(and include its name in- functions).- See - load()for a description of arguments omitted below.- Parameters
- cpp_sources – A string, or list of strings, containing C++ source code. 
- cuda_sources – A string, or list of strings, containing CUDA source code. 
- functions – A list of function names for which to generate function bindings. If a dictionary is given, it should map function names to docstrings (which are otherwise just the function names). 
- with_cuda – Determines whether CUDA headers and libraries are added to the build. If set to - None(default), this value is automatically determined based on whether- cuda_sourcesis provided. Set it to True` to force CUDA headers and libraries to be included.
 
 - Example - >>> from torch.utils.cpp_extension import load_inline >>> source = ''' at::Tensor sin_add(at::Tensor x, at::Tensor y) { return x.sin() + y.sin(); } ''' >>> module = load_inline(name='inline_extension', cpp_sources=[source], functions=['sin_add']) 
- 
torch.utils.cpp_extension.include_paths(cuda=False)¶
- Get the include paths required to build a C++ or CUDA extension. - Parameters
- cuda – If True, includes CUDA-specific include paths. 
- Returns
- A list of include path strings. 
 
- 
torch.utils.cpp_extension.check_compiler_abi_compatibility(compiler)¶
- Verifies that the given compiler is ABI-compatible with PyTorch. - Parameters
- compiler (str) – The compiler executable name to check (e.g. - g++). Must be executable in a shell process.
- Returns
- False if the compiler is (likely) ABI-incompatible with PyTorch, else True. 
 
torch.utils.data¶
- 
class torch.utils.data.Dataset¶
- An abstract class representing a Dataset. - All other datasets should subclass it. All subclasses should override - __len__, that provides the size of the dataset, and- __getitem__, supporting integer indexing in range from 0 to len(self) exclusive.
- 
class torch.utils.data.TensorDataset(*tensors)¶
- Dataset wrapping tensors. - Each sample will be retrieved by indexing tensors along the first dimension. - Parameters
- *tensors (Tensor) – tensors that have the same size of the first dimension. 
 
- 
class torch.utils.data.ConcatDataset(datasets)¶
- Dataset to concatenate multiple datasets. Purpose: useful to assemble different existing datasets, possibly large-scale datasets as the concatenation operation is done in an on-the-fly manner. - Parameters
- datasets (sequence) – List of datasets to be concatenated 
 
- 
class torch.utils.data.Subset(dataset, indices)¶
- Subset of a dataset at specified indices. - Parameters
- dataset (Dataset) – The whole Dataset 
- indices (sequence) – Indices in the whole set selected for subset 
 
 
- 
class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=<function default_collate>, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None)¶
- Data loader. Combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset. - Parameters
- dataset (Dataset) – dataset from which to load the data. 
- batch_size (int, optional) – how many samples per batch to load (default: - 1).
- shuffle (bool, optional) – set to - Trueto have the data reshuffled at every epoch (default:- False).
- sampler (Sampler, optional) – defines the strategy to draw samples from the dataset. If specified, - shufflemust be False.
- batch_sampler (Sampler, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with - batch_size,- shuffle,- sampler, and- drop_last.
- num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: - 0)
- collate_fn (callable, optional) – merges a list of samples to form a mini-batch. 
- pin_memory (bool, optional) – If - True, the data loader will copy tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your- collate_fnreturns a batch that is a custom type see the example below.
- drop_last (bool, optional) – set to - Trueto drop the last incomplete batch, if the dataset size is not divisible by the batch size. If- Falseand the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default:- False)
- timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: - 0)
- worker_init_fn (callable, optional) – If not - None, this will be called on each worker subprocess with the worker id (an int in- [0, num_workers - 1]) as input, after seeding and before data loading. (default:- None)
 
 - Note - By default, each worker will have its PyTorch seed set to - base_seed + worker_id, where- base_seedis a long generated by main process using its RNG. However, seeds for other libraies may be duplicated upon initializing workers (w.g., NumPy), causing each worker to return identical random numbers. (See dataloader-workers-random-seed section in FAQ.) You may use- torch.initial_seed()to access the PyTorch seed for each worker in- worker_init_fn, and use it to set other seeds before data loading.- Warning - If - spawnstart method is used,- worker_init_fncannot be an unpicklable object, e.g., a lambda function.- The default memory pinning logic only recognizes Tensors and maps and iterables containg Tensors. By default, if the pinning logic sees a batch that is a custom type (which will occur if you have a - collate_fnthat returns a custom batch type), or if each element of your batch is a custom type, the pinning logic will not recognize them, and it will return that batch (or those elements) without pinning the memory. To enable memory pinning for custom batch or data types, define a- pin_memorymethod on your custom type(s).- Example: - class SimpleCustomBatch: def __init__(self, data): transposed_data = list(zip(*data)) self.inp = torch.stack(transposed_data[0], 0) self.tgt = torch.stack(transposed_data[1], 0) def pin_memory(self): self.inp = self.inp.pin_memory() self.tgt = self.tgt.pin_memory() return self def collate_wrapper(batch): return SimpleCustomBatch(batch) inps = torch.arange(10 * 5, dtype=torch.float32).view(10, 5) tgts = torch.arange(10 * 5, dtype=torch.float32).view(10, 5) dataset = TensorDataset(inps, tgts) loader = DataLoader(dataset, batch_size=2, collate_fn=collate_wrapper, pin_memory=True) for batch_ndx, sample in enumerate(loader): print(sample.inp.is_pinned()) print(sample.tgt.is_pinned()) 
- 
torch.utils.data.random_split(dataset, lengths)¶
- Randomly split a dataset into non-overlapping new datasets of given lengths. - Parameters
- dataset (Dataset) – Dataset to be split 
- lengths (sequence) – lengths of splits to be produced 
 
 
- 
class torch.utils.data.Sampler(data_source)¶
- Base class for all Samplers. - Every Sampler subclass has to provide an __iter__ method, providing a way to iterate over indices of dataset elements, and a __len__ method that returns the length of the returned iterators. 
- 
class torch.utils.data.SequentialSampler(data_source)¶
- Samples elements sequentially, always in the same order. - Parameters
- data_source (Dataset) – dataset to sample from 
 
- 
class torch.utils.data.RandomSampler(data_source, replacement=False, num_samples=None)¶
- Samples elements randomly. If without replacement, then sample from a shuffled dataset. If with replacement, then user can specify - num_samplesto draw.
- 
class torch.utils.data.SubsetRandomSampler(indices)¶
- Samples elements randomly from a given list of indices, without replacement. - Parameters
- indices (sequence) – a sequence of indices 
 
- 
class torch.utils.data.WeightedRandomSampler(weights, num_samples, replacement=True)¶
- Samples elements from [0,..,len(weights)-1] with given probabilities (weights). - Parameters
- weights (sequence) – a sequence of weights, not necessary summing up to one 
- num_samples (int) – number of samples to draw 
- replacement (bool) – if - True, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.
 
 - Example - >>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True)) [0, 0, 0, 1, 0] >>> list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False)) [0, 1, 4, 3, 2] 
- 
class torch.utils.data.BatchSampler(sampler, batch_size, drop_last)¶
- Wraps another sampler to yield a mini-batch of indices. - Parameters
 - Example - >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False)) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]] >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True)) [[0, 1, 2], [3, 4, 5], [6, 7, 8]] 
- 
class torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=None, rank=None)¶
- Sampler that restricts data loading to a subset of the dataset. - It is especially useful in conjunction with - torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it.- Note - Dataset is assumed to be of constant size. - Parameters
- dataset – Dataset used for sampling. 
- num_replicas (optional) – Number of processes participating in distributed training. 
- rank (optional) – Rank of the current process within num_replicas. 
 
 
torch.utils.dlpack¶
- 
torch.utils.dlpack.from_dlpack(dlpack) → Tensor¶
- Decodes a DLPack to a tensor. - Parameters
- dlpack – a PyCapsule object with the dltensor 
 - The tensor will share the memory with the object represented in the dlpack. Note that each dlpack can only be consumed once. 
- 
torch.utils.dlpack.to_dlpack(tensor) → PyCapsule¶
- Returns a DLPack representing the tensor. - Parameters
- tensor – a tensor to be exported 
 - The dlpack shares the tensors memory. Note that each dlpack can only be consumed once. 
torch.hub¶
Pytorch Hub is a pre-trained model repository designed to facilitate research reproducibility.
Publishing models¶
Pytorch Hub supports publishing pre-trained models(model definitions and pre-trained weights)
to a github repository by adding a simple hubconf.py file;
hubconf.py can have multiple entrypoints. Each entrypoint is defined as a python function with
the following signature.
def entrypoint_name(pretrained=False, *args, **kwargs):
    ...
How to implement an entrypoint?¶
Here is a code snippet from pytorch/vision repository, which specifies an entrypoint
for resnet18 model. You can see a full script in
pytorch/vision repo
dependencies = ['torch', 'math']
def resnet18(pretrained=False, *args, **kwargs):
    """
    Resnet18 model
    pretrained (bool): a recommended kwargs for all entrypoints
    args & kwargs are arguments for the function
    """
    ######## Call the model in the repo ###############
    from torchvision.models.resnet import resnet18 as _resnet18
    model = _resnet18(*args, **kwargs)
    ######## End of call ##############################
    # The following logic is REQUIRED
    if pretrained:
        # For weights saved in local repo
        # model.load_state_dict(<path_to_saved_file>)
        # For weights saved elsewhere
        checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
        model.load_state_dict(model_zoo.load_url(checkpoint, progress=False))
    return model
- dependenciesvariable is a list of package names required to to run the model.
- Pretrained weights can either be stored local in the github repo, or loadable by - model_zoo.load().
- pretrainedcontrols whether to load the pre-trained weights provided by repo owners.
- argsand- kwargsare passed along to the real callable function.
- Docstring of the function works as a help message, explaining what does the model do and what are the allowed arguments. 
- Entrypoint function should ALWAYS return a model(nn.module). 
Important Notice¶
- The published models should be at least in a branch/tag. It can’t be a random commit. 
Loading models from Hub¶
Users can load the pre-trained models using torch.hub.load() API.
- 
torch.hub.load(github, model, force_reload=False, *args, **kwargs)¶
- Load a model from a github repo, with pretrained weights. - Parameters
- github – Required, a string with format “repo_owner/repo_name[:tag_name]” with an optional tag/branch. The default branch is master if not specified. Example: ‘pytorch/vision[:hub]’ 
- model – Required, a string of entrypoint name defined in repo’s hubconf.py 
- force_reload – Optional, whether to discard the existing cache and force a fresh download. Default is False. 
- *args – Optional, the corresponding args for callable model. 
- **kwargs – Optional, the corresponding kwargs for callable model. 
 
- Returns
- a single model with corresponding pretrained weights. 
 
Here’s an example loading resnet18 entrypoint from pytorch/vision repo.
hub_model = hub.load(
    'pytorch/vision:master', # repo_owner/repo_name:branch
    'resnet18', # entrypoint
    1234, # args for callable [not applicable to resnet]
    pretrained=True) # kwargs for callable
Where are my downloaded model & weights saved?¶
The locations are used in the order of
- hub_dir: user specified path. It can be set in the following ways: - Setting the environment variable - TORCH_HUB_DIR- Calling- hub.set_dir(<PATH_TO_HUB_DIR>)
- ~/.torch/hub
- 
torch.hub.set_dir(d)¶
- Optionally set hub_dir to a local dir to save downloaded models & weights. - If this argument is not set, env variable TORCH_HUB_DIR will be searched first, ~/.torch/hub will be created and used as fallback. - Parameters
- d – path to a local folder to save downloaded models & weights. 
 
Caching logic¶
By default, we don’t clean up files after loading it. Hub uses the cache by default if it already exists in hub_dir.
Users can force a reload by calling hub.load(..., force_reload=True). This will delete
the existing github folder and downloaded weights, reinitialize a fresh download. This is useful
when updates are published to the same branch, users can keep up with the latest release.
torch.utils.model_zoo¶
- 
torch.utils.model_zoo.load_url(url, model_dir=None, map_location=None, progress=True)¶
- Loads the Torch serialized object at the given URL. - If the object is already present in model_dir, it’s deserialized and returned. The filename part of the URL should follow the naming convention - filename-<sha256>.extwhere- <sha256>is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file.- The default value of model_dir is - $TORCH_HOME/modelswhere- $TORCH_HOMEdefaults to- ~/.torch. The default directory can be overridden with the- $TORCH_MODEL_ZOOenvironment variable.- Parameters
- url (string) – URL of the object to download 
- model_dir (string, optional) – directory in which to save the object 
- map_location (optional) – a function or a dict specifying how to remap storage locations (see torch.load) 
- progress (bool, optional) – whether or not to display a progress bar to stderr 
 
 - Example - >>> state_dict = torch.utils.model_zoo.load_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth') 
torch.onnx¶
Example: End-to-end AlexNet from PyTorch to Caffe2¶
Here is a simple script which exports a pretrained AlexNet as defined in
torchvision into ONNX.  It runs a single round of inference and then
saves the resulting traced model to alexnet.onnx:
import torch
import torchvision
dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
model = torchvision.models.alexnet(pretrained=True).cuda()
# Providing input and output names sets the display names for values
# within the model's graph. Setting these does not change the semantics
# of the graph; it is only for readability.
#
# The inputs to the network consist of the flat list of inputs (i.e.
# the values you would pass to the forward() method) followed by the
# flat list of parameters. You can partially specify names, i.e. provide
# a list here shorter than the number of inputs to the model, and we will
# only set that subset of names, starting from the beginning.
input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
output_names = [ "output1" ]
torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
The resulting alexnet.onnx is a binary protobuf file which contains both
the network structure and parameters of the model you exported
(in this case, AlexNet).  The keyword argument verbose=True causes the
exporter to print out a human-readable representation of the network:
# These are the inputs and parameters to the network, which have taken on
# the names we specified earlier.
graph(%actual_input_1 : Float(10, 3, 224, 224)
      %learned_0 : Float(64, 3, 11, 11)
      %learned_1 : Float(64)
      %learned_2 : Float(192, 64, 5, 5)
      %learned_3 : Float(192)
      # ---- omitted for brevity ----
      %learned_14 : Float(1000, 4096)
      %learned_15 : Float(1000)) {
  # Every statement consists of some output tensors (and their types),
  # the operator to be run (with its attributes, e.g., kernels, strides,
  # etc.), its input tensors (%actual_input_1, %learned_0, %learned_1)
  %17 : Float(10, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides=[4, 4]](%actual_input_1, %learned_0, %learned_1), scope: AlexNet/Sequential[features]/Conv2d[0]
  %18 : Float(10, 64, 55, 55) = onnx::Relu(%17), scope: AlexNet/Sequential[features]/ReLU[1]
  %19 : Float(10, 64, 27, 27) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: AlexNet/Sequential[features]/MaxPool2d[2]
  # ---- omitted for brevity ----
  %29 : Float(10, 256, 6, 6) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%28), scope: AlexNet/Sequential[features]/MaxPool2d[12]
  # Dynamic means that the shape is not known. This may be because of a
  # limitation of our implementation (which we would like to fix in a
  # future release) or shapes which are truly dynamic.
  %30 : Dynamic = onnx::Shape(%29), scope: AlexNet
  %31 : Dynamic = onnx::Slice[axes=[0], ends=[1], starts=[0]](%30), scope: AlexNet
  %32 : Long() = onnx::Squeeze[axes=[0]](%31), scope: AlexNet
  %33 : Long() = onnx::Constant[value={9216}](), scope: AlexNet
  # ---- omitted for brevity ----
  %output1 : Float(10, 1000) = onnx::Gemm[alpha=1, beta=1, broadcast=1, transB=1](%45, %learned_14, %learned_15), scope: AlexNet/Sequential[classifier]/Linear[6]
  return (%output1);
}
You can also verify the protobuf using the onnx library.
You can install onnx with conda:
conda install -c conda-forge onnx
Then, you can run:
import onnx
# Load the ONNX model
model = onnx.load("alexnet.onnx")
# Check that the IR is well formed
onnx.checker.check_model(model)
# Print a human readable representation of the graph
onnx.helper.printable_graph(model.graph)
To run the exported script with caffe2, you will need to install caffe2: If you don’t have one already, Please follow the install instructions.
Once these are installed, you can use the backend for Caffe2:
# ...continuing from above
import caffe2.python.onnx.backend as backend
import numpy as np
rep = backend.prepare(model, device="CUDA:0") # or "CPU"
# For the Caffe2 backend:
#     rep.predict_net is the Caffe2 protobuf for the network
#     rep.workspace is the Caffe2 workspace for the network
#       (see the class caffe2.python.onnx.backend.Workspace)
outputs = rep.run(np.random.randn(10, 3, 224, 224).astype(np.float32))
# To run networks with more than one input, pass a tuple
# rather than a single numpy ndarray.
print(outputs[0])
In the future, there will be backends for other frameworks as well.
Limitations¶
- The ONNX exporter is a trace-based exporter, which means that it operates by executing your model once, and exporting the operators which were actually run during this run. This means that if your model is dynamic, e.g., changes behavior depending on input data, the export won’t be accurate. Similarly, a trace is likely to be valid only for a specific input size (which is one reason why we require explicit inputs on tracing.) We recommend examining the model trace and making sure the traced operators look reasonable. 
- PyTorch and Caffe2 often have implementations of operators with some numeric differences. Depending on model structure, these differences may be negligible, but they can also cause major divergences in behavior (especially on untrained models.) In a future release, we plan to allow Caffe2 to call directly to Torch implementations of operators, to help you smooth over these differences when precision is important, and to also document these differences. 
Supported operators¶
The following operators are supported:
- add (nonzero alpha not supported) 
- sub (nonzero alpha not supported) 
- mul 
- div 
- cat 
- mm 
- addmm 
- neg 
- sqrt 
- tanh 
- sigmoid 
- mean 
- sum 
- prod 
- t 
- expand (only when used before a broadcasting ONNX operator; e.g., add) 
- transpose 
- view 
- split 
- squeeze 
- prelu (single weight shared among input channels not supported) 
- threshold (non-zero threshold/non-zero value not supported) 
- leaky_relu 
- glu 
- softmax (only dim=-1 supported) 
- avg_pool2d (ceil_mode not supported) 
- log_softmax 
- unfold (experimental support with ATen-Caffe2 integration) 
- elu 
- concat 
- abs 
- index_select 
- pow 
- clamp 
- max 
- min 
- eq 
- gt 
- lt 
- ge 
- le 
- exp 
- sin 
- cos 
- tan 
- asin 
- acos 
- atan 
- permute 
- Conv 
- BatchNorm 
- MaxPool1d (ceil_mode not supported) 
- MaxPool2d (ceil_mode not supported) 
- MaxPool3d (ceil_mode not supported) 
- Embedding (no optional arguments supported) 
- RNN 
- ConstantPadNd 
- Dropout 
- FeatureDropout (training mode not supported) 
- Index (constant integer and tuple indices supported) 
The operator set above is sufficient to export the following models:
- AlexNet 
- DCGAN 
- DenseNet 
- Inception (warning: this model is highly sensitive to changes in operator implementation) 
- ResNet 
- SuperResolution 
- VGG 
Adding export support for operators is an advance usage. To achieve this, developers need to touch the source code of PyTorch. Please follow the instructions for installing PyTorch from source. If the wanted operator is standardized in ONNX, it should be easy to add support for exporting such operator (adding a symbolic function for the operator). To confirm whether the operator is standardized or not, please check the ONNX operator list.
If the operator is an ATen operator, which means you can find the declaration
of the function in torch/csrc/autograd/generated/VariableType.h
(available in generated code in PyTorch install dir), you should add the symbolic
function in torch/onnx/symbolic.py and follow the instructions listed as below:
- Define the symbolic function in torch/onnx/symbolic.py. Make sure the function has the same name as the ATen operator/function defined in - VariableType.h.
- The first parameter is always the exported ONNX graph. Parameter names must EXACTLY match the names in - VariableType.h, because dispatch is done with keyword arguments.
- Parameter ordering does NOT necessarily match what is in - VariableType.h, tensors (inputs) are always first, then non-tensor arguments.
- In the symbolic function, if the operator is already standardized in ONNX, we only need to create a node to represent the ONNX operator in the graph. 
- If the input argument is a tensor, but ONNX asks for a scalar, we have to explicitly do the conversion. The helper function - _scalarcan convert a scalar tensor into a python scalar, and- _if_scalar_type_ascan turn a Python scalar into a PyTorch tensor.
If the operator is a non-ATen operator, the symbolic function has to be added in the corresponding PyTorch Function class. Please read the following instructions:
- Create a symbolic function named - symbolicin the corresponding Function class.
- The first parameter is always the exported ONNX graph. 
- Parameter names except the first must EXACTLY match the names in - forward.
- The output tuple size must match the outputs of - forward.
- In the symbolic function, if the operator is already standardized in ONNX, we just need to create a node to represent the ONNX operator in the graph. 
Symbolic functions should be implemented in Python. All of these functions interact with Python methods which are implemented via C++-Python bindings, but intuitively the interface they provide looks like this:
def operator/symbolic(g, *inputs):
  """
  Modifies Graph (e.g., using "op"), adding the ONNX operations representing
  this PyTorch function, and returning a Value or tuple of Values specifying the
  ONNX outputs whose values correspond to the original PyTorch return values
  of the autograd Function (or None if an output is not supported by ONNX).
  Arguments:
    g (Graph): graph to write the ONNX representation into
    inputs (Value...): list of values representing the variables which contain
        the inputs for this function
  """
class Value(object):
  """Represents an intermediate tensor value computed in ONNX."""
  def type(self):
    """Returns the Type of the value."""
class Type(object):
  def sizes(self):
    """Returns a tuple of ints representing the shape of a tensor this describes."""
class Graph(object):
  def op(self, opname, *inputs, **attrs):
    """
    Create an ONNX operator 'opname', taking 'args' as inputs
    and attributes 'kwargs' and add it as a node to the current graph,
    returning the value representing the single output of this
    operator (see the `outputs` keyword argument for multi-return
    nodes).
    The set of operators and the inputs/attributes they take
    is documented at https://github.com/onnx/onnx/blob/master/docs/Operators.md
    Arguments:
        opname (string): The ONNX operator name, e.g., `Abs` or `Add`.
        args (Value...): The inputs to the operator; usually provided
            as arguments to the `symbolic` definition.
        kwargs: The attributes of the ONNX operator, with keys named
            according to the following convention: `alpha_f` indicates
            the `alpha` attribute with type `f`.  The valid type specifiers are
            `f` (float), `i` (int), `s` (string) or `t` (Tensor).  An attribute
            specified with type float accepts either a single float, or a
            list of floats (e.g., you would say `dims_i` for a `dims` attribute
            that takes a list of integers).
        outputs (int, optional):  The number of outputs this operator returns;
            by default an operator is assumed to return a single output.
            If `outputs` is greater than one, this functions returns a tuple
            of output `Value`, representing each output of the ONNX operator
            in positional.
    """
The ONNX graph C++ definition is in torch/csrc/jit/ir.h.
Here is an example of handling missing symbolic function for elu operator.
We try to export the model and see the error message as below:
UserWarning: ONNX export failed on elu because torch.onnx.symbolic.elu does not exist
RuntimeError: ONNX export failed: Couldn't export operator elu
The export fails because PyTorch does not support exporting elu operator.
We find virtual Tensor elu(const Tensor & input, Scalar alpha, bool inplace) const override;
in VariableType.h. This means elu is an ATen operator.
We check the ONNX operator list,
and confirm that Elu is standardized in ONNX.
We add the following lines to symbolic.py:
def elu(g, input, alpha, inplace=False):
    return g.op("Elu", input, alpha_f=_scalar(alpha))
Now PyTorch is able to export elu operator.
There are more examples in symbolic.py, tensor.py, padding.py.
The interface for specifying operator definitions is experimental; adventurous users should note that the APIs will probably change in a future interface.
Distributed communication package (deprecated) - torch.distributed.deprecated¶
Warning
torch.distributed.deprecated is the older version of torch.distributed and currently deprecated. It will be removed soon. Please use and refer the doc for torch.distributed, which is the latest distributed communication package for PyTorch
torch.distributed.deprecated provides an MPI-like interface for exchanging tensor data across multi-machine networks. It supports a few different backends and initialization methods.
Currently torch.distributed.deprecated supports four backends, each with different capabilities. The table below shows which functions are available for use with CPU / CUDA tensors. MPI supports cuda only if the implementation used to build PyTorch supports it.
| Backend | 
 | 
 | 
 | 
 | ||||
|---|---|---|---|---|---|---|---|---|
| Device | CPU | GPU | CPU | GPU | CPU | GPU | CPU | GPU | 
| send | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✘ | 
| recv | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✘ | 
| broadcast | ✓ | ✘ | ✓ | ✓ | ✓ | ? | ✘ | ✓ | 
| all_reduce | ✓ | ✘ | ✓ | ✓ | ✓ | ? | ✘ | ✓ | 
| reduce | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✓ | 
| all_gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✓ | 
| gather | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✘ | 
| scatter | ✓ | ✘ | ✘ | ✘ | ✓ | ? | ✘ | ✘ | 
| barrier | ✓ | ✘ | ✓ | ✓ | ✓ | ? | ✘ | ✘ | 
Basics¶
The torch.distributed.deprecated package provides PyTorch support and communication primitives
for multiprocess parallelism across several computation nodes running on one or more
machines. The class torch.nn.parallel.deprecated.DistributedDataParallel() builds on this
functionality to provide synchronous distributed training as a wrapper around any
PyTorch model. This differs from the kinds of parallelism provided by
Multiprocessing package - torch.multiprocessing and torch.nn.DataParallel() in that it supports
multiple network-connected machines and in that the user must explicitly launch a separate
copy of the main training script for each process.
In the single-machine synchronous case, torch.distributed.deprecated or the
torch.nn.parallel.deprecated.DistributedDataParallel() wrapper may still have advantages over other
approaches to data-parallelism, including torch.nn.DataParallel():
- Each process maintains its own optimizer and performs a complete optimization step with each iteration. While this may appear redundant, since the gradients have already been gathered together and averaged across processes and are thus the same for every process, this means that no parameter broadcast step is needed, reducing time spent transferring tensors between nodes. 
- Each process contains an independent Python interpreter, eliminating the extra interpreter overhead and “GIL-thrashing” that comes from driving several execution threads, model replicas, or GPUs from a single Python process. This is especially important for models that make heavy use of the Python runtime, including models with recurrent layers or many small components. 
Initialization¶
The package needs to be initialized using the torch.distributed.deprecated.init_process_group()
function before calling any other methods. This blocks until all processes have
joined.
- 
torch.distributed.deprecated.init_process_group(backend, init_method='env://', **kwargs)¶
- Initializes the distributed package. - Parameters
- backend (str) – Name of the backend to use. Depending on build-time configuration valid values include: - tcp,- mpi,- glooand- nccl.
- init_method (str, optional) – URL specifying how to initialize the package. 
- world_size (int, optional) – Number of processes participating in the job. 
- rank (int, optional) – Rank of the current process. 
- group_name (str, optional) – Group name. See description of init methods. 
 
 - To enable - backend == mpi, PyTorch needs to built from source on a system that supports MPI. If you want to use Open MPI with CUDA-aware support, please use Open MPI major version 2 and above.- Note - This method initializes CUDA context. Therefore, if multiple processes run on a single machine but use different GPUs, make sure to use - torch.cuda.set_device()before this method to avoid unnecessarily creating context on the first visible device.
- 
torch.distributed.deprecated.get_rank()¶
- Returns the rank of current process. - Rank is a unique identifier assigned to each process within a distributed group. They are always consecutive integers ranging from - 0to- world_size - 1(inclusive).
- 
torch.distributed.deprecated.get_world_size()¶
- Returns the number of processes in the distributed group. 
Currently three initialization methods are supported:
TCP initialization¶
There are two ways to initialize using TCP, both requiring a network address
reachable from all processes and a desired world_size. The first way
requires specifying an address that belongs to the rank 0 process. This
initialization method requires that all processes have manually specified ranks.
Alternatively, the address has to be a valid IP multicast address, in which case
ranks can be assigned automatically. Multicast initialization also supports
a group_name argument, which allows you to use the same address for multiple
jobs, as long as they use different group names.
import torch.distributed.deprecated as dist
# Use address of one of the machines
dist.init_process_group(backend, init_method='tcp://10.1.1.20:23456', rank=args.rank, world_size=4)
# or a multicast address - rank will be assigned automatically if unspecified
dist.init_process_group(backend, init_method='tcp://[ff15:1e18:5d4c:4cf0:d02d:b659:53ba:b0a7]:23456',
                        world_size=4)
Environment variable initialization¶
This method will read the configuration from environment variables, allowing one to fully customize how the information is obtained. The variables to be set are:
- MASTER_PORT- required; has to be a free port on machine with rank 0
- MASTER_ADDR- required (except for rank 0); address of rank 0 node
- WORLD_SIZE- required; can be set either here, or in a call to init function
- RANK- required; can be set either here, or in a call to init function
The machine with rank 0 will be used to set up all connections.
This is the default method, meaning that init_method does not have to be specified (or
can be env://).
Groups¶
By default collectives operate on the default group (also called the world) and
require all processes to enter the distributed function call. However, some workloads can benefit
from more fine-grained communication. This is where distributed groups come
into play. new_group() function can be
used to create new groups, with arbitrary subsets of all processes. It returns
an opaque group handle that can be given as a group argument to all collectives
(collectives are distributed functions to exchange information in certain well-known programming patterns).
- 
torch.distributed.deprecated.new_group(ranks=None)¶
- Creates a new distributed group. - This function requires that all processes in the main group (i.e., all processes that are part of the distributed job) enter this function, even if they are not going to be members of the group. Additionally, groups should be created in the same order in all processes. 
Point-to-point communication¶
- 
torch.distributed.deprecated.send(tensor, dst)¶
- Sends a tensor synchronously. 
- 
torch.distributed.deprecated.recv(tensor, src=None)¶
- Receives a tensor synchronously. 
isend() and irecv()
return distributed request objects when used. In general, the type of this object is unspecified
as they should never be created manually, but they are guaranteed to support two methods:
- is_completed()- returns True if the operation has finished
- wait()- will block the process until the operation is finished.- is_completed()is guaranteed to return True once it returns.
When using the MPI backend, isend() and irecv()
support non-overtaking, which has some guarantees on supporting message order. For more detail, see
http://mpi-forum.org/docs/mpi-2.2/mpi22-report/node54.htm#Node54
- 
torch.distributed.deprecated.isend(tensor, dst)¶
- Sends a tensor asynchronously. 
Collective functions¶
- 
torch.distributed.deprecated.broadcast(tensor, src, group=<object object>)¶
- Broadcasts the tensor to the whole group. - tensormust have the same number of elements in all processes participating in the collective.
- 
torch.distributed.deprecated.all_reduce(tensor, op=<object object>, group=<object object>)¶
- Reduces the tensor data across all machines in such a way that all get the final result. - After the call - tensorwill be bitwise identical in all processes.- Parameters
- tensor (Tensor) – Input and output of the collective. The function operates in-place. 
- op (optional) – One of the values from - torch.distributed.deprecated.reduce_openum. Specifies an operation used for element-wise reductions.
- group (optional) – Group of the collective. 
 
 
- 
torch.distributed.deprecated.reduce(tensor, dst, op=<object object>, group=<object object>)¶
- Reduces the tensor data across all machines. - Only the process with rank - dstis going to receive the final result.- Parameters
 
- 
torch.distributed.deprecated.all_gather(tensor_list, tensor, group=<object object>)¶
- Gathers tensors from the whole group in a list. 
- 
torch.distributed.deprecated.gather(tensor, **kwargs)¶
- Gathers a list of tensors in a single process. - Parameters
- tensor (Tensor) – Input tensor. 
- dst (int) – Destination rank. Required in all processes except the one that is receiveing the data. 
- gather_list (list[Tensor]) – List of appropriately-sized tensors to use for received data. Required only in the receiving process. 
- group (optional) – Group of the collective. 
 
 
- 
torch.distributed.deprecated.scatter(tensor, **kwargs)¶
- Scatters a list of tensors to all processes in a group. - Each process will receive exactly one tensor and store its data in the - tensorargument.
- 
torch.distributed.deprecated.barrier(group=<object object>)¶
- Synchronizes all processes. - This collective blocks processes until the whole group enters this function. - Parameters
- group (optional) – Group of the collective. 
 
Multi-GPU collective functions¶
If you have more than one GPU on each node, when using the NCCL backend,
broadcast_multigpu()
all_reduce_multigpu()
reduce_multigpu() and
all_gather_multigpu() support distributed collective
operations among multiple GPUs within each node. These functions can potentially
improve the overall distributed training performance and be easily used by
passing a list of tensors. Each Tensor in the passed tensor list needs
to be on a separate GPU device of the host where the function is called. Note
that the length of the tensor list needs to be identical among all the
distributed processes. Also note that currently the multi-GPU collective
functions are only supported by the NCCL backend.
For example, if the system we use for distributed training has 2 nodes, each of which has 8 GPUs. On each of the 16 GPUs, there is a tensor that we would like to all-reduce. The following code can serve as a reference:
Code running on Node 0
import torch
import torch.distributed.deprecated as dist
dist.init_process_group(backend="nccl",
                        init_method="file:///distributed_test",
                        world_size=2,
                        rank=0)
tensor_list = []
for dev_idx in range(torch.cuda.device_count()):
    tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
dist.all_reduce_multigpu(tensor_list)
Code running on Node 1
import torch
import torch.distributed.deprecated as dist
dist.init_process_group(backend="nccl",
                        init_method="file:///distributed_test",
                        world_size=2,
                        rank=1)
tensor_list = []
for dev_idx in range(torch.cuda.device_count()):
    tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
dist.all_reduce_multigpu(tensor_list)
After the call, all 16 tensors on the two nodes will have the all-reduced value of 16
- 
torch.distributed.deprecated.broadcast_multigpu(tensor_list, src, group=<object object>)¶
- Broadcasts the tensor to the whole group with multiple GPU tensors per node. - tensormust have the same number of elements in all the GPUs from all processes participating in the collective. each tensor in the list must be on a different GPU.- Note - Only NCCL backend is currently supported. - tensor_listshould only contain GPU tensors.- Parameters
- tensor_list (List[Tensor]) – Tensors that participate in the collective operation. if - srcis the rank, then the first element of- tensor_list(- tensor_list[0]) will be broadcasted to all other tensors (on different GPUs) in the src process and all tensors in- tensor_listof other non-src processes. You also need to make sure that- len(tensor_list)is the same for all the distributed processes calling this function.
- src (int) – Source rank. 
- group (optional) – Group of the collective. 
 
 
- 
torch.distributed.deprecated.all_reduce_multigpu(tensor_list, op=<object object>, group=<object object>)¶
- Reduces the tensor data across all machines in such a way that all get the final result. This function reduces a number of tensors on every node, while each tensor resides on a different GPU. Therefore, the input tensor in the tensor list needs to be GPU tensors. Also, each tensor in the tensor list needs to reside on a different GPU. - After the call, all tensors in - tensor_listwill be bitwise identical in all processes.- Note - Only NCCL backend is currently supported. - tensor_listshould only contain GPU tensors.- Parameters
- tensor_list (List[Tensor]) – List of input and output tensors of the collective. The function operates in-place and requires that each tensor to be a GPU tensor on different GPUs. You also need to make sure that - len(tensor_list)is the same for all the distributed processes calling this function.
- op (optional) – One of the values from - torch.distributed.deprecated.reduce_openum. Specifies an operation used for element-wise reductions.
- group (optional) – Group of the collective. 
 
 
- 
torch.distributed.deprecated.reduce_multigpu(tensor_list, dst, op=<object object>, group=<object object>)¶
- Reduces the tensor data on multiple GPUs across all machines. Each tensor in - tensor_listshould reside on a separate GPU.- Only the GPU of - tensor_list[0]on the process with rank- dstis going to receive the final result.- Note - Only NCCL backend is currently supported. - tensor_listshould only contain GPU tensors.- Parameters
- tensor_list (List[Tensor]) – Input and output GPU tensors of the collective. The function operates in-place. You also need to make sure that - len(tensor_list)is the same for all the distributed processes calling this function.
- dst (int) – Destination rank 
- op (optional) – One of the values from - torch.distributed.deprecated.reduce_openum. Specifies an operation used for element-wise reductions.
- group (optional) – Group of the collective. 
 
 
- 
torch.distributed.deprecated.all_gather_multigpu(output_tensor_lists, input_tensor_list, group=<object object>)¶
- Gathers tensors from the whole group in a list. Each tensor in - input_tensor_listshould reside on a separate GPU.- Note - Only NCCL backend is currently supported. - output_tensor_listsand- input_tensor_listshould only contain GPU tensors.- Parameters
- output_tensor_lists (List[List[Tensor]]) – Output lists. It should contain correctly-sized tensors on each GPU to be used for output of the collective. e.g. - output_tensor_lists[i]contains the all_gather result that resides on the GPU of- input_tensor_list[i]. Note that each element of- output_tensor_lists[i]has the size of- world_size * len(input_tensor_list), since the function all gathers the result from every single GPU in the group. To interpret each element of- output_tensor_list[i], note that- input_tensor_list[j]of rank k will be appear in- output_tensor_list[i][rank * world_size + j]Also note that- len(output_tensor_lists), and the size of each element in- output_tensor_lists(each element is a list, therefore- len(output_tensor_lists[i])) need to be the same for all the distributed processes calling this function.
- input_tensor_list (List[Tensor]) – List of tensors (on different GPUs) to be broadcast from current process. Note that - len(input_tensor_list)needs to be the same for all the distributed processes calling this function.
- group (optional) – Group of the collective. 
 
 
Launch utility¶
The torch.distributed.deprecated package also provides a launch utility in torch.distributed.deprecated.launch.
torch.distributed.launch is a module that spawns up multiple distributed training processes on each of the training nodes.
The utility can be used for single-node distributed training, in which one or more processes per node will be spawned. The utility can be used for either CPU training or GPU training. If the utility is used for GPU training, each distributed process will be operating on a single GPU. This can achieve well-improved single-node training performance. It can also be used in multi-node distributed training, by spawning up multiple processes on each node for well-improved multi-node distributed training performance as well. This will especially be benefitial for systems with multiple Infiniband interfaces that have direct-GPU support, since all of them can be utilized for aggregated communication bandwidth.
In both cases of single-node distributed training or multi-node distributed
training, this utility will launch the given number of processes per node
(--nproc_per_node). If used for GPU training, this number needs to be less
or euqal to the number of GPUs on the current system (nproc_per_node),
and each process will be operating on a single GPU from GPU 0 to
GPU (nproc_per_node - 1).
How to use this module:
- Single-Node multi-process distributed training 
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
           arguments of your training script)
- Multi-Node multi-process distributed training: (e.g. two nodes) 
Node 1: (IP: 192.168.1.1, and has a free port: 1234)
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)
Node 2:
>>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
           --nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
           --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
           and all other arguments of your training script)
- To look up what optional arguments this module offers: 
>>> python -m torch.distributed.launch --help
Important Notices:
1. This utilty and multi-process distributed (single-node or multi-node) GPU training currently only achieves the best performance using the NCCL distributed backend. Thus NCCL backend is the recommended backend to use for GPU training.
2. In your training program, you must parse the command-line argument:
--local_rank=LOCAL_PROCESS_RANK, which will be provided by this module.
If your training program uses GPUs, you should ensure that your code only
runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by:
Parsing the local_rank argument
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument("--local_rank", type=int)
>>> args = parser.parse_args()
Set your device to local rank using either
>>> torch.cuda.set_device(arg.local_rank)  # before your code runs
or
>>> with torch.cuda.device(arg.local_rank):
>>>    # your code to run
3. In your training program, you are supposed to call the following function
at the beginning to start the distributed backend. You need to make sure that
the init_method uses env://, which is the only supported init_method
by this module.
torch.distributed.init_process_group(backend='YOUR BACKEND',
                                     init_method='env://')
4. In your training program, you can either use regular distributed functions
or use torch.nn.parallel.DistributedDataParallel() module. If your
training program uses GPUs for training and you would like to use
torch.nn.parallel.DistributedDataParallel() module,
here is how to configure it.
model = torch.nn.parallel.DistributedDataParallel(model,
                                                  device_ids=[arg.local_rank],
                                                  output_device=arg.local_rank)
Please ensure that device_ids argument is set to be the only GPU device id
that your code will be operating on. This is generally the local rank of the
process. In other words, the device_ids needs to be [args.local_rank],
and output_device needs to be args.local_rank in order to use this
utility
5. Another way to pass local_rank to the subprocesses via environment variable
LOCAL_RANK. This behavior is enabled when you launch the script with
--use_env=True. You must adjust the subprocess example above to replace
args.local_rank with os.environ['LOCAL_RANK']; the launcher
will not pass --local_rank when you specify this flag.
Warning
local_rank is NOT globally unique: it is only unique per process
on a machine.  Thus, don’t use it to decide if you should, e.g.,
write to a networked filesystem.  See
https://github.com/pytorch/pytorch/issues/12042 for an example of
how things can go wrong if you don’t do this correctly.