20210516 nestedtensor import

cpuhrsch · facebook-github-bot · commit cf952fe571ad · 2021-05-17T07:58:49.000-07:00
Summary: Import from GH

Reviewed By: datumbox

Differential Revision: D28468749

fbshipit-source-id: 63f20f81a2585910d6a45312a050768b4d373632
diff --git a/README.md b/README.md
@@ -6,19 +6,42 @@ If you are here because you ran into a runtime error due to a missing feature or
 
 If you are new to this project, we recommend you take a look at our [whirlwind introduction](https://colab.research.google.com/github/pytorch/nestedtensor/blob/master/tutorials/notebooks/basic.ipynb) to get started.
 
-## Operator support
+## Autograd support
 
-Please see [the list of currently supported operators](https://github.com/pytorch/nestedtensor/blob/master/nestedtensor/csrc/README.md) and [open an issue](https://github.com/pytorch/nestedtensor/issues/new/choose) if you find you need one for your project that's not listed.
+Due to missing extensibility features of PyTorch nestedtensor currently lacks autograd support. We're actively working on this and recognize that it severely limits the applicability of the project. Please run nestedtensor operations within the [inference mode](https://github.com/ailzhang/rfcs/blob/rfc0011/RFC-0011-InferenceMode.md) context to prevent any adverse interactions with the autograd system.
+
+For example
+```
+sentences = [torch.randn(10, 5), torch.randn(5, 5), torch.randn(9, 5)]
+with torch.inference_mode():    
+    nt = nestedtensor.nested_tensor(sentences)
+    nt.sum(1)
+```
 
 ## Binaries
 
-The nestedtensor project is built on top of a torch fork for improved interoperability and also ships with torchvision binaries that were built against this fork. To use NestedTensors you need to install this version of torch, which is frequently rebased upon PyTorch's [viable/strict](https://github.com/pytorch/pytorch/tree/viable/strict) branch (most recent master where all tests pass).
+Due to the development velocity of PyTorch the nestedtensor project is built on top of and dependent on a fixed, recent PyTorch nightly.
 
 | Version | Python | CUDA | Wheels |
 | --- | ---- | ------ | ---- |
-| 0.1.1 | 3.6 | CPU-only | [torch](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.6/torch-1.8.0_nestedtensor_0.1.1_cpu-cp36-cp36m-linux_x86_64.whl), [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.6/nestedtensor-0.1.1_cpu-cp36-cp36m-linux_x86_64.whl), [torchvision](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.6/torchvision-0.1.1_cpu-cp36-cp36m-linux_x86_64.whl) |
-| 0.1.1 | 3.7 | CPU-only | [torch](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/torch-1.8.0_nestedtensor_0.1.1_cpu-cp37-cp37m-linux_x86_64.whl), [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/nestedtensor-0.1.1_cpu-cp37-cp37m-linux_x86_64.whl), [torchvision](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/torchvision-0.1.1_cpu-cp37-cp37m-linux_x86_64.whl) |
-| 0.1.1 | 3.8 | CPU-only | [torch](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.8/torch-1.8.0_nestedtensor_0.1.1_cpu-cp38-cp38m-linux_x86_64.whl), [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.8/nestedtensor-0.1.1_cpu-cp38-cp38m-linux_x86_64.whl), [torchvision](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.8/torchvision-0.1.1_cpu-cp38-cp38m-linux_x86_64.whl) |
+| 0.1.1 | 3.6 | CPU-only | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.6/nestedtensor-0.1.1_cpu-cp36-cp36m-linux_x86_64.whl) |
+| 0.1.1 | 3.7 | CPU-only | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/nestedtensor-0.1.1_cpu-cp37-cp37m-linux_x86_64.whl) |
+| 0.1.1 | 3.8 | CPU-only | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.8/nestedtensor-0.1.1_cpu-cp38-cp38m-linux_x86_64.whl) |
+| 0.1.1 | 3.6 | CUDA 10.2 | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.6/nestedtensor-0.1.1_cu102-cp36-cp36m-linux_x86_64.whl) |
+| 0.1.1 | 3.7 | CUDA 10.2 | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/nestedtensor-0.1.1_cu102-cp37-cp37m-linux_x86_64.whl) |
+| 0.1.1 | 3.8 | CUDA 10.2 | [nestedtensor](https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.8/nestedtensor-0.1.1_cu102-cp38-cp38m-linux_x86_64.whl) |
+
+When installing a binary please specify the corresponding torch nightly link archive to automatically pull in the correct PyTorch nightly.
+
+CPU
+```
+pip install https://download.pytorch.org/nestedtensor/whl/nightly/cpu/py3.7/nestedtensor-0.1.1_cpu-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
+```
+
+CUDA 10.2
+```
+pip install https://download.pytorch.org/nestedtensor/whl/nightly/cu102/py3.7/nestedtensor-0.1.1_cu102-cp37-cp37m-linux_x86_64.whl -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
+```
 
 ## Why consider using this? / Dealing with dynamic shapes
 
@@ -63,52 +86,12 @@ a NestedTensor is still a Tensor. That means it needs to have a single dimension
 
 The nestedtensor package is a prototype intended for early stage feedback and testing. It is on the road to a beta classification, but there is no definitive timeline yet. See [PyTorch feature classification](https://pytorch.org/docs/stable/index.html) for what prototype, beta and stale means.
 
-## Supported platforms
-
-It is developed [against a fork](https://github.com/cpuhrsch/pytorchnestedtensor) of PyTorch to enable cutting-edge features such as improved performance or better `torch.vmap` integration.
-
-Developers will thus need to build from source, but users can use the binary we will start shipping soon ([see the related issue](https://github.com/pytorch/nestedtensor/issues/262)).
-
-If you want to use the binaries you need to run on Linux, use Python 3.8+ and have a CUDA-11 toolkit installed.
-
-If you want to build from source you can probably get it to work on many platforms, but supporting other platforms won't take priority over Linux. We're happy to review community contributions that achieve this however.
-
 ## Dependencies
 
 - pytorch (installed from nestedtensor/third_party/pytorch submodule)
 - torchvision (needed for examples and tests)
 - ipython (needed for examples)
 - notebook (needed for examples)
 
-## Build for development
-
-Get the source
-
-```
-git clone --recursive https://github.com/pytorch/nestedtensor
-cd nestedtensor
-# if you are updating an existing checkout
-git submodule sync
-git submodule update --init --recursive
-```
-
-Install the build tools
-
-```
-conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests
-conda install -c pytorch magma-cuda110
-```
-
-Build from scratch
-```
-./clean_build_with_submodule.sh
-```
-
-Incremental builds
-```
-./build_with_submodule.sh
-```
-
-
 ## Contribution
 The project is under active development. If you have a suggestions or found a bug, please file an issue!
diff --git a/benchmarks/matmul.py b/benchmarks/matmul.py
@@ -5,47 +5,35 @@
 import random
 random.seed(1010)
 
+BDIM=10
+
 # Performance tanks hard for lots of small Tensors as expected
 RAND_INTS = [random.randint(10, 30) for _ in range(2000)]
-RAND_INTS = [random.randint(1000, 3000) for _ in range(20)]
 
-TENSORS0 = [torch.rand(9, 245, 2560, requires_grad=True).cuda() for i in RAND_INTS]
-TENSORS1 = [torch.rand(9, 2560, 245, requires_grad=True).cuda() for i in RAND_INTS]
+OUTDIM=256
+
+TENSORS0 = [torch.rand(i, OUTDIM).cuda() for i in RAND_INTS]
 
 def gen_t_matmul():
-    tensor0 = torch.stack(TENSORS0)
-    tensor1 = torch.stack(TENSORS1)
+    nt0 = nestedtensor.nested_tensor(TENSORS0, device=torch.device('cuda'), dtype=torch.float)
+    data, _ = nt0.to_tensor_mask()
+    t1 = torch.randn(OUTDIM, 512).cuda()
 
     def t():
-        tensor0.requires_grad_()
-        tensor1.requires_grad_()
-        torch.matmul(tensor0, tensor1).sum().backward()
-        tensor0.detach_()
-        tensor1.detach_()
+        torch.matmul(data, t1)
     return t
 
 
-def gen_t_loop_matmul():
-    tensors = [torch.rand(i, 2560).cuda() for i in RAND_INTS]
-
-    def t_loop():
-        for (t0, t1) in zip(TENSORS0, TENSORS1):
-            torch.matmul(t0, t1).sum().backward()
-            t0.grad = None
-            t1.grad = None
-    return t_loop
-
-
+@torch.inference_mode()
 def gen_nt_matmul():
-    nt0 = nestedtensor.nested_tensor(TENSORS0, device=torch.device('cuda'), dtype=torch.float, requires_grad=True)
-    nt1 = nestedtensor.nested_tensor(TENSORS1, device=torch.device('cuda'), dtype=torch.float, requires_grad=True)
+    nt0 = nestedtensor.nested_tensor(TENSORS0, device=torch.device('cuda'), dtype=torch.float)
+    t1 = torch.randn(OUTDIM, 512).cuda()
 
     def nt():
-        torch.matmul(nt0, nt1).sum().backward()
+        torch.matmul(nt0, t1)
     return nt
 
 
 if __name__ == "__main__":
-    # print(utils.benchmark_fn(gen_t_matmul()))
-    # print(utils.benchmark_fn(gen_t_loop_matmul()))
+    print(utils.benchmark_fn(gen_t_matmul()))
     print(utils.benchmark_fn(gen_nt_matmul()))
diff --git a/nestedtensor/csrc/matmul.cpp b/nestedtensor/csrc/matmul.cpp
@@ -9,6 +9,40 @@ namespace F = torch::nn::functional;
 namespace at {
 
 Tensor NestedTensor_matmul(const Tensor& self, const Tensor& other) {
+  if (is_nested_tensor_impl(self) && !is_nested_tensor_impl(other)) {
+    if (get_is_contiguous(self) && get_is_contiguous(other)) {
+      if (get_dim(self) == 3 && get_dim(other) == 2) {
+        auto self_opt_sizes = get_opt_sizes(self);
+        if (self_opt_sizes[2]) {
+          if (*self_opt_sizes[2] == other.size(0)) {
+            Tensor self_buffer = get_buffer(self);
+            Tensor result_buffer =
+                at::matmul(self_buffer.reshape({-1, other.size(0)}), other);
+            result_buffer = result_buffer.reshape({-1});
+            int64_t other_size_1 = other.size(1);
+            EfficientSizeNode new_nested_size =
+                get_efficient_nested_size(self).clone();
+            EfficientSizeNode new_nested_stride =
+                get_efficient_nested_stride(self).clone();
+            apply_efficient_size(
+                [other_size_1](
+                    int64_t* size_ptr,
+                    int64_t size_size,
+                    int64_t* stride_ptr,
+                    int64_t stride_size) {
+                  size_ptr[1] = other_size_1;
+                  stride_ptr[1] = 1;
+                  stride_ptr[0] = other_size_1;
+                },
+                new_nested_size,
+                new_nested_stride);
+            return wrap_buffer(
+                std::move(result_buffer), new_nested_size, new_nested_stride);
+          }
+        }
+      }
+    }
+  }
   return map_nested_tensor(
       [](at::Tensor self, at::Tensor other) { return at::matmul(self, other); },
       self,
diff --git a/nestedtensor/csrc/storage/EfficientSizeNode.h b/nestedtensor/csrc/storage/EfficientSizeNode.h
@@ -92,11 +92,7 @@ struct EfficientSizeNode {
         _opt_sizes(impl::construct_efficient_size(
             impl::efficient_deserialize(_structure, _height),
             _sizes)) {
-          // for (size_t i = 0; i < _structure.size(); i++) {
-          //   std::cout << "_structure[" << i << "]: " << _structure[i] << std::endl;
-          // }
-          // std::cout << "---" << std::endl;
-        }
+  }
 
   explicit EfficientSizeNode(
       int64_t height,
@@ -138,6 +134,9 @@ struct EfficientSizeNode {
   const std::vector<int64_t>& structure() const {
     return _structure;
   }
+  EfficientSizeNode clone() const {
+    return EfficientSizeNode(_height, _structure, _sizes.clone(), _opt_sizes);
+  }
 
  private:
   int64_t _height;
@@ -159,5 +158,32 @@ static inline EfficientSizeNode map_efficient_size(
       size_node.height(), size_node.structure(), sizes, size_node.opt_sizes());
 }
 
+template <class F>
+static inline void apply_efficient_size(
+    F&& fn,
+    EfficientSizeNode& size_node0,
+    EfficientSizeNode& size_node1) {
+  at::Tensor sizes0 = size_node0.sizes();
+  at::Tensor sizes1 = size_node1.sizes();
+  int64_t* sizes0_ptr = sizes0.data_ptr<int64_t>();
+  int64_t* sizes1_ptr = sizes1.data_ptr<int64_t>();
+  const std::vector<int64_t>& structure0 = size_node0.structure();
+  const std::vector<int64_t>& structure1 = size_node1.structure();
+  TORCH_CHECK(
+      structure0.size() == structure1.size(),
+      "Tree structure doesn't match. Size.");
+  for (size_t i = 0; i < structure0.size(); i++) {
+    TORCH_CHECK(
+        structure0[i] == structure1[i],
+        "Tree structure doesn't match. Values.");
+  }
+  for (int64_t i = 0; i < sizes0.size(0); i++) {
+    fn(sizes0_ptr + i * sizes0.size(1),
+       sizes0.size(0),
+       sizes1_ptr + i * sizes1.size(1),
+       sizes1.size(0));
+  }
+}
+
 } // namespace nested_tensor
 } // namespace torch
diff --git a/nestedtensor/nested/nested.py b/nestedtensor/nested/nested.py
@@ -512,4 +512,5 @@ def to_padded_tensor(self, mask_dim=None, padding=-1):
         tensor, mask = masking.to_tensor_mask(self, mask_dim)
         while mask.dim() < tensor.dim():
             mask = mask.unsqueeze(-1)
+        mask = mask.to(torch.bool)
         return tensor.masked_fill(~mask, padding)
diff --git a/nestedtensor/version.py b/nestedtensor/version.py
@@ -1,5 +1,5 @@
-__version__ = '0.1.4+291a8a1'
-git_version = '291a8a10d7de34c02ce2616db4eb8cf95ec27df9'
+__version__ = '0.1.4+fbdd335'
+git_version = 'fbdd335e410c7b3cf7970fbd65db181e9302e07d'
 from nestedtensor import _C
 if hasattr(_C, 'CUDA_VERSION'):
     cuda = _C.CUDA_VERSION
diff --git a/setup.py b/setup.py
@@ -63,13 +63,12 @@ def write_version_file():
 
 pytorch_dep = "torch"
 
-requirements = [
-    pytorch_dep,
-]
-
 if os.getenv("PYTORCH_VERSION"):
     pytorch_dep += "==" + os.getenv("PYTORCH_VERSION")
 
+requirements = [
+    pytorch_dep,
+]
 
 def get_extensions():
 
diff --git a/test/test_nested_tensor_functional.py b/test/test_nested_tensor_functional.py
@@ -29,6 +29,14 @@ def test_addmm(self):
             [torch.rand(1, 4), torch.rand(1, 4), torch.rand(4, 4)]
         )
 
+    @torch.inference_mode()
+    def test_conv2d(self):
+        nt = ntnt_nograd(
+            [torch.rand(3, 35, 56), torch.rand(3, 43, 23), torch.rand(3, 24, 52)]
+        )
+        weight = torch.randn(5, 5).repeat(3, 3, 1, 1)
+        torch.conv2d(nt, weight)
+
     def test_contiguousity(self):
         initial_t = torch.rand(2, 5, 10, 15)
         self.assertEqual(True, initial_t.is_contiguous())
diff --git a/test/test_nested_tensor_masking.py b/test/test_nested_tensor_masking.py
@@ -181,7 +181,7 @@ def test_scalar_and_empty_nt_cuda(self):
 
         # TODO: Fix this case together with C++ rewrite.
         self.assertRaisesRegex(
-                RuntimeError, "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda", lambda: a.to_tensor_mask())
+            RuntimeError, "Expected all tensors to be on the same device, but found at least two devices, cpu and cuda", lambda: a.to_tensor_mask())
         # tensor, mask = a.to_tensor_mask()
         # TestCase.assertEqual(self, tensor, torch.tensor([[0], [11]], dtype=torch.long, device='cuda'))
         # TestCase.assertEqual(self, mask, torch.tensor([False,  True], device='cuda'))
@@ -1105,6 +1105,31 @@ def test_ntftm_mask_dim_cuda(self):
             TestCase.assertEqual(self, a, res_nt)
             TestCase.assertEqual(self, res_nt.nested_dim(), a.nested_dim())
 
+    def test_to_padded_tensor(self):
+        data1 = torch.tensor(
+            [[[0.8413, 0.7325, 0.0000, 0.0000],
+              [0.0000, 0.0000, 0.0000, 0.0000],
+              [0.0000, 0.0000, 0.0000, 0.0000]],
+
+             [[0.6334, 0.5473, 0.3273, 0.0564],
+              [0.3023, 0.6826, 0.3519, 0.1804],
+              [0.8431, 0.1645, 0.1821, 0.9185]]])
+        mask1 = torch.tensor(
+            [[[True,  True, False, False],
+              [False, False, False, False],
+              [False, False, False, False]],
+
+             [[True,  True,  True,  True],
+              [True,  True,  True,  True],
+              [True,  True,  True,  True]]])
+        nt2 = nt.nested_tensor_from_tensor_mask(data1, mask1)
+        data2, mask2 = nt2.to_tensor_mask()
+        self.assertEqual(data1, data2)
+        self.assertEqual(mask1, mask2)
+        data3 = nt2.to_padded_tensor(padding=-10)
+        data1 = data1 + ~mask1 * -10
+        self.assertEqual(data1, data3)
+
 
 if __name__ == "__main__":
     unittest.main()
diff --git a/tutorials/notebooks/basic.ipynb b/tutorials/notebooks/basic.ipynb