Skip to content

Commit d1d2548

Browse files
authored
beginner_source/dist_overview.rst ๋ฒˆ์—ญ (#1046)
* ๋ฒˆ์—ญ: beginner_source/dist_overview.rst ๋ฌธ์„œ ๋ฒˆ์—ญ ์ถ”๊ฐ€ * ๋ฒˆ์—ญ: beginner_source/dist_overview.rst ๋ฌธ์„œ ๋ฒˆ์—ญ ์ˆ˜์ • * ๋ฒˆ์—ญ: beginner_source/dist_overview.rst ๋ฌธ์„œ ๋ฒˆ์—ญ ์ˆ˜์ • * ๋ฒˆ์—ญ: beginner_source/dist_overview.rst ๋ฌธ์„œ ๋ฒˆ์—ญ ์ˆ˜์ •
1 parent 0215c63 commit d1d2548

File tree

1 file changed

+47
-54
lines changed

1 file changed

+47
-54
lines changed
Lines changed: 47 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,94 +1,87 @@
1-
PyTorch Distributed Overview
1+
PyTorch ๋ถ„์‚ฐ ๊ฐœ์š”
22
============================
3-
**Author**: `Will Constable <https://github.com/wconstab/>`_, `Wei Feng <https://github.com/weifengpy>`_
4-
3+
**์ €์ž**: `Will Constable <https://github.com/wconstab/>`_, `Wei Feng <https://github.com/weifengpy>`_
4+
**๋ฒˆ์—ญ**: `๊ฐ•์ง€ํ˜„ <https://github.com/KJH622>`_
55
.. note::
6-
|edit| View and edit this tutorial in `github <https://github.com/pytorchkorea/tutorials-kr/blob/main/beginner_source/dist_overview.rst>`__.
7-
8-
This is the overview page for the ``torch.distributed`` package. The goal of
9-
this page is to categorize documents into different topics and briefly
10-
describe each of them. If this is your first time building distributed training
11-
applications using PyTorch, it is recommended to use this document to navigate
12-
to the technology that can best serve your use case.
6+
|edit| ์ด ํŠœํ† ๋ฆฌ์–ผ์„ ์—ฌ๊ธฐ์„œ ๋ณด๊ณ  ํŽธ์ง‘ํ•˜์„ธ์š” `github <https://github.com/pytorchkorea/tutorials-kr/blob/main/beginner_source/dist_overview.rst>`__.
137

8+
์ด ๋ฌธ์„œ๋Š” ``torch.distributed`` ํŒจํ‚ค์ง€์˜ ๊ฐœ์š” ํŽ˜์ด์ง€์ž…๋‹ˆ๋‹ค.
9+
์ด ํŽ˜์ด์ง€์˜ ๋ชฉํ‘œ๋Š” ๋ฌธ์„œ๋ฅผ ์ฃผ์ œ๋ณ„๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ 
10+
๊ฐ ์ฃผ์ œ๋ฅผ ๊ฐ„๋žตํžˆ ์„ค๋ช…ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. PyTorch๋กœ ๋ถ„์‚ฐ ํ•™์Šต ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์ฒ˜์Œ ๊ตฌ์ถ•ํ•œ๋‹ค๋ฉด,
11+
์ด ๋ฌธ์„œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์—ฌ๋Ÿฌ๋ถ„์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๊ฐ€์žฅ ์ ํ•ฉํ•œ ๊ธฐ์ˆ ์„ ์ฐพ์•„๋ณด๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค.
1412

15-
Introduction
13+
์„œ๋ก 
1614
------------
1715

18-
The PyTorch Distributed library includes a collective of parallelism modules,
19-
a communications layer, and infrastructure for launching and
20-
debugging large training jobs.
16+
ํŒŒ์ดํ† ์น˜ ๋ถ„์‚ฐ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” ์—ฌ๋Ÿฌ ๋ณ‘๋ ฌํ™” ๋ชจ๋“ˆ, ํ†ต์‹  ๊ณ„์ธต, ๊ทธ๋ฆฌ๊ณ  ๋Œ€๊ทœ๋ชจ ํ•™์Šต ์ž‘์—…์˜ ์‹คํ–‰ ๋ฐ ๋””๋ฒ„๊น…์„ ์œ„ํ•œ ์ธํ”„๋ผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
2117

2218

23-
Parallelism APIs
19+
๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ API
2420
****************
2521

26-
These Parallelism Modules offer high-level functionality and compose with existing models:
22+
์ด๋Ÿฌํ•œ ๋ณ‘๋ ฌํ™” ๋ชจ๋“ˆ์€ ๊ณ ์ˆ˜์ค€ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋ฉฐ ๊ธฐ์กด ๋ชจ๋ธ๊ณผ ์กฐํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
2723

28-
- `Distributed Data-Parallel (DDP) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
29-
- `Fully Sharded Data-Parallel Training (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__
30-
- `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__
31-
- `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__
24+
- `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (DDP, Distributed Data-Parallel) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
25+
- `์™„์ „ ์ƒค๋”ฉ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ํ•™์Šต (FSDP2, Fully Sharded Data-Parallel Training) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__
26+
- `ํ…์„œ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (TP, Tensor Parallel) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__
27+
- `ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ (PP, Pipeline Parallel) <https://pytorch.org/docs/main/distributed.pipelining.html>`__
3228

33-
Sharding primitives
29+
์ƒค๋”ฉ ๊ธฐ๋ณธ ์š”์†Œ(Sharding primitives)
3430
*******************
3531

36-
``DTensor`` and ``DeviceMesh`` are primitives used to build parallelism in terms of sharded or replicated tensors on N-dimensional process groups.
32+
``DTensor`` ์™€ ``DeviceMesh`` ๋Š” N์ฐจ์› ํ”„๋กœ์„ธ์Šค ๊ทธ๋ฃน์—์„œ ํ…์„œ๋ฅผ ์ƒค๋”ฉํ•˜๊ฑฐ๋‚˜ ๋ณต์ œํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๋ณ‘๋ ฌํ™”๋ฅผ ๊ตฌ์„ฑํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ณธ ๊ตฌ์„ฑ์š”์†Œ์ž…๋‹ˆ๋‹ค.
3733

38-
- `DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/README.md>`__ represents a tensor that is sharded and/or replicated, and communicates automatically to reshard tensors as needed by operations.
39-
- `DeviceMesh <https://pytorch.org/docs/stable/distributed.html#devicemesh>`__ abstracts the accelerator device communicators into a multi-dimensional array, which manages the underlying ``ProcessGroup`` instances for collective communications in multi-dimensional parallelisms. Try out our `Device Mesh Recipe <https://tutorials.pytorch.kr/recipes/distributed_device_mesh.html>`__ to learn more.
34+
- `DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/README.md>`__ ๋Š” ์ƒค๋”ฉ๋˜๊ฑฐ๋‚˜/๋ณต์ œ๋œ ํ…์„œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์—ฐ์‚ฐ์˜ ์š”๊ตฌ์— ๋”ฐ๋ผ ํ…์„œ๋ฅผ ์žฌ์ƒค๋”ฉํ•˜๊ธฐ ์œ„ํ•œ ํ†ต์‹ ์„ ์ž๋™์œผ๋กœ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
35+
- `DeviceMesh <https://pytorch.org/docs/stable/distributed.html#devicemesh>`__ ๋Š” ๊ฐ€์†๊ธฐ ๋””๋ฐ”์ด์Šค์˜ ์ปค๋ฎค๋‹ˆ์ผ€์ดํ„ฐ(communicator)๋ฅผ ๋‹ค์ฐจ์› ๋ฐฐ์—ด๋กœ ์ถ”์ƒํ™”ํ•˜๋ฉฐ, ๋‹ค์ฐจ์› ๋ณ‘๋ ฌ์„ฑ์—์„œ ์ง‘ํ•ฉ(collective) ํ†ต์‹ ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ํ•˜์œ„ ``ProcessGroup`` ์ธ์Šคํ„ด์Šค๋“ค์„ ๊ด€๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๋” ์•Œ์•„๋ณด๋ ค๋ฉด `Device Mesh ๋ ˆ์‹œํ”ผ <https://tutorials.pytorch.kr/recipes/distributed_device_mesh.html>`__ ๋ฅผ ์ง์ ‘ ๋”ฐ๋ผ ํ•ด๋ณด์„ธ์š”.
4036

41-
Communications APIs
37+
ํ†ต์‹  API
4238
*******************
4339

44-
The `PyTorch distributed communication layer (C10D) <https://pytorch.org/docs/stable/distributed.html>`__ offers both collective communication APIs (e.g., `all_reduce <https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_reduce>`__
45-
and `all_gather <https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_gather>`__)
46-
and P2P communication APIs (e.g.,
47-
`send <https://pytorch.org/docs/stable/distributed.html#torch.distributed.send>`__
48-
and `isend <https://pytorch.org/docs/stable/distributed.html#torch.distributed.isend>`__),
49-
which are used under the hood in all of the parallelism implementations.
50-
`Writing Distributed Applications with PyTorch <../intermediate/dist_tuto.html>`__
51-
shows examples of using c10d communication APIs.
40+
`PyTorch ๋ถ„์‚ฐ ํ†ต์‹  ๊ณ„์ธต (C10D) <https://pytorch.org/docs/stable/distributed.html>`__ ์€ ์ง‘ํ•ฉ ํ†ต์‹  API (์˜ˆ: `all_reduce(์ „์ฒด ์ถ•์†Œ) <https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_reduce>`__
41+
, `all_gather(์ „์ฒด ์ˆ˜์ง‘) <https://pytorch.org/docs/stable/distributed.html#torch.distributed.all_gather>`__)
42+
์™€ P2P ํ†ต์‹  API (์˜ˆ: `send(๋™๊ธฐ ์ „์†ก) <https://pytorch.org/docs/stable/distributed.html#torch.distributed.send>`__
43+
, `isend(๋น„๋™๊ธฐ ์ „์†ก) <https://pytorch.org/docs/stable/distributed.html#torch.distributed.isend>`__)๋ฅผ ๋ชจ๋‘ ์ œ๊ณตํ•˜๋ฉฐ,
44+
์ด๋Ÿฌํ•œ API๋Š” ๋ชจ๋“  ๋ณ‘๋ ฌํ™” ๊ตฌํ˜„์—์„œ ๋‚ด๋ถ€์ ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
45+
`PyTorch๋กœ ๋ถ„์‚ฐ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ž‘์„ฑํ•˜๊ธฐ <../intermediate/dist_tuto.html>`__ ๋Š” C10D ํ†ต์‹  API ์‚ฌ์šฉ ์˜ˆ์ œ๋ฅผ ๋ณด์—ฌ ์ค๋‹ˆ๋‹ค.
5246

53-
Launcher
47+
์‹คํ–‰๊ธฐ(Launcher)
5448
********
5549

56-
`torchrun <https://pytorch.org/docs/stable/elastic/run.html>`__ is a widely-used launcher script, which spawns processes on the local and remote machines for running distributed PyTorch programs.
50+
`torchrun <https://pytorch.org/docs/stable/elastic/run.html>`__ ์€ ๋„๋ฆฌ ์“ฐ์ด๋Š” ์‹คํ–‰๊ธฐ ์Šคํฌ๋ฆฝํŠธ๋กœ, ๋ถ„์‚ฐ PyTorch ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๊ธฐ ์œ„ํ•ด ๋กœ์ปฌ ๋ฐ ์›๊ฒฉ ๋จธ์‹ ์—์„œ ํ”„๋กœ์„ธ์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
5751

5852

59-
Applying Parallelism To Scale Your Model
53+
๋ชจ๋ธ ํ™•์žฅ์„ ์œ„ํ•œ ๋ณ‘๋ ฌํ™” ์ ์šฉ
6054
----------------------------------------
6155

62-
Data Parallelism is a widely adopted single-program multiple-data training paradigm
63-
where the model is replicated on every process, every model replica computes local gradients for
64-
a different set of input data samples, gradients are averaged within the data-parallel communicator group before each optimizer step.
56+
๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌํ™”(Data Parallelism)๋Š” ๋„๋ฆฌ ์ฑ„ํƒ๋œ SPMD(single-program multiple-data) ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„์œผ๋กœ,
57+
๋ชจ๋ธ์ด ๋ชจ๋“  ํ”„๋กœ์„ธ์Šค์— ๋ณต์ œ๋˜๊ณ  ๊ฐ ๋ชจ๋ธ์˜ ๋ณต์ œ๋ณธ์ด ์„œ๋กœ ๋‹ค๋ฅธ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด ๋กœ์ปฌ ๋ณ€ํ™”๋„๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
58+
๊ทธ๋Ÿฐ ๋‹ค์Œ ๊ฐ ์˜ตํ‹ฐ๋งˆ์ด์ € ์Šคํ… ์ „์— ๋ฐ์ดํ„ฐ-๋ณ‘๋ ฌ ํ†ต์‹  ๊ทธ๋ฃน ๋‚ด์—์„œ ๋ณ€ํ™”๋„๋ฅผ ํ‰๊ท ํ™”ํ•ฉ๋‹ˆ๋‹ค.
6559

66-
Model Parallelism techniques (or Sharded Data Parallelism) are required when a model doesn't fit in GPU, and can be combined together to form multi-dimensional (N-D) parallelism techniques.
60+
๋ชจ๋ธ ๋ณ‘๋ ฌํ™”(Model Parallelism) ๊ธฐ๋ฒ•(๋˜๋Š” ์ƒค๋”ฉ๋œ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌํ™”)์€ ๋ชจ๋ธ์ด GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋“ค์–ด๊ฐ€์ง€ ์•Š์„ ๋•Œ ํ•„์š”ํ•˜๋ฉฐ, ์„œ๋กœ ๊ฒฐํ•ฉํ•ด ๋‹ค์ฐจ์›(N-D) ๋ณ‘๋ ฌํ™” ๊ธฐ๋ฒ•์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
6761

68-
When deciding what parallelism techniques to choose for your model, use these common guidelines:
62+
๋ชจ๋ธ์— ์ ์šฉํ•  ๋ณ‘๋ ฌํ™” ๊ธฐ๋ฒ•์„ ๊ฒฐ์ •ํ•  ๋•Œ๋Š” ๋‹ค์Œ์˜ ์ผ๋ฐ˜์ ์ธ ์ง€์นจ์„ ์ฐธ๊ณ ํ•˜์„ธ์š”.
6963

70-
#. Use `DistributedDataParallel (DDP) <https://pytorch.org/docs/stable/notes/ddp.html>`__,
71-
if your model fits in a single GPU but you want to easily scale up training using multiple GPUs.
64+
#. ๋ชจ๋ธ์ด ๋‹จ์ผ GPU๋ฅผ ํƒ‘์žฌํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ์—ฌ๋Ÿฌ GPU๋กœ ์‰ฝ๊ฒŒ ํ•™์Šต์„ ํ™•์žฅํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด
65+
`DistributedDataParallel (DDP, ๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌํ™”) <https://pytorch.org/docs/stable/notes/ddp.html>`__ ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
7266

73-
* Use `torchrun <https://pytorch.org/docs/stable/elastic/run.html>`__, to launch multiple pytorch processes if you are using more than one node.
67+
* ์—ฌ๋Ÿฌ ๋…ธ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ, ์—ฌ๋Ÿฌ PyTorch ํ”„๋กœ์„ธ์Šค๋ฅผ ์‹œ์ž‘ํ•˜๋ ค๋ฉด `torchrun <https://pytorch.org/docs/stable/elastic/run.html>`__ ์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
7468

75-
* See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
69+
* ์ฐธ๊ณ : `๋ถ„์‚ฐ ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ(DDP) ์‹œ์ž‘ํ•˜๊ธฐ <../intermediate/ddp_tutorial.html>`__
7670

77-
#. Use `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ when your model cannot fit on one GPU.
71+
#. ๋ชจ๋ธ์ด ๋‹จ์ผ GPU์— ํƒ‘์žฌ๋˜์ง€ ์•Š์„ ๋•Œ๋Š” `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ ์„ ์‚ฌ์šฉํ•˜์„ธ์š”.
7872

79-
* See also: `Getting Started with FSDP2 <https://tutorials.pytorch.kr/intermediate/FSDP_tutorial.html>`__
73+
* ์ฐธ๊ณ : `FSDP2 ์‹œ์ž‘ํ•˜๊ธฐ <https://tutorials.pytorch.kr/intermediate/FSDP_tutorial.html>`__
8074

81-
#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP2.
75+
#. FSDP2๋กœ๋Š” ํ™•์žฅ ํ•œ๊ณ„์— ๋„๋‹ฌํ•œ ๊ฒฝ์šฐ, `Tensor Parallel (TP, Tensor ๋ณ‘๋ ฌํ™”) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ ๋ฐ/๋˜๋Š” `Pipeline Parallel (PP, ํŒŒ์ดํ”„๋ผ์ธ ๋ณ‘๋ ฌํ™”) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ ๋ฅผ ์‚ฌ์šฉํ•˜์„ธ์š”.
8276

83-
* Try our `Tensor Parallelism Tutorial <https://tutorials.pytorch.kr/intermediate/TP_tutorial.html>`__
77+
* `Tensor ๋ณ‘๋ ฌํ™” ํŠœํ† ๋ฆฌ์–ผ <https://tutorials.pytorch.kr/intermediate/TP_tutorial.html>`__ ์„ ํ™•์ธํ•ด ๋ณด์„ธ์š”.
8478

85-
* See also: `TorchTitan end to end example of 3D parallelism <https://github.com/pytorch/torchtitan>`__
79+
* ์ฐธ๊ณ : `TorchTitan 3D ๋ณ‘๋ ฌํ™” ์ „์ฒด(end to end) ์˜ˆ์ œ <https://github.com/pytorch/torchtitan>`__
8680

87-
.. note:: Data-parallel training also works with `Automatic Mixed Precision (AMP) <https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-multiple-gpus>`__.
81+
.. note:: ๋ฐ์ดํ„ฐ ๋ณ‘๋ ฌ ํ•™์Šต์€ `์ž๋™ ํ˜ผํ•ฉ ์ •๋ฐ€๋„(AMP, Automatic Mixed Precision) <https://pytorch.org/docs/stable/notes/amp_examples.html#working-with-multiple-gpus>`__ ์™€ ํ•จ๊ป˜์—์„œ๋„ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
8882

8983

90-
PyTorch Distributed Developers
84+
PyTorch ๋ถ„์‚ฐ ๊ฐœ๋ฐœ์ž
9185
------------------------------
9286

93-
If you'd like to contribute to PyTorch Distributed, refer to our
94-
`Developer Guide <https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md>`_.
87+
PyTorch ๋ถ„์‚ฐ์— ๊ธฐ์—ฌํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด `๊ฐœ๋ฐœ์ž ๊ฐ€์ด๋“œ <https://github.com/pytorch/pytorch/blob/master/torch/distributed/CONTRIBUTING.md>`_ ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

0 commit comments

Comments
ย (0)