-
Notifications
You must be signed in to change notification settings - Fork 60
Running Simulation with Chakra
Authors: Joongun Park (Georgia Tech)
Chakra is a framework created to support multiple simulators with the aim of advancing performance benchmarking and co-design through standardized execution traces. It facilitates compatibility and performance evaluation across various machine learning models, software, and hardware, enhancing the co-design ecosystem for AI systems.
Currently, ASTRA-Sim supports Chakra trace as its input.
et_generator can be used to define and generate any execution traces, functioning as a test case generator. You can generate execution traces with the following commands (Python 2.7 or higher is required):
$ cd ${ASTRA_SIM}/extern/graph_frontend/chakra
$ pip3 install .
$ python3 -m chakra.et_generator.et_generator --num_npus 64 --num_dims 1
To run one of the example traces (one_comm_coll_node_allreduce), execute the following command.
# For the analytical network backend
$ cd -
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
--workload-configuration=./extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
--system-configuration=./inputs/system/Switch.json \
--network-configuration=./inputs/network/analytical/Switch.yml \
--remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
# For the ns3 network backend. Python2 required.
# After editing the configuration files in the following script
$ ./build/astra_ns3/build.sh -r
# Or, alternatively:
$ cd ./extern/network_backend/ns3/simulation
$ ./waf --run "scratch/AstraSimNetwork \
--workload-configuration=../../../../extern/graph_frontend/chakra/one_comm_coll_node_allreduce \
--system-configuration=../../../../inputs/system/Switch.json \
--network-configuration=mix/config.txt \
--remote-memory-configuration=../../../../inputs/remote_memory/analytical/no_memory_expansion.json \
--logical-topology-configuration=../../../../inputs/network/ns3/sample_64nodes_1D.json \
--comm-group-configuration=\"empty\""
$ cd -
Note that ASTRA-sim's naming rule for execution traces follows the format {path prefix/trace name}.{npu_id}.et. By adding a few lines to any PyTorch workload, you can generate the PyTorch Execution Trace (ET) and Kineto traces for each GPU (and its corresponding CPU thread). Details on how to tweak the PyTorch files to get PyTorch-ET and Kineto traces can be found here. With these traces for each GPU, we merge the PyTorch-ET and Kineto trace into a single enhanced ET. From there, it’s all about feeding this enhanced ET into a converter that converts the enhanced ET into the Chakra format.
Run the following command.
# This is a sample script that runs astrasim with the sample chakra files of {path prefix/trace name}.
$ ./build/astra_analytical/build/bin/AstraSim_Analytical_Congestion_Unaware \
--workload-configuration=./{path prefix/trace name} \
--system-configuration=./inputs/system/FullyConnected.json \
--network-configuration=./inputs/network/analytical/FullyConnected.yml \
--remote-memory-configuration=./inputs/remote_memory/analytical/no_memory_expansion.json
Upon completion, ASTRA-sim will display the number of cycles it took to run the simulation.
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
...
sys[0] finished, 13271344 cycles
sys[1] finished, 14249000 cycles
-
Home
-
Community
-
Chakra Schema Release Notes
-
Getting Started
-
Chakra Framework Explanation
- [Overview] Chakra Framework Components
- [TraceLinker] Merging Host and Device Execution Traces
- [Converter] Representing Nodes in Chakra Execution Traces
- [Converter] Converting Chakra Traces to Protobuf Format
- [ETFeeder] Feeding Chakra Traces to Simulators
- [JSONNode] Representing Chakra Traces in JSON
-
Downstream Tools & Applications
-
Resources
- Publications
- Videos
- Google Drive
- FAQ
- Design Documents
-
Contributing