graph LR
Experiment_Management["Experiment Management"]
MPI_Utilities["MPI Utilities"]
PyTorch_Backend_Utilities["PyTorch Backend Utilities"]
TensorFlow_1_x_Backend_Utilities["TensorFlow 1.x Backend Utilities"]
PyTorch_Algorithms["PyTorch Algorithms"]
TensorFlow_1_x_Algorithms["TensorFlow 1.x Algorithms"]
Policy_Evaluation["Policy Evaluation"]
Experiment_Visualization["Experiment Visualization"]
Reinforcement_Learning_Exercises["Reinforcement Learning Exercises"]
Experiment_Management -- "initiates" --> PyTorch_Algorithms
Experiment_Management -- "manages data for" --> PyTorch_Algorithms
Experiment_Management -- "initiates" --> TensorFlow_1_x_Algorithms
Experiment_Management -- "manages data for" --> TensorFlow_1_x_Algorithms
Experiment_Management -- "leverages" --> MPI_Utilities
PyTorch_Algorithms -- "utilizes" --> PyTorch_Backend_Utilities
PyTorch_Algorithms -- "logs to" --> Experiment_Management
PyTorch_Algorithms -- "communicates via" --> MPI_Utilities
TensorFlow_1_x_Algorithms -- "utilizes" --> TensorFlow_1_x_Backend_Utilities
TensorFlow_1_x_Algorithms -- "logs to" --> Experiment_Management
TensorFlow_1_x_Algorithms -- "communicates via" --> MPI_Utilities
PyTorch_Backend_Utilities -- "depends on" --> MPI_Utilities
TensorFlow_1_x_Backend_Utilities -- "depends on" --> MPI_Utilities
Policy_Evaluation -- "loads from" --> PyTorch_Algorithms
Policy_Evaluation -- "loads from" --> TensorFlow_1_x_Algorithms
Policy_Evaluation -- "reports to" --> Experiment_Management
Experiment_Visualization -- "processes data from" --> Experiment_Management
Reinforcement_Learning_Exercises -- "demonstrates" --> PyTorch_Backend_Utilities
Reinforcement_Learning_Exercises -- "demonstrates" --> TensorFlow_1_x_Backend_Utilities
click Experiment_Management href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/Experiment Management.md" "Details"
click MPI_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/MPI Utilities.md" "Details"
click PyTorch_Backend_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/PyTorch Backend Utilities.md" "Details"
click TensorFlow_1_x_Backend_Utilities href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/TensorFlow 1.x Backend Utilities.md" "Details"
click PyTorch_Algorithms href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/PyTorch Algorithms.md" "Details"
click TensorFlow_1_x_Algorithms href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/TensorFlow 1.x Algorithms.md" "Details"
click Policy_Evaluation href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/Policy Evaluation.md" "Details"
click Experiment_Visualization href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/Experiment Visualization.md" "Details"
click Reinforcement_Learning_Exercises href "https://github.com/CodeBoarding/GeneratedOnBoardings/blob/main//spinningup/Reinforcement Learning Exercises.md" "Details"
The spinningup project provides a comprehensive suite for developing and experimenting with reinforcement learning algorithms. Its core functionality revolves around an Experiment Management system that orchestrates training runs, handles configuration, and manages logging and data. This system leverages MPI Utilities for distributed training, enabling parallel execution of various PyTorch Algorithms and TensorFlow 1.x Algorithms. Both algorithm sets rely on their respective Backend Utilities for neural network architectures and framework-specific MPI integration. Post-training, Policy Evaluation allows for testing trained agents, while Experiment Visualization aids in analyzing results. Additionally, Reinforcement Learning Exercises serve as an educational resource, demonstrating the practical application of the backend utilities.
Manages the overall execution of reinforcement learning experiments, including parsing command-line arguments, setting up experiment grids, initiating training runs, and handling experiment configuration, logging, and data serialization.
Related Classes/Methods:
spinup.run.parse_and_execute_grid_search(48:180)spinup.utils.run_utils.ExperimentGrid(240:546)spinup.utils.run_utils.call_experiment(89:211)spinup.utils.logx.Logger(71:301)spinup.utils.logx.EpochLogger(303:383)spinup.utils.serialization_utils.convert_json(3:26)
Provides fundamental utilities for inter-process communication in distributed training environments, enabling parallel execution of reinforcement learning algorithms.
Related Classes/Methods:
spinup.utils.mpi_tools.proc_id(42:44)spinup.utils.mpi_tools.num_procs(49:51)spinup.utils.mpi_tools.mpi_op(56:61)spinup.utils.mpi_tools.mpi_sum(63:64)spinup.utils.mpi_tools.mpi_avg(66:68)spinup.utils.mpi_tools.mpi_statistics_scalar(70:92)spinup.utils.mpi_tools.broadcast(53:54)spinup.utils.mpi_tools.mpi_fork(6:36)
Contains utilities for integrating PyTorch with MPI for distributed training and defines common neural network architectures (actors and critics) used in PyTorch-based reinforcement learning algorithms.
Related Classes/Methods:
spinup.utils.mpi_pytorch.setup_pytorch_for_mpi(8:17)spinup.utils.mpi_pytorch.mpi_avg_grads(20:27)spinup.utils.mpi_pytorch.sync_params(29:35)spinup.algos.pytorch.ddpg.core.MLPActor(23:33)spinup.algos.pytorch.ddpg.core.MLPQFunction(35:43)spinup.algos.pytorch.ddpg.core.MLPActorCritic(45:61)spinup.algos.pytorch.ppo.core.Actor(47:63)spinup.algos.pytorch.ppo.core.MLPCategoricalActor(66:77)spinup.algos.pytorch.ppo.core.MLPGaussianActor(80:94)spinup.algos.pytorch.ppo.core.MLPCritic(97:104)spinup.algos.pytorch.ppo.core.MLPActorCritic(108:135)spinup.algos.pytorch.sac.core.SquashedGaussianMLPActor(29:67)spinup.algos.pytorch.sac.core.MLPQFunction(70:78)spinup.algos.pytorch.sac.core.MLPActorCritic(80:98)spinup.algos.pytorch.td3.core.MLPActor(23:33)spinup.algos.pytorch.td3.core.MLPQFunction(35:43)spinup.algos.pytorch.td3.core.MLPActorCritic(45:62)spinup.algos.pytorch.vpg.core.Actor(47:63)spinup.algos.pytorch.vpg.core.MLPCategoricalActor(66:77)spinup.algos.pytorch.vpg.core.MLPGaussianActor(80:94)spinup.algos.pytorch.vpg.core.MLPCritic(97:104)spinup.algos.pytorch.vpg.core.MLPActorCritic(108:135)
Contains utilities for integrating TensorFlow 1.x with MPI for distributed training and defines common neural network architectures (actors and critics) used in TensorFlow 1.x-based reinforcement learning algorithms.
Related Classes/Methods:
spinup.utils.mpi_tf.assign_params_from_flat(10:14)spinup.utils.mpi_tf.sync_params(16:22)spinup.utils.mpi_tf.sync_all_params(24:26)spinup.utils.mpi_tf.MpiAdamOptimizer(29:78)spinup.algos.tf1.ddpg.core.placeholders(8:9)spinup.algos.tf1.ddpg.core.count_vars(19:21)spinup.algos.tf1.ddpg.core.mlp_actor_critic(26:36)spinup.algos.tf1.ppo.core.placeholder(13:14)spinup.algos.tf1.ppo.core.placeholders(16:17)spinup.algos.tf1.ppo.core.placeholder_from_space(19:24)spinup.algos.tf1.ppo.core.placeholders_from_spaces(26:27)spinup.algos.tf1.ppo.core.count_vars(37:39)spinup.algos.tf1.ppo.core.mlp_categorical_policy(67:74)spinup.algos.tf1.ppo.core.mlp_gaussian_policy(77:85)spinup.algos.tf1.ppo.core.mlp_actor_critic(91:104)spinup.algos.tf1.sac.core.placeholders(9:10)spinup.algos.tf1.sac.core.count_vars(20:22)spinup.algos.tf1.sac.core.mlp_gaussian_policy(36:46)spinup.algos.tf1.sac.core.mlp_actor_critic(64:82)spinup.algos.tf1.td3.core.placeholders(8:9)spinup.algos.tf1.td3.core.count_vars(19:21)spinup.algos.tf1.td3.core.mlp_actor_critic(26:38)spinup.algos.tf1.trpo.core.values_as_sorted_list(16:17)spinup.algos.tf1.trpo.core.placeholder(19:20)spinup.algos.tf1.trpo.core.placeholders(22:23)spinup.algos.tf1.trpo.core.placeholder_from_space(25:30)spinup.algos.tf1.trpo.core.placeholders_from_spaces(32:33)spinup.algos.tf1.trpo.core.count_vars(43:45)spinup.algos.tf1.trpo.core.flat_grad(73:74)spinup.algos.tf1.trpo.core.hessian_vector_product(76:80)spinup.algos.tf1.trpo.core.assign_params_from_flat(82:86)spinup.algos.tf1.trpo.core.mlp_categorical_policy(109:123)spinup.algos.tf1.trpo.core.mlp_gaussian_policy(126:141)spinup.algos.tf1.trpo.core.mlp_actor_critic(147:161)spinup.algos.tf1.vpg.core.placeholder(13:14)spinup.algos.tf1.vpg.core.placeholders(16:17)spinup.algos.tf1.vpg.core.placeholder_from_space(19:24)spinup.algos.tf1.vpg.core.placeholders_from_spaces(26:27)spinup.algos.tf1.vpg.core.count_vars(37:39)spinup.algos.tf1.vpg.core.mlp_categorical_policy(67:74)spinup.algos.tf1.vpg.core.mlp_gaussian_policy(77:85)spinup.algos.tf1.vpg.core.mlp_actor_critic(91:104)
Implements various reinforcement learning algorithms (DDPG, PPO, SAC, TD3, VPG) using the PyTorch framework, including their training loops, replay buffers, and policy/value function updates.
Related Classes/Methods:
spinup.algos.pytorch.ddpg.ddpg(44:307)spinup.algos.pytorch.ddpg.ddpg.ReplayBuffer(11:40)spinup.algos.pytorch.ppo.ppo(88:354)spinup.algos.pytorch.ppo.ppo.PPOBuffer(12:84)spinup.algos.pytorch.sac.sac(45:348)spinup.algos.pytorch.sac.sac.ReplayBuffer(12:41)spinup.algos.pytorch.td3.td3(45:348)spinup.algos.pytorch.td3.td3.ReplayBuffer(12:41)spinup.algos.pytorch.vpg.vpg(88:326)spinup.algos.pytorch.vpg.vpg.VPGBuffer(12:84)
Implements various reinforcement learning algorithms (DDPG, PPO, SAC, TD3, TRPO, VPG) using the TensorFlow 1.x framework, including their training loops, replay buffers, and policy/value function updates.
Related Classes/Methods:
spinup.algos.tf1.ddpg.ddpg(42:287)spinup.algos.tf1.ddpg.ddpg.ReplayBuffer(10:38)spinup.algos.tf1.ppo.ppo(86:301)spinup.algos.tf1.ppo.ppo.PPOBuffer(11:82)spinup.algos.tf1.sac.sac(42:313)spinup.algos.tf1.sac.sac.ReplayBuffer(10:38)spinup.algos.tf1.td3.td3(42:313)spinup.algos.tf1.td3.td3.ReplayBuffer(10:38)spinup.algos.tf1.trpo.trpo(92:379)spinup.algos.tf1.trpo.trpo.GAEBuffer(13:88)spinup.algos.tf1.vpg.vpg(86:276)spinup.algos.tf1.vpg.vpg.VPGBuffer(11:82)
Provides tools for loading and evaluating trained reinforcement learning policies in simulated environments, allowing users to observe agent behavior.
Related Classes/Methods:
spinup.utils.test_policy.load_policy_and_env(11:64)spinup.utils.test_policy.load_tf_policy(67:89)spinup.utils.test_policy.run_policy(110:137)
Offers functionalities for generating plots and visualizations from experiment data, aiding in the analysis and understanding of training results.
Related Classes/Methods:
spinup.utils.plot.get_all_datasets(103:151)spinup.utils.plot.make_plots(154:163)spinup.utils.plot.main(166:230)
Contains problem sets and their solutions for learning and practicing reinforcement learning concepts, implemented in both PyTorch and TensorFlow. This component serves as a tutorial and educational resource.
Related Classes/Methods:
spinup.exercises.pytorch.problem_set_1.exercise1_2.MLPGaussianActor(71:99)spinup.exercises.pytorch.problem_set_1.exercise1_2_auxiliary.MLPCritic(25:32)spinup.exercises.pytorch.problem_set_1.exercise1_2_auxiliary.ExerciseActorCritic(35:54)spinup.exercises.pytorch.problem_set_1.exercise1_3.td3(60:389)spinup.exercises.pytorch.problem_set_1_solutions.exercise1_2_soln.DiagonalGaussianDistribution(19:32)spinup.exercises.pytorch.problem_set_1_solutions.exercise1_2_soln.MLPGaussianActor(35:49)spinup.exercises.pytorch.problem_set_2.exercise2_2.BuggedMLPActorCritic(45:61)spinup.exercises.tf1.problem_set_1.exercise1_3.td3(58:367)spinup.exercises.tf1.problem_set_1_solutions.exercise1_2_soln.mlp_gaussian_policy(16:24)