Scenario Lifecycle: From Creation to Completion

A scenario in NeuroSim is a single instance of a distributed simulation, coordinating multiple plugins to model a specific situation or test case. Every scenario follows a well-defined lifecycle with five distinct states: Created, Initializing, Active, Stopping, and Completed. The core orchestrator manages these state transitions, coordinating with all participating plugins through Kafka messages to ensure synchronized initialization, execution, and graceful shutdown. Understanding the scenario lifecycle is essential for both platform operators (who need to monitor scenario progress) and plugin developers (who must respond correctly to lifecycle messages).

The Five Lifecycle States

The scenario lifecycle is a linear progression through five states, each representing a distinct phase of the simulation:

Created: The scenario exists but has not yet started. Configuration is validated, Kafka topics are created, and the scenario record is persisted in the orchestrator's database. No plugins are engaged yet. This state exists to allow operators to review and modify configuration before committing to initialization.

Initializing: The orchestrator has sent initialization commands to all plugins participating in the scenario. Plugins are loading configuration, allocating resources, establishing connections, and preparing to begin simulation. The scenario remains in this state until all plugins report successful initialization or until a timeout expires.

Active: All plugins have successfully initialized, and the simulation is running. Plugins exchange scenario-specific messages through Kafka topics, processing events and updating state. The scenario remains Active until an operator explicitly requests shutdown or until an error condition triggers automatic termination.

Stopping: The orchestrator has sent shutdown commands to all plugins. Plugins are flushing buffers, closing connections, persisting final state, and releasing resources. The scenario remains in this state until all plugins confirm clean shutdown or until a timeout forces termination.

Completed: All plugins have shut down, scenario-specific Kafka topics are archived or deleted, and final metrics are recorded. The scenario is now immutable—its configuration and message history are preserved for audit purposes, but it can no longer be modified or restarted.

This state machine is enforced by the orchestrator: illegal state transitions (e.g., Created → Stopping) are rejected, ensuring consistent behavior across all scenarios.

State Transition: Created → Initializing

When an operator starts a scenario, the orchestrator transitions from Created to Initializing and performs these steps:

  1. Configuration injection: The orchestrator extracts the per-plugin configuration from the scenario's overall configuration and sends each plugin its specific configuration subset via a Kafka message on the control plane.

  2. Initialization command: The orchestrator publishes an InitializeScenario message to the scenario's command topic, addressed to all registered plugins. This message includes the scenario ID, initialization timeout, and any scenario-wide parameters.

  3. Plugin response tracking: Each plugin processes the initialization message, performs its startup logic, and publishes an InitializationComplete message back to the orchestrator. The orchestrator tracks which plugins have responded successfully.

  4. Timeout enforcement: If any plugin fails to respond within the initialization timeout (configurable, typically 30-60 seconds), the orchestrator marks the scenario as failed and begins shutdown of any plugins that did successfully initialize.

This coordinated initialization ensures that all plugins start with consistent configuration and are ready to process scenario messages before the simulation begins. Plugins must not start publishing scenario events until they've confirmed initialization—doing so would violate the lifecycle contract.

State Transition: Initializing → Active

The transition to Active happens when the orchestrator has received successful initialization confirmations from all required plugins. At this point:

  • All plugins are ready to process scenario messages
  • Scenario-specific Kafka topics are fully active
  • The orchestrator starts monitoring plugin heartbeats to detect failures
  • Operators can view real-time scenario metrics in the Universal UI

If any plugin reports initialization failure (e.g., invalid configuration, resource unavailable), the orchestrator aborts the scenario and transitions directly to Stopping, ensuring no partial simulation state is left running.

Multi-Plugin Coordination During Initialization

Initializing a scenario with multiple plugins requires careful coordination. Consider a power grid simulation with three plugins:

  • Grid Solver: Performs power flow calculations
  • Load Model: Simulates consumer electricity demand
  • Fault Injector: Injects line faults at specified times

The Load Model needs to know the Grid Solver is ready before publishing load events. The Fault Injector needs to wait until the entire grid is initialized before injecting faults. NeuroSim handles this through the Initializing state:

  1. All three plugins receive initialization commands simultaneously
  2. Each plugin initializes independently (loading data, connecting to external systems)
  3. Each plugin confirms initialization only when it's fully ready
  4. The orchestrator transitions to Active only when all three confirmations are received
  5. Only then do plugins begin exchanging scenario-specific messages

This "barrier synchronization" ensures plugins don't race ahead and publish messages that other plugins aren't ready to process.

State Transition: Active → Stopping

An Active scenario transitions to Stopping in three ways:

Operator-requested shutdown: An operator explicitly clicks "Stop Scenario" in the Universal UI or calls the REST API. This is the normal, graceful shutdown path.

Automatic completion: Some scenarios have defined end conditions (e.g., simulate 24 hours of grid operation). A plugin can publish a ScenarioComplete event when the simulation reaches its natural conclusion, triggering the Stopping transition.

Error-triggered shutdown: If a critical plugin crashes, stops sending heartbeats, or reports an unrecoverable error, the orchestrator can automatically transition to Stopping to prevent partial simulation state from persisting.

When transitioning to Stopping, the orchestrator publishes a StopScenario message to all plugins with a shutdown timeout (configurable, typically 15-30 seconds). This begins the graceful shutdown process.

Graceful Shutdown with Configurable Timeouts

Graceful shutdown is critical for ensuring clean simulation termination:

Flush pending work: Plugins should complete processing any messages currently in flight. For example, a data logger should write all buffered records to disk before shutting down.

Persist state: Plugins that maintain state should save final state to persistent storage so post-simulation analysis tools can access it.

Close connections: Plugins should cleanly disconnect from external systems (databases, APIs, hardware interfaces) rather than leaving orphaned connections.

Report completion: Plugins publish a ShutdownComplete message to confirm clean shutdown.

The orchestrator waits up to the configured shutdown timeout for all plugins to report completion. If any plugin exceeds the timeout, the orchestrator logs a warning and transitions to Completed anyway—but this is considered an abnormal termination and may trigger alerts for operators to investigate.

State Transition: Stopping → Completed

Once all plugins have confirmed shutdown (or the timeout expires), the orchestrator transitions to Completed and performs final cleanup:

  • Archives scenario-specific Kafka topics (or deletes them based on retention policy)
  • Records final scenario metrics (duration, message count, plugin statistics)
  • Marks the scenario record as immutable in the database
  • Releases any orchestrator-side resources (goroutines, file handles, etc.)

At this point, the scenario is fully terminated and exists only as a historical record for audit and analysis purposes.

Scenario Configuration and Per-Plugin Injection

Scenarios are configured through a hierarchical JSON structure:

{
  "scenarioID": "power-grid-fault-test-01",
  "description": "Test grid response to line fault",
  "plugins": {
    "grid-solver": {
      "gridTopology": "ieee-118-bus.xml",
      "simulationStepMs": 100
    },
    "load-model": {
      "loadProfile": "residential-weekday.csv",
      "scaling": 1.2
    },
    "fault-injector": {
      "faultTime": "00:05:00",
      "faultLocation": "line-42"
    }
  }
}

During the Created → Initializing transition, the orchestrator extracts each plugin's configuration subset and sends it in the initialization message. Plugins never see the full scenario configuration—only their own configuration block. This enforces configuration isolation and prevents plugins from depending on other plugins' internal settings.

Practical Implications for Plugin Developers

For developers building plugins, understanding the lifecycle means:

  • Respond to initialization: Your plugin must process InitializeScenario messages and confirm when ready
  • Respect the Active state: Only publish scenario events after transitioning to Active
  • Handle shutdown gracefully: Process StopScenario messages and clean up resources within the timeout
  • Report errors clearly: If initialization or shutdown fails, publish error messages with diagnostic information

Plugins that violate lifecycle expectations (e.g., publishing events during Initializing) will cause scenarios to behave unpredictably. The orchestrator's state machine exists to enforce discipline across the distributed system.

Practical Implications for Platform Operators

For operators managing NeuroSim deployments, the lifecycle provides:

  • Clear visibility: Each scenario's state is always visible in the Universal UI
  • Predictable behavior: Scenarios progress through well-defined states with known semantics
  • Operational safety: Graceful shutdown ensures clean termination even during emergencies
  • Audit trail: Lifecycle transitions are logged and can be correlated with scenario outcomes

Understanding which state a scenario is in helps operators diagnose issues. For example, a scenario stuck in Initializing for minutes indicates a plugin initialization problem, while rapid transitions through Stopping suggest a crash.