Proposed RFC Feature : ScriptCanvas breakpoint support

# RFC ScriptCanvas breakpoint support

# Summary
Improve the ScriptCanvas debugging facilities by **allowing user to toggle breakpoints** for any given graph. During runtime, whenever a node with a breakpoint is executed, it will halt the gameplay and show what occurred in the ScriptCanvas UI. This RFC also handles closely related features.

This relates to the following issue https://github.com/o3de/o3de/issues/9192

# What is the relevance of this feature?
Breakpoints are the **cornerstone of software development**. Their support allows to easily rule out unexecuted code, check if the branching conditions are correct and generally offer quick and in-depth knowledge about the state of your program. It makes programming more enjoyable, and provides a **significant productivity boost** both when creating new gameplay mechanics or solving issues on them.

The alternative is either to log custom messages, or enable tracing (explained further below). While both of these methods can be useful based on the use cases, I believe that they are nowhere near as important as breakpoints.

![Image](https://github.com/user-attachments/assets/8a5ad171-e67c-408d-97e8-cdb9bf7f7bbc)

# Feature design description

## How to use

Users are required to enable the [Remote Tools Gem](https://docs.o3de.org/docs/user-guide/gems/reference/debug/remote-tools/) in order to gain access to debugging facilities in script canvas. When a debugging panel is open and the gem is not present, it will show an message in said panel of the like "please enable remote tools gem to enable debugging facilities".

![Image](https://github.com/user-attachments/assets/1c0b0d07-bc34-46c8-9e2b-f804f0b1112e)

Then it is just about creating/opening a ScriptCanvas file in the ScriptCanvas editor, and right clicking on a node. The option to **"Add Breakpoint"** will be available. Once clicked, the graph will be observed if it wasn't already (there is a dedicated panel listing observed graphs) and an icon will show up on the node (see details below).

![Image](https://github.com/user-attachments/assets/019032dd-9b95-4788-8e01-71afbc399e6f)

If the graph is assigned to an entity in a level, and that the user enters play mode, if the logic reaches the node, then the editor/client executable will freeze and the ScriptCanvas UI will update with a new state on the Node. Allowing the user to resume execution, or proceed step by step to the following nodes. The other debugging windows will also be filled with the current context (mockups below).

## New UI elements

### Overview

Below is a mock-up of ScriptCanvas with the addition of the new panels and elements to support enhanced debugging. It represents the state of an observed graph while the execution is running (no breakpoint is currently hit). This file is [visible in figma](https://www.figma.com/design/IjF0MAYplkgJ1z8nLk08uz/O3DE-Editor-Mockups?node-id=1003-6141&t=8MRLH8apqYVMbxe6-1).

![Image](https://github.com/user-attachments/assets/26986674-04b1-4808-a00c-9614085df21e)

### 1. Breakpoint node state

An icon visible at the top right corner is there to state the current breakpoint status on the node. The exact same icon is to be used in the breakpoints window. On hover on the icon, a tooltip should show to explain the state.

| State              | Description                                                  | Illustration              |
| ------------------ | ------------------------------------------------------------ | ------------------------- |
| None               | Normal node state                                            |  ![Image](https://github.com/user-attachments/assets/593cbb8e-e65e-42e0-97e0-76c0d9a20f64)  |
| Pending add        | We are doing request to the engine to add a breakpoint, until the engine answers we show this pending icon | ![Image](https://github.com/user-attachments/assets/c800ff3b-0b12-48e4-ab92-463e74f24fdb) |
| Unmatched location | Occurs when breakpoint was added onto a new node and graph not saved. Breakpoint will be added on save | ![Image](https://github.com/user-attachments/assets/0229b0e6-26b5-43b0-889c-290167c284b1) |
| Enabled            | Breakpoint is registered and enabled                         | ![Image](https://github.com/user-attachments/assets/14410686-431f-4f87-91dd-321d570899f0) |
| Disabled           | Breakpoint has been manually disabled (but not removed) by the user. It is still listed in the breakpoints list and can easily be toggled back on | ![Image](https://github.com/user-attachments/assets/c9215752-8faa-4505-a579-ee4decfb02b7) |
| Hit                | During execution, when breakpoint is hit the icon changes to something close to an arrow. The current node at the state of execution has a dotted line. <br />We might want more feedback in the global graph view to state that a breakpoint is hit (In unreal, there is [a giant arrow](https://dev.epicgames.com/documentation/en-us/unreal-engine/blueprint-debugging-example-in-unreal-engine) above the node for example) | ![Image](https://github.com/user-attachments/assets/9befae96-eef5-4744-aa28-1c8e4501270a) |

### 2. Action bar

The action bar visible on the top will be expanded with the following elements :

![Image](https://github.com/user-attachments/assets/4468461a-f475-48e9-a0e9-8b06958d1c6e)

A. **Debug context menu** to allow to start/pause/stop/restart execution. When a breakpoint is hit, allow to step in/out and step over. The UI is following the one from [Visual Studio Code](https://code.visualstudio.com/docs/editor/debugging#_debug-actions) (mit-licensed icons) given that I believe they are way more explicit and better designed than the one currently in usage in the LuaEditor (screenshot below)

![Image](https://github.com/user-attachments/assets/deb76f9b-6fe8-4967-ac86-9b647e72dff1)

B. **Target picker** to attach to a remote process. It is a copy from the one currently in use in the [Lua Editor](https://docs.o3de.org/docs/user-guide/scripting/lua/debugging-tutorial/). The "context" has not been kept as - as far as I am aware - it is always the default lua context in usage.

C. **Entity picker** to switch between the graph states of multiple entities. When a breakpoint is hit it is only for a specific entity, all other entities using this graphs are in a different state, and this is how to switch between them (as their execution is paused as well, might be able to see what was the last node executed and update the variable watch)

### 3. Observed Graphs

This view already exist in the current implementation inside of the tracing window, yet I am moving it to its own panel as it impacts more panels now. This window allows the user to toggle graph observation for specific graphs or entities :

- If a **graph is not observed**, the tickbox is off ("CameraTarget" graph in capture below). 

  It won't generate any tracing events or trigger any breakpoints. (Adding a breakpoint automatically tick the graph as observed. If user untick from the observed graph, then breakpoint icon will show as unmatched location)

  

- If a **graph is observed**, the tickbox is on ("Test" graph in capture below). 

  It will generate events to trigger breakpoints. By default it will not generate any tracing event, enabling tracing is global for all observed graphs and is toggled on/off via the "Execution Trace" window.

![Image](https://github.com/user-attachments/assets/c96dba67-2b52-4e18-9d1d-6273d7fd097c)

### 4. Breakpoint list

Is listing breakpoints across all graphs (a new column is missing from the screenshot to state which graph it comes from). 

Allow user to toggle breakpoints via tickbox or remove them on right click menu. Double click allows to quick jump to the node. The icon next to the tickbox must reflect the breakpoint state explained above.

![Image](https://github.com/user-attachments/assets/a90b476b-b598-43bd-a4e5-3f763ea7d783)

### 5. Execution Trace

The execution trace window already exist, but will be modified to better match with the new feature set.

#### Elements removed

- The observed graphs is now its own panel as stated above
- The target picker (to select remote process to debug) is now in the action bar

#### Elements added

- Keep a shortcut to run/stop execution but use the icon set from the Action Bar
- Icons needs refinement to better state what they do (clear on play, trace from the start)
- Need a checkbox to enable or disable tracing (it is expensive so disabled by default)

#### Elements up to debate / for future RFC

Addition of a **Timeline tab** used to display the same set of data in a scrollable timeline instead of being a list. 

This is inspired by the Ubisoft Snowdrop engine which relies heavily on this for AI debugging. You can see such system on [this GDC video](https://gdcvault.com/play/1023382/AI-Behavior-Editing-and-Debugging) at around 13"30 or in [this video from AI and Games](https://youtu.be/6Xv0WguFTFE?si=CM7rLyDLWjK8pCZr&t=10). This timeline allows to scroll through events, and the nodes will lit up on the graph to match the state of the timeline. We could rely on the [TrackView code](https://www.docs.o3de.org/docs/user-guide/visualization/cinematics/track-view/editor-toolbars/) to do that.

![Image](https://github.com/user-attachments/assets/063651d3-8530-4a9b-a632-957d9c3c403b)

### 6. Variable watch

The variable watch is very close to the current "Variable Manager" panel. However it does not allow the user to create variable, and it is listing more variables such as the inputs pins from the current node, or any other global state if there is any. This is only updated when tracing is enabled, or if the execution is currently halted by a breakpoint.

![Image](https://github.com/user-attachments/assets/511ea179-877b-4d03-8055-e1cb11d4c150)

### 7. Callstack

It is not present on the mockup but should be quite similar to the "Execution Trace" event list will less content. It is meant to show any higher graph which led to trigger this node being triggered in the end. The last element in the list is "External Code" given that it is inside of the Lua Virtual Machine. Can be particularly useful to track down who triggered a custom event for example.

# Technical design description

This feature is **building itself upon a large structure of existing code** which was properly architectured but never fully enabled (only tracing is there for now, fixed via https://github.com/o3de/o3de/pull/17879). This section is there to explain such architecture as it is crucial to have a clear understanding of it in order to fathom how the breakpoint support will be added. The two first section below are only there for explanation and the changes needed for the RFC in the third section are minor.

I will start by explaining the lua breakpoint codepath used by the Lua Editor.  As this codepath is way easier to understand, it eases into the scriptcanvas debugging architecture.

## 🪲 Quick overview on the Lua Editor debugging

The LuaIDE is used to edit Lua files and provides debugging support for them. It is an external process to the O3DE editor, this means that in order to communicate the two process have to **send remote messages**. In the context of breakpoint support, this is done via the [Remote Tools Gem](https://docs.o3de.org/docs/user-guide/gems/reference/debug/remote-tools/). The editor is hosting one instance, so is the luaIDE, so that both instances are able to send and get messages from/to each other.

![Image](https://github.com/user-attachments/assets/7f9543f0-1722-4aaf-a6b0-879d4e26cad5)

1. From the `LuaIDE`, when a user click on a line to add a breakpoint, a request is made and reaches the connected client. 

2. Then the `ScriptDebugAgent` takes the message and stores the breakpoint (which is simply a filename with a line number) in the `ScriptContext`. Said `ScriptContext` is a wrapper around the Lua Virtual Machine, there is only one ever instantiated so it is the same context used by all scripts.

3. If it is not already the case, a `LuaHook` is bound. It is a method which is triggered at runtime anytime that something occurs on the Lua Virtual Machine (like when a new line is executed).

```mermaid
classDiagram
	direction LR
	
namespace LuaIDE {
	
	class Component {
		<<LuaDebuggerComponent.cpp>>
		AttachDebugger()
		CreateBreakpoint()
	}
    
}
		
namespace ClientOrEditor {
	class RemoteToolsInterface {
		GetReceivedMessages()
	}

	class ScriptDebugAgent {
		<<AzFramework / ScriptRemoteDebugging.cpp>>
		Process()
		Attach()
	}
		
	class ScriptContextDebug {
		<<AzCore>>
		AddBreakpoint()
		Process()
		LuaHook()
		ScriptContext& m_context;
	}
}
	
	Component --> RemoteToolsInterface : SendRemoteToolsMessage
	RemoteToolsInterface --> ScriptDebugAgent
	ScriptDebugAgent --> ScriptContextDebug
```

At runtime when the game is running, the `LuaHook` is triggered many times per frame. 

1. It has access to the stored breakpoints and check if the current executed line is matching a stored breakpoint. This behavior relies on `lua_getinfo` which only returns something valid if the lua was compiled with debug support. This is enabled in `LuaBuilder::LuaDumpToStream` from the `LmbrCentral` gem.

2. If there is a match, then a message is sent over to the LuaIDE, and the client process enters a freeze mode where only the remote tools messages are handled (this while loop is inside of `ScriptDebugAgent::BreakpointCallback()`). 

3. On its side, the `LuaIDE` offers information about the current process, such as which breakpoint has been hit, what is the callstack and the variable values. User can then choose to resume execution on the client or proceed step by step.

```mermaid
classDiagram
direction LR

namespace ClientOrEditor {

	class ScriptContextDebug {
		<<AzCore>>
		LuaHook()
		BreakpointCallback()
		ScriptContext& m_context;
	}

	class RemoteToolsInterface {
		SendRemoteToolsMessage()
	}
}	

namespace LuaIDE {
	
	class Component {
		<<LuaDebuggerComponent.cpp>>
		OnSystemTick()
	}
	
	class Context {
		<<LuaEditorContext.cpp>>
		OnBreakpointHit()
	}
    
}
		
	ScriptContextDebug --> RemoteToolsInterface
	RemoteToolsInterface --> Component
	Component --> Context
```

## 🏗️ ScriptCanvas Compilation process to support debugging

While we could use the same pipeline to support breakpoints in ScriptCanvas (see https://github.com/o3de/o3de/pull/18708) it would be difficult to support the remaining features such as step by step debugging, variable watch and so on. This is simply because **one node in ScriptCanvas is bound to generate ten lines of lua code to be executed**. Knowing which group of lines corresponds to which node or user variable then becomes an issue.

Fortunately this was well known by the maker of ScriptCanvas tracing, so that the **build system is able to embed debugging information** within the scriptcanvas lua files to identify the nodes section within the lua code.

Let's use the 3 nodes below from this graph as an example.

![Image](https://github.com/user-attachments/assets/2d17c97b-14c3-4d0c-9fc2-a87d3d9123b5)

You can take a peek at the generated file upon save in `ScriptCanvasBuilderWorkerUtility.cpp` via `ProcessTranslationJob()` for the `translation.m_text.data()` variable (it is on the asset processor side, so you need to attach to this external process). Below is an extract of the generated lua text file, before it gets compiled in binary to luac.

```lua
if _G.SCRIPT_CANVAS_GLOBAL_RELEASE then

function CameraTarget_VM:OnGraphStart()
	local executionState = self.executionState
	EBusHandlerConnect(self.TickBusHandler_scvm)
	self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
end

else

function CameraTarget_VM:OnGraphStart()
	if DebugIsTraced(self.executionState) then
		local executionState = self.executionState
		DEBUG_SIGNAL_OUT(executionState, 0) -- TranslateFunctionBlock begin
		DEBUG_SIGNAL_IN(executionState, 0, self.TickBusHandler_scvm) -- TranslateExecutionTreeFunctionCall begin
		EBusHandlerConnect(self.TickBusHandler_scvm)
		DEBUG_SIGNAL_IN(executionState, 1, self.m_InputHandlerNodeable_scvm, [[CameraAim]]) -- TranslateExecutionTreeFunctionCall begin
		self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
	else
		local executionState = self.executionState
		EBusHandlerConnect(self.TickBusHandler_scvm)
		self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
	end
end
```

We can see that the file is **split between two sections** with a global "if/else" statement. The release code does not contain any debug information, so we are interested into the second implementation of "CameraTarget_VM:OnGraphStart()".

- It starts with a call to "DebugIsTraced" which is doing a C++ call checking if the graph is ticked on in the Observed Graph window in ScriptCanvas
- If it is not traced, then the code is similar to the release code
- If it is traced, it does multiple C++ call via "DEBUG_SIGNAL_OUT" and similar. Thanks to these calls, we know when a slot has been entered, when a node has been executed and so on

While the writing of this information is straightforward (via `GraphToLua::WriteDebugInfoIn`), I believe that the **system becomes complex when it comes to resolving these debug lines** to the node they corresponds to. This is caused by multiple architectural choices :

- Nodes in scriptCanvas UI are identified with a unique EntityId (also used by the entity system). 
- We differentiate the Editor-time ScriptCanvas from the Runtime ScriptCanvas files (this is before having lua involved, added via https://github.com/o3de/o3de/pull/8623 and also [explained in this video](https://www.youtube.com/watch?v=PP-sFjByvhA)). As one source scriptcanvas might lead to multiple runtime representation of the same graph

The two points above used in conjunction means that the runtime graph must remap all of its EntityId before being instanciated (find and replace all EntityId occurrences, and save the from/to information somewhere). This is needed to prevent the same EntityId to be used twice.

While I am not entirely certain as to why nodes have to be instantiated outside the ScriptCanvas UI, given that the executed Lua code don't need it, we have to abide by these restrictions. Via https://github.com/o3de/o3de/pull/18145 I removed one such remapping step which was not required.

This pipeline can be revisited in the future to be simplified / properly explained, but this is outside the scope of this RFC.

## 📨 Client to ScriptCanvas UI debugging communication

The communication between the client and editor - even through ScriptCanvas UI is in the editor process - is still very similar to the LuaIDE. We rely on the `RemoteToolsInterface` gem to send messages between the game and editor. The only difference is that we are not starting from the `LuaHook` but instead from our injected custom events.

1. The Lua code, if the graph is observed, sends debug trace event to the `ServiceComponent` on the client side. The client is able to read the event and translates it to the actual node information.
2. If this event matches a breakpoint set on a node, then it sends a message to the editor and freeze in an infinite loop only processing messages (see `NodeSignalled()` in `Debugger.h`) . If the user has enabled tracing and event does not match a breakpoint, then the send event is sent over to the editor.
3. On the Editor side, we grab the message, and currently we simply log it into the `LiveLoggingDataAggregator` which is simply the Execution Trace window.

```mermaid
classDiagram
direction LR

namespace ClientSide {
	class LuaVM

	class ExecutionDebugInterpretedAPI {
		DebugIsTraced()
		DebugSignalIn()
	}
	
	class ExecutionState {
		GetDebugSymbolIn()
	}
	
	class ServiceComponent {
		<<Debugger.cpp>>
		NodeSignalled()
		VariableChanged()
	}
	
	class RemoteToolsInterface {
		SendRemoteToolsMessage()
	}
}

namespace EditorSide {
	class ClientTransceiver {
		OnReceivedMsg()
	}
	
	class LiveLoggingDataAggregator {
		SignaledInput()
	}
}
	
	LuaVM --> ExecutionDebugInterpretedAPI
	ExecutionDebugInterpretedAPI <-- ExecutionState
	ExecutionDebugInterpretedAPI --> ServiceComponent
	ServiceComponent --> RemoteToolsInterface
	RemoteToolsInterface --> ClientTransceiver
	ClientTransceiver --> LiveLoggingDataAggregator
```

It is important to note 2 performance cost of this system :

1. It is 4 times more expensive than the `LuaHook` as instead of checking only the hook in c++, Lua is sending multiple events for every action. While the impact is not gigantic, it does adds up for every entity (an improvement could be to only send one event for each node when tracing is disabled).
2. The most expensive operation by far is the `SendRemoteToolsMessage()` in particular when the message box is overflowed as events are deleted from dynamic memory. Enabling tracing is currently a poor user experience as a result (for breakpoint support it is good enough, as we only send a message to the editor when a breakpoint is hit). Below is a capture with [Superluminal profiler](https://github.com/o3de/o3de-extras/pull/834) in a simple level with one graph traced, all the upcost comes from the method listed above.

![Image](https://github.com/user-attachments/assets/2b733195-c92b-477e-acc2-7d622acbbb2a)

# Planning

### 1. Move ScriptCanvas to a standalone process

This is needed as we currently don't have a generic way to pause the game logic, and will be coherent with the LuaIDE. It also prevent a crash from ScriptCanvas UI to crash the whole editor. Starting work is carried over here https://github.com/o3de/o3de/pull/18722

### 2. Allow user to add a breakpoint on node

Allowing the user to place a breakpoint on right click, and freeze client execution when this happens is the core of this feature. This is already available via this PR https://github.com/o3de/o3de/pull/18719

### 3. Reflect breakpoint node state via icon

This will deepens the message sending logic between client and editor as we want to have acknowledgement when a breakpoint is set. On the UI front in order to add a dynamic icon on the top right of a node we might have to extend the GraphCanvas api.

### 4. Split Execution Trace window and make tracing an option

As tracing is expensive we want to disable it by default and leave the option in the UI (we keep generating lua tracing event, but we only send them to the editor if a breakpoint is hit). The Observed Graph panel will be separated in the process and it might be the opportunity to add fixes in it (entities are not currently shown, and graphs are only shown after a first run).

### 5. Action bar improvements (same for LuaEditor for consistency)

Now that we support adding and removing breakpoint in a consistent manner, adding support for step-by-step debugging will be required. It might be interesting to see if the "pause" execution can be supported, and if we can get some contextual information out of it.

### 6. Breakpoint list window

Should be relatively easy to support, it is more a quality of life feature than a requirement.

### ?. Callstack and Variable watch windows

This will need to be better thought of, as it might require addition to multiple systems. We also might not want to have the watch a separate window.

# What are the advantages of the feature ?

ScriptCanvas is currently the most advertised and documented way to use O3DE in order to create new gameplay logic. It is a cool system, but I believe that it lacks a few central features and the support for breakpoint is likely the most noticeable.

Implemented in the way described above, it would give us a cutting edge over other public engines :

- The Blueprint experience in Unreal Engine is by far the most pleasant and robust, but you can only debug from within the editor. This means that what occurs in a shipping build for Blueprint logic, outside of its call to C++ and the log, is completely opaque. I know from experience that when behaviour differ in a shipping build from the editor (and yes these happens) then the debugging process can be difficult and long (you have to re-cook the assets which can take long time).

- The CryEngine flowgraph system can only be debugged from within the editor as well
- The Unity Visual Scripting solution does not provide breakpoint support

Many event handling in ScriptCanvas are currently text-based which means that if there is a typo in the event name, or if the file defining this event is modified, then the logic breaks. While breakpoints are not the final solution to this issue, being able to break execution or do step-in / step-out around an event is a strong way to know if it is triggered by the right entity or if the link is broken.

# What are the disadvantages of the feature ?
Moving ScriptCanvas UI to a standalone process as advantages (clear code structure, a crash is not crashing the editor), but also inconvenient such as the boot time for the tool. During the migration it is important to have the tool being interactable as fast as possible.

# Are there any alternatives to this feature ?
It is possible to provide a simple way for users to export their scriptcanvas to a lua script so that they can use the luaIDE debugging facilities. Yet it is a higher barer of entry as using Lua is more complicated than scriptcanvas.

# How will users learn this feature ?
The documentation will have to be updated and some youtube tutorials will have to be made. In general, once finished, I believe it is worth advertising this feature given that it is not fully supported or non-existent in other engines.

# Are there any open questions ?
There are a few implementations details about some tools like the callstack which are not yet explained.

State	Description	Illustration
None	Normal node state
Pending add	We are doing request to the engine to add a breakpoint, until the engine answers we show this pending icon
Unmatched location	Occurs when breakpoint was added onto a new node and graph not saved. Breakpoint will be added on save
Enabled	Breakpoint is registered and enabled
Disabled	Breakpoint has been manually disabled (but not removed) by the user. It is still listed in the breakpoints list and can easily be toggled back on
Hit	During execution, when breakpoint is hit the icon changes to something close to an arrow. The current node at the state of execution has a dotted line. We might want more feedback in the global graph view to state that a breakpoint is hit (In unreal, there is a giant arrow above the node for example)

Proposed RFC Feature : ScriptCanvas breakpoint support #151

Description

RFC ScriptCanvas breakpoint support

Summary

What is the relevance of this feature?

Feature design description

How to use

New UI elements

Overview

1. Breakpoint node state

2. Action bar

3. Observed Graphs

4. Breakpoint list

5. Execution Trace

Elements removed

Elements added

Elements up to debate / for future RFC

6. Variable watch

7. Callstack

Technical design description

🪲 Quick overview on the Lua Editor debugging

🏗️ ScriptCanvas Compilation process to support debugging

📨 Client to ScriptCanvas UI debugging communication

Planning

1. Move ScriptCanvas to a standalone process

2. Allow user to add a breakpoint on node

3. Reflect breakpoint node state via icon

4. Split Execution Trace window and make tracing an option

5. Action bar improvements (same for LuaEditor for consistency)

6. Breakpoint list window

?. Callstack and Variable watch windows

What are the advantages of the feature ?

What are the disadvantages of the feature ?

Are there any alternatives to this feature ?

How will users learn this feature ?

Are there any open questions ?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions