Skip to content

Proposed RFC Feature : ScriptCanvas breakpoint support #151

@guillaume-haerinck

Description

@guillaume-haerinck

RFC ScriptCanvas breakpoint support

Summary

Improve the ScriptCanvas debugging facilities by allowing user to toggle breakpoints for any given graph. During runtime, whenever a node with a breakpoint is executed, it will halt the gameplay and show what occurred in the ScriptCanvas UI. This RFC also handles closely related features.

This relates to the following issue o3de/o3de#9192

What is the relevance of this feature?

Breakpoints are the cornerstone of software development. Their support allows to easily rule out unexecuted code, check if the branching conditions are correct and generally offer quick and in-depth knowledge about the state of your program. It makes programming more enjoyable, and provides a significant productivity boost both when creating new gameplay mechanics or solving issues on them.

The alternative is either to log custom messages, or enable tracing (explained further below). While both of these methods can be useful based on the use cases, I believe that they are nowhere near as important as breakpoints.

Image

Feature design description

How to use

Users are required to enable the Remote Tools Gem in order to gain access to debugging facilities in script canvas. When a debugging panel is open and the gem is not present, it will show an message in said panel of the like "please enable remote tools gem to enable debugging facilities".

Image

Then it is just about creating/opening a ScriptCanvas file in the ScriptCanvas editor, and right clicking on a node. The option to "Add Breakpoint" will be available. Once clicked, the graph will be observed if it wasn't already (there is a dedicated panel listing observed graphs) and an icon will show up on the node (see details below).

Image

If the graph is assigned to an entity in a level, and that the user enters play mode, if the logic reaches the node, then the editor/client executable will freeze and the ScriptCanvas UI will update with a new state on the Node. Allowing the user to resume execution, or proceed step by step to the following nodes. The other debugging windows will also be filled with the current context (mockups below).

New UI elements

Overview

Below is a mock-up of ScriptCanvas with the addition of the new panels and elements to support enhanced debugging. It represents the state of an observed graph while the execution is running (no breakpoint is currently hit). This file is visible in figma.

Image

1. Breakpoint node state

An icon visible at the top right corner is there to state the current breakpoint status on the node. The exact same icon is to be used in the breakpoints window. On hover on the icon, a tooltip should show to explain the state.

State Description Illustration
None Normal node state Image
Pending add We are doing request to the engine to add a breakpoint, until the engine answers we show this pending icon Image
Unmatched location Occurs when breakpoint was added onto a new node and graph not saved. Breakpoint will be added on save Image
Enabled Breakpoint is registered and enabled Image
Disabled Breakpoint has been manually disabled (but not removed) by the user. It is still listed in the breakpoints list and can easily be toggled back on Image
Hit During execution, when breakpoint is hit the icon changes to something close to an arrow. The current node at the state of execution has a dotted line.
We might want more feedback in the global graph view to state that a breakpoint is hit (In unreal, there is a giant arrow above the node for example)
Image

2. Action bar

The action bar visible on the top will be expanded with the following elements :

Image

A. Debug context menu to allow to start/pause/stop/restart execution. When a breakpoint is hit, allow to step in/out and step over. The UI is following the one from Visual Studio Code (mit-licensed icons) given that I believe they are way more explicit and better designed than the one currently in usage in the LuaEditor (screenshot below)

Image

B. Target picker to attach to a remote process. It is a copy from the one currently in use in the Lua Editor. The "context" has not been kept as - as far as I am aware - it is always the default lua context in usage.

C. Entity picker to switch between the graph states of multiple entities. When a breakpoint is hit it is only for a specific entity, all other entities using this graphs are in a different state, and this is how to switch between them (as their execution is paused as well, might be able to see what was the last node executed and update the variable watch)

3. Observed Graphs

This view already exist in the current implementation inside of the tracing window, yet I am moving it to its own panel as it impacts more panels now. This window allows the user to toggle graph observation for specific graphs or entities :

  • If a graph is not observed, the tickbox is off ("CameraTarget" graph in capture below).

    It won't generate any tracing events or trigger any breakpoints. (Adding a breakpoint automatically tick the graph as observed. If user untick from the observed graph, then breakpoint icon will show as unmatched location)

  • If a graph is observed, the tickbox is on ("Test" graph in capture below).

    It will generate events to trigger breakpoints. By default it will not generate any tracing event, enabling tracing is global for all observed graphs and is toggled on/off via the "Execution Trace" window.

Image

4. Breakpoint list

Is listing breakpoints across all graphs (a new column is missing from the screenshot to state which graph it comes from).

Allow user to toggle breakpoints via tickbox or remove them on right click menu. Double click allows to quick jump to the node. The icon next to the tickbox must reflect the breakpoint state explained above.

Image

5. Execution Trace

The execution trace window already exist, but will be modified to better match with the new feature set.

Elements removed

  • The observed graphs is now its own panel as stated above
  • The target picker (to select remote process to debug) is now in the action bar

Elements added

  • Keep a shortcut to run/stop execution but use the icon set from the Action Bar
  • Icons needs refinement to better state what they do (clear on play, trace from the start)
  • Need a checkbox to enable or disable tracing (it is expensive so disabled by default)

Elements up to debate / for future RFC

Addition of a Timeline tab used to display the same set of data in a scrollable timeline instead of being a list.

This is inspired by the Ubisoft Snowdrop engine which relies heavily on this for AI debugging. You can see such system on this GDC video at around 13"30 or in this video from AI and Games. This timeline allows to scroll through events, and the nodes will lit up on the graph to match the state of the timeline. We could rely on the TrackView code to do that.

Image

6. Variable watch

The variable watch is very close to the current "Variable Manager" panel. However it does not allow the user to create variable, and it is listing more variables such as the inputs pins from the current node, or any other global state if there is any. This is only updated when tracing is enabled, or if the execution is currently halted by a breakpoint.

Image

7. Callstack

It is not present on the mockup but should be quite similar to the "Execution Trace" event list will less content. It is meant to show any higher graph which led to trigger this node being triggered in the end. The last element in the list is "External Code" given that it is inside of the Lua Virtual Machine. Can be particularly useful to track down who triggered a custom event for example.

Technical design description

This feature is building itself upon a large structure of existing code which was properly architectured but never fully enabled (only tracing is there for now, fixed via o3de/o3de#17879). This section is there to explain such architecture as it is crucial to have a clear understanding of it in order to fathom how the breakpoint support will be added. The two first section below are only there for explanation and the changes needed for the RFC in the third section are minor.

I will start by explaining the lua breakpoint codepath used by the Lua Editor. As this codepath is way easier to understand, it eases into the scriptcanvas debugging architecture.

🪲 Quick overview on the Lua Editor debugging

The LuaIDE is used to edit Lua files and provides debugging support for them. It is an external process to the O3DE editor, this means that in order to communicate the two process have to send remote messages. In the context of breakpoint support, this is done via the Remote Tools Gem. The editor is hosting one instance, so is the luaIDE, so that both instances are able to send and get messages from/to each other.

Image

  1. From the LuaIDE, when a user click on a line to add a breakpoint, a request is made and reaches the connected client.

  2. Then the ScriptDebugAgent takes the message and stores the breakpoint (which is simply a filename with a line number) in the ScriptContext. Said ScriptContext is a wrapper around the Lua Virtual Machine, there is only one ever instantiated so it is the same context used by all scripts.

  3. If it is not already the case, a LuaHook is bound. It is a method which is triggered at runtime anytime that something occurs on the Lua Virtual Machine (like when a new line is executed).

classDiagram
	direction LR
	
namespace LuaIDE {
	
	class Component {
		<<LuaDebuggerComponent.cpp>>
		AttachDebugger()
		CreateBreakpoint()
	}
    
}
		
namespace ClientOrEditor {
	class RemoteToolsInterface {
		GetReceivedMessages()
	}

	class ScriptDebugAgent {
		<<AzFramework / ScriptRemoteDebugging.cpp>>
		Process()
		Attach()
	}
		
	class ScriptContextDebug {
		<<AzCore>>
		AddBreakpoint()
		Process()
		LuaHook()
		ScriptContext& m_context;
	}
}
	
	Component --> RemoteToolsInterface : SendRemoteToolsMessage
	RemoteToolsInterface --> ScriptDebugAgent
	ScriptDebugAgent --> ScriptContextDebug
Loading

At runtime when the game is running, the LuaHook is triggered many times per frame.

  1. It has access to the stored breakpoints and check if the current executed line is matching a stored breakpoint. This behavior relies on lua_getinfo which only returns something valid if the lua was compiled with debug support. This is enabled in LuaBuilder::LuaDumpToStream from the LmbrCentral gem.

  2. If there is a match, then a message is sent over to the LuaIDE, and the client process enters a freeze mode where only the remote tools messages are handled (this while loop is inside of ScriptDebugAgent::BreakpointCallback()).

  3. On its side, the LuaIDE offers information about the current process, such as which breakpoint has been hit, what is the callstack and the variable values. User can then choose to resume execution on the client or proceed step by step.

classDiagram
direction LR

namespace ClientOrEditor {

	class ScriptContextDebug {
		<<AzCore>>
		LuaHook()
		BreakpointCallback()
		ScriptContext& m_context;
	}

	class RemoteToolsInterface {
		SendRemoteToolsMessage()
	}
}	

namespace LuaIDE {
	
	class Component {
		<<LuaDebuggerComponent.cpp>>
		OnSystemTick()
	}
	
	class Context {
		<<LuaEditorContext.cpp>>
		OnBreakpointHit()
	}
    
}
		
	ScriptContextDebug --> RemoteToolsInterface
	RemoteToolsInterface --> Component
	Component --> Context
Loading

🏗️ ScriptCanvas Compilation process to support debugging

While we could use the same pipeline to support breakpoints in ScriptCanvas (see o3de/o3de#18708) it would be difficult to support the remaining features such as step by step debugging, variable watch and so on. This is simply because one node in ScriptCanvas is bound to generate ten lines of lua code to be executed. Knowing which group of lines corresponds to which node or user variable then becomes an issue.

Fortunately this was well known by the maker of ScriptCanvas tracing, so that the build system is able to embed debugging information within the scriptcanvas lua files to identify the nodes section within the lua code.

Let's use the 3 nodes below from this graph as an example.

Image

You can take a peek at the generated file upon save in ScriptCanvasBuilderWorkerUtility.cpp via ProcessTranslationJob() for the translation.m_text.data() variable (it is on the asset processor side, so you need to attach to this external process). Below is an extract of the generated lua text file, before it gets compiled in binary to luac.

if _G.SCRIPT_CANVAS_GLOBAL_RELEASE then

function CameraTarget_VM:OnGraphStart()
	local executionState = self.executionState
	EBusHandlerConnect(self.TickBusHandler_scvm)
	self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
end

else

function CameraTarget_VM:OnGraphStart()
	if DebugIsTraced(self.executionState) then
		local executionState = self.executionState
		DEBUG_SIGNAL_OUT(executionState, 0) -- TranslateFunctionBlock begin
		DEBUG_SIGNAL_IN(executionState, 0, self.TickBusHandler_scvm) -- TranslateExecutionTreeFunctionCall begin
		EBusHandlerConnect(self.TickBusHandler_scvm)
		DEBUG_SIGNAL_IN(executionState, 1, self.m_InputHandlerNodeable_scvm, [[CameraAim]]) -- TranslateExecutionTreeFunctionCall begin
		self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
	else
		local executionState = self.executionState
		EBusHandlerConnect(self.TickBusHandler_scvm)
		self.m_InputHandlerNodeable_scvm.ConnectEvent(self.m_InputHandlerNodeable_scvm, [[CameraAim]])
	end
end

We can see that the file is split between two sections with a global "if/else" statement. The release code does not contain any debug information, so we are interested into the second implementation of "CameraTarget_VM:OnGraphStart()".

  • It starts with a call to "DebugIsTraced" which is doing a C++ call checking if the graph is ticked on in the Observed Graph window in ScriptCanvas
  • If it is not traced, then the code is similar to the release code
  • If it is traced, it does multiple C++ call via "DEBUG_SIGNAL_OUT" and similar. Thanks to these calls, we know when a slot has been entered, when a node has been executed and so on

While the writing of this information is straightforward (via GraphToLua::WriteDebugInfoIn), I believe that the system becomes complex when it comes to resolving these debug lines to the node they corresponds to. This is caused by multiple architectural choices :

  • Nodes in scriptCanvas UI are identified with a unique EntityId (also used by the entity system).
  • We differentiate the Editor-time ScriptCanvas from the Runtime ScriptCanvas files (this is before having lua involved, added via Run ScriptCanvas everywhere in O3DE o3de#8623 and also explained in this video). As one source scriptcanvas might lead to multiple runtime representation of the same graph

The two points above used in conjunction means that the runtime graph must remap all of its EntityId before being instanciated (find and replace all EntityId occurrences, and save the from/to information somewhere). This is needed to prevent the same EntityId to be used twice.

While I am not entirely certain as to why nodes have to be instantiated outside the ScriptCanvas UI, given that the executed Lua code don't need it, we have to abide by these restrictions. Via o3de/o3de#18145 I removed one such remapping step which was not required.

This pipeline can be revisited in the future to be simplified / properly explained, but this is outside the scope of this RFC.

📨 Client to ScriptCanvas UI debugging communication

The communication between the client and editor - even through ScriptCanvas UI is in the editor process - is still very similar to the LuaIDE. We rely on the RemoteToolsInterface gem to send messages between the game and editor. The only difference is that we are not starting from the LuaHook but instead from our injected custom events.

  1. The Lua code, if the graph is observed, sends debug trace event to the ServiceComponent on the client side. The client is able to read the event and translates it to the actual node information.
  2. If this event matches a breakpoint set on a node, then it sends a message to the editor and freeze in an infinite loop only processing messages (see NodeSignalled() in Debugger.h) . If the user has enabled tracing and event does not match a breakpoint, then the send event is sent over to the editor.
  3. On the Editor side, we grab the message, and currently we simply log it into the LiveLoggingDataAggregator which is simply the Execution Trace window.
classDiagram
direction LR

namespace ClientSide {
	class LuaVM

	class ExecutionDebugInterpretedAPI {
		DebugIsTraced()
		DebugSignalIn()
	}
	
	class ExecutionState {
		GetDebugSymbolIn()
	}
	
	class ServiceComponent {
		<<Debugger.cpp>>
		NodeSignalled()
		VariableChanged()
	}
	
	class RemoteToolsInterface {
		SendRemoteToolsMessage()
	}
}

namespace EditorSide {
	class ClientTransceiver {
		OnReceivedMsg()
	}
	
	class LiveLoggingDataAggregator {
		SignaledInput()
	}
}
	
	LuaVM --> ExecutionDebugInterpretedAPI
	ExecutionDebugInterpretedAPI <-- ExecutionState
	ExecutionDebugInterpretedAPI --> ServiceComponent
	ServiceComponent --> RemoteToolsInterface
	RemoteToolsInterface --> ClientTransceiver
	ClientTransceiver --> LiveLoggingDataAggregator
Loading

It is important to note 2 performance cost of this system :

  1. It is 4 times more expensive than the LuaHook as instead of checking only the hook in c++, Lua is sending multiple events for every action. While the impact is not gigantic, it does adds up for every entity (an improvement could be to only send one event for each node when tracing is disabled).
  2. The most expensive operation by far is the SendRemoteToolsMessage() in particular when the message box is overflowed as events are deleted from dynamic memory. Enabling tracing is currently a poor user experience as a result (for breakpoint support it is good enough, as we only send a message to the editor when a breakpoint is hit). Below is a capture with Superluminal profiler in a simple level with one graph traced, all the upcost comes from the method listed above.

Image

Planning

1. Move ScriptCanvas to a standalone process

This is needed as we currently don't have a generic way to pause the game logic, and will be coherent with the LuaIDE. It also prevent a crash from ScriptCanvas UI to crash the whole editor. Starting work is carried over here o3de/o3de#18722

2. Allow user to add a breakpoint on node

Allowing the user to place a breakpoint on right click, and freeze client execution when this happens is the core of this feature. This is already available via this PR o3de/o3de#18719

3. Reflect breakpoint node state via icon

This will deepens the message sending logic between client and editor as we want to have acknowledgement when a breakpoint is set. On the UI front in order to add a dynamic icon on the top right of a node we might have to extend the GraphCanvas api.

4. Split Execution Trace window and make tracing an option

As tracing is expensive we want to disable it by default and leave the option in the UI (we keep generating lua tracing event, but we only send them to the editor if a breakpoint is hit). The Observed Graph panel will be separated in the process and it might be the opportunity to add fixes in it (entities are not currently shown, and graphs are only shown after a first run).

5. Action bar improvements (same for LuaEditor for consistency)

Now that we support adding and removing breakpoint in a consistent manner, adding support for step-by-step debugging will be required. It might be interesting to see if the "pause" execution can be supported, and if we can get some contextual information out of it.

6. Breakpoint list window

Should be relatively easy to support, it is more a quality of life feature than a requirement.

?. Callstack and Variable watch windows

This will need to be better thought of, as it might require addition to multiple systems. We also might not want to have the watch a separate window.

What are the advantages of the feature ?

ScriptCanvas is currently the most advertised and documented way to use O3DE in order to create new gameplay logic. It is a cool system, but I believe that it lacks a few central features and the support for breakpoint is likely the most noticeable.

Implemented in the way described above, it would give us a cutting edge over other public engines :

  • The Blueprint experience in Unreal Engine is by far the most pleasant and robust, but you can only debug from within the editor. This means that what occurs in a shipping build for Blueprint logic, outside of its call to C++ and the log, is completely opaque. I know from experience that when behaviour differ in a shipping build from the editor (and yes these happens) then the debugging process can be difficult and long (you have to re-cook the assets which can take long time).

  • The CryEngine flowgraph system can only be debugged from within the editor as well

  • The Unity Visual Scripting solution does not provide breakpoint support

Many event handling in ScriptCanvas are currently text-based which means that if there is a typo in the event name, or if the file defining this event is modified, then the logic breaks. While breakpoints are not the final solution to this issue, being able to break execution or do step-in / step-out around an event is a strong way to know if it is triggered by the right entity or if the link is broken.

What are the disadvantages of the feature ?

Moving ScriptCanvas UI to a standalone process as advantages (clear code structure, a crash is not crashing the editor), but also inconvenient such as the boot time for the tool. During the migration it is important to have the tool being interactable as fast as possible.

Are there any alternatives to this feature ?

It is possible to provide a simple way for users to export their scriptcanvas to a lua script so that they can use the luaIDE debugging facilities. Yet it is a higher barer of entry as using Lua is more complicated than scriptcanvas.

How will users learn this feature ?

The documentation will have to be updated and some youtube tutorials will have to be made. In general, once finished, I believe it is worth advertising this feature given that it is not fully supported or non-existent in other engines.

Are there any open questions ?

There are a few implementations details about some tools like the callstack which are not yet explained.

Metadata

Metadata

Assignees

Labels

rfc-featureRequest for Comments for a Feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions