Developed by Yeison Nolberto Cardona Álvarez, MSc.
Andrés Marino Álvarez Meza, PhD.
César Germán Castellanos Dominguez, PhD.
Digital Signal Processing and Control Group | Grupo de Control y Procesamiento Digital de Señales (GCPDS)
Universidad Nacional de Colombia sede Manizales
Chaski-Confluent is an advanced distributed communication framework designed to streamline data exchange between nodes over TCP/IP networks. It features robust node discovery, efficient message handling, dynamic pairing based on subscription topics, and extends functionality with remote interactions, ensuring resilience and flexibility in complex network topologies.
The project aims to provide a reliable and scalable solution for distributed systems, addressing the challenges of latency management, subscription routing, remote method invocation, and connection stability. With Chaski-Confluent, developers can easily build and maintain efficient communication protocols in their distributed applications.
Chaski-Confluent is a comprehensive solution to the challenges of distributed systems. Built with robustness and scalability in mind, it leverages advanced networking techniques to facilitate data exchange between nodes. Its architecture allows for dynamic scaling, maintaining communication efficiency without compromising performance as the network grows. This makes Chaski-Confluent an ideal choice for resilient and scalable distributed applications that need to adapt to changing conditions and workloads.
One of the standout features of Chaski-Confluent is its support for both TCP and UDP protocols. This dual-protocol capability ensures that developers can choose the most appropriate method for their specific use cases. Additionally, the sophisticated node discovery mechanism and intelligent subscription-based message routing enable the creation of dynamic network topologies where nodes can communicate effortlessly. These features, along with effective latency management and remote method invocation, position Chaski-Confluent as a powerful tool for developing modern distributed systems.
The Chaski-Confluent framework provides various powerful features that make it suitable for managing distributed systems. Here are some of the key features:
TCP and UDP Communication: Chaski Confluent supports both TCP and UDP protocols, allowing for reliable and timely message delivery between nodes. The framework ensures efficient data transfer irrespective of the underlying network conditions.
Node Discovery and Pairing: Automatic discovery of nodes based on shared subscription topics is a crucial feature. Chaski Confluent facilitates the pairing of nodes with common interests, making it easy to build dynamic and scalable network topologies.
Ping and Latency Management: The framework includes built-in mechanisms for measuring latency between nodes through ping operations. This helps in maintaining healthy connections and ensures that communication within the network is optimal.
Subscription Management: Nodes can subscribe to specific topics, and messages are routed efficiently based on these subscriptions. This allows for effective communication and data exchange only with relevant nodes.
Keep-alive and Disconnection Handling: Chaski Confluent ensures that connections between nodes remain active by implementing keep-alive checks. If a connection is lost, the framework handles reconnection attempts gracefully to maintain network integrity.
Remote Method Invocation: The Chaski Remote class enables remote method invocation and interaction across distributed nodes. Nodes can communicate transparently, invoking methods and accessing attributes on remote objects as if they were local.
Security: Implement robust security measures to protect data and ensure safe communication between the nodes. Features like encryption and authentication are essential to safeguarding the integrity of the network. For example, you can set up a Certificate Authority (CA) within your network to manage SSL certificates and ensure encrypted communication.
Flexible Configuration: The framework offers a flexible configuration system, allowing users to customize various parameters such as timeouts, retry intervals, and buffer sizes. This adaptability helps in optimizing the performance according to specific requirements.
Logging and Monitoring: Comprehensive logging and monitoring capabilities are integrated into the framework, providing real-time insights into the network activity and performance metrics. This aids in troubleshooting and maintaining the health of the system.
File Streaming and Transfer:
ChaskiStreamer includes helpers to send files in chunks through the network using push_file and to accept incoming files when allow_incoming_files is enabled. This facilitates distributing large payloads without blocking the event loop.
Persistent Storage:
Nodes can store key/value pairs using an SQLite-backed PersistentStorage. Data can be requested or served to peers with the ChaskiStorageRequest message type.
Synchronous Interface:
ChaskiStreamerSync wraps the asynchronous streamer in a dedicated thread, offering a blocking API that integrates with traditional synchronous code bases.
Celery Integration:
The package ships with a custom Kombu transport (chaski.utils.transport) so Chaski can act as a Celery broker or backend.
Message Pool with TTL: Each node keeps a bounded pool of recently processed messages. This avoids processing duplicates and provides a configurable time-to-live for cached entries.
The Chaski_ Node is an essential component of the Chaski-Confluent system. It is responsible for initiating and managing network communication between distributed nodes. This class handles functions such as connection establishment, message passing, node discovery, and pairing based on shared subscriptions. Nodes keep track of their connections as "edges" where latency information and subscription data are stored. Each node can propagate received messages to its peers and caches recent messages in a bounded pool to avoid processing
The Chaski-Streamer extends the functionality of Chaski-Node by introducing asynchronous message streaming capabilities.
It sets up an internal message queue to manage incoming messages, allowing efficient and scalable message processing within a distributed environment.
The ChaskiStreamer can enter an asynchronous context, enabling the user to stream messages using the async with statement.
This allows for handling messages dynamically as they arrive, enhancing the
responsiveness and flexibility of the system. The streamer also supports
chunked file transfer via push_file and can store temporary results in a
PersistentStorage database. When synchronous behaviour is required,
ChaskiStreamerSync exposes the same API from a background thread.
The Chaski-Remote class enhances the Chaski-Node functionality by enabling remote method invocation and interaction across distributed nodes. It equips nodes with the ability to communicate transparently, invoking methods and accessing attributes on remote objects as if they were local. This is achieved by utilizing the Proxy class, which wraps around the remote objects and provides a clean interface for method calls and attribute access. The remote node verifies module availability through a lightweight UDP check before the proxy is returned, ensuring that requested services are reachable.
The core functionalities of Chaski-Confluent revolve around efficient and scalable
communication mechanisms integral to modern distributed systems. Central to its
architecture is the use of the Python asyncio library, which facilitates asynchronous
programming to manage concurrent connections without the overhead of traditional
threading models. This allows for high-performance message handling and real-time
node interactions, optimizing the framework for low-latency and responsive communication.
In implementing Chaski-Confluent, leveraging asyncio ensures that tasks such as
node discovery, subscription management, and remote method invocation are carried
out efficiently. Asynchronous programming enables the framework to handle multiple
network operations simultaneously, maintaining high throughput and scalability even
under heavy network loads. The integration of asyncio thus provides a robust
foundation for building dynamic and resilient distributed systems, ensuring seamless
and efficient data exchange across nodes.
Certification Authority (CA) is crucial for securing communications within the Chaski-Confluent framework. By acting as a trust anchor, CA issues and manages digital certificates, ensuring that nodes in the network can verify each other's identities. This mechanism helps to maintain the integrity and confidentiality of the data being exchanged.
The CA in Chaski-Confluent can generate, sign, and distribute SSL certificates, providing a robust security layer. This ensures that all communication between nodes is encrypted and authenticated, significantly reducing the risk of data breaches or unauthorized access.
Chaski includes a custom Kombu transport so it can be used as a Celery broker.
The ChaskiChannel class relies on ChaskiStreamerSync to publish and consume
tasks through the topic celery_tasks. Several command line scripts under
chaski/scripts/ make it easy to start a streamer root, a remote proxy or a
certificate authority from the shell.