diff --git a/docs/data/how-to-guides/manage-transactional-queue/only_show_transactions.png b/docs/data/how-to-guides/manage-transactional-queue/only_show_transactions.png deleted file mode 100644 index e4c7f74ef02..00000000000 Binary files a/docs/data/how-to-guides/manage-transactional-queue/only_show_transactions.png and /dev/null differ diff --git a/docs/data/how-to-guides/manage-transactional-queue/show_transactions.png b/docs/data/how-to-guides/manage-transactional-queue/show_transactions.png deleted file mode 100644 index 79998f37fb5..00000000000 Binary files a/docs/data/how-to-guides/manage-transactional-queue/show_transactions.png and /dev/null differ diff --git a/docs2/advanced-algorithms/advanced-algorithms.md b/docs2/advanced-algorithms/advanced-algorithms.md new file mode 100644 index 00000000000..471f0b373d8 --- /dev/null +++ b/docs2/advanced-algorithms/advanced-algorithms.md @@ -0,0 +1,80 @@ +# Advanced algorithms + +import MageSpells from '../mage/templates/_mage_spells.mdx'; + +**Memgraph Advanced Graph Extensions**, **MAGE** to friends, is an [**open-source +repository**](https://github.com/memgraph/mage) that contains **graph algorithms** and **modules** written by the +team behind Memgraph and its users in the form of **query modules**. The project +aims to give everyone the tools they need to tackle the most interesting and +challenging **graph analytics** problems. + +[**Query +module**](https://memgraph.com/docs/memgraph/database-functionalities/query-modules/built-in-query-modules) +is a concept introduced by Memgraph and it refers to user-defined procedures, +grouped into modules that extend the **Cypher query language**. Procedures are +implementations of various algorithms in multiple programming languages and they +are all runnable inside Memgraph. + +## Quick start + +Start utilizing the power of MAGE with these simple steps. + +### 1. Install MAGE + +If you are using Memgraph Platform and starting Memgraph with the +`memgraph-platform` image, MAGE is already included and you can proceed to +step 2 or 3. + +Install MAGE using a prepared image from the [Docker +Hub](/installation/docker-hub.md) or by [building a Docker +image](/installation/docker-build.md) from the [official MAGE GitHub +repository](https://github.com/memgraph/mage). On Linux, you can also [install +MAGE from source](/installation/source.md) but be aware you will also need to +install additional +dependencies. + +### 2. Load query modules + +To use certain procedures, first, you need to [load the query modules](/usage/loading-modules.md) to the +appropriate directory. + +### 3. Call procedures + +You are ready to [call procedures](/usage/calling-procedures.md) within queries and tackle that graph analytics +problem that's been keeping you awake. + +## What to do next? + +### Browse the spellbook of query modules + +The spellbook has been written to help you utilize all the [currently +available query modules](/mage/query-modules/available-queries). + +
+ Spellbook πŸ“– + + +
+ +### Create query modules + +If you need some assistance in creating and running your own Python and C++ +query modules [How-to guides](/how-to-guides/create-a-new-module-cpp.md) are here for you. + +### Learn about algorithms and their usage + +There are so many +[algorithms](/algorithms/traditional-graph-analytics/betweenness-centrality-algorithm.md) +to benefit from. Browse through them and see how they can be applied in [use +cases](/use-cases/bioinformatics.md) from various fields, such as bioinformatics or +transportation. + +### Contribute + +Make MAGE even better by [contributing](/contributing.md) your own algorithm implementations and ideas or reporting pesky bugs. + +### Browse through the Changelog + +Want to know what's new in MAGE? Take a look at [Changelog](/changelog.md) +to see a list of new features. + diff --git a/docs2/advanced-algorithms/available-algorithms/available-algorithms.md b/docs2/advanced-algorithms/available-algorithms/available-algorithms.md new file mode 100644 index 00000000000..e526d77740a --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/available-algorithms.md @@ -0,0 +1,5 @@ +# Algorithms avaliable in the MAGE library + +import MageSpells from '../../mage/templates/_mage_spells.mdx'; + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/betweenness_centrality.md b/docs2/advanced-algorithms/available-algorithms/betweenness_centrality.md new file mode 100644 index 00000000000..a8f554c3021 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/betweenness_centrality.md @@ -0,0 +1,151 @@ +--- +id: betweenness-centrality +title: betweenness_centrality +sidebar_label: betweenness_centrality +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +Centrality analysis provides information about the node’s importance for an +information flow or connectivity of the network. Betweenness centrality is one +of the most used centrality metrics. Betweenness centrality measures the extent +to which a node lies on paths between other nodes in the graph. Thus, nodes with +high betweenness may have considerable influence within a network under their +control over information passing between others. The calculation of betweenness +centrality is not standardized, and there are many ways to solve it. It is +defined as the number of shortest paths in the graph that passes through the +node divided by the total number of shortest paths. The implemented algorithm is +described in the paper "[A Faster Algorithm for Betweenness +Centrality](http://www.uvm.edu/pdodds/research/papers/others/2001/brandes2001a.pdf)" +[^1]. + +[^1] [A Faster Algorithm for Betweenness +Centrality](http://www.uvm.edu/pdodds/research/papers/others/2001/brandes2001a.pdf), +Ulrik Brandes + +[![docs-source](https://img.shields.io/badge/source-betweenness_centrality-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/betweenness_centrality_module/betweenness_centrality_module.cpp) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed**/**undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **parallel** | + +## Procedures + + + +### `get(directed, normalized, threads)` + +#### Input: + +- `directed: boolean (default=True)` ➑ If `False` the direction of the edges is ignored. +- `normalized: boolean (default=True)` ➑ If `True` the betweenness values are normalized by + `2/((n-1)(n-2))` for graphs, and `1/((n-1)(n-2))` for directed graphs where + `n` is the number of nodes. +- `threads: integer (default=number of concurrent threads supported by the + implementation)` ➑ The number of threads used to calculate betweenness + centrality. + +#### Output: + +- `betweenness_centrality: float` ➑ Value of betweenness for a given node. + +- `node: Vertex` ➑ Graph vertex for betweenness calculation. + +#### Usage: + +```cypher +CALL betweenness_centrality.get() +YIELD node, betweenness_centrality; +``` + +## Example + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 9}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 11}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 11}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL betweenness_centrality.get(TRUE,TRUE) +YIELD node, betweenness_centrality +RETURN node, betweenness_centrality; +``` + + + + + +```plaintext ++-------------------------+-------------------------+ +| node | betweenness_centrality | ++-------------------------+-------------------------+ +| (:Node {id: 0}) | 0 | +| (:Node {id: 1}) | 0.109091 | +| (:Node {id: 2}) | 0.0272727 | +| (:Node {id: 3}) | 0 | +| (:Node {id: 4}) | 0.0454545 | +| (:Node {id: 5}) | 0.2 | +| (:Node {id: 6}) | 0.0636364 | +| (:Node {id: 7}) | 0 | +| (:Node {id: 8}) | 0.0181818 | +| (:Node {id: 9}) | 0.0909091 | +| (:Node {id: 10}) | 0 | +| (:Node {id: 11}) | 0.0181818 | ++-------------------------+-------------------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/betweenness_centrality_online.md b/docs2/advanced-algorithms/available-algorithms/betweenness_centrality_online.md new file mode 100644 index 00000000000..6f0252ea724 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/betweenness_centrality_online.md @@ -0,0 +1,226 @@ +--- +id: betweenness-centrality-online +title: betweenness_centrality_online +sidebar_label: betweenness_centrality_online +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + + +export const Highlight = ({children, color}) => ( + +{children} + +); + +Betweenness centrality is among the most common metrics in graph analytics owing +to its utility in identifying critical vertices of graphs. It is one of the +tools in _centrality analysis_, a set of techniques for measuring the importance +of nodes in networks. + +The notion of [Betweenness +centrality](https://en.wikipedia.org/wiki/Betweenness_centrality) is based on +shortest paths: the shortest path between two nodes is the one consisting of the +fewest edges, or in case of weighted graphs, the one with the smallest total +edge weight. A node’s betweenness centrality is defined as the share of all +shortest paths in the graph that run through it. + +This query module delivers a _fully dynamic_ betweenness centrality computation +tool using the +[iCentral](https://repository.kaust.edu.sa/bitstream/handle/10754/625935/08070346.pdf) +[^1] algorithm by Jamour, Skiadopoulos and Kalnis. iCentral saves up on +computation in two ways: it singles out the nodes whose centrality scores could +have changed and then incrementally updates their scores, making use of +previously calculated data structures where applicable. + +This drives down the algorithm’s time complexity to _O_(_mβ€²nβ€²_) and space +complexity to _O_(_m_ + _n_), where _m_ and _n_ are the counts of edges and +vertices in the graph, _mβ€²_ is the number of edges in the biconnected component +affected by the graph update, and _nβ€²_ is the size of a subset of the nodes in +the biconnected component. Consequently, the algorithm is suitable for mid-scale +graphs. + +Dynamic algorithms such as iCentral are especially suited for graph streaming +solutions such as Memgraph. As updates arrive in a stream, the algorithm avoids +redundant work by processing only the portion of the graph affected by the +update. + +[^1] [Parallel Algorithm for Incremental Betweenness Centrality on Large +Graphs](https://repository.kaust.edu.sa/bitstream/handle/10754/625935/08070346.pdf) +(Jamour et al., 2017) + +[![docs-source](https://img.shields.io/badge/source-betweenness_centrality_online-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/betweenness_centrality_module/betweenness_centrality_online_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **parallel** | + +## Procedures + + + +### `set(normalize, threads)` + +#### Input: + +- `normalize: boolean (default=True)` ➑ If `True`, the betweenness values are normalized by + `2/((n-1)(n-2))`, where `n` is the number of nodes in the graph. +- `threads: integer (default=NΒΊ of concurrent threads supported by the implementation)` ➑ The + number of threads used in calculating betweenness centrality. + +#### Output: + +- `node: Vertex` ➑ Graph vertex. +- `betweenness_centrality: float` ➑ Betweenness centrality score of the above + vertex. + +#### Usage: + +```cypher +CALL betweenness_centrality_online.set() +YIELD node, betweenness_centrality; +``` + +### `get(normalize)` + +#### Input: + +- `normalize: boolean (default=True)` ➑ If `True`, the betweenness values are normalized by + `2/((n-1)(n-2))`, where `n` is the number of nodes in the graph. + +#### Output: + +- `node: Vertex` ➑ Graph vertex. +- `betweenness_centrality: float` ➑ Betweenness centrality score of the above + vertex. + +#### Usage: + +```cypher +CALL betweenness_centrality_online.get() +YIELD node, betweenness_centrality; +``` + +### `update(created_vertices, created_edges, deleted_vertices, deleted_edges, normalize, threads)` + +- `created_vertices: List[Vertex]` ➑ Vertices created in the latest graph + update. +- `created_edges: List[Edge]` ➑ Edges created in the latest graph update. +- `updated_vertices: List[Vertex]` ➑ Vertices updated in the latest graph + update. +- `updated_edges: List[Edge]` ➑ Edges updated in the latest graph update. +- `deleted_vertices: List[Vertex]` ➑ Vertices deleted in the latest graph + update. +- `deleted_edges: List[Edge]` ➑ Edges deleted in the latest graph update. +- `normalize: boolean (default=True)` ➑ If `True`, the betweenness values are normalized by + `2/((n-1)(n-2))`, where `n` is the number of nodes in the graph. +- `threads: integer (default=NΒΊ of concurrent threads supported by the implementation)` ➑ The + number of threads used in updating betweenness centrality. + +#### Output: + +- `node: Vertex` ➑ Graph vertex. +- `betweenness_centrality: float` ➑ Betweenness centrality score of the above + vertex. + +#### Usage: + +As there is a total of four complex obligatory parameters, setting the +parameters by hand might be cumbersome. The recommended use of this method is to +call it within a [trigger](/memgraph/reference-guide/triggers), making sure +beforehand that all [predefined +variables](/memgraph/reference-guide/triggers/#predefined-variables) are +available: + +```cypher +CREATE TRIGGER sample_trigger BEFORE COMMIT +EXECUTE CALL betweenness_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges, normalize, threads) YIELD *; +``` + +Communities calculated by `update()` are accessible by subsequently calling +`get()`: + +```cypher +CALL betweenness_centrality_online.get() +YIELD node, betweenness_centrality; +``` + +## Example + + + + + + + + + +```cypher +CREATE TRIGGER update_bc_trigger +BEFORE COMMIT EXECUTE + CALL betweenness_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) + YIELD *; +``` + + + + +```cypher +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +CALL betweenness_centrality_online.get(True) +YIELD node, betweenness_centrality +RETURN node.id AS node_id, betweenness_centrality +ORDER BY node_id; +``` + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ node_id β”‚ betweenness_centrality β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ 0 β”‚ 0 β”‚ +β”‚ 1 β”‚ 0 β”‚ +β”‚ 2 β”‚ 0.6 β”‚ +β”‚ 3 β”‚ 0.6 β”‚ +β”‚ 4 β”‚ 0 β”‚ +β”‚ 5 β”‚ 0 β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/biconnected_components.md b/docs2/advanced-algorithms/available-algorithms/biconnected_components.md new file mode 100644 index 00000000000..6fb983a3133 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/biconnected_components.md @@ -0,0 +1,136 @@ +--- +id: biconnected-components +title: biconnected_components +sidebar_label: biconnected_components +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +Finding biconnected components means finding a maximal biconnected subgraph. Subgraph is biconnected if: + +- It is possible to go from each node to another within a biconnected subgraph +- First scenario remains true even after removing any vertex in the subgraph + +The algorithm works by finding articulation points, and then traversing from these articulation points toward other nodes, which all together form one biconnected component. + +[![docs-source](https://img.shields.io/badge/source-biconnected_components-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/biconnected_components_module/biconnected_components_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get()` + +#### Output: + +* `bcc_id` ➑ Biconnected component identifier. There is no order of nodes within one biconnected component. +* `node_from` ➑ First node of an edge contained in biconnected component. +* `node_to` ➑ Second node of an edge contained in biconnected component + +#### Usage: +```cypher +CALL biconnected_components.get() +YIELD bcc_id, node_from, node_to; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 7}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 11}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL biconnected_components.get() +YIELD bcc_id, node_from, node_to +WITH bcc_id, node_from, node_to +MATCH (node_from)-[edge]-(node_to) +RETURN bcc_id, edge, node_from, node_to; +``` + + + + + + +```plaintext ++------------------+------------------+------------------+------------------+ +| bcc_id | edge | node_from | node_to | ++------------------+------------------+------------------+------------------+ +| 0 | [:RELATION] | (:Node {id: 2}) | (:Node {id: 4}) | +| 0 | [:RELATION] | (:Node {id: 3}) | (:Node {id: 4}) | +| 0 | [:RELATION] | (:Node {id: 1}) | (:Node {id: 3}) | +| 0 | [:RELATION] | (:Node {id: 2}) | (:Node {id: 3}) | +| 0 | [:RELATION] | (:Node {id: 1}) | (:Node {id: 2}) | +| 1 | [:RELATION] | (:Node {id: 8}) | (:Node {id: 9}) | +| 2 | [:RELATION] | (:Node {id: 5}) | (:Node {id: 8}) | +| 2 | [:RELATION] | (:Node {id: 7}) | (:Node {id: 8}) | +| 2 | [:RELATION] | (:Node {id: 5}) | (:Node {id: 7}) | +| 3 | [:RELATION] | (:Node {id: 0}) | (:Node {id: 6}) | +| 3 | [:RELATION] | (:Node {id: 5}) | (:Node {id: 6}) | +| 3 | [:RELATION] | (:Node {id: 1}) | (:Node {id: 5}) | +| 3 | [:RELATION] | (:Node {id: 0}) | (:Node {id: 1}) | +| 4 | [:RELATION] | (:Node {id: 10}) | (:Node {id: 11}) | ++------------------+------------------+------------------+------------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/bipartite_matching.md b/docs2/advanced-algorithms/available-algorithms/bipartite_matching.md new file mode 100644 index 00000000000..555ea0ce43d --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/bipartite_matching.md @@ -0,0 +1,106 @@ +--- +id: bipartite-matching +title: bipartite_matching +sidebar_label: bipartite_matching +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +A bipartite graph is a graph in which we can divide vertices into two independent sets, such that every edge connects vertices between these sets. No connection can be established within the set. Matching in bipartite graphs (bipartite matching) is described as a set of edges that are picked in a way to not share an endpoint. Furthermore, maximum matching is such matching of maximum cardinality of the chosen edge set. The algorithm runs in O(|V|*|E|) time where V represents a set of nodes and E represents a set of edges. + +[![docs-source](https://img.shields.io/badge/source-bipartite_matching-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/bipartite_matching_module/bipartite_matching_module.cpp) + + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `max()` + +#### Output: + +* `maximum_bipartite_matching` ➑ Maximum bipartite matching, the cardinality of maximum matching edge subset. If graph is not bipartite, zero(0) is returned value. + +#### Usage: +```cypher +CALL bipartite_matching.max() +YIELD maximum_bipartite_matching; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL bipartite_matching.max() +YIELD maximum_bipartite_matching +RETURN maximum_bipartite_matching; +``` + + + + + + +```plaintext ++----------------------------+ +| maximum_bipartite_matching | ++----------------------------+ +| 3 | ++----------------------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/bridges.md b/docs2/advanced-algorithms/available-algorithms/bridges.md new file mode 100644 index 00000000000..299b0ad709f --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/bridges.md @@ -0,0 +1,108 @@ +--- +id: bridges +title: bridges +sidebar_label: bridges +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +A bridge in the graph can be described as an edge which if deleted, creates two disjoint graph components. This algorithm finds bridges within the graph. This algorithm has various practical usages such can be road or internet network design planning. A bridge can represent a bottleneck for many scenarios and it is valuable to have such an algorithm to detect it. + +[![docs-source](https://img.shields.io/badge/source-bridges-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/bridges_module/bridges_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get()` + +#### Output: + +* `node_from` ➑ Represents the first node in bridge edge +* `node_to` ➑ Represents the second node in bridge edge + +#### Usage: +```cypher +CALL bridges.get() +YIELD node_from, node_to; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL bridges.get() YIELD node_from, node_to +WITH node_from, node_to +MATCH (node_from)-[bridge]-(node_to) +RETURN bridge, node_from, node_to; +``` + + + + + + +```plaintext ++-----------------+-----------------+-----------------+ +| bridge | node_from | node_to | ++-----------------+-----------------+-----------------+ +| [:RELATION] | (:Node {id: 3}) | (:Node {id: 4}) | +| [:RELATION] | (:Node {id: 0}) | (:Node {id: 3}) | ++-----------------+-----------------+-----------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/community_detection.md b/docs2/advanced-algorithms/available-algorithms/community_detection.md new file mode 100644 index 00000000000..3407b59b802 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/community_detection.md @@ -0,0 +1,176 @@ +--- +id: community-detection +title: community_detection +sidebar_label: community_detection +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +This query module enables using the [Louvain method](https://en.wikipedia.org/wiki/Louvain_method)[1] for community +detection, using the [Grappolo](https://github.com/Exa-Graph/grappolo) parallel implementation. + +The Louvain algorithm belongs to the *modularity maximization* family of community +detection algorithms. Each node is initially assigned to its own community, and the +algorithm uses a *greedy heuristic* to search for the community partition with +the highest modularity score by merging previously obtained communities. + +The algorithm is suitable for large-scale graphs as it runs in *O*(*n*log*n*) time +on a graph with *n* nodes. Further performance gains are obtained by parallelization using +a distance-1 graph coloring heuristic, and a graph coarsening algorithm that aims to preserve communities. + +[^1] [Fast unfolding of communities in large networks](https://arxiv.org/abs/0803.0476), +Blondel et al. + +[![docs-source](https://img.shields.io/badge/source-community_detection-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/community_detection_module/community_detection_module.cpp) + +| Trait | Value | +| ------------------------ | ----------------------------------------------------------------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Relationship weights** | **weighted** / **unweighted** | +| **Parallelism** | **parallel** | + +## Procedures + + + +### `get(weight, coloring, min_graph_shrink, community_alg_threshold, coloring_alg_threshold)` + +Computes graph communities using the Louvain method. + +#### Input + +* `weight: string (default=null)` ➑ Specifies the default edge weight. If not set, + the algorithm uses the `weight` edge attribute when present and otherwise + treats the graph as unweighted. +* `coloring: boolean (default=False)` ➑ If set, use the graph coloring heuristic for effective parallelization. +* `min_graph_shrink: integer (default=100000)` ➑ The graph coarsening optimization stops upon shrinking the graph to this many nodes. +* `community_alg_threshold: double (default=0.000001)` ➑ Controls how long the algorithm iterates. When the gain in modularity + goes below the threshold, iteration is over. + Valid values are between 0 and 1 (exclusive). +* `coloring_alg_threshold: double (default=0.01)` ➑ If coloring is enabled, controls how long the algorithm iterates. When the + gain in modularity goes below this threshold, a final iteration is performed using the + `community_alg_threshold` value. + Valid values are between 0 and 1 (exclusive); this parameter's value should be higher than `community_alg_threshold`. + +#### Output + +* `node: Vertex` ➑ Graph node. +* `community_id: integer` ➑ Community ID. Defaults to ***-1*** if the node does not belong to any community. + +#### Usage + +```cypher +CALL community_detection.get() +YIELD node, community_id; +``` + +### `get_subgraph(subgraph_nodes, subgraph_relationships, weight, coloring, min_graph_shrink, community_alg_threshold, coloring_alg_threshold)` + +Computes graph communities over a subgraph using the Louvain method. + +#### Input + +* `subgraph_nodes: List[Node]` ➑ List of nodes in the subgraph. +* `subgraph_relationships: List[Relationship]` ➑ List of relationships in the subgraph. +* `weight: str (default=null)` ➑ Specifies the default relationship weight. If not set, + the algorithm uses the `weight` relationship attribute when present and otherwise + treats the graph as unweighted. +* `coloring: bool (default=False)` ➑ If set, use the graph coloring heuristic for effective parallelization. +* `min_graph_shrink: int (default=100000)` ➑ The graph coarsening optimization stops upon shrinking the graph to this many nodes. +* `community_alg_threshold: double (default=0.000001)` ➑ Controls how long the algorithm iterates. When the gain in modularity + goes below the threshold, iteration is over. + Valid values are between 0 and 1 (exclusive). +* `coloring_alg_threshold: double (default=0.01)` ➑ If coloring is enabled, controls how long the algorithm iterates. When the + gain in modularity goes below this threshold, a final iteration is performed using the + `community_alg_threshold` value. + Valid values are between 0 and 1 (exclusive); this parameter's value should be higher than `community_alg_threshold`. + +#### Output + +* `node: Vertex` ➑ Graph node. +* `community_id: int` ➑ Community ID. Defaults to ***-1*** if the node does not belong to any community. + +#### Usage + +```cypher +MATCH (a)-[e]-(b) +WITH COLLECT(a) AS nodes, COLLECT (e) AS relationships +CALL community_detection.get_subgraph(nodes, relationships) +YIELD node, community_id; +``` + +## Example + + + + + + + + + + +```cypher +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b); +``` + + + + + +```cypher +CALL community_detection.get() +YIELD node, community_id +RETURN node.id AS node_id, community_id +ORDER BY node_id; +``` + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ node_id β”‚ community_id β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ 0 β”‚ 1 β”‚ +β”‚ 1 β”‚ 1 β”‚ +β”‚ 2 β”‚ 1 β”‚ +β”‚ 3 β”‚ 2 β”‚ +β”‚ 4 β”‚ 2 β”‚ +β”‚ 5 β”‚ 2 β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/community_detection_online.md b/docs2/advanced-algorithms/available-algorithms/community_detection_online.md new file mode 100644 index 00000000000..704d18215ed --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/community_detection_online.md @@ -0,0 +1,255 @@ +--- +id: community-detection-online +title: community_detection_online +sidebar_label: community_detection_online +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +This query module implements the [LabelRankT](https://arxiv.org/abs/1305.2006) +dynamic community detection algorithm. + +LabelRankT belongs to the *label propagation* family of community detection +algorithms and thus rests upon the idea that individual nodes learn from their +neighbors what community they belong to. + +Being *dynamic* and *efficient*, the algorithm is suitable for large-scale +graphs. It runs in *O(m)* time and guarantees *O(mn)* space complexity, where +*m* and *n* are the counts of vertices and edges in the graph, respectively. + +Dynamic algorithms such as LabelRankT are especially suited for graph streaming +solutions such as Memgraph. As updates arrive in a stream, it avoids redundant +work by only processing the portion of the graph modified by the update. + +Furthermore, the algorithm improves upon earlier label propagation methods by +being deterministic; its results are replicable. Taking into account edge weight +and directedness generally yields better community quality than similar methods, +and it extends LabelRankT’s compatibility to a wider set of graphs. + +[^1] [LabelRankT: Incremental Community Detection in Dynamic Networks via Label +Propagation](https://arxiv.org/abs/1305.2006), Xie, Jierui et al. + +[![docs-source](https://img.shields.io/badge/source-community_detection_online-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/community_detection_module/community_detection_online_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------------------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** / **undirected** | +| **Edge weights** | **weighted** / **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `set(directed, weighted, similarity_threshold, exponent, min_value, weight_property, w_selfloop, max_iterations, max_updates)` + +Performs dynamic community detection using the LabelRankT algorithm. + +The default values of the `similarity_threshold`, `exponent` and `min_value` +parameters are not universally applicable, and the actual values should be +determined experimentally. This is especially pertinent to setting the +`min_value` parameter. For example, with the default ***1/10*** value, vertices +of degree greater than 10 are at risk of not being assigned to any community and +the user should check if that is indeed the case. + +#### Input: + +* `directed: boolean (default=False)` ➑ Specifies whether the graph is directed. If not set, + the graph is treated as undirected. +* `weighted: boolean (default=False)` ➑ Specifies whether the graph is weighted. If not set, + the graph is considered unweighted. +* `similarity_threshold: double (default=0.7)` ➑ Maximum similarity between node’s and + its neighbors’ communities for the node to be updated in the ongoing + iteration. +* `exponent: double (default=4)` ➑ Power which community probability vectors are raised + elementwise to. +* `min_value: double (default=0.1)` ➑ Smallest community probability that is not pruned + between iterations. +* `weight_property: string (default="weight")` For directed graphs, the values at the given + edge property are used as weights in the community detection algorithm. +* `w_selfloop: double (default=1)` ➑ Each vertex has a self-loop added to smooth the + label propagation. This parameter specifies the weight assigned to the + self-loops. If the graph is unweighted, this value is ignored. + + +* `max_iterations: integer (default=100)` ➑ Maximum number of iterations to run. +* `max_updates: integer (default=5)` ➑ Maximum number of updates to any node’s community + probabilities. + +#### Output: + +* `node: Vertex` ➑ Graph node. +* `community_id: integer` ➑ Community ID. If the node is not associated with any + community, defaults to ***-1***. + +#### Usage: + +```cypher +CALL community_detection_online.set(False, False, 0.7, 4.0, 0.1, "weight", 1, 100, 5) +YIELD node, community_id; +``` + +### `get()` + +Returns the latest previously calculated community detection results. If there +are none, defaults to calling `set()` with default parameters. + +#### Output: + +* `node: Vertex` ➑ Graph node. +* `community_id: integer` ➑ Community ID. Defaults to ***-1*** if the node does not belong to any community. + +#### Usage: + +```cypher +CALL community_detection_online.get() +YIELD node, community_id; +``` + +### `update(createdVertices, createdEdges, updatedVertices, updatedEdges, deletedVertices, deletedEdges)` + +Dynamically updates previously calculated community detection results based on +changes applied in the latest graph update and returns the results. + +#### Input: + +* `createdVertices: mgp.List[mgp.Vertex]` ➑ Vertices created in the latest graph + update. +* `createdEdges: mgp.List[mgp.Edge]` ➑ Edges created in the latest graph update. +* `updatedVertices: mgp.List[mgp.Vertex]` ➑ Vertices updated in the latest graph + update. +* `updatedEdges: mgp.List[mgp.Edge]` ➑ Edges updated in the latest graph update. +* `deletedVertices: mgp.List[mgp.Vertex]` ➑ Vertices deleted in the latest graph + update. +* `deletedEdges: mgp.List[mgp.Edge]` ➑ Edges deleted in the latest graph update. + +#### Output: + +* `node: Vertex` ➑ Graph node. +* `community_id: integer` ➑ Community ID. If the node is not associated with any + community, defaults to ***-1***. + +#### Usage: + +As there are a total of six complex obligatory parameters, setting the +parameters by hand might be cumbersome. The recommended use of this method is to +call it within a +[trigger](https://memgraph.com/docs/memgraph/database-functionalities/triggers), +making sure beforehand that all [predefined +variables](https://memgraph.com/docs/memgraph/database-functionalities/triggers/#predefined-variables) +are available: + +```cypher +CREATE TRIGGER sample_trigger BEFORE COMMIT +EXECUTE CALL community_detection_online.update(createdVertices, createdEdges, updatedVertices, updatedEdges, deletedVertices, deletedEdges) YIELD node, community_id; +``` + +Communities calculated by `update()` are also accessible by subsequently calling +`get()`: + +```cypher +CREATE TRIGGER sample_trigger BEFORE COMMIT +EXECUTE CALL community_detection_online.update(createdVertices, createdEdges, updatedVertices, updatedEdges, deletedVertices, deletedEdges) YIELD *; + +CALL community_detection_online.get() +YIELD node, community_id +RETURN node.id AS node_id, community_id +ORDER BY node_id; +``` + +### `reset()` + +Resets the algorithm to its initial state. + +#### Output: + +* `message: string` ➑ Reports whether the algorithm was successfully reset. + +#### Usage: + +```cypher +CALL community_detection_online.reset() YIELD message; +``` + +## Example + + + + + + + + + +```cypher +CREATE TRIGGER community_detection_online_trigger BEFORE COMMIT +EXECUTE CALL community_detection_online.update(createdVertices, createdEdges, updatedVertices, updatedEdges, deletedVertices, deletedEdges) YIELD node, community_id +SET node.community_id = community_id; +``` + + + +```cypher +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 1}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 0}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 1}) MERGE (b: Node {id: 2}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 2}) MERGE (b: Node {id: 3}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 4}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 3}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b); +MERGE (a: Node {id: 4}) MERGE (b: Node {id: 5}) CREATE (a)-[r: Relation]->(b); +``` + + + + +```cypher +CALL community_detection_online.get() +YIELD node, community_id +RETURN node.id AS node_id, community_id +ORDER BY node_id; +``` + + + + +```plaintext ++-------------------------+-------------------------+ +| node_id | community_id | ++-------------------------+-------------------------+ +| 0 | 1 | +| 1 | 1 | +| 2 | 1 | +| 3 | 2 | +| 4 | 2 | +| 5 | 2 | ++-------------------------+-------------------------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/conditional_execution.md b/docs2/advanced-algorithms/available-algorithms/conditional_execution.md new file mode 100644 index 00000000000..3e828f20de9 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/conditional_execution.md @@ -0,0 +1,189 @@ +--- +id: conditional_execution +title: conditional_execution +sidebar_label: conditional_execution +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +Your queries might require conditional execution logic that can’t be adequately +expressed in Cypher. The `do` module makes it possible to define complex logic +and use it to control query execution. + +[![docs-source](https://img.shields.io/badge/source-conditional_execution-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/do.py) + +| Trait | Value | +| ------------------ | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **C++** | +| **Parallelism** | **sequential** | + +## Procedures + +:::info + +Using the following procedures to run queries that execute global operations is +currently not supported and returns a warning. +The operations in question are: + +* index creation/deletion +* constraint creation/deletion +* changing the isolation level globally +* setting the storage mode + +::: + +### `case(conditionals, else_query, params)` + +Given a list of condition-query pairs, `do.case` executes the query associated +with the first condition evaluating to `true` (or the `else query` if none are +`true`) with the given parameters. + +Parameters are prefixed with `$` like `$param_name`. For examples, see +[here](https://memgraph.com/docs/cypher-manual/other-features#parameters). + +#### Input: + +* `conditionals: List[Any]` ➑ Variable-length list of condition-query pairs + structured as `[condition, query, condition, query, …​]`. Conditions are + `boolean` and queries are `string`. +* `else_query: string (default = "")` ➑ The query to be executed if no + condition evaluates to `true`. +* `params: Map (default = NULL)` ➑ If any of the given queries is parameterized, + provide a `{param_name: param_value}` map to be applied to them. + +#### Output: + +* `value: Map` ➑ Contains the result record of the executed query. Each `value` corresponds to one result record. + +:::caution +Currently, the module supports only those returning records that contain `string`, `integer`, `double` or `boolean` primitives. +Queries in the `conditionals` or in the `else_query` arguments, which get executed and return other types of data, will +result in an exception. +::: + +#### Usage: + +```cypher +MATCH (n) +WITH size(collect(n)) as n_nodes +CALL do.case([n_nodes = 0, + "RETURN 'empty' AS graph_status;", + n_nodes = 1, + "RETURN 'one_node' AS graph_status;"], + "RETURN 'multiple nodes' AS graph_status;") +YIELD value +RETURN value.graph_status AS graph_status; +``` + +### `when(condition, if_query, else_query, params)` + +`do.when` evaluates the given condition and executes the `if query` or the +`else query` depending on whether the condition is satisfied. + +Parameters are prefixed with `$` like `$param_name`. For examples, see +[here](https://memgraph.com/docs/cypher-manual/other-features#parameters). + +#### Input: + +* `condition: boolean` ➑ Determines what query to execute. +* `if_query: string` ➑ The query to be executed if the condition is satisfied. +* `else_query: string (default = "")` ➑ The query to be executed if the + condition isn’t satisfied. +* `params: Map (default = NULL)` ➑ If `if_query` or `else_query` are parameterized, + provide a `{param_name: param_value}` map to be applied. + +#### Output: + +* `value: Map` ➑ Contains the result record of the executed query. Each `value` corresponds to one result record. + +:::caution +Currently, the module supports only those returning records that contain `string`, `integer`, `double` or `boolean` primitives. +Queries in the `conditionals` or in the `else_query` arguments, which get executed and return other types of data, will +result in an exception. +::: + +#### Usage: + +```cypher +MATCH (n) +WITH size(collect(n)) as n_nodes +CALL do.when(n_nodes = 0, + "RETURN 'empty' AS graph_status;", + "RETURN 'not empty' as graph_status;") +YIELD value +RETURN value.graph_status AS graph_status; +``` + +## Example + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (n:Node) +WITH size(collect(n)) as n_nodes +CALL do.when(n_nodes = 0, + "RETURN 'empty' AS graph_status;", + "RETURN 'not empty' as graph_status;") +YIELD value +RETURN value.graph_status AS graph_status; +``` + + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ graph_status β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ not empty β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/cugraph.md b/docs2/advanced-algorithms/available-algorithms/cugraph.md new file mode 100644 index 00000000000..bf37c7b9d19 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/cugraph.md @@ -0,0 +1,395 @@ +--- +id: cugraph +title: cugraph +sidebar_label: cugraph +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**NVIDIA cuGraph** is a graph analytics library that is part of NVIDIA’s +[**RAPIDS**](https://rapids.ai/) open-source data science suite containing +machine learning tools and libraries for various applications in data science; +it can be used from Memgraph on machines that meet the [**system +requirements**](https://rapids.ai/start.html#requirements). + +This set of modules is built on top of NVIDIA cuGraph and provides a set of +wrappers for most of the algorithms present in the +[**cuGraph**](https://github.com/rapidsai/cugraph) repository. + +[![docs-source](https://img.shields.io/badge/source-cugraph-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/cpp/cugraph_module) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **CUDA** | +| **Graph direction** | **undirected**/**directed** | +| **Edge weights** | **unweighted**/**weighted** | +| **Parallelism** | **parallelized** | + +## Modules + +:::info + +The **cugraph** module is a collection of distinct GPU-powered modules with +their own procedures. + +::: + +## `cugraph.balanced_cut_clustering` + +### Procedures + + + +### `get(num_clusters, num_eigenvectors, ev_tolerance, ev_max_iter, kmean_tolerance, kmean_max_iter, weight_property)` + +Find the balanced cut clustering of the graph’s nodes. + +#### Input: + +- `num_clusters: integer` ➑ Number of clusters. +- `num_eigenvectors: integer (default=2)` ➑ Number of eigenvectors to be used (must be less + than or equal to `num_clusters`). +- `ev_tolerance: float (default=0.00001)` ➑ Tolerance used by the eigensolver. +- `ev_max_iter: integer (default=100)` ➑ Maximum number of iterations for the eigensolver. +- `kmean_tolerance: float (default=0.00001)` ➑ Tolerance used by the k-means solver. +- `kmean_max_iter: integer (default=100)` ➑ Maximum number of iterations for the k-means + solver. +- `weight_property: string (default="weight")` ➑ The values of the given relationship. + property are used as weights by the algorithm. If this property is not set for + a relationship, the fallback value is `1.0`. + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `cluster: integer` ➑ Cluster of a node. + +#### Usage: + +```cypher +CALL cugraph.balanced_cut_clustering.get(3) +YIELD node, cluster +RETURN node, cluster; +``` + +## `cugraph.betweenness_centrality` + +### Procedures + +### `get(normalized, directed, weight_property)` + +Find betweenness centrality scores for all nodes in the graph. + +#### Input: + +- `normalized: boolean (default=True)` ➑ Normalize the output. +- `directed: boolean (default=True)` ➑ Graph directedness. (default `True`) +- `weight_property: string (default="weight")` ➑ The values of the given relationship + property are used as weights by the algorithm. If this property is not set for + a relationship, the fallback value is `1.0`. + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `betweenness_centrality: float` ➑ Betweenness centrality score of a node. + +#### Usage: + +```cypher +CALL cugraph.betweenness_centrality.get() +YIELD node, betweenness_centrality +RETURN node, betweenness_centrality; +``` + +## `cugraph.generator` + +### Procedures + +### `rmat(scale, num_edges, node_labels, edge_type, a, b, c, seed, clip_and_flip)` + +Generate a graph using a Recursive MATrix (R-MAT) graph generation algorithm and +load it in Memgraph. + +#### Input: + +- `scale: integer (default=4)` ➑ Scale factor to set the number of vertices in the graph. +- `num_edges: integer (default=100)` ➑ Number of edges in the generated graph. +- `node_labels: mgp.List[string] ([])` ➑ Labels on created vertices. Defaults to + empty list. +- `edge_type: string (default="RELATIONSHIP")` ➑ Edge type, defines the name of the + relationship. +- `a: double (default=0.57)` ➑ First partition probability. +- `b: double (default=0.19)` ➑ Second partition probability. +- `c: double (default=0.19)` ➑ Third partition probability. +- `seed: integer (default=0)` ➑ RNG (random number generator) seed value +- `clip_and_flip: boolean (default=False)` ➑ Controls whether to generate edges only in the + lower triangular part (including the diagonal) of the graph adjacency matrix + (if set to `True`) or not (if set to `False`). + +#### Output: + +The generated graph is loaded into Memgraph. + +- `message: string` ➑ Success message if the graph is loaded. + +#### Usage: + +```cypher +CALL cugraph.generator.rmat() YIELD *; +``` + +## `cugraph.hits` + +### Procedures + +### `get(tolerance, max_iterations, normalized, directed)` + +Find HITS authority and hub values for all nodes in the graph. The HITS +algorithm computes two numbers for each node: its _authority_, which estimates +the value of its content, and its _hub value_, which estimates the value of its +links to other nodes. + +Whereas the HITS algorithm was designed for directed graphs, this implementation +does not check if the input graph is directed and will execute on undirected +graphs. + +#### Input: + +- `tolerance: float (default=1e-5)` ➑ HITS approximation tolerance (custom values not + supported by NVIDIA cuGraph). +- `max_iterations: integer (default=100)` ➑ Maximum number of iterations before returning an + answer (custom values not supported by NVIDIA cuGraph). +- `normalized: boolean (default=True)` ➑ Normalize the output (`False` not supported by + NVIDIA cuGraph). +- `directed: boolean (default=True)` ➑ Graph directedness. (default `True`) + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `hubs: float` ➑ Hub value of a node. +- `authorities: float` ➑ Authority value of a node. + +#### Usage: + +```cypher +CALL cugraph.hits.get() +YIELD node, hubs, authorities +RETURN node, hubs, authorities; +``` + +## `cugraph.katz_centrality` + +### Procedures + +### `get(alpha, beta, epsilon, max_iterations, normalized, directed)` + +Find Katz centrality scores for all nodes in the graph. + +#### Input: + +- `alpha: float (default=None)` ➑ Attenuation factor defining the walk length importance. + If not specified, calculated as `1 / max(out_degree)`. +- `beta: float (default=1.0)` ➑ Weight scalar (currently not supported by NVIDIA + cuGraph). +- `epsilon: float (default=1e-6)` ➑ Set the tolerance for the approximation, this + parameter should be a small magnitude value. +- `max_iterations: integer (default=100)` ➑ Maximum number of iterations before returning an + answer. +- `normalized: boolean (default=True)` ➑ Normalize the output. +- `directed: boolean (default=True)` ➑ Graph directedness. (default `True`) + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `katz_centrality: float` ➑ Katz centrality score of a node. + +#### Usage: + +```cypher +CALL cugraph.katz_centrality.get() +YIELD node, katz_centrality +RETURN node, katz_centrality; +``` + +## `cugraph.leiden` + +### Procedures + +### `get(max_iterations, resolution)` + +Find the partition of the graph into communities using the Leiden method. + +#### Input: + +- `max_iterations: integer (default=100)` ➑ Maximum number of iterations (levels) of the + algorithm. +- `resolution: float (default=1.0)` ➑ Controls community size (lower values lead to + fewer, larger communities and vice versa). + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `partition: integer` ➑ Partition of a node. + +#### Usage: + +```cypher +CALL cugraph.leiden.get() +YIELD node, partition +RETURN node, partition; +``` + +## `cugraph.louvain` + +### Procedures + +### `get(max_iterations, resolution, directed)` + +Find the partition of the graph into communities using the Louvain method. + +#### Input: + +- `max_iterations: integer (default=100)` ➑ Maximum number of iterations (levels) of the + algorithm. +- `resolution: float (default=1.0)` ➑ Controls community size (lower values lead to + fewer, larger communities and vice versa). +- `directed: boolean (default=True)` ➑ Graph directedness. (default `True`) + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `partition: integer` ➑ Partition of a node. + +#### Usage: + +```cypher +CALL cugraph.louvain.get() +YIELD node, partition +RETURN node, partition; +``` + +## `cugraph.pagerank` + +### Procedures + +### `get(max_iterations, damping_factor, stop_epsilon, weight_property)` + +Find PageRank scores for all nodes in the graph. + +#### Input: + +- `max_iterations: integer (default=100)` ➑ The maximum number of iterations before returning + an answer (default `100`). Use it to limit the execution time or do an early + exit before the solver reaches the convergence tolerance. +- `damping_factor: float (default=0.85)` ➑ The damping factor represents the probability + to follow an outgoing edge. +- `stop_epsilon: float (default=1e-5)` ➑ The convergence tolerance for PageRank + approximation. Lowering tolerance improves the approximation, but setting this + parameter too low can ensue in non-convergence due to numerical round-off. + Values between `0.01` and `0.00001` are usually acceptable. +- `weight_property: string (default="weight")` ➑ The values of the given relationship + property are used as weights by the algorithm. If this property is not set for + a relationship, the fallback value is `1.0`. + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `pagerank: float` ➑ PageRank score of a node. + +#### Usage: + +```cypher +CALL cugraph.pagerank.get() +YIELD node, pagerank +RETURN node, pagerank; +``` + +## `cugraph.personalized_pagerank` + +### Procedures + +### `get(personalization_vertices, personalization_values, max_iterations, damping_factor, stop_epsilon, weight_property)` + +Find personalized PageRank scores for all nodes in the graph. + +#### Input: + +- `personalization_vertices: mgp.List[mgp.Vertex]` ➑ Graph nodes with + personalization values. +- `personalization_values: mgp.List[float]` ➑ Above nodes’ personalization + values. +- `weight_property: string (default="weight")` ➑ The values of the given relationship. + property are used as weights by the algorithm. If this property is not set for + a relationship, the fallback value is `1.0`. +- `damping_factor: float (default=0.85)` ➑ The damping factor represents the probability + to follow an outgoing edge. +- `stop_epsilon: float (default=1e-5)` ➑ The convergence tolerance for PageRank + approximation. Lowering tolerance improves the approximation, but setting this + parameter too low can ensue in non-convergence due to numerical round-off. + Values between `0.01` and `0.00001` are usually acceptable. +- `max_iterations: integer (default=100)` ➑ The maximum number of iterations before returning + an answer (default `100`). Use it to limit the execution time or do an early + exit before the solver reaches the convergence tolerance. + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `pagerank: float` ➑ PageRank score of a node. + +#### Usage: + +```cypher +MATCH (n: Node {id: 1}), (m: Node {id: 2}) +CALL cugraph.pagerank.get([n, m], [0.2, 0.5]) +YIELD node, pagerank +RETURN node, pagerank; +``` + +## `cugraph.spectral_clustering` + +### Procedures + +### `get(num_clusters, num_eigenvectors, ev_tolerance, ev_max_iter, kmean_tolerance, kmean_max_iter, weight_property)` + +Find the spectral clustering of the graph’s nodes. + +#### Input: + +- `num_clusters: integer` ➑ Number of clusters. +- `num_eigenvectors: integer (default=2)` ➑ Number of eigenvectors to be used (must be less + than or equal to `num_clusters`). +- `ev_tolerance: float (default=0.00001)` ➑ Tolerance used by the eigensolver. +- `ev_max_iter: integer (default=100)` ➑ Maximum number of iterations for the eigensolver. +- `kmean_tolerance: float (default=0.00001)` ➑ Tolerance used by the k-means solver. +- `kmean_max_iter: integer (default=100)` ➑ Maximum number of iterations for the k-means + solver. +- `weight_property: string (default="weight")` ➑ The values of the given relationship + property are used as weights by the algorithm. If this property is not set for + a relationship, the fallback value is `1.0`. + +#### Output: + +- `node: Vertex` ➑ Graph node. +- `cluster: integer` ➑ Cluster of a node. + +#### Usage: + +```cypher +CALL cugraph.spectral_clustering.get(3) +YIELD node, cluster +RETURN node, cluster; +``` diff --git a/docs2/advanced-algorithms/available-algorithms/cycles.md b/docs2/advanced-algorithms/available-algorithms/cycles.md new file mode 100644 index 00000000000..f401208de87 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/cycles.md @@ -0,0 +1,114 @@ +--- +id: cycles +title: cycles +sidebar_label: cycles +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +In graph theory, a cycle represents a path within the graph where only the starting and ending nodes are equal. Furthermore, cycles can be double-connected links between neighboring nodes or self-loops. The cycles detection algorithm implemented within MAGE works on an undirected graph and has **no guarantee** of node order in the output. The implemented algorithm (Gibb) is described in the 1982 MIT report called "[Algorithmic approaches to circuit enumeration problems and applications](http://hdl.handle.net/1721.1/68106)" [^1]. The problem is not solvable in polynomial time. It is based on finding all subsets of fundamental cycles which takes about O(2^(|E|-|V|+1)) time where E represents a set of edges and V represents a set of vertices of the given graph. + +[^1] [Algorithmic approaches to circuit enumeration problems and applications](http://hdl.handle.net/1721.1/68106), Boon Chai Lee + +[![docs-source](https://img.shields.io/badge/source-cycles-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/cycles_module/cycles_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get()` + +#### Output: + +* `cycle_id` ➑ Incremental cycle ID of a certain vertex. There is no guarantee on how nodes are going to be ordered within cycle. The cycle can be represented with a minimum of one ID, where it stands for self-loop. +* `node` ➑ Vertex object with all properties which is going to be related to the cycle ID it belongs to. + +#### Usage: +```cypher +CALL cycles.get() +YIELD cycle_id, node; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL cycles.get() +YIELD cycle_id, node +RETURN cycle_id, node; +``` + + + + + + +```plaintext ++-----------------+-----------------+ +| cycle_id | node | ++-----------------+-----------------+ +| 0 | (:Node {id: 2}) | +| 0 | (:Node {id: 0}) | +| 0 | (:Node {id: 1}) | +| 1 | (:Node {id: 4}) | +| 1 | (:Node {id: 2}) | +| 1 | (:Node {id: 3}) | ++-----------------+-----------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/degree_centrality.md b/docs2/advanced-algorithms/available-algorithms/degree_centrality.md new file mode 100644 index 00000000000..e49aa11f4e6 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/degree_centrality.md @@ -0,0 +1,150 @@ +--- +id: degree-centrality +title: degree_centrality +sidebar_label: degree_centrality +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Degree Centrality** is the basic measurement of centrality that refers to the +number of edges adjacent to a node. For directed graphs, we define an in-degree +measure, which is defined as the number of in-coming edges, and an out-degree +measure, defined as the number of out-going edges. + +Let $A = (a_{i,j})$ be the adjacency matrix of a directed graph. The in-degree centrality $x_{i}$ of node $i$ is given by: $$x_{i} = \sum_k a_{k,i}$$ or in matrix form (1 is a vector with all components equal to unity): $$x = 1 A$$ The out-degree centrality $y_{i}$ of node $i$ is given by: $$y_{i} = \sum_k a_{i,k}$$ or in matrix form: $$y = A 1$$ + +[![docs-source](https://img.shields.io/badge/source-degree_centrality-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/degree_centrality_module/algorithm/degree_centrality_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed/undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get(type)` + +#### Output: + +- `node` ➑ Node in the graph, for which Degree Centrality is calculated. +- `degree` ➑ Calculated degree of a node. + +#### Usage: + +```cypher +CALL degree_centrality.get() +YIELD node, degree; +``` + +### `get_subgraph(nodes, relationships, type)` + +#### Input: + +- `nodes: list[node]` ➑ nodes to be used in the algorithm. +- `relationships: list[relationship]` ➑ relationships to be considered for +degree calculation. +- `type: string (default="undirected")` ➑ whether we are using "in", "out", or +"undirected" edges. + +#### Output: + +- `node` ➑ Node in the graph, for which Degree Centrality is calculated. +- `degree` ➑ Calculated degree of a node. + +#### Usage: + +```cypher +CALL degree_centrality.get() +YIELD node, degree; +``` + +## Example + + + + + + + + + +```cypher +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 9}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +CALL degree_centrality.get("in") +YIELD node, degree +RETURN node, degree; +``` + + + + +```plaintext ++------------------+------------------+ +| node | degree | ++------------------+------------------+ +| (:Node {id: 9}) | 1 | +| (:Node {id: 7}) | 0 | +| (:Node {id: 6}) | 0 | +| (:Node {id: 5}) | 0 | +| (:Node {id: 4}) | 0 | +| (:Node {id: 3}) | 0 | +| (:Node {id: 8}) | 1 | +| (:Node {id: 2}) | 5 | +| (:Node {id: 10}) | 7 | +| (:Node {id: 0}) | 1 | +| (:Node {id: 1}) | 1 | ++------------------+------------------+ + +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/distance_calculator.md b/docs2/advanced-algorithms/available-algorithms/distance_calculator.md new file mode 100644 index 00000000000..bfcd3fde1f8 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/distance_calculator.md @@ -0,0 +1,137 @@ +--- +id: distance_calculator +title: distance_calculator +sidebar_label: distance_calculator +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +The distance calculator is a module for calculating distance between two geographic locations. It measures the distance along the surface of the earth. +Formula takes into consideration the radius of the earth. For this algorithm, it is necessary to define an object that has longitude and latitude properties like this: + +```cypher +(location:Location {lat: 44.1194, lng: 15.2314}) +``` + +[![docs-source](https://img.shields.io/badge/source-distance_calculator-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/distance_calculator.py) + +| Trait | Value | +| ------------------- |-------------------------------------------------------| +| **Module type** | **module** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `single(start, end, metrics, decimals)` + +#### Input: + +* `start: Vertex` ➑ Starting point to measure distance. Required to have *lng* and *lat* properties. +* `end: Vertex` ➑ Ending point to measure distance. Required to have *lng* and *lat* properties. +* `metrics: string` ➑ Can be either "m" or "km". These stand for meters and kilometers respectively. +* `decimals:int` ➑ Number of decimals on which you want to round up number. + +#### Output: + +* `distance: double` ➑ The final result obtained by calculating distance (in 'm' or 'km') between the 2 points that each have its latitude and longitude properties. + +#### Usage: +```cypher +MATCH (n:Location), (m:Location) +CALL distance_calculator.single(m, n, 'km') +YIELD distance +RETURN distance; +``` + +### `multiple(start_points, end_points, metrics, decimals)` + +#### Input: + +* `start_points: List[Vertex]` ➑ Starting points to measure distance collected in a list. Required to have *lng* and *lat* properties. Must be of the same size as *end_points*. +* `end_points: List[Vertex]` ➑ Ending points to measure distance collected in a list. Required to have *lng* and *lat* properties. Must be of the same size as *start_points*. +* `metrics: string` ➑ Can be either "m" or "km". These stand for metres and kilometres respectively. +* `decimals:int` ➑ Number of decimals on which you want to round up number. + +#### Output: + +* `distance: List[double]` ➑ The final result obtained by calculating distance (in meters) between the 2 points who each have its latitude and longitude. + +#### Usage: +```cypher +MATCH (n), (m) +WITH COLLECT(n) AS location_set1, COLLECT(m) AS location_set2 +CALL distance_calculator.multiple(location_set1, location_set2, 'km') YIELD distances +RETURN distances; +``` + +## Example + + + + + + + + + + + +```cypher +CREATE (location:Location {name: 'Zagreb', lat: 45.8150, lng: 15.9819}); +CREATE (location:Location {name: 'Zadar', lat: 44.1194, lng: 15.2314}); +``` + + + + + +```cypher +MATCH (n {name: 'Zagreb'}), (m {name: 'Zadar'}) +CALL distance_calculator.single(n, m, 'km') YIELD distance +RETURN distance; +``` + + + + + + +```plaintext ++----------+ +| distance | ++----------+ +| 197.568 | ++----------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/elasticsearch_synchronization.md b/docs2/advanced-algorithms/available-algorithms/elasticsearch_synchronization.md new file mode 100644 index 00000000000..4c468d6089e --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/elasticsearch_synchronization.md @@ -0,0 +1,342 @@ +--- +id: elasticsearch_synchronization +title: elasticsearch_synchronization +sidebar_label: elasticsearch_synchronization +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Elasticsearch** is a **text-processing platform** that can be used to enhance the capabilities of a graph database like Memgraph. It offers many fine-grained features useful when working on a text that is impossible to develop in databases. Data residing in Elasticsearch and Memgraph should be **synchronized** because otherwise, the whole system could be in an inconsistent state. Such a feature can be added inside Memgraph by using triggers: every time a new entity is added (node or edge) it gets indexed to the Elasticsearch index. + +[![docs-source](https://img.shields.io/badge/source-Elasticsearch_sync-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/elastic_search_serialization.py) [![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/synchronize-data-between-memgraph-graph-database-and-elasticsearch) + +The module supports the following features: +- creating **Elasticsearch index** from Memgraph clients using Cypher +- **indexing all data** inside Memgraph to Elasticsearch indexes +- managing Elasticsearch **authentication** in a secure way +- indexing entities (nodes and edges) as they are being inserted into the database **without reindexing old data** +- **scanning and searching documents** from Elasticsearch indexes using **Query DSL** +- reindexing existing documents from Elasticsearch + +When using **Elasticsearch synchronization modules**: + +1. start Elasticsearch instance and securely store **username, password, path to the certificate file and instance's URL** +2. connect to the instance by calling the `connect` method +3. use the `create_index` method to create Elasticsearch indexes for nodes and edges +4. index all entities inside the database using the `index_db` method +5. check that documents were indexed correctly using the `scan` or `search` method + +## Procedures + + + +The module for synchronizing Elasticsearch with Memgraph is organized as a **stateful** module where it is expected that the user performs a sequence of operations using a managed secure connection to Elasticsearch. The user can **use indexes that already exist** inside Elasticsearch but can also choose **to create new ones with custom schema**. Indexing can be performed in two ways: +1. **index all data** residing inside the database +2. **incrementally index entities** as they get inserted into the database by using triggers. Find more information about triggers in the [reference guide](https://memgraph.com/docs/memgraph/reference-guide/triggers) or check how to [set up triggers](https://memgraph.com/docs/memgraph/how-to-guides/set-up-triggers). Essentially, triggers offer a way of executing a specific procedure upon some event. + + +### `connect()` + +The `connect()` method is used for connecting to the Elasticsearch instance using Memgraph. It uses a **basic authentication scheme with username, password and certificate**. + +#### **Input**: +- `elastic_url: str` -> URL for connecting to the Elasticsearch instance. +- `ca_certs: str` -> Path to the certificate file. +- `elastic_user: str` -> The user trying to connect to Elasticsearch. +- `elastic_password: str` -> User's password for connecting to Elasticsearch. + +#### **Output**: +- `connection_status: Dict[str, str]` -> Connection info + + +An example of how you can use this method to connect to the Elasticsearch instance: +``` +CALL elastic_search_serialization.connect("https://localhost:9200", "~/elasticsearch-8.4.3/config/certs/http_ca.crt", , ) YIELD *; +``` + +### `create_index()` +The method used for creating Elasticsearch indexes. + +#### **Input**: +- `index_name: str` -> Name of the index that needs to be created. +- `schema_path: str` -> Path to the schema from where the index will be loaded. +- `schema_parameters: Dict[str, Any]` + - `number_of_shards: int` -> Number of shards index will use. + - `number_of_replicas: int` -> Number of replicas index will use. + - `analyzer: str` -> Custom analyzer, which can be set to any legal Elasticsearch analyzer. + + +#### **Output**: +- `message_status: Dict[str, str]` -> Output from the Elasticsearch instance whether the index was successfully created. + +Use the following query to create Elasticsearch indexes: +``` +CALL elastic_search_serialization.create_index("edge_index", +"edge_index_path_schema.json", {analyzer: "mem_analyzer", index_type: "edge"}) YIELD *; +``` + +### `index_db()` +The method is used to serialize all vertices and relationships in MemgraphDB to Elasticsearch instance. By setting the `thread_count`, `max_chunk_bytes`, `chunk_size`, `max_chunk_bytes` and `queue_size` parameters, it is possible to get a good performance spot when indexing large quantities of documents. + +#### **Input** +- `node_index: str` -> The name of the **node index**. Can be used for both **streaming and parallel bulk**. +- `edge_index: str` -> The name of the **edge index**. Can be used for both **streaming and parallel bulk**. +- `thread_count: int` -> **Size of the threadpool** to use for the bulk requests. +- `chunk_size: int` -> The number of docs in one chunk sent to Elasticsearch (default: 500). +- `max_chunk_bytes: int` -> The maximum size of the request in bytes (default: 100MB). +- `raise_on_error: bool` -> Raise `BulkIndexError` containing errors (as .errors) from the execution of the last chunk when some occur. By default, it's raised. +- `raise_on_exception: bool` -> If `False` then don’t propagate exceptions from call to bulk and just report the items that failed as failed. +- `max_retries: int` -> Maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429. +- `initial_backoff: float` -> The number of seconds we should wait before the first retry. Any subsequent retries will be powers of `initial_backoff * 2**retry_number` +- `max_backoff: float` -> The maximum number of seconds a retry will wait. +- `yield_ok: float` -> If set to `False` will skip successful documents in the output. +- `queue_size: int` -> Size of the **task queue** between the **main thread (producing chunks to send) and the processing threads**. + +The method can be called in a following way: +``` +CALL elastic_search_serialization.index_db("node_index", "edge_index", 5, 256, 104857600, True, False, 2, 2.0, 600.0, True, 2) YIELD *; +``` + +#### **Output** +- `number_of_nodes: int` -> Number of indexed nodes. +- `number_of_edges: int` -> Number of indexed edges. + +### `index()` +The method is meant to be used in combination with triggers for incrementally indexing incoming data and it shouldn't be called by a user explicitly. Check out our [docs](https://memgraph.com/docs/memgraph/reference-guide/triggers) where it is explained how Memgraph handles objects captured by various triggers. + +#### **Input** +- `createdObjects: List[Dict[str, Object]]` -> Objects that are captured by a create trigger. +- `node_index: str` -> The name of the **node index**. Can be used for both **streaming and parallel bulk**. +- `edge_index: str` -> The name of the **edge index**. Can be used for both **streaming and parallel bulk**. +- `thread_count: int` -> **Size of the threadpool** to use for the bulk requests. +- `chunk_size: int` -> The number of docs in one chunk sent to Elasticsearch (default: 500). +- `max_chunk_bytes: int` -> The maximum size of the request in bytes (default: 100MB). +- `raise_on_error: bool` -> Raise `BulkIndexError` containing errors (as .errors) from the execution of the last chunk when some occur. By default, it's raised. +- `raise_on_exception: bool` -> If `False` then don’t propagate exceptions from call to bulk and just report the items that failed as failed. +- `max_retries: int` -> Maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429. +- `initial_backoff: float` -> The number of seconds we should wait before the first retry. Any subsequent retries will be powers of `initial_backoff * 2**retry_number` +- `max_backoff: float` -> The maximum number of seconds a retry will wait. +- `yield_ok: float` -> If set to `False` will skip successful documents in the output. +- `queue_size: int` -> Size of the **task queue** between the **main thread (producing chunks to send) and the processing threads**. + +The method can be used in a following way: +``` +CREATE TRIGGER elastic_search_create +ON CREATE AFTER COMMIT EXECUTE +CALL elastic_search_serialization.index(createdObjects, "docs_nodes", "docs_edges") YIELD * RETURN *; +``` + +#### **Output** +- `number_of_nodes: int` -> Number of indexed nodes. +- `number_of_edges: int` -> Number of indexed edges. + + + +#### **Input** +- `node_index: str` -> The name of the **node index**. Can be used for both **streaming and parallel bulk**. +- `edge_index: str` -> The name of the **edge index**. Can be used for both **streaming and parallel bulk**. +- `chunk_size: int` -> The number of docs in one chunk sent to Elasticsearch (default: 500). +- `max_chunk_bytes: int` -> The maximum size of the request in bytes (default: 100MB). +- `raise_on_error: bool` -> Raise `BulkIndexError` containing errors (as .errors) from the execution of the last chunk when some occur. By default, it's raised. +- `raise_on_exception: bool` -> If `False` then don’t propagate exceptions from call to bulk and just report the items that failed as failed. +- `max_retries: int` -> Maximum number of times a document will be retried when 429 is received, set to 0 (default) for no retries on 429. +- `initial_backoff: float` -> The number of seconds we should wait before the first retry. Any subsequent retries will be powers of `initial_backoff * 2**retry_number` +- `max_backoff: float` -> The maximum number of seconds a retry will wait. +- `yield_ok: float` -> If set to `False` will skip successful documents in the output. +- `thread_count: int` -> **Size of the threadpool** to use for the bulk requests. +- `queue_size: int` -> Size of the **task queue** between the **main thread (producing chunks to send) and the processing threads**. + +The method can be called in a following way: +``` +CALL elastic_search_serialization.index_db("node_index", "edge_index", 5, 256, 104857600, True, False, 2, 2.0, 600.0, True, 2) YIELD *; +``` + +#### **Output** +- `number_of_nodes: int` -> Number of indexed nodes. +- `number_of_edges: int` -> Number of indexed edges. + + + +### `reindex()` +**Reindex all documents** that satisfy a given query from one index to another, potentially (if `target_client` is specified) on a different cluster. If you don’t specify the query you will reindex all the documents. + +#### **Input** +- `updatatedObjects: List[Dict[str, Any]]` -> List of all objects that **were updated and then sent as arguments to this method** with the help of the update trigger. +- `source_index: Union[str, List[str]])` -> Identifies **source index (or more of them)** from where documents need to be indexed. +- `target_index: str` -> Identifies **target index** to where documents need to be indexed. +- `query: str` -> Query written as JSON. +- `chunk_size: int` -> Number of docs in one chunk sent to Elasticsearch (default: 500). +- `scroll: str` -> Specifies how long **a consistent view of the index** should be maintained for scrolled search. +- `op_type: Optional[str])` -> Explicit operation type. Defaults to `_index`. Data streams must be set to `create`. If not specified, will auto-detect if target_index is a data stream. + +#### **Output** +- `response: str` -> Number of documents matched by a query in the `source_index`. + +To reindex all documents from the `source_index` to the `destination_index`, use the following query: +``` +CALL elastic_search_serialization.reindex("source_index", "destination_index", "{\"query\": {\"match_all\": {}}} ") YIELD * RETURN *; +``` + +### `scan()` +Fetches documents from the index specified by the `index_name` matched by the query. Supports pagination. + +#### **Input** +- `index_name: str` -> Name of the index. +- `query: str` -> Query written as JSON. +- `scroll: int` -> Specifies how long **a consistent view** of the index should be maintained for scrolled search. +- `raise_on_error: bool` -> Raises an exception (`ScanError`) if an error is encountered (some shards fail to execute). By default, it's raised. +- `preserve_order: bool` -> By default `scan()` does not return results in any pre-determined order. To have a standard order in the returned documents (either by score or explicit sort definition) when scrolling, set `preserve_order=True`. Don’t set the search_type to scan - this will cause the scroll to paginate with preserving the order. Note that this can be an extremely expensive operation and can easily lead to unpredictable results, use it with caution. +- `size: int` -> Size (per shard) of the batch sent at each iteration. +- `from: int` -> Starting document offset. By default, you cannot page through more than 10,000 hits using the `from` and size parameters. To page through more hits, use the `search_after` parameter. +- `request_timeout: mgp.Nullable[float]` -> Explicit timeout for each call to scan. +- `clear_scroll: bool` -> Explicitly calls delete on the scroll id via the clear scroll API at the end of the method on completion or error, defaults to true. + +#### **Output** +- `documents: List[Dict[str, str]]` -> List of all items matched by the specific query. + +Below is an example scan query that makes use of all parameters: +``` +CALL elastic_search_serialization.scan("edge_index", "{\"query\": {\"match_all\": {}}}", "5m", false, false, 100, 0, 2.0, False) YIELD *; +``` + +### `search()` +Searches for all documents by specifying query and index. It is the preferred method to be used before the `scan()` method because of the possibility to use aggregations. + +#### **Input** +- `index_name: str` -> A name of the index. +- `query: str` -> Query written as JSON. +- `size: int` -> Size (per shard) of the batch sent at each iteration. +- `from_: int` -> Starting document offset. By default, you cannot page through more than 10,000 hits using the `from` and size parameters. To page through more hits, use the `search_after` parameter. +- `aggregations: Optional[Mapping[str, Mapping[str, Any]]]` -> Check out the (docs)[https://elasticsearch-py.readthedocs.io/en/v8.5.3/api.html#elasticsearch.Elasticsearch.search] +- `aggs: Optional[Mapping[str, Mapping[str, Any]]]` -> Check out the (docs)[https://elasticsearch-py.readthedocs.io/en/v8.5.3/api.html#elasticsearch.Elasticsearch.search] + +#### **Output** +- `documents: Dict[str, str]` β†’ Returns results matching a query. + +A query without aggregations that represents how the search method could be used: +``` +CALL elastic_search_serialization.search("node_index", "{\"match_all\": {}}", 1000, 0) YIELD *; +``` + +## Example + +Example shows all module's features, from connecting to the Elasticsearch instance up to the synchronizing Memgraph and Elasticsearch using triggers. + + + + + + +```cypher +CALL elastic_search_serialization.connect("https://localhost:9200", "http_ca.crt", "","") YIELD *; +CREATE (n0 {name: "n0"}), (n1 {name: "n1"}), (n2 {name: "n2"}), (n3 {name: "n3"}), (n4 {name: "n4"}), (n5 {name: "n5"}), (n6 {name: "n6"}); +CREATE (n1)-[r1:RELATED]->(n2); +CREATE (n1)-[r2:RELATED]->(n3); +CREATE (n1)-[r3:RELATED]->(n3); +CREATE (n2)-[r4:RELATED]->(n1); +CREATE (n2)-[r5:RELATED]->(n5); +CREATE (n2)-[r6:RELATED]->(n3); +CREATE (n3)-[r7:RELATED]->(n6); +CREATE (n3)-[r8:RELATED]->(n1); +CREATE (n1)-[r9:RELATED]->(n4); +``` + + + + + + +```cypher +CALL elastic_search_serialization.create_index("docs_nodes", "node_index_path_schema.json", +{analyzer: "mem_analyzer", index_type: "vertex"}) YIELD *; +CALL elastic_search_serialization.create_index("docs_edges", "edge_index_path_schema.json", {analyzer: "mem_analyzer", index_type: "edge"}) YIELD *; +``` + + + + + +```cypher +CALL elastic_search_serialization.index_db("docs_nodes", "docs_edges", 4) YIELD *; +``` + + + + + +```cypher +CALL elastic_search_serialization.scan("docs_nodes", "{\"query\": {\"match_all\": {}}}", "5m", false, false, 100, 0, 2.0, False) YIELD *; +``` + + + + + + + + + + + +```cypher +CREATE TRIGGER elastic_search_create +ON CREATE AFTER COMMIT EXECUTE +CALL elastic_search_serialization.index(createdObjects, "docs_nodes", "docs_edges") YIELD * RETURN *; +``` + + + + + +```cypher +CREATE (n7 {name: "n7"}); +MATCH (n6 {name: "n6"}), (n7 {name: "n7"}) +CREATE (n6)-[:NEW_CONNECTION {edge_property: "docs"}]->(n7); +``` + + + + + +```cypher +CALL elastic_search_serialization.search("docs_nodes", "{\"match_all\": {}}", 1000, 0) YIELD *; +``` + + + + + + + + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/export_util.md b/docs2/advanced-algorithms/available-algorithms/export_util.md new file mode 100644 index 00000000000..e9dbef3dc5f --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/export_util.md @@ -0,0 +1,394 @@ +--- +id: export_util +title: export_util +sidebar_label: export_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +Module for exporting a graph database or query results in different formats. Currently, this +module supports [**exporting database to a JSON file format**](#jsonpath) and [**exporting query results in a CSV file format**](#csv_queryquery-file_path-stream). + +[![docs-source](https://img.shields.io/badge/source-export_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/export_util.py) + + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `json(path)` + +#### Input: + +* `path: string` ➑ Path to the JSON file containing the exported graph database. + +#### Usage: + +The `path` you have to provide as procedure argument depends on how you started +Memgraph. + + + + + +If you ran Memgraph with Docker, database will be exported to a JSON file inside +the Docker container. We recommend exporting the database to the JSON file +inside the `/usr/lib/memgraph/query_modules` directory. + +You can call the procedure by running the following query: + +```cypher +CALL export_util.json(path); +``` +where `path` is the path to the JSON file inside the +`/usr/lib/memgraph/query_modules` directory in the running Docker container (e.g., +`/usr/lib/memgraph/query_modules/export.json`). + +:::info +You can [**copy the exported CSV file to your local file system**](/memgraph/how-to-guides/work-with-docker#how-to-copy-files-from-and-to-a-docker-container) using the [`docker cp`](https://docs.docker.com/engine/reference/commandline/cp/) command. +::: + + + + +To export database to a local JSON file create a new directory (for example, +`export_folder`) and run the following command to give the user `memgraph` the +necessary permissions: + +``` +sudo chown memgraph export_folder +``` + +Then, call the procedure by running the following query: + +```cypher +CALL export_util.json(path); +``` +where `path` is the path to a local JSON file that will be created inside the +`export_folder` (e.g., `/users/my_user/export_folder/export.json`). + + + + +### `csv_query(query, file_path, stream)` + +#### Input: + +* `query: string` ➑ A query from which the results will be saved to a CSV file. +* `file_path: string (default="")` ➑ A path to the CSV file where the query results will be exported. Defaults to an empty string. +* `stream: bool (default=False)` ➑ A value which determines whether a stream of query results in a CSV format will be returned. + +#### Output: + +* `file_path: string` ➑ A path to the CSV file where the query results are exported. If `file_path` is not provided, the output will be an empty string. +* `data: string` ➑ A stream of query results in a CSV format. + +#### Usage: + +The `file_path` you have to provide as procedure argument depends on how you started +Memgraph. + + + + + +If you ran Memgraph with Docker, query results will be exported to a CSV file inside +the Docker container. We recommend exporting the database to the CSV file +inside the `/usr/lib/memgraph/query_modules` directory. + +You can call the procedure by running the following query: + +```cypher +CALL export_util.csv_query(path); +``` +where `path` is the path to a CSV file inside the +`/usr/lib/memgraph/query_modules` directory in the running Docker container (e.g., +`/usr/lib/memgraph/query_modules/export.csv`). + +:::info +You can [**copy the exported CSV file to your local file system**](/memgraph/how-to-guides/work-with-docker#how-to-copy-files-from-and-to-a-docker-container) using the [`docker cp`](https://docs.docker.com/engine/reference/commandline/cp/) command. +::: + + + + +To export query results to a local CSV file create a new directory (for example, +`export_folder`) and run the following command to give the user `memgraph` the +necessary permissions: + +``` +sudo chown memgraph export_folder +``` + +Then, call the procedure by running the following query: + +```cypher +CALL export_util.csv_query(path); +``` +where `path` is the path to a local CSV file that will be created inside the +`export_folder` (e.g., `/users/my_user/export_folder/export.csv`). + + + + + +## Example - Exporting database to a JSON file + + + + + +You can create a simple graph database by running the following queries: + +```cypher +CREATE (n:Person {name:"Anna"}), (m:Person {name:"John"}), (k:Person {name:"Kim"}) +CREATE (n)-[:IS_FRIENDS_WITH]->(m), (n)-[:IS_FRIENDS_WITH]->(k), (m)-[:IS_MARRIED_TO]->(k); +``` + + + + +The image below shows the above data as a graph: + + + + + + + +If you're using **Memgraph with Docker**, the following Cypher query will +export the database to the `export.json` file in the +`/usr/lib/memgraph/query_modules` directory inside the running Docker container. + +```cypher +CALL export_util.json("/usr/lib/memgraph/query_modules/export.json"); +``` + +If you're using **Memgraph on Ubuntu, Debian, RPM package or WSL**, the +following Cypher query will export the database to the `export.json` file in the +`/users/my_user/export_folder`. + +```cypher +CALL export_util.json("/users/my_user/export_folder/export.json"); +``` + + + + + +The `export.json` file should be similar to the one below, except for the +`id` values that depend on the internal database `id` values: + + +```json +[ + { + "id": 6114, + "labels": [ + "Person" + ], + "properties": { + "name": "Anna" + }, + "type": "node" + }, + { + "id": 6115, + "labels": [ + "Person" + ], + "properties": { + "name": "John" + }, + "type": "node" + }, + { + "id": 6116, + "labels": [ + "Person" + ], + "properties": { + "name": "Kim" + }, + "type": "node" + }, + { + "end": 6115, + "id": 21120, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21121, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21122, + "label": "IS_MARRIED_TO", + "properties": {}, + "start": 6115, + "type": "relationship" + } +] +``` + + + + + +## Example - Exporting query results to a CSV file + + + + + +You can create a simple graph database by running the following queries: + +```cypher +CREATE (StrangerThings:TVShow {title:'Stranger Things', released:2016, program_creators:['Matt Duffer', 'Ross Duffer']}) +CREATE (Eleven:Character {name:'Eleven', portrayed_by:'Millie Bobby Brown'}) +CREATE (JoyceByers:Character {name:'Joyce Byers', portrayed_by:'Millie Bobby Brown'}) +CREATE (JimHopper:Character {name:'Jim Hopper', portrayed_by:'Millie Bobby Brown'}) +CREATE (MikeWheeler:Character {name:'Mike Wheeler', portrayed_by:'Finn Wolfhard'}) +CREATE (DustinHenderson:Character {name:'Dustin Henderson', portrayed_by:'Gaten Matarazzo'}) +CREATE (LucasSinclair:Character {name:'Lucas Sinclair', portrayed_by:'Caleb McLaughlin'}) +CREATE (NancyWheeler:Character {name:'Nancy Wheeler', portrayed_by:'Natalia Dyer'}) +CREATE (JonathanByers:Character {name:'Jonathan Byers', portrayed_by:'Charlie Heaton'}) +CREATE (WillByers:Character {name:'Will Byers', portrayed_by:'Noah Schnapp'}) +CREATE (SteveHarrington:Character {name:'Steve Harrington', portrayed_by:'Joe Keery'}) +CREATE (MaxMayfield:Character {name:'Max Mayfield', portrayed_by:'Sadie Sink'}) +CREATE (RobinBuckley:Character {name:'Robin Buckley', portrayed_by:'Maya Hawke'}) +CREATE (EricaSinclair:Character {name:'Erica Sinclair', portrayed_by:'Priah Ferguson'}) +CREATE +(Eleven)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(JoyceByers)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(JimHopper)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(MikeWheeler)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(DustinHenderson)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(LucasSinclair)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(NancyWheeler)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(JonathanByers)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(WillByers)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(SteveHarrington)-[:ACTED_IN {seasons:[1, 2, 3, 4]}]->(StrangerThings), +(MaxMayfield)-[:ACTED_IN {seasons:[2, 3, 4]}]->(StrangerThings), +(RobinBuckley)-[:ACTED_IN {seasons:[3, 4]}]->(StrangerThings), +(EricaSinclair)-[:ACTED_IN {seasons:[2, 3, 4]}]->(StrangerThings); +``` + + + + +The image below shows the above data as a graph: + + + + + + + +If you're using **Memgraph with Docker**, the following Cypher query will +export the database to the `export.csv` file in the +`/usr/lib/memgraph/query_modules` directory inside the running Docker container. + +```cypher +WITH "MATCH path = (c:Character)-[:ACTED_IN]->(tvshow) RETURN c.name AS name, c.portrayed_by AS portrayed_by, tvshow.title AS title, tvshow.released AS released, tvshow.program_creators AS program_creators" AS query +CALL export_util.csv_query(query, "/usr/lib/memgraph/query_modules/export.csv", True) +YIELD file_path, data +RETURN file_path, data; +``` + +If you're using **Memgraph on Ubuntu, Debian, RPM package or WSL**, then the +following Cypher query will export the database to the `export.csv` file in the +`/users/my_user/export_folder`. + +```cypher +WITH "MATCH path = (c:Character)-[:ACTED_IN]->(tvshow) RETURN c.name AS name, c.portrayed_by AS portrayed_by, tvshow.title AS title, tvshow.released AS released, tvshow.program_creators AS program_creators" AS query +CALL export_util.csv_query(query, "/users/my_user/export_folder/export.csv", True) +YIELD file_path, data +RETURN file_path, data; +``` + + + + + +The output in the `export.csv` file looks like this: + +```csv +name,portrayed_by,title,released,program_creators +Eleven,Millie Bobby Brown,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Joyce Byers,Millie Bobby Brown,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Jim Hopper,Millie Bobby Brown,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Mike Wheeler,Finn Wolfhard,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Dustin Henderson,Gaten Matarazzo,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Lucas Sinclair,Caleb McLaughlin,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Nancy Wheeler,Natalia Dyer,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Jonathan Byers,Charlie Heaton,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Will Byers,Noah Schnapp,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Steve Harrington,Joe Keery,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Max Mayfield,Sadie Sink,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Robin Buckley,Maya Hawke,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +Erica Sinclair,Priah Ferguson,Stranger Things,2016,"['Matt Duffer', 'Ross Duffer']" +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/gnn_link_prediction.md b/docs2/advanced-algorithms/available-algorithms/gnn_link_prediction.md new file mode 100644 index 00000000000..e54702d3937 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/gnn_link_prediction.md @@ -0,0 +1,376 @@ +--- +id: link_prediction_with_gnn +title: link_prediction_with_gnn +sidebar_label: link_prediction_with_gnn +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Link prediction** can be defined as a problem where one wants to predict if there is a link between two nodes in the graph. It can be used for predicting missing or future links in the evolving graph. Using the notation `G = (V, E)` for a graph with nodes `V` and edges `E` and given two nodes `v1` and `v2`, the link prediction algorithm tries to predict whether those two nodes will be connected, based on the **node features** and **graph structure**. Lately, **graph neural networks** have been often used for **node-classification** and **link-prediction** problems. They are extremely useful in numerous interdisciplinary fields of work where is important to incorporate **domain-specific** knowledge to capture more **fine-grained** relationships among the data. Such fields usually involve working with **heterogeneous** and **large-scale** graphs. **GNNs** iteratively update node representations by aggregating the representations of node neighbors and their representation from the previous iteration. Such properties make **graph neural networks** a great tool for various problems we in Memgraph encounter. If your graph is evolving in time, check [TGN model](https://github.com/memgraph/mage/blob/main/python/tgn.py) that Memgraph engineers have already developed. + +[![docs-source](https://img.shields.io/badge/source-link_prediction_with_gnn-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/link_prediction.py) + +### Blog Posts + +The following blog posts explain how we tried to apply link prediction: +- [Node2Vec](https://memgraph.com/blog/link-prediction-with-node2vec-in-physics-collaboration-network) +- [GNN Link prediction](https://memgraph.com/blog/building-a-recommendation-system-for-telecommunication-packages-using-graph-neural-networks) + +### About the query module + +In this module you can find support for the following interesting features: +- support for both **homogeneous** and **heterogeneous** graphs +- support for **disconnected** graphs +- its applicability to use it as a **recommendation system** +- a **semi-inductive** link prediction setup where a larger, updated graph is used for the **inference** +- an **inductive** link prediction setup in which **training** and **inference** graphs are different +- **transductive** graph splitting (training and validation sets) +- **Graph attention layer** aggregates information using an attention mechanism from the first-hop neighbourhood. Introduced by [Velickovic et al.](https://arxiv.org/pdf/1710.10903.pdf) +- **GraphSAGE layer** extends the usability of graph neural networks to large-scale graphs. Introduced by [Hamilton et al.](https://arxiv.org/pdf/1706.02216.pdf) +- **mlp** and **dot** predictors are used for combining node scores to edge scores +- **ADAM** and **SGD** optimizers are used for training neural networks +- support for **batch training** +- **parallel graph sampling** is done using multiple threads +- **negative graph sampling** is a sampling where the final graph consists only of edges that don't exist +- evaluating the model's **training performance** using a variety of metrics like **AUC, Precision, Recall, Accuracy, Confusion matrix** +- evaluating the model's **recommendation performance** with **Precision@k, Recall@k, F1@k and Average Precision** metrics + +If you want to try-out our implementation, head to **[github/memgraph/mage](https://github.com/memgraph/mage)** and find `python/link_prediction.py`. Feel free to give us a :star: if you like the code. The easiest way to test **link-prediction** is by downloading [Memgraph Platform](https://memgraph.com/download) and using some of the preloaded datasets in **Memgraph Lab**. + +There are some things you should be careful about when using **link prediction**: +- features of all nodes should be called the same (e.g saved as **'features'** property in **Memgraph**) +- model's performance on the validation set is obtained using **transductive** splitting mode, while **inductive** dataset split is not yet supported. You can find more information about graph splitting on slides of [Graph Machine Learning course](http://web.stanford.edu/class/cs224w/slides/08-GNN-application.pdf) offered by **Stanford**. +- to improve performance, **self-loop** is added to each node with the edge-type set to `self` +- the user can set the flag to automatically add **reverse edges** to each existing edge and hence, convert a **directed** graph to a **bidirected** one. If the source and destination nodes of the edge are the same, **reverse edge type** will be the same as the original **edge type**. Otherwise, the prefix **rev_** will be added to the original **edge type**. See the FAQ part to further see why are **self-loops** and **reverse edges** very important in ML training and how you can get into problems if your graph is already **undirected** :thinking_face: + +Feel free to open a **[GitHub issue](https://github.com/memgraph/mage/issues)** +or start a discussion on **[Discord](https://discord.gg/memgraph)** if you want +to speed up development. + +### Usage + +The following procedure is expected when using **link prediction module**: + +- set parameters by calling `set_model_parameters` function +- train a model by calling `train` function +- inspect training results (optional) by calling `get_training_results` function +- predict the relationship between two vertices by calling `predict` or +- call the `recommend` function to find the most likely relationships + +### Implementation details + +For the underlying **GNN** training we use the [DGL library](https://github.com/dmlc/dgl/). +> Fast and memory-efficient message passing primitives for training Graph Neural Networks. Scale to giant graphs via multi-GPU acceleration and distributed training infrastructure. +> +> -- DGL team + +#### **Splitting the dataset** + +If the user specifies `split_ratio 1.0`, the model will train normally on a whole dataset without validating its performance on a validation set. However, if the user-defined split_ratio is a value between 0.0 and 1.0 but the graph is too small to have such a split, an exception will be thrown. + +#### **Self-loops** + +**Self-loop edge** is added to every node to improve **link_prediction** performance if specified by the user. **Self-loop edges** are added only as **edge_type** `self`, not in any other way, and to enable this, a custom module has been added. + +#### **Batch training** + +In heterogeneous graphs, all edges are used for creating the node’s neighbourhood but trained on only one edge type that can be set by the user. + +> For each gradient descent step, we select a mini-batch of nodes whose final representations at the L-th layer are to be computed. We then take all or some of their neighbours at the Lβˆ’1 layer. This process continues until we reach the input. This iterative process builds the dependency graph starting from the output and working backwards to the input, as the figure below shows: + + + +> +> -- DGL docs + +The reader is encouraged to take a look at the [DGL mini-batch explanation](https://docs.dgl.ai/guide/minibatch.html) for more details. + +## Procedures + + + +The link prediction module is organized as a stateful module in which the user can run several methods one after another without losing the context. The user should start with setting the parameters that are going to be used in the training. If the graph is **heterogeneous** (more than one **edge type**), `target_relation` parameter must be set so the model could distinguish **supervision edges** (edges used in prediction) from **message passing edges** (used for message aggregation). In the case of **homogeneous graph**, `target relation` will be automatically inferred. `Node_features_property` must also be sent by the user to specify where are saved original node features. Those are needed by **graph neural networks** to compute **node embeddings**. All other parameters are optional. + +### `set_model_parameters()` + +Here is the description of all parameters supported by **link prediction** that you can set by calling the `set_model_parameters` method: +#### **Input**: + +| Name | Type | Default | Description | +| --------------- | ----- | ----------- | ----------- | +| `hidden_features_size` | mgp.List[int] | `[16, 16]` | Defines the size of each hidden layer in the architecture. Input feature size is determined automatically while converting the original graph to the DGL compatible one. | +| `layer_type` | str | `graph_attn` | Supported values are `graph_sage` and `graph_attn`. | +| `num_epochs` | int | `100` | The number of epochs for model training. | +| `optimizer` | str | `ADAM` | Supported values are `ADAM` and `SGD`. | +| `learning_rate` | float | `0.01` | Optimizer's learning rate. | +| `split_ratio` | float | `0.8` | The split ratio between the training and the validation set. There is no test dataset because it's assumed that the user first needs to create new edges in the original dataset to test a model on them. | +| `node_features_property` | str | `features` | Property name where the node features are saved. | +| `device_type` | str | `cpu` | Defines if the model will be trained using the `CPU` or `Cuda GPU`. To run on `Cuda GPU`, check if the system supports it with `torch.cuda.is_available()`, then set this flag to `cuda`. | +| `console_log_freq` | int | `5` | Specifies how often results will be printed. This also directly specifies which results will be returned as training and validation results when calling the training method. | +| `checkpoint_freq` | int | `5` | Select the number of epochs on which the model will be saved. The model is persisted on disc. | +| `aggregator` | str | `mean` | Aggregator used in GraphSAGE model. Supported values are `lstm`, `pool`, `mean` and `gcn`. | +| `metrics` | mgp.List[str] | `[loss, accuracy, auc_score, precision, recall, f1, true_positives, true_negatives, false_positives, false_negatives]` | Metrics used to evaluate the training model on the validation set. Additionally, epoch information will always be displayed. | +| `predictor_type` | str | `dot` | Type of the predictor. A predictor is used for combining node scores to edge scores. Supported values are `dot` and `mlp`. | +| `attn_num_heads` | List[int] | `[4, 1]` | `GAT` can support the usage of more than one head in each layer except the last one. The size of the list must be the same as the number of layers specified by the `hidden_features_size` parameter. | +| `tr_acc_patience` | int | `8` | Training patience specifies for how many epochs drop in accuracy on the validation set is tolerated before the training is stopped. | +| `context_save_dir` | str | `None` | Path where the model and predictor will be saved every `checkpoint_freq` epochs. | +| `target_relation` | str | `None` | Unique edge type used for training. Users can provide only `edge_type` or `tuple of the source node, edge type, dest_node` if the same `edge_type` is used with more source-destination node combinations. | +| `num_neg_per_pos_edge` | int | `1` | Number of negative edges that will be sampled per one positive edge in the mini-batch training. | +| `batch_size` | int | `256` | Batch size used in both training and validation procedure. It specifies the number of indices in each batch. | +| `sampling_workers` | int | `5` | Number of workers that will cooperate in the sampling procedure in the training and validation. | +| `last_activation_function` | str | `sigmoid` | Activation function that is applied after the last layer in the model and before the `predictor_type`. Currently, only `sigmoid` is supported. | +| `add_reverse_edges` | bool | `False` | Whether the module should add reverse edges for each existing edge in the obtained graph. If the source and destination node are of the same type, edges of the same edge type will be created. If the source and destination nodes are different, then the prefix `rev_` will be added to the previous edge type. Reverse edges will be excluded as message passing edges for corresponding supervision edges. | + +#### **Output**: +- `status: bool` -> `True` if all parameters were successfully updated, `False` otherwise. +- `message: str` -> `OK` if all parameters were successfully updated, `Error message` otherwise. + +Only those parameters that need changing from their default values are sent when calling the procedure: +``` +CALL link_prediction.set_model_parameters({num_epochs: 100, node_features_property: "features", tr_acc_patience: 8, target_relation: "CITES", batch_size: 256, last_activation_function: "sigmoid", add_reverse_edges: True}) +YIELD status, message +RETURN status, message; +``` + +### `train()` +The `train` method doesn't take any parameters, so it is very simple to use. + +#### **Output**: +- `training_results: List[Dict[str, float]]` -> List of training results through epochs. Model's performance is evaluated every `console_log_freq` epochs. +- `validation results: List[Dict[str, float]]` -> List of validation results through epochs. Model's performance is evaluated every `console_log_freq` epochs. + +You can just call +``` +CALL link_prediction.train() +YIELD training_results, validation_results +RETURN training_results, validation_results; +``` +to get training and validation results summarized through epochs. + +### `get_training_results()` + +The `get_training_results` method is used when the user wants to get performance data obtained from the last training. It is in the same form as a result of calling the training method. If there is no loaded model, the exception will be thrown. + +``` +CALL link_prediction.get_training_results() +YIELD training_results, validation_results; +RETURN training_results, validation_results; +``` + +#### **Output:** +- `training_results: List[Dict[str, float]]` -> List of training results through epochs. Model's performance is evaluated every `console_log_freq` epochs. +- `validation results: List[Dict[str, float]]` -> List of validation results through epochs. Model's performance is evaluated every `console_log_freq` epochs. + +### `predict()` + +The `predict` method takes two arguments, **src_vertex** and **dest_vertex**, and predicts whether there is an edge between them or not. It supports an `β€œactual”` prediction scenario when the edge doesn’t exist and the user wants to predict whether there is an edge or not but also a scenario in which there is an edge between two vertices and the user wants to check the model’s evaluation. + +#### Input +- `src_vertex: mgp.Vertex` -> Source vertex of the edge +- `dest_vertex: mgp.Vertex` -> Destination vertex of the edge. + +#### Output +- `score: mgp.Number` -> Score between 0 and 1 that represents the probability of two nodes being connected. + +``` +MATCH (v1:PAPER {id: "ID_1"}) +MATCH (v2:PAPER {id: "ID_2"}) +CALL link_prediction.predict(v1, v2) +YIELD score +RETURN score; +``` + +### `recommend()` + +The `recommend` method can be used to recommend the best k nodes from `dest_vertices` to `src_vertex`. It is implemented efficiently using the **max heap** data structure. The best nodes are determined based on the edge scores. Metrics specific to recommendation systems (**precision@k, recall@k, f1@k and average precision**) are logged to the **standard output**. **K** is equal to the given `min(k, length(dest_vertices), length(results))` where results are a list of all recommendations given by the model(classified as a positive example.) + +#### Input +- `src_vertex: mgp.Vertex` β†’ Source node. +- `dest_vertices: List[mgp.Vertex]` β†’ destination nodes. If they are not of the same type, an exception is thrown. +- `k: int` β†’ Number of edges to recommend. + +#### Output +- `score: mgp.Number` β†’ Score between 0 and 1 that represents the probability of two nodes being connected. +- `recommendation: mgp.Vertex` β†’ A reference to the target node. + +``` +MATCH (v1:Customer {id: "8779-QRDMV"}) +MATCH (p:Plan) +WITH collect(p) AS all_plans, v1 +CALL link_prediction.recommend(v1, all_plans, 5) +YIELD score, recommendation +RETURN v1, score, recommendation; +``` + +### `load_context()` + +Loading the context means loading the model and the predictor. If the user specifies the path, the method will try to load it from there. Otherwise, context will be loaded from the default parameter specified in the **link_prediction_parameters** module. + +#### Input +- `path: str` β†’ Path to the folder where the model and the predictor are saved. + +#### Output +- `status: mgp.Any` β†’ True to indicate that execution went well. + +``` +CALL link_prediction.load_context() YIELD * RETURN *; +``` + +### `reset_parameters()` + +You can explicitly reset parameters whenever you want. Note, however, that parameters will be reset before the training even if not specified because of implementation reasons. + +#### Output +- `status: mgp.Any` β†’ True to indicate that method is successfully finished. + +``` +CALL link_prediction.reset_parameters() YIELD * RETURN *; +``` + +## Results + +We extensively tested our model on the [**CORA**](https://paperswithcode.com/dataset/cora) dataset and the Telecom recommendation dataset. To show you how the training performance could progress through epochs, here are the results for one of our basic models tried on the Cora dataset: + +| epoch_num | AUC | accuracy | precision | recall | f1 | +| --------- | ----- | -------- | --------- | ------ | ----- | +| 1 | 0.64 | 0.594 | 0.613 | 0.494 | 0.547 | +| 2 | 0.781 | 0.696 | 0.711 | 0.663 | 0.686 | +| 3 | 0.798 | 0.729 | 0.752 | 0.682 | 0.715 | +| 4 | 0.754 | 0.686 | 0.716 | 0.617 | 0.663 | +| 5 | 0.789 | 0.711 | 0.715 | 0.7 | 0.707 | +| 6 | 0.813 | 0.756 | 0.742 | 0.784 | 0.763 | +| 7 | 0.884 | 0.772 | 0.764 | 0.791 | 0.775 | +| 8 | 0.859 | 0.775 | 0.781 | 0.766 | 0.773 | +| 9 | 0.871 | 0.805 | 0.822 | 0.777 | 0.798 | +| 10 | 0.832 | 0.759 | 0.776 | 0.729 | 0.752 + +## Example + + + + + + + + + + + + + + +```cypher +CREATE (v1:PAPER {id: 10, features: [1, 2, 3]}); +CREATE (v2:PAPER {id: 11, features: [1.54, 0.3, 1.78]}); +CREATE (v3:PAPER {id: 12, features: [0.5, 1, 4.5]}); +CREATE (v4:PAPER {id: 13, features: [0.78, 0.234, 1.2]}); +MATCH (v1:PAPER {id: 10}), (v2:PAPER {id: 11}) CREATE (v1)-[e:CITES {}]->(v2); +MATCH (v2:PAPER {id: 11}), (v3:PAPER {id: 12}) CREATE (v2)-[e:CITES {}]->(v3); +MATCH (v3:PAPER {id: 12}), (v4:PAPER {id: 13}) CREATE (v3)-[e:CITES {}]->(v4); +MATCH (v4:PAPER {id: 13}), (v1:PAPER {id: 10}) CREATE (v4)-[e:CITES {}]->(v1); +``` + + + + + +```cypher +CALL link_prediction.set_model_parameters({target_relation: ["PAPER", "CITES", "PAPER"], node_features_property: "features", +split_ratio: 1.0, predictor_type: "mlp", num_epochs: 100, hidden_features_size: [256], attn_num_heads: [1]}) YIELD * RETURN *; +``` + + + + + + +```cypher +CALL link_prediction.train() YIELD training_results, validation_results +RETURN training_results, validation_results; +``` + + + + + +```plaintext ++--------------------+--------------------+--------------------+--------------------+--------------------+ +| epoch_num | accuracy | auc_score | loss | precision | ++--------------------+--------------------+--------------------+--------------------+--------------------+ +| 17 | 0.833 | 0.906 | 0.428 | 1.0 | +| 18 | 0.917 | 0.938 | 0.393 | 1.0 | +| 19 | 0.833 | 0.938 | 0.365 | 0.75 | +| 20 | 0.917 | 0.938 | 0.341 | 1.0 | +| 21 | 0.917 | 0.938 | 0.315 | 1.0 | +| 22 | 0.833 | 0.969 | 0.296 | 0.75 | +| 23 | 0.917 | 1.0 | 0.277 | 1.0 | +| 24 | 0.917 | 1.0 | 0.246 | 0.8 | +| 25 | 0.917 | 1.0 | 0.233 | 0.8 | +| 26 | 1.0 | 1.0 | 0.202 | 1.0 | +``` + + + + + +```cypher +MATCH (v1:PAPER {id: 10}) +MATCH (v2:PAPER {id: 12}) +CALL link_prediction.predict(v1, v2) +YIELD score +RETURN score; +``` + + + + + +```plaintext ++-------+ +| score | +| 0.104 | +``` + + + + + +## FAQ + +### **Why can I get into problems with reverse edges?** + +Having a `reverse_edge` in your dataset can be a problem if they are not excluded from `message passing edges` in the prediction of its `opposite edge`(`supervision edge`). The best thing you can do is have a `directed` graph and the module will automatically add reverse edges, if you specify `add_reverse_edges` in the `set_model_parameters` method, in a way that doesn't cause information flow. +### **What is a transductive dataset split?** + +The transductive dataset split assumes that the entire graph can be observed in all dataset splits. We distinguish four types of edges, and those are: `validation`, `training`, `message passing` and `supervision edges`. + + + +The transductive dataset split is described in detail by prof. Jure Leskovec at one of its presentations for [Graph ML course](http://web.stanford.edu/class/cs224w/slides/08-GNN-application.pdf). + diff --git a/docs2/advanced-algorithms/available-algorithms/gnn_node_classification.md b/docs2/advanced-algorithms/available-algorithms/gnn_node_classification.md new file mode 100644 index 00000000000..e74dd9ae7f6 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/gnn_node_classification.md @@ -0,0 +1,343 @@ +--- +id: node-classification-with-gnn +title: node_classification_with_gnn +sidebar_label: node_classification_with_gnn +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Node classification** is the problem of finding out the **right label** for a **node** based on its **neighbors’ labels** and **structure similarities**. + +[![docs-source](https://img.shields.io/badge/source-node_classification-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/node_classification.py) + +### About the query module + +This query module contains all necessary functions you need to train GNN model on Memgraph. + +The `node_classification` module supports as follows: +- homogeneous and heterogeneous graphs +- multiple-label and multi-edge-type graphs +- any-size datasets +- the following model architectures: + - Graph Attention with Jumping Knowledge + - multiple versions of Graph attention networks (GAT) + - GraphSAGE +- early stopping +- calculation of various metrics +- predictions for specified nodes +- model saving and loading +- recommendation system use cases + + +The easiest way to test **node_classification** is by downloading [Memgraph Platform](https://memgraph.com/download) +and using some of the preloaded datasets in **Memgraph Lab**. If you want to explore our implementation, jump to **[github/memgraph/mage](https://github.com/memgraph/mage)** and find +`python/node_classification.py`. Feel free to give us a :star: if you like the code. + + +Feel free to open a **[GitHub issue](https://github.com/memgraph/mage/issues)** +or start a discussion on **[Discord](https://discord.gg/memgraph)** if you want +to speed up development. + +## Usage +Load dataset in Memgraph, call `set_model_parameters`, and start training your model. When training is done, query module will save models. +Afterwards, you can test modules on other data (which model has not already seen for example) and inspect the results! +The module reports the [mean average precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html) +for every batch `training` or `evaluation` epoch. + + +To **summarize** basic node classification workflow is as follows: + +- load data to Memgraph +- set parameters by calling `set_model_parameters()` function. Be sure that **node_features** property on nodes are in place. +- call `train()` function +- inspect training results (optional) by calling `get_training_data()` function +- optionally use `save_model()` and `load_model()` +- predict node class by calling `predict()` procedure + + +:::info + +This **MAGE** module is still in its early stage. We intend to use it only for +**exploring or learning** about node classification. If you want it to be production-ready, make sure +to either open a **[GitHub issue](https://github.com/memgraph/mage/issues)** or +drop us a comment on **[Discord](https://discord.gg/memgraph)**. + +::: + +## Procedures + + + +### `set_model_parameters(params)` + +The function initializes all global variables. _You_ can change global variables via **params** dictionary. Procedure checks if variables in **params** are defined appropriately. If so, map of default global parameters is overriden with user defined dictionary params. +After that procedure executes previously defined functions declare_globals and +declare_model_and_data and sets each global variable to some value. + +#### Input: +- `params: (mgp.Map, optional)`: User defined parameters from query module. Defaults to {}. + +| Name | Type | Default | Description | +|----------------------------|--------------|----------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| hidden_features_size | List[Int] | [16, 16] | Embedding dimension for each node in a new layer. | +| layer_type | String | `GATJK` | Type of layer used, supported types: `GATJK`, `GAT`, `GRAPHSAGE`. | +| aggregator | String | `mean` | Type of aggregator used, supported type: `mean`. | +| learning_rate | Float | 0.1 | Optimizer's learning rate. | +| weight_decay | Float | 5e-4 | Optimizer's weight decay. | +| split_ratio | Float | 0.8 | Ratio between training and validation data. | +| metrics | List[String] | `["loss","accuracy","f1_score","precision","recall","num_wrong_examples"]` | List of metrics to report, supports any combination of "loss","accuracy","f1_score","precision","recall","num_wrong_examples". | +| node_id_property | String | `id` | Property name of node features. | +| num_epochs | Integer | 100 | The number of epochs for model training. | +| console_log_freq | Integer | 5 | Specifies how often results will be printed. | +| checkpoint_freq | Integer | 5 | Specifies how often the model will be saved. The model is persisted on disc. | +| device_type | String | `cpu` | Defines if the model will be trained using the `cpu` or `cuda`. To run on `Cuda GPU`, check if the system supports it with `torch.cuda.is_available()`, then set this flag to `cuda`. | +| path_to_model | String | "/tmp/torch_models" | Path for loading and storing the model. | + +#### Exceptions: +- `Exception`: Exception is raised if some variable in dictionary params is not correctly defined. + +#### Output +- `mgp.Record( + hidden_features_size=list, + layer_type=str, + aggregator=str, + learning_rate=float, + weight_decay=float, + split_ratio=float, + metrics=mgp.Any, + node_id_property=str, + num_epochs=int, + console_log_freq=int, + checkpoint_freq=int, + device_type=str, + path_to_model=str, +)` ➑ Map of parameters set for training + +#### Usage: + +```cypher + CALL node_classification.set_model_parameters( + {layer_type: "GATJK", learning_rate: 0.001, hidden_features_size: [16,16], class_name: "fraud", features_name: "embedding"} + ) YIELD * RETURN *; +``` + +### `train(num_epochs)` + +This procedure performs model training. Firstly it declares data, model, optimizer, and criterion. Afterward, it performs training. +#### Input +- `num_epochs (int, optional)` ➑ Number of epochs (default:100). + +#### Exceptions +- `Exception`➑ Raised if graph is empty. + +#### Outputs +- `epoch: int` ➑ Epoch number. +- `loss: float`➑ Loss of model on training data. +- `val_loss: float`➑ Loss of model on validation data. +- `train_log: list`➑ List of metrics on training data. +- `val_log: list`➑ List of metrics on validation data. + +#### Usage +```cypher + CALL node_classification.train() YIELD * RETURN *; +``` + +### `get_training_data()` +Use following procedure to get logged data from training. + +#### Return values +- `epoch: int` ➑ Epoch number for current record's logged data. +- `loss: float`➑ Loss in epoch. +- `train_log: mgp.Any` ➑ Training parameters for epoch. +- `val_log: mgp.Any`➑ Validation parameters for epoch. + +#### Usage +```cypher + CALL node_classification.get_training_data() YIELD * RETURN *; +``` + +### `save_model()` + +This function saves the model to a specified folder. If there are already **max_models_to_keep** in the folder, +the oldest model is deleted. + +#### Exception +- `Exception`: Raised if model is not initialized or defined. + +#### Return values +- `path (str)`➑ Path to the stored model. +- `status (str)`➑ Status of the stored model. + +#### Usage +```cypher + CALL node_classification.save_model() YIELD * RETURN *; +``` + +### `load_model(num)` + +This function loads the model from the specified folder. + +#### Input + +- `num (int, optional)`: Ordinal number of model to load from the default path on the disc (default: 0, i.e., newest model). + +#### Return values +- `path: str` ➑ Path of loaded model. + +#### Usage + +```cypher + CALL node_classification.load_model() YIELD * RETURN *; +``` + +### `predict(vertex)` + +This function predicts metrics on one node. It is suggested to load the test data (data without labels) as well. Test data +won't be a part of the training or validation process. + +#### Input +- `vertex: mgp.Vertex`➑ Prediction node. + +#### Return values +- `predicted_class: int`➑ Predicted class for specified node. + +#### Usage: +```cypher +MATCH (n {id: 1}) CALL node_classification.predict(n) YIELD * RETURN predicted_value; +``` + +### `reset()` +This function resets all variables to default values. + +#### Return values +- `status (str)`: Status of reset function. + +#### Usage: +```cypher + CALL node_classification.reset() YIELD * RETURN *; +``` + +## Example + + + + + + + + + + + + + + +```cypher +CREATE (v1:PAPER {id: 10, features: [1, 2, 3], label:0}); +CREATE (v2:PAPER {id: 11, features: [1.54, 0.3, 1.78], label:0}); +CREATE (v3:PAPER {id: 12, features: [0.5, 1, 4.5], label:0}); +CREATE (v4:PAPER {id: 13, features: [0.78, 0.234, 1.2], label:0}); +CREATE (v5:PAPER {id: 14, features: [3, 4, 100], label:0}); +CREATE (v6:PAPER {id: 15, features: [2.1, 2.2, 2.3], label:1}); +CREATE (v7:PAPER {id: 16, features: [2.2, 2.3, 2.4], label:1}); +CREATE (v8:PAPER {id: 17, features: [2.3, 2.4, 2.5], label:1}); +CREATE (v9:PAPER {id: 18, features: [2.4, 2.5, 2.6], label:1}); +MATCH (v1:PAPER {id:10}), (v2:PAPER {id:11}) CREATE (v1)-[e:CITES {}]->(v2); +MATCH (v2:PAPER {id:11}), (v3:PAPER {id:12}) CREATE (v2)-[e:CITES {}]->(v3); +MATCH (v3:PAPER {id:12}), (v4:PAPER {id:13}) CREATE (v3)-[e:CITES {}]->(v4); +MATCH (v4:PAPER {id:13}), (v1:PAPER {id:10}) CREATE (v4)-[e:CITES {}]->(v1); +MATCH (v4:PAPER {id:13}), (v5:PAPER {id:14}) CREATE (v4)-[e:CITES {}]->(v5); +MATCH (v5:PAPER {id:14}), (v6:PAPER {id:15}) CREATE (v5)-[e:CITES {}]->(v6); +MATCH (v6:PAPER {id:15}), (v7:PAPER {id:16}) CREATE (v6)-[e:CITES {}]->(v7); +MATCH (v7:PAPER {id:16}), (v8:PAPER {id:17}) CREATE (v7)-[e:CITES {}]->(v8); +MATCH (v8:PAPER {id:17}), (v9:PAPER {id:18}) CREATE (v8)-[e:CITES {}]->(v9); +MATCH (v9:PAPER {id:18}), (v6:PAPER {id:15}) CREATE (v9)-[e:CITES {}]->(v6); +``` + + + + + +```cypher +CALL node_classification.set_model_parameters({layer_type: "GAT", learning_rate: 0.001, + hidden_features_size: [2,2], + class_name: "label", features_name: "features", console_log_freq:1}) YIELD * +RETURN *; +``` + + + + + + +```cypher +CALL node_classification.train(5) YIELD epoch, loss RETURN *; +``` + + + + + +```plaintext ++----------+----------+ +| epoch | loss | ++----------+----------+ +| 1 | 0.788709 | +| 2 | 0.765075 | +| 3 | 0.776351 | +| 4 | 0.727615 | +| 5 | 0.727735 | + +``` + + + + + +```cypher + MATCH (v1:PAPER {id: 10}) + CALL node_classification.predict(v1) YIELD predicted_class RETURN predicted_class, v1.label as correct_class; +``` + + + + + +```plaintext ++-----------------+-----------------+ +| predicted_class | correct_class | ++-----------------+-----------------+ +| 0 | 0 | ++-----------------+-----------------+ +``` + + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/graph_analyzer.md b/docs2/advanced-algorithms/available-algorithms/graph_analyzer.md new file mode 100644 index 00000000000..fb269072840 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/graph_analyzer.md @@ -0,0 +1,171 @@ +--- +id: graph_analyzer +title: graph_analyzer +sidebar_label: graph_analyzer +--- + + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +The first thing someone should focus on when working with graphs is getting deep analytics of the current state. That is what this module is doing. By using the power of NetworkX, various different graph properties are extracted. This module has the ability to run on a subgraph if a subgraph of nodes is provided as input. + +The list of analytics that the module uses: + +* **nodes**: Number of nodes +* **edges**: Number of edges +* **bridges**: Number of bridges +* **articulation_points**: Number of articulation points +* **avg_degree**: Average degree +* **sorted_nodes_degree**: Sorted nodes degree +* **self_loops**: Self loops +* **is_bipartite**: Is bipartite +* **is_plannar**: Is planar +* **is_biconnected**: Is biconnected +* **is_weakly_connected**: Is weakly connected +* **number_of_weakly_components**: Number of weakly connected components +* **is_strongly_connected**: Is strongly connected +* **strongly_components**: Number of strongly connected components +* **is_dag**: Is directed acyclic graph (DAG) +* **is_eulerian**: Is eulerian +* **is_forest**: Is forest +* **is_tree**: Is tree + +[![docs-source](https://img.shields.io/badge/source-graph_analyzer-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/graph_analyzer.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `analyze(analyses)` + +#### Input: + +* `analyses: List[string] (default=NULL)` ➑ List of analytics names to be fetched. If provided with NULL, the whole set of analytics will be included. + +#### Output: + +* `name: string` ➑ The name of the analytics +* `value: string` ➑ Analytics value, stored as a string + +#### Usage: +```cypher +CALL graph_analyzer.analyze() YIELD *; +``` + +### `analyze_subgraph(vertices, edges, analyses)` + +#### Input: + +* `vertices: List[Vertex]` ➑ Subset of vertices within a graph. +* `edges: List[Edge]` ➑ Subset of edges in a graph for which analytics will take place. +* `analyses: List[string] (default=NULL)` ➑ List of analytics names to be fetched. If provided with NULL, theΒ whole set of analytics will be included. + +#### Output: + +* `name: string` ➑ The name of the analytics +* `value: string` ➑ Analytics value, stored as a string + +#### Usage: +```cypher +MATCH (n)-[e]-(m) +WITH COLLECT(n) AS nodes_subset, COLLECT(e) AS edges_subset +CALL graph_analyzer.analyze(nodes_subset, edges_subset) YIELD name, value +RETURN name, value; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 7}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 11}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL graph_analyzer.analyze([ + "nodes", "edges", "bridges", "articulation_points", + "avg_degree", "is_dag", "is_tree", "strongly_components" + ]) +YIELD * +RETURN *; +``` + + + + + + +```plaintext ++-------------------------------------------+-------------------------------------------+ +| name | value | ++-------------------------------------------+-------------------------------------------+ +| "Number of nodes" | "12" | +| "Number of edges" | "14" | +| "Number of bridges" | "2" | +| "Number of articulation points" | "3" | +| "Average degree" | "1.1666666666666667" | +| "Is DAG" | "True" | +| "Is tree" | "False" | +| "Number of strongly connected components" | "12" | ++-------------------------------------------+-------------------------------------------+ +``` + + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/graph_coloring.md b/docs2/advanced-algorithms/available-algorithms/graph_coloring.md new file mode 100644 index 00000000000..f264e21d0c6 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/graph_coloring.md @@ -0,0 +1,197 @@ +--- +id: graph_coloring +title: graph_coloring +sidebar_label: graph_coloring +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + + +Graph coloring is the assignment of colors to nodes such that two nodes connected with an edge don’t have the same color. The goal is to minimize the number of colors while correctly coloring a graph. + +Algorithm implementation is inspired by "[Quantum Annealing (QA)](https://link.springer.com/chapter/10.1007/978-3-642-22000-5_57)" [^1], a simple metaheuristic frequently used for solving discrete optimization problems. + +QA is a simple strategy for exploring a solution space. The main idea is to start a search with several possible solutions. These solutions change through iterations and produce new β€œbetter” solutions. Each solution has a particular error value which depends on the error function that the algorithm optimizes. In the graph coloring scenario, the error function is often defined as the number of edges connecting nodes with the same color. + +The algorithm is iterative. It applies several simple rules to change solutions in a loop (the same rules are applied multiple times). The algorithm terminates when a stop criterion is met, usually when the error becomes zero. One of the rules could be to randomly select the node involved in a conflict and change its colors. + +Changes made in one iteration may not be permanent if they don’t improve the solution. But, with a certain probability, the new solution is accepted even if its error is not reduced. In that way, the algorithm is prevented from converging to local minimums too early. + +[^1] [Graph Coloring with a Distributed Hybrid Quantum Annealing Algorithm](https://link.springer.com/chapter/10.1007/978-3-642-22000-5_57), Olawale TitiloyeAlan Crispin + +[![docs-source](https://img.shields.io/badge/source-graph_coloring-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/graph_coloring.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **weighted**/**unweighted**| +| **Parallelism** | **parallel** | + +## Procedures + + + +### `color_graph(parameters, edge_property)` + +#### Input: + +* `parameters: Dict[string, Any] (default={})` ➑ A dictionary that specifies the algorithm configuration. Configuration parameters are explained in the table below. +* `edge_property: string (default=weight)` ➑ Edge property that stores the edge weight. Any edge attribute not present defaults to 1. + + +#### Output: + +* `node: Vertex` ➑ Represents the node + +* `color: int` ➑ Represents the assigned color + +#### Usage: +```cypher +CALL graph_coloring.color_graph() +YIELD node, color; +``` + +### ` color_subgraph(vertices, edges, parameters, edge_property)` + +#### Input: +* `vertices: List[Vertex]` ➑ List of vertices in the subgraph. +* `edges: List[Edge]` ➑ List of edges in the subgraph. +* `parameters: Dict[string, Any] (default={})` ➑ A dictionary that specifies the algorithm configuration. Configuration parameters are explained in the table below. +* `edge_property: string (default=weight)` ➑ Edge property that stores the edge weight. Any edge attribute not present defaults to 1. + + +#### Output: + +* `node: Vertex` ➑ Represents the node + +* `color: int` ➑ Represents the assigned color + +#### Usage: +```cypher +MATCH (a)-[e]->(b) +WITH collect(a) as nodes, collect (e) as edges +CALL graph_coloring.color_subgraph(nodes, edges, {no_of_colors: 2}) +YIELD node, color; +``` + +### Parameters + +| Name | Type | Default | Description | +|- |- |- |- | +| algorithm | String | QA | An optimization strategy used to find graph coloring. | +| no_of_colors | Integer | 10 | The number of colors used to color the nodes of the graph. | +| no_of_processes | Integer | 1 | The number of processes used to execute the algorithm in parallel. | +| population_size | Integer | 15 | The number of different solutions that are improved through iterations. | +| population_factory | String | ChainChunkFactory | The name of a function that generates an initial population. | +| init_algorithms | List[String] | [SDO, LDO] | Contains algorithms used to initialize the solutions. | +| error | String | ConflictError | The name of an error function that is minimized by an optimization strategy. | +| max_iterations | Integer | 10 | The maximum number of iterations of an algorithm. | +| iteration_callbacks | List[String] | [] | Contains iteration callbacks. Iteration callback is called after each iteration of the iterative algorithm. Iteration callback saves certain population information and calls specified actions if certain conditions are met. | +| communication_delay | Integer | 10 | The number of iterations that must pass for neighboring parts to exchange solutions. | +| logging_delay | Integer | 10 | The number of iteration after the algorithm information is logged. | +| QA_temperature | Float | 0.035 | The temperature parameter of the quantum annealing algorithm. | +| QA_max_steps | Float | 10 | The maximum number of steps in one iteration. | +| conflict_err_alpha | Float | 0.1 | The number that scales the sum of the conflicting edges in the error function formula. | +| conflict_err_beta | Float | 0.001 | The number that scales the correlation between solutions in the error function formula. | +| mutation | String | SimpleMutation | The name of a function that changes the solutions. | +| multiple_mutation_no_of_nodes | Integer | 2 | The number of nodes that will change color. | +| random_mutation_probability | Float | 0.1 | The probability that the color of the random node (it does not have to be conflicting) will be changed. | +| simple_tunneling_mutation | String | MultipleMutation | The name of a mutation function. | +| simple_tunneling_probability | Float | 0.5 | The probability of changing an individual. | +| simple_tunneling_error_correction | Float | 2 | The mutated individual is accepted only if its error is less than the error of the old individual multiplied by this parameter. | +| simple_tunneling_max_attempts | Integer | 25 | The maximum number of mutation attempts until the individual is accepted. | +| convergence_callback_tolerance | Integer | 500 | The maximum number of iterations in which if the algorithm does not find a better solution convergence occurs and defined actions are called. | +| convergence_callback_actions | String | [SimpleTunneling] | Actions that are called when convergence is detected.| + + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 9}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 7}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL graph_coloring.color_graph({no_of_colors: 4}) +YIELD node, color +RETURN node, color; +``` + + + + + + +```plaintext ++-------+-------+ +| node | color | ++-------+-------+ +| "130" | "1" | +| "131" | "3" | +| "132" | "0" | +| "133" | "1" | +| "134" | "2" | +| "135" | "1" | +| "136" | "3" | +| "137" | "0" | +| "138" | "0" | +| "139" | "3" | +| "140" | "1" | ++-------+-------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/graph_util.md b/docs2/advanced-algorithms/available-algorithms/graph_util.md new file mode 100644 index 00000000000..10358840d54 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/graph_util.md @@ -0,0 +1,167 @@ +--- +id: graph_util +title: graph_util +sidebar_label: graph_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Graph util** is a collection of Memgraph's utility graph algorithms. The algorithms that are included in this module +are the ones that may suit a developer's day-to-day job while prototyping new +solutions, with various graph manipulation tools to accelerate development. + +[![docs-source](https://img.shields.io/badge/source-graph_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/cpp/graph_util_module) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **C++** | +| **Graph direction** | **directed**/**undirected** | +| **Edge weights** | **unweighted**/**weighted** | +| **Parallelism** | **sequential** | + +### Procedures + + + +### `ancestors(node)` + +Find the ancestor nodes of the input node. Ancestor nodes are all the nodes from which +there exists a path to the input node. + +#### Input: + +- `node: Vertex` ➑ node for which we want to find ancestors + + +#### Output: + +- `ancestors: List[Vertex]` ➑ List of ancestors from which a path to the source node exists + +#### Usage: + +```cypher +MATCH (n {id:1}) +CALL graph_util.ancestors(n) +YIELD ancestors +UNWIND ancestors AS ancestor +RETURN ancestor; +``` + +### `chain_nodes(nodes, edge_type)` + +Creates a relationship between each of the neighboring nodes in the input list, `nodes`. Each of the relationships +gets the edge type from the second input parameter `edge_type`. + +#### Input: + +- `nodes: List[Vertex]` ➑ List of nodes between which we want to create corresponding relationships between them +- `edge_type: String` ➑ The name of the relationship that will be created between nodes. + + +#### Output: + +- `connections: List[Edge]` ➑ List of relationships that connect the nodes. Each node is connected with the node following it in the input list, using the relationship type specified as the second input parameter. + +#### Usage: + +```cypher +MATCH (n) +WITH collect(n) AS nodes +CALL graph_util.chain_nodes(nodes, "MY_EDGE") +YIELD connections +RETURN nodes, connections; +``` + +### `connect_nodes(nodes)` + +Returns a list of relationships that connect the list of inputted nodes. +Typically used to create a subgraph from returned nodes. +#### Input: + +- `nodes: List[Vertex]` ➑ List of nodes for which we want to find corresponding connections, i.e., relationships between them + + +#### Output: + +- `connections: List[Edge]` ➑ List of relationships that connect the starting graph's input nodes + +#### Usage: + +```cypher +MATCH (n) +WITH collect(n) AS nodes +CALL graph_util.connect_nodes(nodes) +YIELD connections +RETURN nodes, connections; +``` + +### `descendants(node)` + +Find the descendant nodes of the input node. Descendant nodes are all the nodes to which +there exists a path from the input node. + +#### Input: + +- `node: Vertex` ➑ node for which we want to find descendants + + +#### Output: + +- `descendants: List[Vertex]` ➑ List of descendants to which a path from the source node exists + +#### Usage: + +```cypher +MATCH (n {id:1}) +CALL graph_util.descendants(n) +YIELD descendants +UNWIND descendants AS descendant +RETURN descendant; +``` + +### `topological_sort()` + +The topological sort algorithm takes a directed graph and returns an array of the nodes where each node appears before all the nodes it points to. The ordering of the nodes in the array is called a topological ordering. + +#### Input: + +- there is no input to this graph. The sort is done either on the whole graph, or a graph projection. + + +#### Output: + +- `sorted_nodes: List[Vertex]` ➑ Node ordering in which each node appears before all nodes to which it points + +#### Usage: + +Usage on the whole graph: +```cypher +CALL graph_util.topological_sort() YIELD sorted_nodes +UNWIND sorted_nodes AS nodes +RETURN nodes.name; +``` + +Usage on a graph projection: +```cypher +MATCH p=(sl:SomeLabel)-[*bfs]->(al:AnotherLabel) +WITH project(p) AS graph +CALL graph_util.topological_sort(graph) YIELD sorted_nodes +UNWIND sorted_nodes AS nodes +RETURN nodes.name; +``` + diff --git a/docs2/advanced-algorithms/available-algorithms/igraphalg.md b/docs2/advanced-algorithms/available-algorithms/igraphalg.md new file mode 100644 index 00000000000..c264a2b1432 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/igraphalg.md @@ -0,0 +1,266 @@ +--- +id: igraphalg +title: igraphalg +sidebar_label: igraphalg +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +The **igraphalg** module provides a comprehensive set of thin wrappers around some of the algorithms present in the [igraph](https://igraph.org/) package. The wrapper functions can create an igraph compatible graph-like object that can stream the native database graph directly, significantly lowering memory usage. + +[![docs-source](https://img.shields.io/badge/source-igraphalg-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/igraphalg.py) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed**/**undirected** | +| **Edge weights** | **weighted**/**unweighted** | +| **Parallelism** | **sequential** | + +:::tip + +If you are not satisfied with the performance of algorithms from the igraphalg +module, check Memgraph's native implementation of algorithms such as PageRank, +shortest path, and others written in C++ + +::: + +## Procedures + + + +### `get_all_simple_paths(v, to, cutoff)` + +Returns all simple paths in the graph `G` from source to target. A simple path is a path with no repeated nodes. + +#### Input: + +* `v: Vertex` ➑ Path's starting node. +* `to: Vertex` ➑ Path's ending node. +* `cutoff: int (default=-1)` ➑ Maximum length of the considered path. If negative, paths of all lengths are considered. + +#### Output: + +* `path: List[Vertex]` ➑ List of vertices for a certain path. If there are no paths between the source and the target within the given cutoff, there is no output. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL igraphalg.get_all_simple_paths(n, m, 5) YIELD * +RETURN path; +``` + +### `spanning_tree(weights, directed)` +Returns a minimum spanning tree on a graph `G`. + A *minimum spanning tree* is a subset of the edges of a connected graph that connects all of the vertices without any cycles. + +#### Input: + +* `weights: string (default=NULL)` ➑ Data key to use for edge weights. +* `directed: bool (default=False)` ➑ If `true` the graph is directed, otherwise it's undirected. + +#### Output: + +* `tree: List[List[Vertex]]` ➑ A minimum spanning tree or forest. + +#### Usage: + +```cypher +CALL igraphalg.spanning_tree() +YIELD * +RETURN tree; +``` + +### `pagerank(damping, weights, directed,implementation)` +Returns the PageRank of the nodes in the graph. + +PageRank computes a ranking of the nodes in graph G based on the structure of the incoming links. It was originally designed as an algorithm to rank web pages. + +#### Input: + +* `damping: double (default=0.85)` ➑ Damping parameter for PageRank. +* `weights: string (default="weight")` ➑ Edge data key to use as a weight. If `None`, weights are set to 1. +* `directed: bool (default=True)` ➑ If `true` the graph is directed, otherwise it's undirected. +* `implementation: string (default="prpack")` ➑ Algorithm used for calculating PageRank values. The algorithm can be either `prpack` or `arpack`. + +#### Output: + +* `node: Vertex` ➑ Vertex for which the PageRank is calculated. +* `rank: double` ➑ Node's PageRank value. + +#### Usage: + +```cypher +CALL igraphalg.pagerank() YIELD * +RETURN node, rank; +``` + +### `get_shortest_path(source, target, weights, directed)` +Compute the shortest path in the graph. + +#### Input: + +* `source: Vertex (default=NULL)` ➑ Path's starting node. +* `target: Vertex (default=NULL)` ➑ Path's ending node. +* `weights: string (default=NULL)` ➑ If `None`, every edge has weight/distance/cost 1. If the value is a property name, use that property as the edge weight. If an edge doesn't have a property, the value defaults to 1. +* `directed: bool (default=True)` ➑ If `true`, the graph is directed, otherwise, it's undirected. + + +#### Output: +* `path: List[Vertex]` ➑ Path between `source` node and `target` node. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL igraphalg.get_shortest_path(n, m) YIELD * +RETURN path; +``` + +### `shortest_path_length(source, target, weights, directed)` +Compute the shortest path length in the graph. + +#### Input: + +* `source: Vertex (default=NULL)` ➑ Path's starting node. +* `target: Vertex (default=NULL)` ➑ Path's ending node. +* `weights: string (default=NULL)` ➑ If `None`, every edge has weight/distance/cost 1. If the value is a property name, use that property as the edge weight. If an edge doesn't have a property, the value defaults to 1. +* `directed: bool (default=True)` ➑ If `true`, the graph is directed, otherwise, it's undirected. + +#### Output: + +* `length: double` ➑ Shortest path length between the `source` node and `target` node. If there is no path it returns `inf`. + +#### Usage: + +```cypher +MATCH (n:Label), (m:Label) +CALL igraphalg.shortest_path_length(n, m) YIELD length +RETURN length; +``` + +### `topological_sort(mode)` +Returns nodes in topologically sorted order. + A *topological sort* is a non-unique permutation of the nodes such that an edge from `u` to `v` implies that `u` appears before `v` in the topological sort order. + +#### Input: + +* `mode: string (default="out")` ➑ Specifies how to use the direction of the edges. For `out`, the sorting order ensures that each node comes before all nodes to which it has edges, so nodes with no incoming edges go first. For `in`, it is quite the opposite: each node comes before all nodes from which it receives edges. Nodes with no outgoing edges go first. + +#### Output: + +* `nodes: List[Vertex]` ➑ A list of nodes in topological sorted order. + +#### Usage: + +```cypher +CALL igraphalg.topological_sort() YIELD * +RETURN nodes; +``` + +### `maxflow(source, target, capacity)` +The maximum flow problem consists of finding a flow through a graph such that it is the maximum possible flow. + +#### Input: + +* `source: Vertex` ➑ Source node from which the maximum flow is calculated. +* `target: Vertex` ➑ Sink node to which the max flow is calculated. +* `capacity: string (default="weight")` ➑ Edge property which is used as the flow capacity of the edge. + +#### Output: + +* `max_flow: Number` ➑ Maximum flow of the network, from source to sink + +#### Usage: + +```cypher +MATCH (source {id: 0}), (sink {id: 5}) +CALL igraphalg.maxflow(source, sink, "weight") +YIELD max_flow +RETURN max_flow; +``` + +### `mincut(source, target, capacity,directed)` +Minimum cut calculates the minimum st-cut between two vertices in a graph. + +#### Input: + +* `source: Vertex` ➑ Source node from which the maximum flow is calculated. +* `target: Vertex` ➑ Sink node to which the max flow is calculated. +* `capacity: string (default="weight")` ➑ Edge property which is used as the capacity of the edge. + +#### Output: + +* `node: Vertex` ➑ Vertex in graph. +* `partition_id: int` ➑ Id of the partition where `node` belongs after min-cut. + +#### Usage: + +```cypher + MATCH (source {id: 0}), (sink {id: 5}) + CALL igraphalg.mincut(source, sink) + YIELD node, partition_id + RETURN node, partition_id; +``` + +### `community_leiden(objective_function, weights, resolution_parameter, beta, initial_membership, n_iterations, node_weights)` +Finding community structure of a graph using the Leiden algorithm of Traag, van Eck & Waltman. + +#### Input: + +* `objective_function: string (default="CPM")` ➑ Whether to use the Constant Potts Model (CPM) or modularity. Must be either `CPM` or `modularity`. +* `weights: string (default=NULL)` ➑ If a string is present, use this edge attribute as the edge weight if it isn't edge weights default to 1. +* `resolution_parameter: float (default=1.0)` ➑ Higher resolutions lead to smaller communities, while lower resolutions lead to fewer larger communities. +* `beta: float (default=0.01)` ➑ Parameter affecting the randomness in the Leiden algorithm. This affects only the refinement step of the algorithm. +* `initial_membership: List[int](default=NULL)` ➑ If provided, the Leiden algorithm will try to improve this provided membership. If no argument is provided, the algorithm simply starts from the singleton partition. +* `n_iterations: int (default=2)` ➑ The number of iterations to iterate the Leiden algorithm. Each iteration may improve the partition further. +`vertex_weights: List[float] (default=NULL)` ➑ The vertex weights used in the Leiden algorithm. If this is not provided, it will be automatically determined based on the objective_function. + +#### Output: + +* `node: Vertex` ➑ Vertex in graph. +* `community_id: int` ➑ Id of community where `node` belongs. + +#### Usage: + +```cypher + CALL igraphalg.community_leiden() + YIELD node, community_id + RETURN node, community_id; +``` + +### `all_shortest_path_lengths( weights, directed)` +Compute all shortest path lengths in the graph. + +#### Input: + +* `weights: string (default=NULL)` ➑ If `None`, every edge has weight/distance/cost 1. If the value is a property name, use that property as the edge weight. If an edge doesn't have a property, the value defaults to 1. +* `directed: bool (default=True)` ➑ If `true`, the graph is directed, otherwise, it's undirected. + +#### Output: + +* `src_node: Vertex` ➑ `Source` node. +* `dest_node: Vertex` ➑ `Destination` node. +* `length: double` ➑ If `true`, the graph is directed, otherwise, it's undirected. + +#### Usage: + +```cypher +CALL igraphalg.all_shortest_path_length() + YIELD src_node, dest_node, length + RETURN src_node, dest_node, length; +``` diff --git a/docs2/advanced-algorithms/available-algorithms/import_util.md b/docs2/advanced-algorithms/available-algorithms/import_util.md new file mode 100644 index 00000000000..626b18aeac8 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/import_util.md @@ -0,0 +1,252 @@ +--- +id: import_util +title: import_util +sidebar_label: import_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +Module for importing data from different formats. Currently, this module +supports only the import of JSON file format. + +[![docs-source](https://img.shields.io/badge/source-import_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/import_util.py) + + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `json(path)` + +#### Input: + +* `path: string` ➑ Path to the JSON file that is being imported. + +#### Usage: +The JSON file you're importing needs to be structured the same as the JSON file +that the +[`export_util.json()`](/docs/mage/query-modules/python/export-util) +procedure generates. The generated JSON file is a list of objects representing +nodes or relationships. If the object is node, then it looks like this: + +```json +{ + "id": 4000, + "labels": [ + "City" + ], + "properties": { + "id": 0, + "name": "Amsterdam", + }, + "type": "node" +} +``` + +The `id` key has the value of the Memgraph's internal node ide. The `labels` key +holds the information about node labels in a list. The `properties` are +key-value pairs representing properties of the certain node. Each node needs to +have the value of `type` set to `"node"`. + +On the other hand, if the object is a relationship, then it is structured like this: + +```json +{ + "end": 4052, + "id": 7175, + "label": "CloseTo", + "properties": { + "eu_border": true + }, + "start": 4035, + "type": "relationship" +} +``` + +The `end` and `start` keys hold the information about the internal ids of start +and end node of the relationship. Each relationship also has it's internal id +exported as a value of `id` key. A relationship can only have one label which is +exported to the `label` key. Properties are again key-value pairs, and the value +of `type` needs to be set to `"relationship"`. + + +The `path` you have to provide as procedure argument depends on how you started +Memgraph. + + + + + + +If you ran Memgraph with Docker, you need to save the JSON file inside the +Docker container. We recommend saving the JSON file inside the +`/usr/lib/memgraph/query_modules` directory. + +You can call the procedure by running the following query: + +```cypher +CALL export_util.json(path); +``` +where `path` is the path to the JSON file inside the +`/usr/lib/memgraph/query_modules` directory in the running Docker container (e.g., +`/usr/lib/memgraph/query_modules/import.json`). + +:::info +You can copy the JSON file to the running Docker container with the [`docker cp`](https://docs.docker.com/engine/reference/commandline/cp/) command: +``` +docker cp /path_to_local_folder/import.json :/usr/lib/memgraph/query_modules/import.json +``` +::: + + + + +To import a local JSON file call the procedure by running the following query: + +```cypher +CALL export_util.json(path); +``` +where `path` is the path to a local JSON file that will be created inside the +`import_folder` (e.g., `/users/my_user/import_folder/export.json`). + + + + +## Example - Importing JSON file to create a database + + + + +Below is the content of the `import.json` file. + +- If you're using **Memgraph with Docker**, then you have to save the + `import.json` file in the `/usr/lib/memgraph/query_modules` directory inside + the running Docker container. + +- If you're using **Memgraph on Ubuntu, Debian, RPM package or WSL**, then you + have to save the `import.json` file in the local + `/users/my_user/import_folder` directory. + +```json +[ + { + "id": 6114, + "labels": [ + "Person" + ], + "properties": { + "name": "Anna" + }, + "type": "node" + }, + { + "id": 6115, + "labels": [ + "Person" + ], + "properties": { + "name": "John" + }, + "type": "node" + }, + { + "id": 6116, + "labels": [ + "Person" + ], + "properties": { + "name": "Kim" + }, + "type": "node" + }, + { + "end": 6115, + "id": 21120, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21121, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21122, + "label": "IS_MARRIED_TO", + "properties": {}, + "start": 6115, + "type": "relationship" + } +] + +``` + + + + +If you're using **Memgraph with Docker**, then the following Cypher query will +create a graph database from the provided JSON file: + +```cypher +CALL import_util.json("/usr/lib/memgraph/query_modules/import.json"); +``` + +If you're using **Memgraph on Ubuntu, Debian, RPM package or WSL**, then the +following Cypher query will create a graph database from the provided JSON file: + +```cypher +CALL import_util.json("/users/my_user/import_folder/import.json"); +``` + + + + + +After you import the `import.json` file, you get the following graph database: + + + + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/json_util.md b/docs2/advanced-algorithms/available-algorithms/json_util.md new file mode 100644 index 00000000000..951fd85b550 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/json_util.md @@ -0,0 +1,180 @@ +--- +id: json_util +title: json_util +sidebar_label: json_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +A module for loading JSON from a local file or remote address. If the JSON that is being loaded is an array, then this module loads it as a stream of values, and if it is a map, the module loads it as a single value. + +[![docs-source](https://img.shields.io/badge/source-json_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/json_util.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `load_from_path(path)` + +#### Input: + +* `path: string` ➑ Path to the JSON that is being loaded. + +#### Output: + +* `objects: List[object]` ➑ list of JSON objects from the file that is being loaded. + +#### Usage: +```cypher +CALL json_util.load_from_path(path) +YIELD objects +RETURN objects; +``` + +### `load_from_url(url)` + +#### Input: + +* `url: string` ➑ URL to the JSON that is being loaded. + +#### Output: + +* `objects: List[object]` ➑ list of JSON objects from the file that is being loaded. + +#### Usage: +```cypher +CALL json_util.load_from_url(url) +YIELD objects +RETURN objects; +``` + +## Example - Loading JSON from path + + + + + For example, let the input path be `"load-data/data.json"`. There we can find `data.json`: + +```json +{ + "first_name": "Jessica", + "last_name": "Rabbit", + "pets": [ + "dog", + "cat", + "bird" + ] +} +``` + + + + +```cypher +CALL json_util.load_from_path("load-data/data.json") +YIELD objects +UNWIND objects AS o +RETURN o.first_name AS name, o.last_name AS surname; +``` + + + + + + +```plaintext ++------------------+-------------------+ +| name | surname | ++------------------+-------------------+ +| Jessica | Rabbit | ++------------------+-------------------+ + +``` + + + + + + +## Example - Loading JSON from URL + + + + + For example, let the input URL be `"https://download.memgraph.com/asset/mage/data.json"`. There we can find `data.json`: + +```json +{ + "first_name": "James", + "last_name": "Bond", + "pets": [ + "dog", + "cat", + "fish" + ] +} +``` + + + + +```cypher +CALL json_util.load_from_url("https://download.memgraph.com/asset/mage/data.json") +YIELD objects +UNWIND objects AS o +RETURN o.first_name AS name, o.last_name AS surname; +``` + + + + + + +```plaintext ++------------------+-------------------+ +| name | surname | ++------------------+-------------------+ +| James | Bond | ++------------------+-------------------+ + +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/katz_centrality.md b/docs2/advanced-algorithms/available-algorithms/katz_centrality.md new file mode 100644 index 00000000000..ce8cac936b3 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/katz_centrality.md @@ -0,0 +1,156 @@ +--- +id: katz_centrality +title: katz_centrality +sidebar_label: katz_centrality +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Katz Centrality** is the measurement of centrality that incorporates the +inbound path length starting from the wanted node. More central nodes will have +closer connections rather than having many long-distance nodes connected to +them. + +The implemented algorithm is based on the work of Alexander van der Grinten et. +al. called [Scalable Katz Ranking Computation in Large Static and Dynamic +Graphs](https://arxiv.org/pdf/1807.03847.pdf)[^1]. The author proposes an +estimation method that preserves rankings for both static and dynamic Katz +centrality scenarios. + +Theoretically speaking there exists an attenuation factor `(alpha^i)` smaller +than 1 which is applied to walks of length `i`. If `w_i(v)` is the number of +walks of length `i` starting from node `v`, Katz centrality is defined as: + +``` +Centrality(v) = sum { w_i(v) * alpha ^ i} +``` + +The constructed algorithm computes Katz centrality by iteratively improving the +upper and lower bounds on centrality scores. This guarantees that centrality +rankings will be correct, but it does not guarantee that the corresponding +resulting centralities will be correct. + +[^1] [Scalable Katz Ranking Computation in Large Static and Dynamic +Graphs](https://arxiv.org/pdf/1807.03847.pdf), Alexander van der Grinten et. al. + +[![docs-source](https://img.shields.io/badge/source-katz_centrality-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/katz_centrality_module/katz_centrality_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get(alpha, epsilon)` + +#### Input: + +- `alpha: double (default=0.2)` ➑ Exponential decay factor defining the walk length + importance. +- `epsilon: double (default=1e-2)` ➑ Convergence tolerance. Minimal difference in two + adjacent pairs of nodes in the final ranking. + +#### Output: + +- `node` ➑ Node in the graph, for which Katz Centrality is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the centrality value after + bound convergence + +#### Usage: + +```cypher +CALL katz_centrality.get() +YIELD node, rank; +``` + +## Example + + + + + + + + + +```cypher +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 9}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +CALL katz_centrality.get() +YIELD node, rank +RETURN node, rank; +``` + + + + +```plaintext ++------------------+------------------+ +| node | rank | ++------------------+------------------+ +| (:Node {id: 9}) | 0.544 | +| (:Node {id: 7}) | 0 | +| (:Node {id: 6}) | 0 | +| (:Node {id: 5}) | 0 | +| (:Node {id: 4}) | 0 | +| (:Node {id: 3}) | 0 | +| (:Node {id: 8}) | 0.408 | +| (:Node {id: 2}) | 1.08 | +| (:Node {id: 10}) | 1.864 | +| (:Node {id: 0}) | 0.28 | +| (:Node {id: 1}) | 0.408 | ++------------------+------------------+ + +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/katz_centrality_online.md b/docs2/advanced-algorithms/available-algorithms/katz_centrality_online.md new file mode 100644 index 00000000000..dd1c5266d5b --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/katz_centrality_online.md @@ -0,0 +1,223 @@ +--- +id: katz_centrality_online +title: katz_centrality_online +sidebar_label: katz_centrality_online +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +Because of its simplicity, **Katz Centrality** has become one of the most +established centrality measurements. The definition of Katz centrality is that +it presents the amount of influence by summing all walks starting from the node +of interest and weighting walks by some attenuation factor smaller than 1. + +Just as the other centrality measures got their dynamic algorithm +implementations, so did **Katz Centrality**. The implementation results in a +reduction of computations needed to update already calculated results. The +algorithm offers substantially large speedups compared to static algorithm runs. + +The algorithm is based on the work of Alexander van der Grinten et. al. called +[Scalable Katz Ranking Computation in Large Static and Dynamic +Graphs](https://arxiv.org/pdf/1807.03847.pdf)[^1]. The author proposes an +estimation method that computes Katz's centrality by iteratively improving upper +and lower bounds on the centrality scores. The computed scores may differ from +the real values, but the algorithm has the guarantee of preserving the rankings. + +[^1] [Scalable Katz Ranking Computation in Large Static and Dynamic +Graphs](https://arxiv.org/pdf/1807.03847.pdf), Alexander van der Grinten et. al. + +### Usage + +Online Katz centrality should be used in a specific way. To set the parameters, +the user should call a `set()` procedure. This function also sets the context of +a streaming algorithm. `get()` function only returns the resulting values stored +in a cache. Therefore, if you try to get values before first calling `set()`, +the run will fail with a proper message. + +To make the incremental flow, you should set the proper trigger. For that, we'll +use the `update()` function: + +```cypher +CREATE TRIGGER katz_trigger +(BEFORE | AFTER) COMMIT +EXECUTE CALL katz_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + +Finally, the `reset()` function resets the context and enables the user to start +new runs. + +[![docs-source](https://img.shields.io/badge/source-katz_centrality_online-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/katz_centrality_module/katz_centrality_online_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `set(alpha, epsilon)` + +#### Input: + +- `alpha: double (default=0.2)` ➑ Exponential decay factor defining the walk length + importance. +- `epsilon: double (default=1e-2)` ➑ Convergence tolerance. Minimal difference in two + adjacent pairs of nodes in the final ranking. + +#### Output: + +- `node` ➑ Node in the graph, for which Katz Centrality is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the centrality value after + bound convergence. + +#### Usage: + +```cypher +CALL katz_centrality_online.set(0.2, 0.01) +YIELD node, rank; +``` + +### `get()` + +\* This should be used if the trigger has been set or a `set` function has been +called before adding changes to the graph. + +#### Output: + +- `node` ➑ Node in the graph, for which Katz Centrality is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the centrality value after + bound convergence. + +#### Usage: + +```cypher +CALL katz_centrality_online.get() +YIELD node, rank; +``` + +### `update(created_vertices, created_edges, deleted_vertices, deleted_edges)` + +#### Input: + +- `created_vertices` ➑ Vertices that were created in the last transaction. +- `created_edges` ➑ Edges created in a period from the last transaction. +- `deleted_vertices` ➑ Vertices deleted from the last transaction. +- `deleted_edges` ➑ Edges deleted from the last transaction. + +#### Output: + +- `node` ➑ Node in the graph, for which Katz Centrality is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the centrality value after + bound convergence. + +#### Usage: + +```cypher +CREATE TRIGGER katz_trigger +(BEFORE | AFTER) COMMIT +EXECUTE CALL katz_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + +## Example + + + + + + + + + +```cypher +CALL katz_centrality_online.set(0.2) YIELD *; + +CREATE TRIGGER katz_trigger +BEFORE COMMIT +EXECUTE CALL katz_centrality_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + + + + +```cypher +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 8}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 7}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 8}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 9}) MERGE (b:Node {id: 10}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 10}) MERGE (b:Node {id: 9}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +MATCH (node) +RETURN node.id AS node_id, node.rank AS rank; +``` + + + + +```plaintext ++---------+---------+ +| node_id | rank | ++---------+---------+ +| 1 | 0.408 | +| 0 | 0.28 | +| 10 | 1.864 | +| 2 | 1.08 | +| 8 | 0.408 | +| 3 | 0 | +| 4 | 0 | +| 5 | 0 | +| 6 | 0 | +| 7 | 0 | +| 9 | 0.544 | ++---------+---------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/kmeans_clustering.md b/docs2/advanced-algorithms/available-algorithms/kmeans_clustering.md new file mode 100644 index 00000000000..afbe0e0a9c3 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/kmeans_clustering.md @@ -0,0 +1,178 @@ +--- +id: kmeans +title: kmeans +sidebar_label: kmeans +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +The k-means algorithm clusters given data by trying to separate samples in `n` groups of equal variance by minimizing the criterion known as +within-the-cluster sum-of-squares. To learn more about it, jump to the [algorithm](../../algorithms/machine-learning-graph-analytics/k-means-clustering-algorithm) page. + +[![docs-source](https://img.shields.io/badge/source-kmeans-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/kmeans.py) + +| Trait | Value | +| ------------------- | -------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed/undirected** | +| **Edge weights** | **weighted/unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `get_clusters(n_clusters, embedding_property, init, n_init, max_iter, tol, algorithm, random_state)` +For each node, this procedure returns what cluster it belongs to. + +#### Input: + +- `n_clusters : int` ➑ Number of clusters to be formed. +- `embedding_property : str (default: "embedding")` ➑ Node property where embeddings are stored. +- `init : str (default: "k-means")` ➑ Initialization method. If `k-means++` is selected, initial cluster centroids are sampled per an empirical probability distribution of the points’ contribution to the overall inertia. This technique speeds up convergence and is theoretically proven to be `O(logk)`-optimal. +If `random`, `n_clusters` observations (rows) are randomly chosen for the initial centroids. +- `n_init : int (default: 10)` ➑ Number of times the k-means algorithm will be run with different centroid seeds. +- `max_iter : int (default: 10)` ➑ Length of sampling walks. +- `tol : float (default: 1e-4)` ➑ Relative tolerance of the Frobenius norm of the difference of cluster centers across consecutive iterations. Used in determining convergence. +- `algorithm : str (default: "auto")` ➑ Options are `lloyd`, `elkan`, `auto`, `full`. Description [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#:~:text=algorithm%7B%E2%80%9Clloyd%E2%80%9D%2C%20%E2%80%9Celkan%E2%80%9D%2C%20%E2%80%9Cauto%E2%80%9D%2C%20%E2%80%9Cfull%E2%80%9D%7D%2C%20default%3D%E2%80%9Dlloyd%E2%80%9D). +- `random_state : int (default: 1998)` ➑ Random seed for the algorithm. + +#### Output: + +- `node: mgp.Vertex` ➑ Graph node. +- `cluster_id: mgp.Number` ➑ Cluster ID of the above node. + +#### Usage: + +```cypher + CALL kmeans.get_clusters(2, "embedding", "k-means++", 10, 10, 0.0001, "auto", 1) YIELD node, cluster_id + RETURN node.id as node_id, cluster_id + ORDER BY node_id ASC; +``` + +### `set_clusters( n_clusters, embedding_property, cluster_property, init, n_init, max_iter, tol, algorithm, random_state)` +Procedure sets for each node to which cluster it belongs to by writing cluster id to `cluster_property`. + +#### Input: + +- `n_clusters : int` ➑ Number of clusters to be formed. +- `embedding_property : str (default: "embedding")` ➑ Node property where embeddings are stored. +- `cluster_property: str (default: "cluster_id")` ➑ Node property where `cluster_id` will be stored. +- `init : str (default: "k-means")` ➑ Initialization method. If `k-means++` is selected, initial cluster centroids are sampled per an empirical probability distribution of the points’ contribution to the overall inertia. This technique speeds up convergence and is theoretically proven to be `O(logk)`-optimal. +If `random`, `n_clusters` observations (nodes) are randomly chosen for the initial centroids. +- `n_init : int (default: 10)` ➑ Number of times the k-means algorithm will be run with different centroid seeds. +- `max_iter : int (default: 10)` ➑ Length of sampling walks. +- `tol : float (default: 1e-4)` ➑ Relative tolerance of the Frobenius norm of the difference of cluster centers across consecutive iterations. Used in determining convergence. +- `algorithm : str (default: "auto")` ➑ Options are `lloyd`, `elkan`, `auto`, `full`. Description [here](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#:~:text=algorithm%7B%E2%80%9Clloyd%E2%80%9D%2C%20%E2%80%9Celkan%E2%80%9D%2C%20%E2%80%9Cauto%E2%80%9D%2C%20%E2%80%9Cfull%E2%80%9D%7D%2C%20default%3D%E2%80%9Dlloyd%E2%80%9D). +- `random_state : int (default: 1998)` ➑ Random seed for the algorithm. + +#### Output: + +- `node: mgp.Vertex` ➑ Graph node. +- `cluster_id: mgp.Number` ➑ Cluster ID of the above node. + +#### Usage: + +```cypher + CALL kmeans.set_clusters(2, "embedding", "cluster_id", "k-means++", 10, 10, 0.0001, "auto", 1) YIELD node, cluster_id + RETURN node.id as node_id, cluster_id + ORDER BY node_id ASC; +``` + +## Example + + + + + + + + + +```cypher +CREATE (:Node {id:0, embedding: [0.90678340196609497, 0.74690568447113037, -0.65984714031219482]}); +CREATE (:Node {id:1, embedding: [1.2019195556640625, 0.42643040418624878, -0.4709840714931488]}); +CREATE (:Node {id:2, embedding: [1.1005796194076538, 0.67131000757217407, -0.5418705940246582]}); +CREATE (:Node {id:4, embedding: [1.1840434074401855, 0.39269298315048218, -0.5063326358795166]}); +CREATE (:Node {id:5, embedding: [0.83302301168441772, 0.5545622706413269, -0.31265774369239807]}); +CREATE (:Node {id:6, embedding: [0.78877884149551392, 0.5189281702041626, -0.097793936729431152]}); +CREATE (:Node {id:7, embedding: [0.61398810148239136, 0.5255049467086792, -0.3551192581653595]}); +CREATE (:Node {id:8, embedding: [0.83923488855361938, -0.0041203685104846954, -0.51874136924743652]}); +CREATE (:Node {id:9, embedding: [0.60883384943008423, 0.60958302021026611, -0.40317356586456299]}); +MATCH (a:Node {id: 0}) MATCH (b:Node {id: 1}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 1}) MATCH (b:Node {id: 2}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 2}) MATCH (b:Node {id: 0}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 0}) MATCH (b:Node {id: 4}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 4}) MATCH (b:Node {id: 1}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 4}) MATCH (b:Node {id: 2}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 0}) MATCH (b:Node {id: 5}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 5}) MATCH (b:Node {id: 6}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 6}) MATCH (b:Node {id: 7}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 7}) MATCH (b:Node {id: 8}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 8}) MATCH (b:Node {id: 6}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 6}) MATCH (b:Node {id: 9}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 9}) MATCH (b:Node {id: 7}) MERGE (a)-[:RELATION]->(b); +MATCH (a:Node {id: 9}) MATCH (b:Node {id: 8}) MERGE (a)-[:RELATION]->(b); +``` + + + + +```cypher +CALL kmeans.get_clusters(2, "embedding", "k-means++", 10, 10, 0.0001, "auto", 1) YIELD node, cluster_id + RETURN node.id as node_id, cluster_id + ORDER BY node_id ASC; +``` + + + + + +```plaintext ++-------------------------+-------------------------+ +| node_id | cluster_id | ++-------------------------+-------------------------+ +| 0 | 1 | +| 1 | 1 | +| 2 | 1 | +| 4 | 1 | +| 5 | 0 | +| 6 | 0 | +| 7 | 0 | +| 8 | 0 | +| 9 | 0 | ++-------------------------+-------------------------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/llm_util.md b/docs2/advanced-algorithms/available-algorithms/llm_util.md new file mode 100644 index 00000000000..3bb46cbfcd6 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/llm_util.md @@ -0,0 +1,288 @@ +--- +id: llm_util +title: llm_util +sidebar_label: llm_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + + +[![docs-source](https://img.shields.io/badge/source-llm_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/json_util.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### schema(output_type) + +The `schema()` procedure generates the graph database schema in a **prompt-ready** or **raw** format. The prompt-ready format is optimized to describe the database schema in words best recognized by large language models (LLMs). The raw format offers all the necessary information about the graph schema in a format that can be customized for later use with LLMs. + +#### Input: + +* `output_type: str (default='prompt_ready')` ➑ By default, the graph schema will include additional context and it will be prompt-ready. If set to 'raw', it will produce a simpler version that can be adjusted for the prompt. + +#### Output: + +* `schema: mgp.Any` ➑ `str` containing prompt-ready graph schema description in a format suitable for large language models (LLMs), or `mgp.List` containing information on graph schema in raw format which can customized for LLMs. + +#### Usage: +Get **prompt-ready graph schema**: +```cypher +CALL llm_util.schema() YIELD schema RETURN schema; +``` +or +```cypher +CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema; +``` + +Get **raw graph schema**: +```cypher +CALL llm_util.schema('raw') YIELD schema RETURN schema; +``` + +:::note +The `output_type` is case-insensitive. +::: + + +## Example - Get prompt-ready graph schema + + + + + Create a graph by running the following Cypher query: + + +```cypher +CREATE (n:Person {name: "Kate", age: 27})-[:IS_FRIENDS_WITH {since: "2023-06-21"}]->(m:Person:Student {name: "James", age: 30, year: "second"})-[:STUDIES_AT]->(:University {name: "University of Zagreb"}) CREATE (p:Person:Student {name: "Anthony", age: 25})-[:STUDIES_AT]->(:University {name: "University of Vienna"}) +WITH n, m +CREATE (n)-[:LIVES_IN]->(:City {name: "Zagreb"})<-[:LIVES_IN]-(m); +``` + + + +The schema of the created graph can be seen in Memgraph Lab, under the Graph Schema tab: + +
+ +
+ +
+ + + +Once the graph is created, run the following code to call the schema procedure: + + +```cypher +CALL llm_util.schema() YIELD schema RETURN schema; +``` + +or + +```cypher +CALL llm_util.schema('prompt_ready') YIELD schema RETURN schema; +``` + + + + + +Below is the result of running the schema procedure: + + +``` +Node properties are the following: +Node name: 'Person', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'age', 'type': 'int'}, {'property': 'year', 'type': 'str'}] +Node name: 'Student', Node properties: [{'property': 'name', 'type': 'str'}, {'property': 'age', 'type': 'int'}, {'property': 'year', 'type': 'str'}] +Node name: 'University', Node properties: [{'property': 'name', 'type': 'str'}] +Node name: 'City', Node properties: [{'property': 'name', 'type': 'str'}] + +Relationship properties are the following: +Relationship Name: 'IS_FRIENDS_WITH', Relationship Properties: [{'property': 'since', 'type': 'str'}] + +The relationships are the following: +['(:Person)-[:IS_FRIENDS_WITH]->(:Person)'] +['(:Person)-[:IS_FRIENDS_WITH]->(:Student)'] +['(:Person)-[:LIVES_IN]->(:City)'] +['(:Person)-[:STUDIES_AT]->(:University)'] +['(:Student)-[:STUDIES_AT]->(:University)'] +['(:Student)-[:LIVES_IN]->(:City)'] +``` + + + + + +
+ +## Example - Get raw graph schema + + + + + Create a graph by running the following Cypher query: + + +```cypher +CREATE (n:Person {name: "Kate", age: 27})-[:IS_FRIENDS_WITH {since: "2023-06-21"}]->(m:Person:Student {name: "James", age: 30, year: "second"})-[:STUDIES_AT]->(:University {name: "University of Zagreb"}) CREATE (p:Person:Student {name: "Anthony", age: 25})-[:STUDIES_AT]->(:University {name: "University of Vienna"}) +WITH n, m +CREATE (n)-[:LIVES_IN]->(:City {name: "Zagreb"})<-[:LIVES_IN]-(m); +``` + + + +The schema of the created graph can be seen in Memgraph Lab, under the Graph Schema tab: + +
+ +
+ +
+ + + +Once the graph is created, run the following code to call the schema procedure: + + +```cypher +CALL llm_util.schema('raw') YIELD schema RETURN schema; +``` + + + + + +Below is the result of running the schema procedure: + + +``` +{ + "node_props": { + "City": [ + { + "property": "name", + "type": "str" + } + ], + "Person": [ + { + "property": "name", + "type": "str" + }, + { + "property": "age", + "type": "int" + }, + { + "property": "year", + "type": "str" + } + ], + "Student": [ + { + "property": "name", + "type": "str" + }, + { + "property": "age", + "type": "int" + }, + { + "property": "year", + "type": "str" + } + ], + "University": [ + { + "property": "name", + "type": "str" + } + ] + }, + "rel_props": { + "IS_FRIENDS_WITH": [ + { + "property": "since", + "type": "str" + } + ] + }, + "relationships": [ + { + "end": "Person", + "start": "Person", + "type": "IS_FRIENDS_WITH" + }, + { + "end": "Student", + "start": "Person", + "type": "IS_FRIENDS_WITH" + }, + { + "end": "City", + "start": "Person", + "type": "LIVES_IN" + }, + { + "end": "University", + "start": "Person", + "type": "STUDIES_AT" + }, + { + "end": "University", + "start": "Student", + "type": "STUDIES_AT" + }, + { + "end": "City", + "start": "Student", + "type": "LIVES_IN" + } + ] +} +``` + + + + + +
diff --git a/docs2/advanced-algorithms/available-algorithms/max_flow.md b/docs2/advanced-algorithms/available-algorithms/max_flow.md new file mode 100644 index 00000000000..7fcb82491cc --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/max_flow.md @@ -0,0 +1,166 @@ +--- +id: max_flow +title: max_flow +sidebar_label: max_flow +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +The maximum flow problem consists of finding a flow through a graph such that it +is the maximum possible flow. + +The algorithm implementation is based on the Ford-Fulkerson method with capacity +scaling. Ford-Fulkerson method is not itself an algorithm as it does not specify +the procedure of finding augmenting paths in a residual graph. It is a greedy +method, using augmenting paths as it comes across them. Input is a weighted +graph with a defined source and sink, representing the beginning and end of the +flow network. The algorithm begins with an empty flow and, at each step, finds a +path, called an augmenting path, from the source to the sink that generates more +flow. When flow cannot be increased anymore, the algorithm stops, and the maximum +flow has been found. + +The capacity scaling is a heuristic for finding augmenting paths in such a way +that prioritizes taking edges with larger capacities, maintaining a threshold +value that is only lowered once no larger path can be found. It speeds up the +algorithm noticeably compared to a standard DFS search. + +The algorithm is adapted to work with heterogeneous graphs, meaning not all +edges need to have the defined edge property used for edge flow. When an edge +doesn't have a flow, it is skipped, and when no edges have this property, the +returning max flow value is 0. + +[![docs-source](https://img.shields.io/badge/source-max_flow-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/max_flow.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **Python** | +| **Graph direction** | **directed** | +| **Edge weights** | **weighted**| +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `get_flow(parameters, edge_property)` + +#### Input: + +* `start_v: Vertex` ➑ Source node from which the maximum flow is calculated +* `end_v: Vertex` ➑ Sink node to which the max flow is calculated +* `edge_property: string (default="weight")` ➑ Edge property which is used as the flow + capacity of the edge + +#### Output: + +* `max_flow: Number` ➑ Maximum flow of the network, from source to sink + +#### Usage: + +```cypher +MATCH (source {id: 0}), (sink {id: 5}) +CALL max_flow.get_flow(source, sink, "weight") +YIELD max_flow +RETURN max_flow; +``` + +### `get_paths(parameters, edge_property)` + +#### Input: + +* `start_v: Vertex` ➑ Source node from which the maximum flow is calculated +* `end_v: Vertex` ➑ Sink node to which the max flow is calculated +* `edge_property: string (default="weight")` ➑ Edge property which is used as the flow + capacity of the edge + +#### Output: + +* `path: Path` ➑ path with a flow in a maximum flow +* `flow: Number` ➑ flow amount corresponding to that path + +#### Usage: + +```cypher +MATCH (source {id: 0}), (sink {id: 5}) +CALL max_flow.get_paths(source, sink, "weight") +YIELD path, flow +RETURN path, flow; +``` + +## Example + + + + + + + + + +```cypher +MERGE (a:Node {id: "A"}) MERGE (b:Node {id: "B"}) CREATE (a)-[:RELATION {weight: 9}]->(b); +MERGE (a:Node {id: "A"}) MERGE (b:Node {id: "C"}) CREATE (a)-[:RELATION {weight: 10}]->(b); +MERGE (a:Node {id: "B"}) MERGE (b:Node {id: "E"}) CREATE (a)-[:RELATION {weight: 8}]->(b); +MERGE (a:Node {id: "C"}) MERGE (b:Node {id: "F"}) CREATE (a)-[:RELATION {weight: 7}]->(b); +MERGE (a:Node {id: "C"}) MERGE (b:Node {id: "D"}) CREATE (a)-[:RELATION {weight: 1}]->(b); +MERGE (a:Node {id: "A"}) MERGE (b:Node {id: "D"}) CREATE (a)-[:RELATION {weight: 8}]->(b); +MERGE (a:Node {id: "E"}) MERGE (b:Node {id: "D"}) CREATE (a)-[:RELATION {weight: 2}]->(b); +MERGE (a:Node {id: "D"}) MERGE (b:Node {id: "F"}) CREATE (a)-[:RELATION {weight: 11}]->(b); +MERGE (a:Node {id: "E"}) MERGE (b:Node {id: "G"}) CREATE (a)-[:RELATION {weight: 5}]->(b); +MERGE (a:Node {id: "F"}) MERGE (b:Node {id: "G"}) CREATE (a)-[:RELATION {weight: 14}]->(b); +``` + + + + +```cypher +MATCH (source {id: "A"}), (sink {id: "G"}) +CALL max_flow.get_flow(source, sink) +YIELD max_flow +RETURN max_flow; +``` + + + + +```plaintext ++----------+ +| max_flow | ++----------+ +| 19 | ++----------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/meta_util.md b/docs2/advanced-algorithms/available-algorithms/meta_util.md new file mode 100644 index 00000000000..9eefea27e33 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/meta_util.md @@ -0,0 +1,396 @@ +--- +title: meta_util +sidebar_label: meta_util +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +A module that contains procedures describing graphs on a meta-level. + +[![docs-source](https://img.shields.io/badge/source-meta_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/meta_util.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `schema(include_properties)` + +Knowing what kind of data, that is, what kind of nodes and relationships, are stored inside the database and how they're connected can be helpful. Besides that, each node or relationship can have a set of properties, and while loading the data in the database, you should be sure that a certain amount of graph objects has a particular property. That's where the number of graph objects with a particular property (property count) might come in handy. + +The `schema()` procedure returns a list of distinct relationships connecting distinct nodes, that is, a graph schema. If `include_properties` is set to `true`, the graph schema will contain additional information about properties. + +#### Input: + +* `include_properties: bool (default=false)` ➑ If set to `true`, the graph schema will include properties count information. + +#### Output: + +* `nodes: List[Map]` ➑ List of distinct node objects with their count. If `include_properties` is set to `true`, the node object contains properties count too. +* `relationships: List[Map]` ➑ List of distinct relationship objects with their count. If `include_properties` is set to `true`, the relationship object contains properties count too. + +#### Usage: +Get graph schema without properties count: +```cypher +CALL meta_util.schema() +YIELD nodes, relationships +RETURN nodes, relationships; +``` + +Get graph schema with properties count: +```cypher +CALL meta_util.schema(true) +YIELD nodes, relationships +RETURN nodes, relationships; +``` + +:::info +The queries above will return results in the graph view only in Memgraph Lab version >= 2.4.0. For earlier versions of the Memgraph Lab, call `UNWIND` on returned object properties nodes and edges. +::: + +## Example - Get graph schema without properties count + + + + + Create a graph by running the following Cypher query: + +```cypher +CREATE (n:Person {name: "Kate", age: 27})-[:IS_FRIENDS_WITH]->(m:Person:Student {name: "James", age: 30, year: "second"})-[:STUDIES_AT]->(:University {name: "University of Vienna"}) +WITH n, m +CREATE (n)-[:LIVES_IN]->(:City {name: "Zagreb"})<-[:LIVES_IN]-(m); +``` + + + + +Once the graph is created, run the following code to call the `schema` procedure: + +```cypher +CALL meta_util.schema() +YIELD nodes, relationships +RETURN nodes, relationships; +``` + + + + + + +The graph result of the `schema` procedure can be seen in Memgraph Lab, and it looks like this: + +
+
+ +
+
+ +
+
+
+ + + +Memgraph Lab can also return data results - a list of nodes and a list of relationships. Here is the obtained list of nodes: + +```json +[ + { + "id": 0, + "labels": [ + "Person" + ], + "properties": { + "count": 1 + }, + "type": "node" + }, + { + "id": 1, + "labels": [ + "Person", + "Student" + ], + "properties": { + "count": 1 + }, + "type": "node" + }, + { + "id": 2, + "labels": [ + "University" + ], + "properties": { + "count": 1 + }, + "type": "node" + }, + { + "id": 3, + "labels": [ + "City" + ], + "properties": { + "count": 1 + }, + "type": "node" + } +] +``` + + + +Here is the obtained list of relationships: + +```json +[ + { + "end": 1, + "id": 0, + "label": "IS_FRIENDS_WITH", + "properties": { + "count": 1 + }, + "start": 0, + "type": "relationship" + }, + { + "end": 3, + "id": 1, + "label": "LIVES_IN", + "properties": { + "count": 1 + }, + "start": 0, + "type": "relationship" + }, + { + "end": 2, + "id": 2, + "label": "STUDIES_AT", + "properties": { + "count": 1 + }, + "start": 1, + "type": "relationship" + }, + { + "end": 3, + "id": 3, + "label": "LIVES_IN", + "properties": { + "count": 1 + }, + "start": 1, + "type": "relationship" + } +] +``` + + +
+ + +## Example - Get graph schema with properties count + + + + + Create a graph by running the following Cypher query: + +```cypher +CREATE (n:Person {name: "Kate", age: 27})-[:IS_FRIENDS_WITH]->(m:Person:Student {name: "James", age: 30, year: "second"})-[:STUDIES_AT]->(:University {name: "University of Vienna"}) +WITH n, m +CREATE (n)-[:LIVES_IN]->(:City {name: "Zagreb"})<-[:LIVES_IN]-(m); +``` + + + + +Once the graph is created, run the following code to call the `schema` procedure: + +```cypher +CALL meta_util.schema(true) +YIELD nodes, relationships +RETURN nodes, relationships; +``` + + + + + + +The graph result of the `schema` procedure can be seen in Memgraph Lab, and it looks like this: + +
+
+ +
+
+ +
+
+
+ + + +Memgraph Lab can also return data results - a list of nodes and a list of relationships. Here is the obtained list of nodes: + +```json +[ + { + "id": 0, + "labels": [ + "Person" + ], + "properties": { + "count": 1, + "properties_count": { + "age": 1, + "name": 1 + } + }, + "type": "node" + }, + { + "id": 1, + "labels": [ + "Person", + "Student" + ], + "properties": { + "count": 1, + "properties_count": { + "age": 1, + "name": 1, + "year": 1 + } + }, + "type": "node" + }, + { + "id": 2, + "labels": [ + "University" + ], + "properties": { + "count": 1, + "properties_count": { + "name": 1 + } + }, + "type": "node" + }, + { + "id": 3, + "labels": [ + "City" + ], + "properties": { + "count": 1, + "properties_count": { + "name": 1 + } + }, + "type": "node" + } +] +``` + + + +Here is the obtained list of relationships: + +```json +[ + { + "end": 1, + "id": 0, + "label": "IS_FRIENDS_WITH", + "properties": { + "count": 1, + "properties_count": {} + }, + "start": 0, + "type": "relationship" + }, + { + "end": 3, + "id": 1, + "label": "LIVES_IN", + "properties": { + "count": 1, + "properties_count": {} + }, + "start": 0, + "type": "relationship" + }, + { + "end": 2, + "id": 2, + "label": "STUDIES_AT", + "properties": { + "count": 1, + "properties_count": {} + }, + "start": 1, + "type": "relationship" + }, + { + "end": 3, + "id": 3, + "label": "LIVES_IN", + "properties": { + "count": 1, + "properties_count": {} + }, + "start": 1, + "type": "relationship" + } +] +``` + + +
diff --git a/docs2/advanced-algorithms/available-algorithms/migrate.md b/docs2/advanced-algorithms/available-algorithms/migrate.md new file mode 100644 index 00000000000..8f6c5240b18 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/migrate.md @@ -0,0 +1,152 @@ +--- +title: migrate +sidebar_label: migrate +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +A module that contains procedures describing graphs on a meta-level. + +[![docs-source](https://img.shields.io/badge/source-migrate-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/migrate.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **Python** | +| **Parallelism** | **sequential** | + +## Procedures + +### `mysql(table_or_sql, config, config_path, params)` + +With `migrate.mysql` you can access MySQL and execute queries. The result table is converted into a stream, +and returned rows can be used to create graph structures. The value of the `config` parameter must be at least an empty map. If `config_path` is passed, every key,value pair from JSON file will overwrite any values in `config` file. + +#### Input: + +* `table_or_sql: str` ➑ Table name or an SQL query +* `config: mgp.Map` ➑ Connection configuration parameters (as in `mysql.connector.connect`) +* `config_path` ➑ Path to a JSON file containing configuration parameters (as in `mysql.connector.connect`) +* `params: mgp.Nullable[mgp.Any] (default=None)` ➑ Optionally, queries can be parameterized. In that case, `params` provides parameter values + + +#### Output: + +* `row: mgp.Map`: The result table as a stream of rows + +#### Usage: +Get count of rows: +```cypher +CALL migrate.mysql('example_table', {user:'memgraph', + password:'password', + host:'localhost', + database:'demo_db'} ) +YIELD row +RETURN count(row); +``` + +### `sql_server(table_or_sql, config, config_path, params)` + +With `migrate.sql_server` you can access SQL Server and execute queries. The result table is converted into a stream, and returned rows can be used to create graph structures. The value of the `config` parameter must be at least an empty map. If `config_path` is passed, every key,value pair from JSON file will overwrite any values in `config` file. + +#### Input: + +* `table_or_sql: str` ➑ Table name or an SQL query +* `config: mgp.Map` ➑ Connection configuration parameters (as in `pyodbc.connect`) +* `config_path` ➑ Path to the JSON file containing configuration parameters (as in `pyodbc.connect`) +* `params: mgp.Nullable[mgp.Any] (default=None)` ➑ Optionally, queries can be parameterized. In that case, `params` provides parameter values + +#### Output: + +* `row: mgp.Map`: The result table as a stream of rows + +#### Usage: +Get all data from database in form of map: +```cypher +CALL migrate.sql_server('example_table', {user:'memgraph', + password:'password', + host:'localhost', + database:'demo_db'} ) +YIELD row +RETURN row; +``` + +### `oracle_db(table_or_sql, config, config_path, params)` + +With `migrate.oracle_db` you can access Oracle DB and execute queries. The result table is converted into a stream, and returned rows can be used to create graph structures. The value of the `config` parameter must be at least an empty map. If `config_path` is passed, every key,value pair from JSON file will overwrite any values in `config` file. + + +#### Input: + +* `table_or_sql: str` ➑ Table name or an SQL query +* `config: mgp.Map` ➑ Connection configuration parameters (as in oracledb.connect), +* `config_path` ➑ Path to the JSON file containing configuration parameters (as in oracledb.connect) +* `params: mgp.Nullable[mgp.Any] (default=None)` ➑ Optionally, queries may be parameterized. In that case, `params` provides parameter values + +#### Output: + +* `row: mgp.Map`: The result table as a stream of rows + +#### Usage: +Get the first 5000 rows from a database: +```cypher +CALL migrate.oracle_db('example_table', {user:'memgraph', + password:'password', + host:'localhost', + database:'demo_db'} ) +YIELD row +RETURN row +LIMIT 5000; +``` + +## Example + + + + + +```cypher +CALL migrate.mysql('example_table', {user:'memgraph', + password:'password', + host:'localhost', + database:'mydemodb'} ) +YIELD row +RETURN count(row) as row_count; +``` + + + + +```plaintext ++------------------+ +| row_count | ++------------------+ +| 4000 | ++------------------+ +``` + + + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/node2vec-online.md b/docs2/advanced-algorithms/available-algorithms/node2vec-online.md new file mode 100644 index 00000000000..5ec765c774e --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/node2vec-online.md @@ -0,0 +1,283 @@ +--- +id: node2vec-online +title: node2vec_online +sidebar_label: node2vec_online +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +The **node2vec_online** algorithm learns and updates temporal node embeddings on +the fly for tracking and measuring node similarity over time in graph streams. +The algorithm creates similar embeddings for two nodes (e.g. `v` and `u`) if there +is an option to reach one node from the other across edges that appeared +recently. In other words, the embedding of a node `v` should be more similar to +the embedding of node `u` if we can reach `u` by taking steps backward to node +`v` across edges that appeared before the previous one. These steps backward +from one node to the other form a temporal walk. It is temporal since it depends +on when the edge appeared in the graph. + +To make two nodes more similar and to create these temporal walks, the `Node2Vec +Online` algorithm uses the `StreamWalk updater` and `Word2Vec learner`. + +`StreamWalk updater` is a machine for sampling temporal walks. A sampling of the +walk is done in a backward fashion because we look only at the incoming edges of +the node. Since one node can have multiple incoming edges, when sampling a walk, +`StreamWalk updater` uses probabilities to determine which incoming edge of the +node it will take next, and that way leading to a new node. These probabilities +are computed after the edge arrives and before temporal walk sampling. +Probability represents a sum over all temporal walks `z` ending in node `v` +using edges arriving no later than the latest one of already sampled ones in the +temporal walk. When the algorithm decides which edge to take next for temporal +walk creation, it uses these computed weights (probabilities). Every time a new +edge appears in the graph, these probabilities are updated just for two nodes of +a new edge. + +After walks sampling, `Word2Vec learner` uses these prepared temporal walks to +make node embeddings more similar using the `gensim Word2Vec` module. These +sampled walks are given as sentences to the `gensim Word2Vec` module, which then +optimizes for the similarity of the node embeddings in the walk with stochastic +gradient descent using a skip-gram model or continuous-bag-of-words (CBOW). + +Embeddings capture the graph topology, relationships between nodes, and further +relevant information. How the embeddings capture this inherent information of +the graph is not fixed. + +Capturing information in networks often shuttles between two kinds of +similarities: **homophily** and **structural equivalence**. Under the +**homophily** hypothesis, nodes that are highly interconnected and belong to +similar network clusters or communities should be embedded closely together. In +contrast, under the **structural equivalence** hypothesis, nodes that have +similar structural roles in networks should be embedded closely together (e.g., +nodes that act as hubs of their corresponding communities). + +Currently, our implementation captures for **homophily** - nodes that are highly +interconnected and belong to similar network clusters or communities. + +[^1] [Node embeddings in dynamic +graphs](https://appliednetsci.springeropen.com/track/pdf/10.1007/s41109-019-0169-5.pdf), +Ferenc BΓ©res, RΓ³bert PΓ‘lovics, Domokos MiklΓ³s Kelen and AndrΓ‘s A. BenczΓΊr + +[![docs-source](https://img.shields.io/badge/source-node2vec_online-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/node2vec_online.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `set_streamwalk_updater(half_life, max_length, beta, cutoff, sampled_walks, full_walks)` + +#### Input: + +* `half_life: integer` ➑ half-life [seconds], used in the temporal walk probability + calculation +* `max_length: integer` ➑ Maximum length of the sampled temporal random walks +* `beta: float` ➑ Damping factor for long paths +* `cutoff: integer` ➑ Temporal cutoff in seconds to exclude very distant past +* `sampled_walks: integer` ➑ Number of sampled walks for each edge update +* `full_walks: boolean` ➑ Return every node of the sampled walk for representation + learning (full_walks=True) or only the endpoints of the walk + (full_walks=False) + +#### Output: + +* `message: string` ➑ Whether parameters are set or they need to be reset + +#### Usage: + +```cypher +CALL node2vec_online.set_streamwalk_updater(7200, 3, 0.9, 604800, 4, False); +``` + +### `set_word2vec_learner(embedding_dimension, learning_rate, skip_gram )` + +#### Input: + +* `embedding_dimension: integer` ➑ Number of dimensions in the representation of the + embedding vector +* `learning_rate: float` ➑ Learning rate +* `skip_gram: boolean` ➑ Whether to use skip-gram model (True) or + continuous-bag-of-words (CBOW) +* `negative_rate: integer` ➑ Negative rate for Gensim Word2Vec model +* `threads: integer` ➑ Maximum number of threads for parallelization + +#### Output: + +* `message: string` ➑ Whether parameters are set or they need to be reset + +#### Usage: + +```cypher +CALL node2vec_online.set_word2vec_learner(128, 0.01, True, 10, 1); +``` + +### `get()` + +#### Output: + +* `node: mgp.Vertex` ➑ Node in the graph for which embedding exists +* `embedding: mgp.List[mgp.Number]` ➑ Embedding for the given node + +#### Usage: + +```cypher +CALL node2vec_online.get(); +``` + +### `update(edges)` + +### Input: + +* `edges: mgp.List[mgp.Edge]` ➑ List of edges added to the graph. For those + nodes only `node2vec_online` calculates embeddings. + +#### Usage: + +There are a few options here. The first one is to create a trigger, so every +time an edge is added to graph, the trigger calls a procedure and makes an +update. + +```cypher +CREATE TRIGGER trigger ON --> CREATE BEFORE COMMIT +EXECUTE CALL node2vec_online.update(createdEdges) YIELD *; +``` + +The second option is to add all the edges and then call the algorithm with those +edges: + +```cypher +MATCH (n)-[e]->(m) +WITH COLLECT(e) as edges +CALL node2vec_online.update(edges) YIELD * +WITH 1 as x +RETURN x; +``` + +### `reset()` + +#### Output: + +* `message: string` ➑ Message that parameters are ready to be set again + +#### Usage: + +```cypher +CALL node2vec_online.reset(); +``` + +### `help()` + +#### Output: + +* `name: string` ➑ Name of available functions +* `value: string` ➑ Documentation for every function + +#### Usage: + +```cypher +CALL node2vec_online.help(); +``` + +## Example + + + + + + + + + +```cypher +CALL node2vec_online.set_streamwalk_updater(7200, 2, 0.9, 604800, 2, True) YIELD *; +CALL node2vec_online.set_word2vec_learner(2, 0.01, True, 1, 1) YIELD *; + +CREATE TRIGGER trigger ON --> CREATE BEFORE COMMIT +EXECUTE CALL node2vec_online.update(createdEdges) YIELD *; +``` + + + +```cypher +MERGE (n:Node {id: 1}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 2}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 10}) MERGE (m:Node {id: 5}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 5}) MERGE (m:Node {id: 2}) CREATE (n)-[:RELATION]->(m); + +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 7}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 7}) MERGE (m:Node {id: 3}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 3}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); + +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 8}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 8}) MERGE (m:Node {id: 4}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 4}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +``` + + + + +```cypher +CALL node2vec_online.get() YIELD node, embedding +RETURN node, embedding +ORDER BY node.id; +``` + + + + +```plaintext ++-------------------------+-------------------------+ +| node | embedding | ++-------------------------+-------------------------+ +| (:Node {id: 1}) | [0.255167, 0.450464] | +| (:Node {id: 2}) | [-0.465147, -0.35584] | +| (:Node {id: 3}) | [-0.243008, -0.0908009] | +| (:Node {id: 4}) | [-0.414261, -0.472441] | +| (:Node {id: 5}) | [-0.250771, -0.188169] | +| (:Node {id: 6}) | [-0.0268114, 0.0118215] | +| (:Node {id: 7}) | [-0.226831, 0.327703] | +| (:Node {id: 8}) | [0.143829, 0.0495937] | +| (:Node {id: 9}) | [0.369025, -0.0766736] | +| (:Node {id: 10}) | [0.322944, 0.448649] | ++-------------------------+-------------------------+ + +``` + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/node2vec.md b/docs2/advanced-algorithms/available-algorithms/node2vec.md new file mode 100644 index 00000000000..6ed7ac15901 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/node2vec.md @@ -0,0 +1,261 @@ +--- +id: node2vec +title: node2vec +sidebar_label: node2vec +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +The **node2vec** is a semi-supervised algorithmic framework for learning +continuous feature representations for nodes in networks. The algorithm +generates a mapping of nodes to a low-dimensional space of features that +maximizes the likelihood of preserving network neighborhoods of nodes. By using +a biased random walk procedure, it enables exploring diverse neighborhoods. In +tasks such as multi-label classification and link prediction, node2vec shows +great results. + +The **node2vec** algorithm was inspired by a similar **NLP** technique. The same +way as a document is an ordered sequence of words, by sampling sequences of +nodes from the underlying network and turning a network into an ordered sequence +of nodes. Although the idea of sampling is easy, choosing the actual strategy +can be challenging and dependant on the techniques that will be applied +afterward. + +Capturing information in networks often shuttles between two kinds of +similarities: **homophily** and **structural equivalence**. Under the +**homophily** hypothesis, nodes that are highly interconnected and belong to +similar network clusters or communities should be embedded closely together. In +contrast, under the **structural equivalence** hypothesis, nodes that have +similar structural roles in networks should be embedded closely together (e.g., +nodes that act as hubs of their corresponding communities). + +The current implementation easily captures **homophily** or **structural +equivalence** by changing hyperparameters. + +`BFS` and `DFS` strategies play a key role in producing representations that +reflect either of the above equivalences. The neighborhoods sampled by `BFS` +lead to embeddings that correspond closely to structural equivalence. The +opposite is true for `DFS`. It can explore larger parts of the network as it +can move further away from the source node. Therefore, `DFS` sampled walks +accurately reflect a macro-view of the neighborhood, which is essential in +inferring communities based on homophily. + +By having different parameters: + +- **return parameter `p`** +- and **in-out parameter`q`** + +one decides whether to prioritize the `BFS` or `DFS` strategy when sampling. If +`p` is smaller than 1, then we create more `BFS` like walks and we capture more +**structural equivalence**. The opposite is true if `q` is smaller than 1. Then we +capture `DFS` like walks and **homophily**. + +[^1] [Scalable Feature Learning for Networks](https://arxiv.org/abs/1607.00653), +A. Grover, J. Leskovec + +[![docs-source](https://img.shields.io/badge/source-node2vec-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/node2vec.py) + +| Trait | Value | +| ------------------- | -------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed/undirected** | +| **Edge weights** | **weighted/unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `get_embeddings( is_directed, p, q, num_walks, walk_length, vector_size, alpha, window, min_count, seed, workers, min_alpha, sg, hs, negative, epochs,)` + +#### Input: + +- `is_directed : boolean` ➑ If `True`, graph is treated as directed, else not + directed +- `p : float` ➑ Return hyperparameter for calculating transition probabilities. +- `q : float` ➑ In-out hyperparameter for calculating transition probabilities. +- `num_walks : integer` ➑ Number of walks per node in walk sampling. +- `walk_length : integer` ➑ Length of one walk in walk sampling. +- `vector_size : integer` ➑ Dimensionality of the word vectors. +- `window : integer` ➑ Maximum distance between the current and predicted word + within a sentence. +- `min_count : integer` ➑ Ignores all words with total frequency lower than this. +- `workers : integer` ➑ Use these many worker threads to train the model (=faster + training with multicore machines). +- `sg : {0, 1}` ➑ Training algorithm: 1 for skip-gram; otherwise CBOW. +- `hs : {0, 1}` ➑ If 1, hierarchical softmax will be used for model training. If + 0, and `negative` is non-zero, negative sampling will be used. +- `negative : integer` ➑ If > 0, negative sampling will be used, the integer for + negative specifies how many "noise words" should be drawn (usually + between 5-20). If set to 0, no negative sampling is used. +- `cbow_mean : {0, 1}` ➑ If 0, use the sum of the context word vectors. If 1, + use the mean, only applies when cbow is used. +- `alpha : float` ➑ The initial learning rate. +- `min_alpha : float` ➑ Learning rate will linearly drop to `min_alpha` as + training progresses. +- `seed : integer` ➑ Seed for the random number generator. Initial vectors for each + word are seeded with a hash of the concatenation of word + `str(seed)`. + +#### Output: + +- `nodes: mgp.List[mgp.Vertex]` ➑ List of nodes for which embeddings were + calculated +- `embeddings: mgp.List[mgp.List[mgp.Number]])` ➑ Corresponding list of + embeddings + +#### Usage: + +```cypher +CALL node2vec_online.get_embeddings(False, 2.0, 0.5, 4, 5, 100, 0.025, 5, 1, 1, 1, 0.0001, 1, 0, 5, 5); +``` + +### `set_embeddings( is_directed, p, q, num_walks, walk_length, vector_size, alpha, window, min_count, seed, workers, min_alpha, sg, hs, negative, epochs,)` + +#### Input: + +- `is_directed : boolean` ➑ If `True`, graph is treated as directed, else not + directed +- `p : float` ➑ Return hyperparameter for calculating transition probabilities. +- `q : float` ➑ In-out hyperparameter for calculating transition probabilities. +- `num_walks : integer` ➑ Number of walks per node in walk sampling. +- `walk_length : integer` ➑ Length of one walk in walk sampling. +- `vector_size : integer` ➑ Dimensionality of the word vectors. +- `window : integer` ➑ Maximum distance between the current and predicted word + within a sentence. +- `min_count : integer` ➑ Ignores all words with total frequency lower than this. +- `workers : integer` ➑ Use these many worker threads to train the model (=faster + training with multicore machines). +- `sg : {0, 1}` ➑ Training algorithm: 1 for skip-gram; otherwise CBOW. +- `hs : {0, 1}` ➑ If 1, hierarchical softmax will be used for model training. If + 0, and `negative` is non-zero, negative sampling will be used. +- `negative : integer` ➑ If > 0, negative sampling will be used, the int for + negative specifies how many "noise words" should be drawn (usually + between 5-20). If set to 0, no negative sampling is used. +- `cbow_mean : {0, 1}` ➑ If 0, use the sum of the context word vectors. If 1, + use the mean, only applies when cbow is used. +- `alpha : float` ➑ The initial learning rate. +- `min_alpha : float` ➑ Learning rate will linearly drop to `min_alpha` as + training progresses. +- `seed : integer` ➑ Seed for the random number generator. Initial vectors for each + word are seeded with a hash of the concatenation of word + `str(seed)`. + +#### Output: + +- `nodes: mgp.List[mgp.Vertex]` ➑ List of nodes for which embeddings were + calculated +- `embeddings: mgp.List[mgp.List[mgp.Number]])` ➑ Corresponding list of + embeddings + +#### Usage: + +```cypher +CALL node2vec_online.get_embeddings(False, 2.0, 0.5, 4, 5, 100, 0.025, 5, 1, 1, 1, 0.0001, 1, 0, 5, 5); +``` + +### `help()` + +#### Output: + +- `name: string` ➑ Name of available functions +- `value: string` ➑ Documentation for every function + +#### Usage: + +```cypher +CALL node2vec_online.help(); +``` + +## Example + + + + + + + + + +```cypher +MERGE (n:Node {id: 1}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 2}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 10}) MERGE (m:Node {id: 5}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 5}) MERGE (m:Node {id: 2}) CREATE (n)-[:RELATION]->(m); + +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 7}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 7}) MERGE (m:Node {id: 3}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 3}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); + +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 8}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 8}) MERGE (m:Node {id: 4}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 4}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +``` + + + + +```cypher +CALL node2vec.set_embeddings(False, 2.0, 0.5, 4, 5, 2) YIELD *; +``` + + + + +```cypher +MATCH (n) +RETURN n.id as node, n.embedding as embedding +ORDER BY n.id; +``` + + + + +```plaintext ++-------------------------+-------------------------+ +| node | embedding | ++-------------------------+-------------------------+ +| 1 | [-0.243723, -0.0916009] | +| 2 | [0.25442, 0.449585] | +| 3 | [0.322331, 0.448404] | +| 4 | [0.143389, 0.0492275] | +| 5 | [-0.465552, -0.35653] | +| 6 | [-0.0272922, 0.0111898] | +| 7 | [0.368725, -0.0773199] | +| 8 | [-0.414683, -0.472285] | +| 9 | [-0.226683, 0.328159] | +| 10 | [-0.251244, -0.189218] | ++-------------------------+-------------------------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/node_similarity.md b/docs2/advanced-algorithms/available-algorithms/node_similarity.md new file mode 100644 index 00000000000..c33d007dfab --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/node_similarity.md @@ -0,0 +1,402 @@ +--- +id: node-similarity +title: node_similarity +sidebar_label: node_similarity +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +[![docs-source](https://img.shields.io/badge/source-node_similarity-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/node_similarity_module/node_similarity_module.cpp) + + +## Abstract + +If we're interested in how similar two nodes in a graph are, we'll want to get a numerical value that represents the node similarity between those two nodes. There are many node similarity measures and currently this module contains the following: +* cosine similarity +* Jaccard similarity +* overlap similarity + +**The Jaccard similarity** is computed using the following formula: + + +**The overlap similarity** is computed using the following formula: + + +**The cosine similarity** computes similarity between two nodes based on some property. This property should be a vector and it can be computed using the following formula: + + +Set A represents all outgoing neighbors of one node, set B represents all outgoing neighbors of the other node. In all the given formulas, the numerator is the cardinality of the intersection of set A and set B (in other words, the cardinality of the common neighbors set). The denominator differs but requires the cardinality of sets A and B in some way. + +For each similarity measure, there are two functions, one that calculates similarity between all pairs of nodes and the other, pairwise function, that takes into account pairwise similarities between two set of nodes. + + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `cosine()` + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The cosine similarity between the first and the second node. + +#### Usage: +```cypher +CALL node_similarity.cosine() YIELD node1, node2, similarity +RETURN node1, node2, similarity +``` + +### `cosine_pairwise(src_nodes, dst_nodes)` + +#### Input: + +* `src_nodes: List[Vertex]` ➑ The first set of nodes. +* `dst_nodes: List[Vertex]]` ➑ The second set of nodes. +* `property: str` ➑ The property based on which the cosine similarity will be calculated. If the property is not of the vector type, the error will be thrown. + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The cosine similarity between the first and the second node. + +#### Usage: +```cypher +MATCH (m) +WHERE m.id > 2 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id < 8 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.cosine_pairwise("score", nodes1, nodes2) YIELD node1, node2, similarity +RETURN node1, node2, similarity +``` + +### `jaccard()` + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The Jaccard similarity between the first and the second node. + +#### Usage: +```cypher +CALL node_similarity.jaccard() YIELD node1, node2, similarity +RETURN node1, node2, similarity; +``` + +### `jaccard_pairwise(src_nodes, dst_nodes)` + +#### Input: + +* `src_nodes: List[Vertex]` ➑ The first set of nodes. +* `dst_nodes: List[Vertex]` ➑ The second set of nodes. + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The Jaccard similarity between the first and the second node. + +#### Usage: + +```cypher +MATCH (m) +WHERE m.id > 2 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id < 8 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.jaccard_pairwise(nodes1, nodes2) YIELD node1, node2, similarity +RETURN node1, node2, similarity +``` + +### `overlap()` + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The overlap similarity between the first and the second node. + +#### Usage: +```cypher +CALL node_similarity.overlap() YIELD node1, node2, similarity +RETURN node1, node2, similarity; +``` + + +### `overlap_pairwise(node1, node2)` + +#### Input: + +* `src_nodes: List[Vertex]` ➑ The first set of nodes. +* `dst_nodes: List[Vertex]` ➑ The second set of nodes. + +#### Output: + +* `node1: Vertex` ➑ The first node. +* `node2: Vertex` ➑ The second node. +* `similarity: float` ➑ The overlap similarity between the first and the second node. + + +```cypher +MATCH (m) +WHERE m.id > 2 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id < 8 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.overlap_pairwise(nodes1, nodes2) YIELD node1, node2, similarity +RETURN node1, node2, similarity; +``` + +## Example - cosine pairwise similarity + + + + + + + + + + + +```cypher +CREATE (b:Node {id: 0, score: [1.0, 1.0, 1.0]}); +CREATE (b:Node {id: 1, score: [1.0, 1.0, 1.0]}); +CREATE (b:Node {id: 2, score: [1.0, 1.0, 1.0]}); +CREATE (b:Node {id: 3, score: [1.0, 1.0, 0.0]}); +CREATE (b:Node {id: 4, score: [0.0, 1.0, 0.0]}); +CREATE (b:Node {id: 5, score: [1.0, 0.0, 1.0]}); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (m) +WHERE m.id < 3 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id > 2 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.cosine_pairwise("score", nodes1, nodes2) YIELD node1, node2, similarity AS cosine_similarity +RETURN node1, node2, cosine_similarity; +``` + + + + + + +```plaintext ++-------------------+-------------------+-------------------+ +| node1 | node2 | cosine_similarity | ++-------------------+-------------------+-------------------+ +| (:Node {id: 1}) | (:Node {id: 3}) | 0.816 | +| (:Node {id: 2}) | (:Node {id: 4}) | 0.577 | +| (:Node {id: 0}) | (:Node {id: 5}) | 0.816 | ++-------------------+-------------------+-------------------+ + +``` + + + + + +## Example - Jaccard pairwise similarity + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (m) +WHERE m.id < 3 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id > 2 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.jaccard_pairwise(nodes1, nodes2) YIELD node1, node2, similarity AS jaccard_similarity +RETURN node1, node2, jaccard_similarity; +``` + + + + + + +```plaintext ++-------------------+-------------------+--------------------+ +| node1 | node2 | jaccard_similarity | ++-------------------+-------------------+--------------------+ +| (:Node {id: 1}) | (:Node {id: 3}) | 0.0 | +| (:Node {id: 2}) | (:Node {id: 4}) | 0.25 | +| (:Node {id: 0}) | (:Node {id: 5}) | 0.5 | ++-------------------+-------------------+--------------------+ + +``` + + + + + +## Example - overlap similarity + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (m) +WHERE m.id < 3 +WITH COLLECT(m) AS nodes1 +MATCH (n) +WHERE n.id > 2 +WITH COLLECT(n) AS nodes2, nodes1 +CALL node_similarity.overlap_pairwise(nodes1, nodes2) YIELD node1, node2, similarity AS overlap_similarity +RETURN node1, node2, overlap_similarity; +``` + + + + + + +```plaintext ++-------------------+-------------------+--------------------+ +| node1 | node2 | overlap_similarity | ++-------------------+-------------------+--------------------+ +| (:Node {id: 1}) | (:Node {id: 3}) | 0.0 | +| (:Node {id: 2}) | (:Node {id: 4}) | 0.5 | +| (:Node {id: 0}) | (:Node {id: 5}) | 1.0 | ++-------------------+-------------------+--------------------+ + +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/nxalg.md b/docs2/advanced-algorithms/available-algorithms/nxalg.md new file mode 100644 index 00000000000..7a1f77dc713 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/nxalg.md @@ -0,0 +1,1514 @@ +--- +id: nxalg +title: nxalg +sidebar_label: nxalg +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +This module, named **nxalg**, provides a comprehensive set of thin wrappers around most of the algorithms present in the [NetworkX](https://networkx.org/) package. The wrapper functions now have the capability to create a NetworkX compatible graph-like object that can stream the native database graph directly saving on memory usage significantly. + +[![docs-source](https://img.shields.io/badge/source-nxalg-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/nxalg.py) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed**/**undirected** | +| **Edge weights** | **weighted**/**unweighted** | +| **Parallelism** | **sequential** | + +:::tip + +If you are not satisfied with the performance of algorithms from the nxalg +module, check Memgraph's native implementation of algorithms such as PageRank, +betweenness centrality, and others written in C++ + +::: + +## Procedures + + + +### `all_shortest_paths(source, target, weight, method)` + +Compute all shortest simple paths in the graph. A simple path is a path with no repeated nodes. + +#### Input: + +* `source: Vertex` ➑ Starting node for the path. +* `target: Vertex` ➑ Ending node for the path. +* `weight: string (default=NULL)` ➑ If `NULL`, every edge has weight/distance/cost 1. If a string, use this edge attribute as the edge weight. Any edge attribute not present defaults to 1. +* `method: string (default="dijkstra")` ➑ The algorithm to use to compute the path lengths. Supported options: β€˜dijkstra’, β€˜bellman-ford’. Other inputs produce a ValueError. If `weight` is `None`, unweighted graph methods are used, and this suggestion is ignored. + +#### Output: + +* `paths: List[Vertex]` ➑ List of vertices for a certain path. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.all_shortest_paths(n, m) YIELD * +RETURN paths; +``` + +### `all_simple_paths(source, target, cutoff)` + +Returns all simple paths in the graph `G` from source to target. A simple path is a path with no repeated nodes. + +#### Input: + +* `source: Vertex` ➑ Starting node for the path. +* `target: Vertex` ➑ Ending node for the path. +* `cutoff: List[integer] (default=NULL)` ➑ Depth to stop the search. Only paths of `length <= cutoff` are returned. + +#### Output: + +* `paths: List[Vertex]` ➑ List of vertices for a certain path. If there are no paths between the source and target within the given cutoff there is no output. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.all_simple_paths(n, m, 5) YIELD * +RETURN paths; +``` + +### `ancestors(source)` + +Returns all nodes having a path to `source` in `G`. + +#### Input: + +* `source: Vertex` ➑ Starting node. Calculates all nodes that have a path to `source` + +#### Output: + +* `ancestors: List[Vertex]` ➑ List of vertices that have a path toward source node + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.ancestors(n) YIELD * +RETURN ancestors; +``` + +### `betweenness_centrality(k, normalized, weight, endpoints, seed)` + +Compute the shortest-path betweenness centrality for nodes. *Betweenness centrality* is a measure of centrality in a graph based on shortest paths. Centrality identifies the most important nodes within a graph. + +#### Input: + +* `k: string (default=NULL)` ➑ If `k` is not `None`, use `k` node samples to estimate betweenness. The value of `k <= n` where `n` is the number of nodes in the graph. Higher values give a better approximation. +* `normalized: boolean (default=True)` ➑ If `True` the betweenness values are normalized by `2/((n-1)(n-2))` for graphs, and `1/((n-1)(n-2))` for directed graphs where `n` is the number of nodes in `G`. +* `weight: string (default=NULL)` ➑ If `None`, all edge weights are considered equal. Otherwise holds the name of the edge attribute used as weight. +* `endpoints: boolean (default=False)` ➑ If `True`, includes the endpoints in the shortest path counts. +* `seed: integer (default=NULL)` ➑ Indicator of random number generation state. Note that this is only used if `k` is not `None`. + +#### Output: + +* `node: Vertex` ➑ Graph vertex for betweenness calculation +* `betweenness: double` ➑ Value of betweenness for a given node + +#### Usage: +```cypher +CALL nxalg.betweenness_centrality(20, True) YIELD * +RETURN node, betweenness; +``` + +### `bfs_edges(source, reverse, depth_limit)` + +Iterate over edges in a breadth-first-search starting at source. + +#### Input: + +* `source: Vertex` ➑ Specify starting node for breadth-first search; this function iterates over only those edges in the component reachable from this node. +* `reverse: boolean (default=False)` ➑ If `True`, traverse a directed graph in the reverse direction. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `edges: List[Edge]` ➑ List of edges in the breadth-first search. + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.bfs_edges(n, False) YIELD * +RETURN edges; +``` + + +### `bfs_predecessors(source, depth_limit)` + +Returns an iterator of predecessors in breadth-first-search from source. +#### Input: + +* `source: Vertex` ➑ Specify starting node for breadth-first search. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `node: Vertex` ➑ Node in a graph +* `predecessors: List[Vertex]` ➑ List of predecessors of given node + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.bfs_predecessors(n, 10) YIELD * +RETURN node, predecessors; +``` + +### `bfs_successors(source, depth_limit)` + +Returns an iterator of successors in breadth-first-search from source. + +#### Input: + +* `source: Vertex` ➑ Specify starting node for breadth-first search. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `node: Vertex` ➑ Node in a graph +* `successors: List[Vertex]` ➑ List of successors of given node + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.bfs_successors(n, 5) YIELD * +RETURN node, successors; +``` + +### `bfs_tree(source, reverse, depth_limit)` +Returns an oriented tree constructed from of a breadth-first-search starting at `source`. + + +#### Input: + +* `source: Vertex` ➑ Specify starting node for breadth-first search. +* `reversed: boolean (default=False)` ➑ If `True`, traverse a directed graph in the reverse direction. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `tree: List[Vertex]` ➑ An oriented tree in a list format. + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.bfs_tree(n, True, 3) YIELD * +RETURN n, tree; +``` + + +### `biconnected_components()` + +Returns a list of sets of nodes, one set for each biconnected +component of the graph + +*Biconnected components* are maximal subgraphs such that the removal of a +node (and all edges incident on that node) will not disconnect the +subgraph. Note that nodes may be part of more than one biconnected +component. Those nodes are articulation points or cut vertices. The +removal of articulation points will increase the number of connected +components of the graph. + +Notice that by convention a dyad is considered a biconnected component. + +#### Output: + +* `components: List[List[Vertex]]` ➑ A list of sets of nodes, one set for each biconnected component. + +#### Usage: +```cypher +CALL nxalg.biconnected_components() YIELD * +RETURN components; +``` + + +### `bridges(root)` + +Returns all bridges in a graph. + +A *bridge* in a graph is an edge whose removal causes the number of +connected components of the graph to increase. Equivalently, a bridge is an +edge that does not belong to any cycle. + +#### Input: + +* `root: Vertex (default=NULL)` ➑ A node in the graph `G`. If specified, only the bridges in the connected components containing this node will be returned. + +#### Output: + +* `bridges: List[Edge]` ➑ A list of edges in the graph whose removal disconnects the graph (or causes the number of connected components to increase). + +#### Usage: +```cypher +CALL nxalg.bridges() YIELD * +RETURN bridges; +``` + +### `center()` + +Returns the center of the graph `G`. + +The *center* is the set of nodes with eccentricity equal to the radius. + +#### Output: + +* `center: List[Vertex]` ➑ List of nodes in center. + +#### Usage: +```cypher +CALL nxalg.center() YIELD * +RETURN center; +``` + +### `chain_decomposition(root)` + +Returns the chain decomposition of a graph. + +The *chain decomposition* of a graph with respect to a depth-first +search tree is a set of cycles or paths derived from the set of +fundamental cycles of the tree in the following manner. Consider +each fundamental cycle with respect to the given tree, represented +as a list of edges beginning with the non tree edge oriented away +from the root of the tree. For each fundamental cycle, if it +overlaps with any previous fundamental cycle, just take the initial +non-overlapping segment, which is a path instead of a cycle. Each +cycle or path is called a *chain*. + +#### Input: + +* `root: Vertex[NULL]` ➑ Optional. A node in the graph `G`. If specified, only the chain decomposition for the connected component containing this node will be returned. This node indicates the root of the depth-first Search tree. + +#### Output: + +* `chains: List[List[Edge]]` ➑ A list of edges representing a chain. There is no guarantee on the orientation of the edges in each chain (for example, if a chain includes the edge joining nodes 1 and 2, the chain may include either (1, 2) or (2, 1)). + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.chain_decomposition(n) YIELD * +RETURN chains; +``` +### `check_planarity()` + +Check if a graph is planar. + +A graph is planar if it can be drawn in a plane without +any edge intersections. + +#### Output: + +* `is_planar: boolean` ➑ `True` if the graph is planar. + +#### Usage: +```cypher +CALL nxalg.check_planarity() YIELD * +RETURN is_planar; +``` +### `clustering(nodes, weight)` +Compute the clustering coefficient for nodes. + +A *clustering coefficient* is a measure of the degree to which nodes +in a graph tend to cluster together. + + +#### Input: + +* `nodes: List[Vertex] (default=NULL)` ➑ Compute clustering for nodes in this container. +* `weight: string (default=NULL)` ➑ The edge attribute that holds the numerical value used as a weight. If `None`, then each edge has weight 1. + +#### Output: + +* `node: Vertex` ➑ Node in graph for calculation of clustering +* `clustering: double` ➑ Clustering coefficient at specified nodes. + +#### Usage: +```cypher +MATCH (n:SpecificLabel) +WITH COLLECT(n) AS cluster_nodes +CALL nxalg.clustering(cluster_nodes) YIELD * +RETURN node, clustering; +``` +### `communicability()` + +Returns communicability between all pairs of nodes in `G`. + +The *communicability* between pairs of nodes in `G` is the sum of +closed walks of different lengths starting at node `u` and ending at node `v`. + +#### Output: + +* `node1: Vertex` ➑ First value in communicability calculation +* `node2: Vertex` ➑ Second value in communicability calculation +* `communicability: double` ➑ Value of communicability between two values. + +#### Usage: +```cypher +CALL nxalg.communicability() YIELD * +RETURN node1, node2, communicability +ORDER BY communicability DESC; +``` +### `core_number()` +Returns the core number for each vertex. + +A *k-core* is a maximal subgraph that contains nodes of degree `k` or more. + +The core number of a node is the largest value `k` of a k-core containing +that node. + + +#### Output: + +* `node: Vertex` ➑ Node to calculate k-core for +* `core: integer` ➑ Largest value `k` of a k-core + +#### Usage: +```cypher +CALL nxalg.core_number() YIELD * +RETURN node, core +ORDER BY core DESC; +``` +### `degree_assortativity_coefficient(x, y, weight, nodes)` +Compute degree assortativity of a graph. + +*Assortativity* measures the similarity of connections +in the graph with respect to the node degree. + + +#### Input: + +* `x: string (default="out")` ➑ The degree type for source node (directed graphs only). Can be "in" or "out". +* `y: string (default="in")` ➑ The degree type for target node (directed graphs only). Can be "in" or "out". +* `weight: string (default=NULL)` ➑ The edge attribute that holds the numerical value used as a weight. If `None`, then each edge has weight 1. The degree is the sum of the edge weights adjacent to the node. +* `nodes: List[Vertex] (default=NULL)` ➑ Compute degree assortativity only for nodes in a container. The default is all nodes. + +#### Output: + +* `assortativity: double` ➑ Assortativity of graph by degree. + +#### Usage: +```cypher +CALL nxalg.degree_assortativity_coefficient('out', 'in') YIELD * +RETURN assortativity; +``` +### `descendants(source)` + +Returns all nodes reachable from `source` in `G`. + + +#### Input: + +* `source: Vertex` ➑ A node in `G`. + +#### Output: + +* `descendants: List[Vertex]` ➑ The descendants of `source` in `G`. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.descendants(source) YIELD * +RETURN descendants; +``` +### `dfs_postorder_nodes(source, depth_limit)` + +Returns nodes in a depth-first-search post-ordering starting at source. + +#### Input: + +* `source: Vertex` ➑ Specify the maximum search depth. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `nodes: List[Vertex]` ➑ A list of nodes in a depth-first-search post-ordering. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dfs_postorder_nodes(source, 10) YIELD * +RETURN source, nodes; +``` +### `dfs_predecessors(source, depth_limit)` + +Returns a dictionary of predecessors in depth-first-search from source. + +#### Input: + +* `source: Vertex` ➑ Specify the maximum search depth. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `node: Vertex` ➑ Node we are looking a predecessor for. +* `predecessor: Vertex` ➑ predecessor of a given node. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dfs_predecessors(source, 10) YIELD * +RETURN node, predecessor; +``` +### `dfs_preorder_nodes(source, depth_limit)` + +Returns nodes in a depth-first-search pre-ordering starting at source. + +#### Input: + +* `source: Vertex` ➑ Specify starting node for depth-first search and return nodes in the component reachable from this node. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `nodes: List[Vertex]` ➑ A list of nodes in a depth-first-search pre-ordering. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dfs_preorder_nodes(source, 10) YIELD * +RETURN source, nodes AS preoder_nodes; +``` +### `dfs_successors(source, depth_limit)` + +Returns a dictionary of successors in depth-first-search from source. + +#### Input: + +* `source: Vertex` ➑ Specify starting node for depth-first search and return nodes in the component reachable from this node. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `node: Vertex` ➑ Node to calculate successors +* `successors: List[Vertex]` ➑ Successors of a given nodes + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dfs_successors(source, 5) YIELD * +RETURN node, successors; +``` +### `dfs_tree(source, depth_limit)` + +Returns an oriented tree constructed from a depth-first-search from source. + +#### Input: + +* `source: Vertex` ➑ Specify starting node for depth-first search. +* `depth_limit: integer (default=NULL)` ➑ Specify the maximum search depth. + +#### Output: + +* `tree: List[Vertex]` ➑ An oriented tree in a form of a list. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dfs_tree(source, 7) YIELD * +RETURN tree; +``` +### `diameter()` +Returns the diameter of the graph `G`. + +The diameter is the maximum eccentricity. + +#### Output: + +* `diameter: integer` ➑ Diameter of graph. + +#### Usage: +```cypher +CALL nxalg.diameter() YIELD * +RETURN diameter; +``` +### `dominance_frontiers(start)` + +Returns the dominance frontiers of all nodes of a directed graph. + +The *dominance frontier* of a node `d` is the set of all +nodes such that `d` dominates an immediate +predecessor of a node, but `d` does not strictly dominate that node. + +#### Input: + +* `start: Vertex` ➑ The start node of dominance computation. + +#### Output: + +* `node: Vertex` ➑ Node to calculate frontier. +* `frontier: List[Vertex]` ➑ Dominance frontier for a given node. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dominance_frontiers(source) YIELD * +RETURN node, frontier; +``` +### `dominating_set(start)` +Finds a dominating set for the graph `G`. + +A *dominating set* for a graph with node set `V` is a subset `D` of +`V` such that every node not in `D` is adjacent to at least one +member of `D`. + + +#### Input: + +* `start: Vertex` ➑ Node to use as a starting point for the algorithm. + +#### Output: + +* `dominating_set: List[Vertex]` ➑ A dominating set for `G`. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.dominating_set(source) YIELD * +RETURN dominating_set; +``` +### `edge_bfs(source, orientation)` +A directed, breadth-first-search of edges in `G`, beginning at `source`. + +Return the edges of `G` in a breadth-first-search order continuing until +all edges are generated. + + +#### Input: + +* `source: Vertex` ➑ The node from which the traversal begins. If `None`, then a source is chosen arbitrarily and repeatedly until all edges from each node in the graph are searched. +* `orientation: string (default=NULL)` ➑ For directed graphs and directed multigraphs, edge traversals need not respect the original orientation of the edges. When set to β€˜reverse’, every edge is traversed in the reverse direction. When set to β€˜ignore’, every edge is treated as undirected. When set to β€˜original’, every edge is treated as directed. In all three cases, the returned edge tuples add a last entry to indicate the direction in which that edge was traversed. If `orientation` is `None`, the returned edge has no direction indicated. The direction is respected, but not reported. + +#### Output: + +* `edges: List[Edges]` ➑ A directed edge indicating the path taken by the breadth-first-search. For graphs, edge is of the form `(u, v)` where `u` and `v` are the tail and head of the edge as determined by the traversal. For multigraphs, edge is of the form `(u, v, key)`, where `key` is the key of the edge. When the graph is directed, then u and `v` are always in the order of the actual directed edge. If `orientation` is not `None` then the edge tuple is extended to include the direction of traversal (β€˜forward’ or β€˜reverse’) on that edge. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.edge_bfs(source, 'ignore') YIELD * +RETURN source, edges; +``` +### `edge_dfs(source, orientation)` + +A directed, depth-first-search of edges in `G`, beginning at `source`. + +Return the edges of `G` in a depth-first-search order continuing until +all edges are generated. + +#### Input: + +* `source: Vertex (default=NULL)` ➑ The node from which the traversal begins. If `None`, then a source is chosen arbitrarily and repeatedly until all edges from each node in the graph are searched. +* `orientation: string (default=NULL)` ➑ For directed graphs and directed multigraphs, edge traversals +need not respect the original orientation of the edges. +When set to β€˜reverse’, every edge is traversed in the reverse direction. +When set to β€˜ignore’, every edge is treated as undirected. +When set to β€˜original’, every edge is treated as directed. +In all three cases, the returned edge tuples add a last entry to +indicate the direction in which that edge was traversed. +If `orientation` is `None`, the returned edge has no direction indicated. +The direction is respected, but not reported. + +#### Output: + +* `edges: List[Edge]` ➑ A directed edge indicating the path taken by the depth-first traversal. +For graphs, edge is of the form `(u, v)` where `u` and `v` +are the tail and head of the edge as determined by the traversal. +For multigraphs, edge is of the form `(u, v, key)`, where `key` is +the key of the edge. When the graph is directed, then `u` and `v` +are always in the order of the actual directed edge. +If `orientation` is not `None` then the edge tuple is extended to include +the direction of traversal (β€˜forward’ or β€˜reverse’) on that edge. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.edge_dfs(source, 'original') YIELD * +RETURN source, edges; +``` +### `find_cliques()` + +Returns all maximal cliques in an undirected graph. + +For each node `v`, a *maximal clique* for `v` is the largest complete +subgraph containing `v`. The largest maximal clique is sometimes +called the *maximum clique*. + +This function returns an iterator over cliques, each of which is a +list of nodes. It is an iterative implementation, so should not +suffer from recursion depth issues. + +#### Output: + +* `cliques: List[List[Vertex]]` ➑ An iterator over maximal cliques, each of which is a list of +nodes in `G`. The order of cliques is arbitrary. + +#### Usage: +```cypher +CALL nxalg.find_cliques() YIELD * +RETURN cliques; +``` +### `find_cycle(source, orientation)` + +Returns a cycle found via depth-first traversal. + +A *cycle* is a closed path in the graph. +The orientation of directed edges is determined by `orientation`. + +#### Input: + +* `source: List[Vertex] (default=NULL)` ➑ The node from which the traversal begins. If `None`, then a source is chosen arbitrarily and repeatedly until all edges from each node in the graph are searched. +* `orientation: string (default=NULL)` ➑ For directed graphs and directed multigraphs, edge traversals +need not respect the original orientation of the edges. When set to β€˜reverse’ every edge is traversed in the reverse direction. When set to β€˜ignore’, every edge is treated as undirected. When set to β€˜original’, every edge is treated as directed. In all three cases, the yielded edge tuples add a last entry to indicate the direction in which that edge was traversed. If `orientation` is `None`, the yielded edge has no direction indicated. The direction is respected, but not reported. + +#### Output: + +* ` ` ➑ A list of directed edges indicating the path taken for the loop. If no cycle is found, then an exception is raised. For graphs, an edge is of the form `(u, v)` where `u` and `v` are the tail and the head of the edge as determined by the traversal. For multigraphs, an edge is of the form `(u, v, key)`, where `key` is the key of the edge. When the graph is directed, then `u` and `v` are always in the order of the actual directed edge. If `orientation` is not `None` then the edge tuple is extended to include the direction of traversal (β€˜forward’ or β€˜reverse’) on that edge. + +#### Usage: +```cypher +MATCH (source:Label) +CALL nxalg.find_cycle(source) YIELD * +RETURN source, edges; +``` + +### `flow_hierarchy(weight)` + +Returns the flow hierarchy of a directed network. + +*Flow hierarchy* is defined as the fraction of edges not participating in cycles in a directed graph. + +#### Input: + +* `weight: string (default=NULL)` ➑ Attribute to use for node weights. If `None`, the weight defaults to 1. + +#### Output: + +* `flow_hierarchy: double` ➑ Flow hierarchy value. + +#### Usage: +```cypher +CALL nxalg.flow_hierarchy() YIELD * +RETURN flow_hierarchy; +``` +### `global_efficiency()` + +Returns the average global efficiency of the graph. The *efficiency* of a pair of nodes in a graph is the multiplicative inverse of the shortest path distance between the nodes. The *average global efficiency* of a graph is the average efficiency of all pairs of nodes. + +#### Output: + +* `global_efficiency: double` ➑ The average global efficiency of the graph. + +#### Usage: +```cypher +CALL nxalg.global_efficiency() YIELD * +RETURN global_efficiency; +``` +### `greedy_color(strategy, interchange)` +Color a graph using various strategies of greedy graph coloring. Attempts to color a graph using as few colors as possible, where no neighbors of a node can have the same color as the node itself. The given strategy determines the order in which nodes are colored. + + +#### Input: + +* `strategy` ➑ The parameter `function(G,colors)` is a function (or a string representing a function) that provides the coloring strategy, by returning nodes in the order they should be colored. `G` is the graph, and `colors` is a dictionary of the currently assigned colors, keyed by nodes. The function must return an iterable over all the nodes in `G`. If the strategy function is an iterator generator (a function with +`yield` statements), keep in mind that the `colors` dictionary will be updated after each `yield`, since this function chooses colors greedily. If `strategy` is a string, it must be one of the following, each of which represents one of the built-in strategy functions. +`'largest_first'` +`'random_sequential'` +`'smallest_last'` +`'independent_set'` +`'connected_sequential_bfs'` +`'connected_sequential_dfs'` +`'connected_sequential'` (alias for the previous strategy) +`'saturation_largest_first'` +`'DSATUR'` (alias for the previous strategy) +* `interchange: boolean (default=False)` ➑ Will use the color interchange algorithm if set to `True`. Note that `saturation_largest_first` and `independent_set` do not work with interchange. Furthermore, if you use interchange with your own strategy function, you cannot rely on the values in the `colors` argument. +#### Output: + +* `node: Vertex` ➑ Vertex to color. +* `color: integer` ➑ Color index of a certain node. + +#### Usage: +```cypher +CALL nxalg.greedy_color('connected_sequential_bfs') YIELD * +RETURN node, color; +``` +### `has_eulerian_path()` + + An *Eulerian path* is a path in a graph that uses each edge of a graph exactly once. + A directed graph has an Eulerian path if: +* at most one vertex has `out_degree - in_degree = 1`, +* at most one vertex has `in_degree - out_degree = 1`, +* every other vertex has equal in_degree and out_degree, +* and all of its vertices with nonzero degree belong to a single connected component of the underlying undirected graph. + An undirected graph has an Eulerian path if exactly zero or two vertices have an odd degree and all of its vertices with nonzero degrees belong to a single connected component. + +#### Output: + +* `has_eulerian_path: boolean` ➑ `True` if `G` has an eulerian path. + +#### Usage: +```cypher +CALL nxalg.has_eulerian_path() YIELD * +RETURN has_eulerian_path; +``` + + +### `has_path(source, target)` + +Returns `True` if `G` has a path from `source` to `target`. + +#### Input: + +* `source: Vertex` ➑ Starting node for the path. +* `target: Vertex` ➑ Ending node for the path. + +#### Output: + +* `has_path: boolean` ➑ `True` if `G` has a path from `source` to `target`. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.has_path(n, m) YIELD * +RETURN has_path; +``` + +### `immediate_dominators(start)` + +Returns the immediate dominators of all nodes of a directed graph. The immediate dominator of a node is the unique node that Strictly dominates a node `n` but does not strictly dominate any other node That dominates `n`. + +#### Input: + +* `start: Vertex` ➑ The start node of dominance computation. + +#### Output: + +* `node: Vertex` ➑ Vertex to calculate dominator for. +* `dominator: Vertex` ➑ Dominator node for certain vertex. + +#### Usage: +```cypher +MATCH (n:Label) +CALL nxalg.immediate_dominators(n) YIELD * +RETURN node, dominator; +``` + +### `is_arborescence()` + +Returns `True` if `G` is an arborescence. An *arborescence* is a directed tree with maximum in-degree equal to 1. + +#### Output: + +* `is_arborescence: boolean` ➑ A boolean that is `True` if `G` is an arborescence. + +#### Usage: +```cypher +CALL nxalg.is_arborescence() YIELD * +RETURN is_arborescence; +``` + +### `is_at_free()` + +Check if a graph is AT-free. The method uses the find_asteroidal_triple method to recognize an AT-free graph. If no asteroidal triple is found, the graph is AT-free and `True` is returned. If at least one asteroidal triple is found, the graph is not AT-free and `False` is returned. + +#### Output: + +* `is_at_free: boolean` ➑ `True` if `G` is AT-free and `False` otherwise. + +#### Usage: +```cypher +CALL nxalg.is_at_free() YIELD * +RETURN is_at_free; +``` +### `is_bipartite()` + +Returns `True` if graph `G` is bipartite, `False` if not. A *bipartite graph* (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets `u` and `v` and such that every edge connects a vertex in `u` one in `v`. + +#### Output: + +* `is_bipartite: boolean` ➑ `True` if `G` is bipartite and `False` otherwise. + +#### Usage: +```cypher +CALL nxalg.is_bipartite() YIELD * +RETURN is_bipartite; +``` + +### `is_branching()` + +Returns `True` if `G` is a branching. A *branching* is a directed forest with maximum in-degree equal to 1. + +#### Output: + +* `is_branching: boolean` ➑ A boolean that is `True` if `G` is a branching. + +#### Usage: +```cypher +CALL nxalg.is_branching() YIELD * +RETURN is_branching; +``` + +### `is_chordal()` + +Checks whether `G` is a chordal graph. A graph is *chordal* if every cycle of length at least 4 has a chord (an edge joining two nodes not adjacent in the cycle). + +#### Output: + +* `is_chordal: boolean` ➑ `True` if `G` is a chordal graph and `False` otherwise. + +#### Usage: +```cypher +CALL nxalg.is_chordal() YIELD * +RETURN is_chordal; +``` + +### `is_distance_regular()` + +Returns `True` if the graph is distance regular, `False` otherwise. A connected graph `G` is distance-regular if for any nodes `x,y` and any integers `i,j=0,1,...,d` (where `d` is the graph diameter), the number of vertices at distance `i` from `x` and distance `j` from `y` depends only on `i,j` and the graph distance between `x` and `y`, independently of the choice of `x` and `y`. + +#### Output: + +* `is_distance_regular: boolean` ➑ `True` if the graph is Distance Regular, `False` otherwise. + +#### Usage: +```cypher +CALL nxalg.is_distance_regular() YIELD * +RETURN is_distance_regular; +``` +### `is_edge_cover(cover)` + +Decides whether a set of edges is a valid edge cover of the graph. Given a set of edges, it can be decided whether the set is an *edge covering* if checked whether all nodes of the graph have an edge from the set incident on it. +#### Input: + +* `cover: List[Edge]` ➑ A list of edges to be checked. + +#### Output: + +* `is_edge_cover: boolean` ➑ Whether the set of edges is a valid edge cover of the graph. + +#### Usage: +```cypher +MATCH (n)-[e]-(m) +WITH COLLECT(e) AS cover +CALL nxalg.is_edge_cover(cover) YIELD * +RETURN is_edge_cover; +``` + +### `is_eulerian()` + +Returns `True` if and only if `G` is Eulerian. A graph is *Eulerian* if it has an Eulerian circuit. An *Eulerian circuit* is a closed walk that includes each edge of a graph exactly once. + +#### Output: + +* `is_eulerian: boolean` ➑ `True` if `G` is Eulerian. + +#### Usage: +```cypher +CALL nxalg.is_eulerian() YIELD * +RETURN is_eulerian; +``` +### `is_forest()` + +Returns `True` if `G` is a forest. A *forest* is a graph with no undirected cycles. + For directed graphs, `G` is a forest if the underlying graph is a forest. The underlying graph is obtained by treating each directed edge as a single undirected edge in a multigraph. +#### Output: + +* `is_forest: boolean` ➑ A boolean that is `True` if `G` is a forest. + +#### Usage: +```cypher +CALL nxalg.is_forest() YIELD * +RETURN is_forest; +``` + +### `is_isolate(n)` + +Determines whether a node is an isolate. + An *isolate* is a node with no neighbors (that is, with degree zero). For directed graphs, this means no in-neighbors and no out-neighbors. + +#### Input: + +* `n: Vertex` ➑ A node in `G`. + +#### Output: + +* `is_isolate: boolean` ➑ `True` if and only if `n` has no neighbors. + +#### Usage: +```cypher +MATCH (n) +CALL nxalg.is_isolate(n) YIELD * +RETURN is_isolate; +``` +### `is_isomorphic(nodes1, edges1, nodes2, edges2)` + +Returns `True` if the graphs `G1` and `G2` are isomorphic and `False` otherwise. The two graphs `G1` and `G2` must be the same type. + +#### Input: + +* `nodes1: List[Vertex]` ➑ Nodes in `G1`. +* `edges1: List[Edge]` ➑ Edges in `G1`. +* `nodes2: List[Vertex]` ➑ Nodes in `G2`. +* `edges2: List[Edge]` ➑ Edges in `G2`. + +#### Output: + +* `is_isomorphic: boolean` ➑ `True` if the graphs `G1` and `G2` are isomorphic and `False` otherwise. + +#### Usage: +```cypher +MATCH (n:Label1)-[e]-(), (r:Label2)-[f]-() +WITH +COLLECT(n) AS nodes1 +COLLECT(e) AS edges1 +COLLECT(r) AS nodes2 +COLLECT(f) AS edges2 +CALL nxalg.is_isomorphic(nodes1, edges1, nodes2, edges2) YIELD * +RETURN is_isomorphic; +``` + +### `is_semieulerian()` + +Returns `True` if `G` is semi-Eulerian. + +`G` is semi-Eulerian if it has an Eulerian path but no Eulerian circuit. + +#### Output: + +* `is_semieulerian: boolean` ➑ `True` if `G` is semi-Eulerian. + +#### Usage: +```cypher +CALL nxalg.is_semieulerian() YIELD * +RETURN is_semieulerian; +``` +### `is_simple_path(nodes)` + +Returns `True` if and only if the given nodes form a simple path in +`G`. + A *simple path* in a graph is a nonempty sequence of nodes in which no node appears more than once in the sequence and each adjacent pair of nodes in the sequence is adjacent in the graph. + +#### Input: + +* `nodes: List[Vertex]` ➑ A list of one or more nodes in the graph `G`. + +#### Output: + +* `is_simple_path: boolean` ➑ Whether the given list of nodes represents a simple path in `G`. + +#### Usage: +```cypher +MATCH (n:Label) +WITH COLLECT(n) AS nodes +CALL nxalg.is_simple_path(nodes) YIELD * +RETURN is_simple_path; +``` + +### `is_strongly_regular()` + +Returns `True` if and only if the given graph is strongly regular. + An undirected graph is *strongly regular* if: + + +* it is regular, + + +* each pair of adjacent vertices has the same number of neighbors in common, + + +* each pair of nonadjacent vertices has the same number of neighbors in common. + Each strongly regular graph is a distance-regular graph. Conversely, if a distance-regular graph has a diameter of two, then it is a strongly regular graph. + +#### Output: + +* `is_strongly_regular: boolean` ➑ Whether `G` is strongly regular. + +#### Usage: +```cypher +CALL nxalg.is_strongly_regular() YIELD * +RETURN is_strongly_regular; +``` + +### `is_tournament()` +Returns `True` if and only if `G` is a tournament. + A *tournament* is a directed graph, with neither self-loops nor multi-edges, in which there is exactly one directed edge joining each pair of distinct nodes. + +#### Output: + +* `is_tournament: boolean` ➑ Whether the given graph is a tournament graph. + +#### Usage: +```cypher +CALL nxalg.is_tournament() YIELD * +RETURN is_tournament; +``` +### `is_tree()` + +Returns `True` if `G` is a tree. + A *tree* is a connected graph with no undirected cycles. + For directed graphs, `G` is a tree if the underlying graph is a tree. The underlying graph is obtained by treating each directed edge as a single undirected edge in a multigraph. + +#### Output: + +* `is_tree: boolean` ➑ A boolean that is `True` if `G` is a tree. + +#### Usage: +```cypher +CALL nxalg.is_tree() YIELD * +RETURN is_tree; +``` +### `isolates()` +Returns a list of isolates in the graph. + An *isolate* is a node with no neighbors (that is, with degree zero). For directed graphs, this means no in-neighbors and no out-neighbors. +#### Output: + +* `isolates: List[Vertex]` ➑ A list of isolates in `G`. +#### Usage: +```cypher +CALL nxalg.isolates() YIELD * +RETURN isolates; +``` +### `jaccard_coefficient(ebunch)` +Compute the Jaccard coefficient of all node pairs in `ebunch`. + +*Jaccard coefficient* compares members of two sets to see which members are shared and which are distinct. +#### Input: + +* `ebunch: List[List[Vertex]] (default=NULL)` ➑ Jaccard coefficient will be computed for each pair of nodes given in the iterable. The pairs must be given as 2-tuples +`(u, v)` where `u` and `v` are nodes in the graph. If `ebunch` is `None` then all non-existent edges in the graph will be used. + +#### Output: + +* `u: Vertex` ➑ First node in pair. +* `v: Vertex` ➑ Second node in pair. +* `coef: Vertex` ➑ Jaccard coefficient. + +#### Usage: +```cypher +CALL nxalg.jaccard_coefficient() YIELD * +RETURN u, v, coef; +``` +### `k_clique_communities(k, cliques)` +Find k-clique communities in a graph using the percolation method. + A *k-clique community* is the union of all cliques of size `k` that can be reached through adjacent (sharing `k-1` nodes) k-cliques. +#### Input: + +* `k: integer` ➑ Size of the smallest clique. +* `cliques: List[List[Vertex]] (default=NULL)` ➑ Precomputed cliques (use networkx.find_cliques(G)). + +#### Output: + +* `communities: List[List[Vertex]]` ➑ Sets of nodes, one for each k-clique community. + +#### Usage: +```cypher +CALL nxalg.k_clique_communities(3) YIELD * +RETURN communities; +``` +### `k_components(density)` +Returns the approximate k-component structure of a graph `G`. + A *k-component* is a maximal subgraph of a graph `G` that has, at least, node connectivity `k`: we need to remove at least `k` nodes to break it into more components. k-components have an inherent hierarchical structure because they are nested in terms of connectivity: a connected graph can contain several 2-components, each of which can contain one or more 3-components, and so forth. + This implementation is based on the fast heuristics to approximate the k-component structure of a graph. This, in turn, is based on a fast approximation algorithm for finding good lower bounds of the number of node independent paths between two nodes. +#### Input: + +* `min_density: double (default=0.95)` ➑ Density relaxation threshold. + +#### Output: + +* `k: integer` ➑ Connectivity level k +* `components: List[List[Vertex]]` ➑ List of sets of nodes that form a k-component of level `k` as values. + +#### Usage: +```cypher +CALL nxalg.k_components(0.8) YIELD * +RETURN k, components; +``` +### `k_edge_components(k)` +Returns nodes in each maximal k-edge-connected component in `G`. +A connected graph is *k-edge-connected* if it remains connected whenever fewer than k edges are removed. The edge-connectivity of a graph is the largest k for which the graph is k-edge-connected. +#### Input: + +* `k: integer` ➑ Desired edge connectivity. + +#### Output: + +* `components: List[List[Vertex]]` ➑ A list of k-edge-ccs. Each set of returned nodes will have k-edge-connectivity in the graph `G`. + +#### Usage: +```cypher +CALL nxalg.k_edge_components(3) YIELD * +RETURN components; +``` +### `local_efficiency()` +Returns the average local efficiency of the graph. + The *efficiency* of a pair of nodes in a graph is the multiplicative inverse of the shortest path distance between the nodes. The *local efficiency* of a node in the graph is the average global efficiency of the subgraph induced by the neighbors of the node. The *average local efficiency* is the average of the local efficiencies of each node. +#### Output: + +* `local_efficiency: double` ➑ The average local efficiency of the graph. + +#### Usage: +```cypher +CALL nxalg.local_efficiency() YIELD * +RETURN local_efficiency; +``` +### `lowest_common_ancestor(node1, node2)` +Compute the lowest common ancestor of the given pair of nodes. +#### Input: + +* `node1: Vertex` ➑ A node in the graph. +* `node2: Vertex` ➑ A node in the graph. + +#### Output: + +* `ancestor: Vertex` ➑ The lowest common ancestor of `node1` and `node2`, or default if they have no common ancestors. +#### Usage: +```cypher +MATCH (n), (m) +WHERE n != m +CALL nxalg.lowest_common_ancestor(n, m) YIELD * +RETURN n, m, ancestor; +``` +### `maximal_matching()` + A *matching* is a subset of edges in which no node occurs more than once. A *maximal matching* cannot add more edges and still be a matching. + +#### Output: + +* `edges: List[Edge]` ➑ A maximal matching of the graph. + +#### Usage: +```cypher +CALL nxalg.maximal_matching() YIELD * +RETURN edges; +``` +### `minimum_spanning_tree(weight, algorithm, ignore_nan)` +Returns a minimum spanning tree or forest on an undirected graph `G`. + A *minimum spanning tree* is a subset of the edges of a connected, undirected graph that connects all of the vertices together without any cycles. +#### Input: + +* `weight: string (default="weight")` ➑ Data key to use for edge weights. +* `algorithm: string (default="kruskal")` ➑ The algorithm to use when finding a minimum spanning tree. Valid choices are β€˜kruskal’, β€˜prim’, or β€˜boruvka’. +* `ignore_nan: boolean (default=False)` ➑ If `NaN` is found as an edge weight normally an exception is raised. If `ignore_nan` is `True` then that edge is ignored. + +#### Output: + +* `node: List[Vertex]` ➑ A minimum spanning tree or forest. +* `edges: List[Edge]` ➑ A minimum spanning tree or forest. + +#### Usage: +```cypher +CALL nxalg.minimum_spanning_tree("weight", "prim", TRUE) YIELD * +RETURN node, edges; +``` +### `multi_source_dijkstra_path(sources, cutoff, weight)` +Find shortest weighted paths in G from a given set of source nodes. + +Compute shortest path between any of the source nodes and all other reachable nodes for a weighted graph. +#### Input: + +* `sources: List[Vertex]` ➑ Starting nodes for paths. If this is a set containing a single node, then all paths computed by this function will start from that node. If there are two or more nodes in the set, the computed paths may begin from any one of the start nodes. +* `cutoff: integer (default=NULL)` ➑ Depth to stop the search. Only return paths with `length <= cutoff`. +* `weight: string` ➑ If this is a string, then edge weights will be accessed via the edge attribute with this key (that is, the weight of the edge joining `u` to `v` will be `G.edges[u, v][weight]`). If no such edge attribute exists, the weight of the edge is assumed to be one. If this is a function, the weight of an edge is the value returned by the function. The function must accept exactly three positional arguments: the two endpoints of an edge and the dictionary of edge attributes for that edge. The function must return a number. + +#### Output: + +* `target: Vertex` ➑ Target key for shortest path +* `path: List[Vertex]` ➑ Shortest path in a list + +#### Usage: +```cypher +MATCH (n:Label) +COLLECT (n) AS sources +CALL nxalg.multi_source_dijkstra_path(sources, 7) YIELD * +RETURN target, path; +``` +### `multi_source_dijkstra_path_length(sources, cutoff, weight)` +Find shortest weighted path lengths in `G` from a given set of source nodes. + +Compute the shortest path length between any of the source nodes and all other reachable nodes for a weighted graph. +#### Input: + +* `sources: List[Vertex]` ➑ Starting nodes for paths. If this is a set containing a single node, then all paths computed by this function will start from that node. If there are two or more nodes in the set, the computed paths may begin from any one of the start nodes. +* `cutoff: integer (default=NULL)` ➑ Depth to stop the search. Only return paths with `length <= cutoff`. +* `weight: string` ➑ If this is a string, then edge weights will be accessed via the edge attribute with this key (that is, the weight of the edge joining `u` to `v` will be `G.edges[u, v][weight]`). If no such edge attribute exists, the weight of the edge is assumed to be one. If this is a function, the weight of an edge is the value returned by the function. The function must accept exactly three positional arguments: the two endpoints of an edge and the dictionary of edge attributes for that edge. The function must return a number. + +#### Output: + +* `target: Vertex` ➑ Target key for shortest path +* `length: double` ➑ Shortest path length + + +#### Usage: +```cypher +MATCH (n:Label) +COLLECT (n) AS sources +CALL nxalg.multi_source_dijkstra_path_length(sources, 5) YIELD * +RETURN target, length; +``` +### `node_boundary(nbunch1, bunch2)` +Returns the node boundary of `nbunch1`. + The *node boundary* of a set `S` with respect to a set `T` is the set of nodes `v` in `T` such that for some `u` in `S`, there is an edge joining `u` to `v`. If `T` is not specified, it is assumed to be the set of all nodes not in `S`. +#### Input: + +* `nbunch1: List[Vertex]` ➑ List of nodes in the graph representing the set of nodes whose node boundary will be returned. (This is the set `S` from the definition above.) +* `nbunch2: List[Vertex] (default=NULL)` ➑ List of nodes representing the target (or β€œexterior”) set of nodes. (This is the set `T` from the definition above.) If not specified, this is assumed to be the set of all nodes in `G` not in `nbunch1`. + +#### Output: + +* `boundary: List[Vertex]` ➑ The node boundary of `nbunch1` with respect to `nbunch2`. + +#### Usage: +```cypher +MATCH (n:Label) +COLLECT (n) AS sources1 +CALL nxalg.node_boundary(sources1) YIELD * +RETURN boundary; +``` +### `node_connectivity(source, target)` +Returns an approximation for node connectivity for a graph or digraph `G`. + +*Node connectivity* is equal to the minimum number of nodes that must be removed to disconnect `G` or render it trivial. By Menger’s theorem, this is equal to the number of node independent paths (paths that share no nodes other than `source` and `target`). + If `source` and `target` nodes are provided, this function returns the local node connectivity: the minimum number of nodes that must be removed to break all paths from source to `target` in `G`. + This algorithm is based on a fast approximation that gives a strict lower bound on the actual number of node independent paths between two nodes. It works for both directed and undirected graphs. +#### Input: + +* `source: Vertex (default=NULL)` ➑ Source node. +* `target: Vertex (default=NULL)` ➑ Target node. + +#### Output: + +* `connectivity: integer` ➑ Node connectivity of `G`, or local node connectivity if `source` and `target` are provided. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.node_connectivity(n, m) YIELD * +RETURN connectivity; +``` +### `node_expansion(s)` +Returns the node expansion of the set `S`. + The *node expansion* is the quotient of the size of the node boundary of `S` and the cardinality of `S`. +#### Input: + +* `s: List[Vertex]` ➑ A sequence of nodes in `G`. + +#### Output: + +* `node_expansion: double` ➑ The node expansion of the set `S`. + +#### Usage: +```cypher +MATCH (n:Label) +WITH COLLECT(n) AS s +CALL nxalg.node_expansion(s) YIELD * +RETURN node_expansion; +``` +### `non_randomness(k)` +Compute the non-randomness of graph `G`. +The first returned value `non_randomness` is the sum of non-randomness values of all edges within the graph (where the non-randomness of an edge tends to be small when the two nodes linked by that edge are from two different communities). +The second computed value `relative_non_randomness` is a relative measure that indicates to what extent graph `G` is different from random graphs in terms of probability. When it is close to 0, the graph tends to be more likely generated by an Erdos Renyi model. +#### Input: + +* `k: integer (default=NULL)` ➑ The number of communities in `G`. If `k` is not set, the function will use a default community detection algorithm to set it. + +#### Output: + +* `non_randomness: double` ➑ Non-randomness of a graph +* `relative_non_randomness: double` ➑ Relative non-randomness of a graph + +#### Usage: +```cypher +CALL nxalg.non_randomness() YIELD * +RETURN non_randomness, relative_non_randomness; +``` +### `pagerank(alpha, personalization, max_iter, tol, nstart, weight, dangling)` +Returns the PageRank of the nodes in the graph. + +PageRank computes a ranking of the nodes in the graph G based on the structure of the incoming links. It was originally designed as an algorithm to rank web pages. +#### Input: + +* `alpha: double (default=0.85)` ➑ Damping parameter for PageRank. +* `personalization: string (default=NULL)` ➑ The β€œpersonalization vector” consisting of a dictionary with a subset of graph nodes as a key and maps personalization value for each subset. At least one personalization value must be non-zero. If not specified, a nodes personalization value will be zero. By default, a uniform distribution is used. +* `max_iter: integer (default=100)` ➑ Maximum number of iterations in power method eigenvalue solver. +* `tol: double (default=1e-06)` ➑ Error tolerance used to check convergence in power method solver. +* `nstart: string (default=NULL)` ➑ Starting value of PageRank iteration for each node. +* `weight: string (default="weight")` ➑ Edge data key to use as weight. If `None`, weights are set to 1. +* `dangling: string (default=NULL)` ➑ The outedges to be assigned to any β€œdangling” nodes, i.e., nodes without any outedges. The dict key is the node the outedge points to and the dict value is the weight of that outedge. By default, dangling nodes are given outedges according to the personalization vector (uniform if not specified). This must be selected to result in an irreducible transition matrix. It may be common to have the dangling dict to be the same as the personalization dict. + +#### Output: + +* `node: Vertex` ➑ Vertex to calculate PageRank for. +* `rank: double` ➑ Node PageRank. + +#### Usage: +```cypher +CALL nxalg.pagerank() YIELD * +RETURN node, rank; +``` +### `reciprocity(nodes)` +Compute the reciprocity in a directed graph. +The *reciprocity* of a directed graph is defined as the ratio of the number of edges pointing in both directions to the total number of edges in the graph. +The reciprocity of a single node `u` is defined similarly, it is the ratio of the number of edges in both directions to the total number of edges attached to node `u`. +#### Input: + +* `nodes: List[Vertex]` ➑ Compute reciprocity for nodes in this container. + +#### Output: + +* `node: Vertex` ➑ Node to calculate reciprocity. +* `reciprocity: double` ➑ Reciprocity value + +#### Usage: +```cypher +MATCH(n:Label) +WITH COLLECT(n) AS nodes +CALL nxalg.reciprocity(nodes) YIELD * +RETURN node, reciprocity; +``` +### `shortest_path(source, target, weight, method)` +Compute shortest paths in the graph. +#### Input: + +* `source: Vertex (default=NULL)` ➑ Starting node for the path. If not specified, compute shortest path lengths using all nodes as source nodes. +* `target: Vertex (default=NULL)` ➑ Ending node for the path. If not specified, compute shortest path lengths using all nodes as target nodes. +* `weight: string (default=NULL)` ➑ If `None`, every edge has weight/distance/cost 1. If a string, use this edge attribute as the edge weight. Any edge attribute not present defaults to 1. +* `method: string (default="dijkstra")` ➑ The algorithm to use to compute the path length. Supported options: β€˜dijkstra’, β€˜bellman-ford’. Other inputs produce a ValueError. If `weight` is `None`, unweighted graph methods are used and this suggestion is ignored. + +#### Output: + +* `source: Vertex` ➑ Source node. +* `target: Vertex` ➑ Target node. +* `path: List[Vertex]` ➑ All returned paths include both the `source` and `target` in the path. If the `source` and `target` are both specified, return a single list of nodes in a shortest path from the `source` to the `target`. If only the `source` is specified, return a dictionary keyed by targets with a list of nodes in a shortest path from the `source` to one of the targets. If only the `target` is specified, return a dictionary keyed by sources with a list of nodes in a shortest path from one of the sources to the `target`. If neither the `source` nor `target` are specified return a dictionary of dictionaries with `path[source][target]=[list of nodes in path]`. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.shortest_path(n, m) YIELD * +RETURN source, target, path; +``` +### `shortest_path_length(source, target, weight, method)` +Compute shortest path lengths in the graph. +#### Input: + +* `source: Vertex (default=NULL)` ➑ Starting node for the path. If not specified, compute shortest path lengths using all nodes as source nodes. +* `target: Vertex (default=NULL)` ➑ Ending node for the path. If not specified, compute shortest path lengths using all nodes as target nodes. +* `weight: string (default=NULL)` ➑ If `None`, every edge has weight/distance/cost 1. If a string, use this edge attribute as the edge weight. Any edge attribute not present defaults to 1. +* `method: string (default="dijkstra")` ➑ The algorithm to use to compute the path length. Supported options: β€˜dijkstra’, β€˜bellman-ford’. Other inputs produce a ValueError. If `weight` is `None`, unweighted graph methods are used and this suggestion is ignored. + +#### Output: + +* `source: Vertex` ➑ Source node. +* `target: Vertex` ➑ Target node. +* `length: double` ➑ If the `source` and `target` are both specified, return the length of the shortest path from the `source` to the `target`. If only the `source` is specified, return a dict keyed by `target` to the shortest path length from the `source` to that `target`. If only the `target` is specified, return a dict keyed by `source` to the shortest path length from that `source` to the `target`. If neither the `source` nor `target` are specified, return an iterator over (source, dictionary) where dictionary is keyed by `target` to shortest path length from `source` to that `target`. + +#### Usage: +```cypher +MATCH (n:Label), (m:Label) +CALL nxalg.shortest_path_length(n, m) YIELD * +RETURN source, target, length; +``` +### `simple_cycles()` +Find simple cycles (elementary circuits) of a directed graph. + A *simple cycle*, or *elementary circuit*, is a closed path where no node appears twice. Two elementary circuits are distinct if they are not cyclic permutations of each other. + This is a nonrecursive, iterator/generator version of Johnson’s algorithm. There may be better algorithms for some cases. + +#### Output: + +* `cycles: List[List[Vertex]]` ➑ A list of elementary cycles in the graph. Each cycle is represented by a list of nodes in the cycle. + +#### Usage: +```cypher +CALL nxalg.simple_cycles() YIELD * +RETURN cycles; +``` + +### `strongly_connected_components()` +Returns nodes in strongly connected components of a graph. +#### Output: + +* `components: List[List[Vertex]]` ➑ A list of lists of nodes, one for each strongly connected component of `G`. + +#### Usage: +```cypher +CALL nxalg.strongly_connected_components() YIELD * +RETURN components; +``` +### `topological_sort()` +Returns nodes in topologically sorted order. + A *topological sort* is a non unique permutation of the nodes such that an edge from `u` to `v` implies that `u` appears before `v` in the topological sort order. +#### Output: + +* `nodes: List[Vertex]` ➑ A list of nodes in topological sorted order. + +#### Usage: +```cypher +CALL nxalg.topological_sort() YIELD * +RETURN nodes; +``` +### `triadic_census()` +Determines the triadic census of a directed graph. The *triadic census* is a count of how many of the 16 possible types of triads are present in a directed graph. + +#### Output: + +* `triad: string` ➑ Triad name. +* `count: integer` ➑ Number of occurrences as value. + +#### Usage: +```cypher +CALL nxalg.triadic_census() YIELD * +RETURN triad, count; +``` +### `voronoi_cells(center_nodes, weight)` +Returns the Voronoi cells centered at center_nodes with respect to the shortest-path distance metric. + If `C` is a set of nodes in the graph and `c` is an element of `C`, the *Voronoi cell* centered at a node `c` is the set of all nodes +`v` that are closer to `c` than to any other center node in `C` with respect to the shortest-path distance metric. + For directed graphs, this will compute the β€œoutward” Voronoi cells in which distance is measured from the center nodes to the target node. + +#### Input: + +* `center_nodes: List[Vertex]` ➑ A nonempty set of nodes in the graph `G` that represent the centers of the Voronoi cells. +* `weight: string (default=NULL)` ➑ The edge attribute (or an arbitrary function) representing the weight of an edge. This keyword argument is as described in the documentation for `networkx.multi_source_dijkstra_path`, for example. + +#### Output: + +* `center: Vertex` ➑ Vertex value of center_nodes. +* `cell: List[Vertex]` ➑ Partition of `G` closer to that center node. + +#### Usage: +```cypher +MATCH (n) +WITH COLLECT(n) AS center_nodes +CALL nxalg.voronoi_cells(center_nodes) YIELD * +RETURN center, cell; +``` +### `wiener_index(weight)` +Returns the Wiener index of the given graph. + The *Wiener index* of a graph is the sum of the shortest-path distances between each pair of reachable nodes. For pairs of nodes in undirected graphs, only one orientation of the pair is counted. +#### Input: + +* `weight: string (default=NULL)` ➑ The edge attribute to use as distance when computing shortest-path distances. This is passed directly to the +`networkx.shortest_path_length` function. + +#### Output: + +* `wiener_index: double` ➑ The Wiener index of the graph `G`. + +#### Usage: +```cypher +CALL nxalg.voronoi_cells() YIELD * +RETURN wiener_index; +``` \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/pagerank.md b/docs2/advanced-algorithms/available-algorithms/pagerank.md new file mode 100644 index 00000000000..9e4b7b59ccc --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/pagerank.md @@ -0,0 +1,147 @@ +--- +id: pagerank +title: pagerank +sidebar_label: pagerank +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +If we present nodes as pages and directed edges between them as links, the +**PageRank** algorithm outputs a probability distribution used to represent the +likelihood that a person randomly clicking on links will arrive at any +particular page. + +**PageRank** theory holds that an imaginary surfer who is randomly clicking on +links will eventually stop clicking. The probability, at any step, that the +person will continue randomly clicking on links is called a damping factor, +otherwise, the next page is chosen randomly among all pages. + +**PageRank** is computed iteratively using the following formula: + +``` +Rank(n, t + 1) = (1 - d) / number_of_nodes + + d * sum { Rank(in_neighbour_of_n, t) / + out_degree(in_neighbour_of_n)} +``` + +Where Rank(n, t) is **PageRank** of node n at iteration t. In the end, *rank* +values are **normalized** to sum 1 to form a probability distribution. + +The algorithm is implemented in such a way that all available **threads** are +used to calculate PageRank, mostly for scalability purposes. + +Default arguments are equal to default arguments in +[NetworkX](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html) +PageRank implementation. + +[![docs-source](https://img.shields.io/badge/source-pagerank-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/pagerank_module/pagerank_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **parallel** | + +## Procedures + + + +### `get(max_iterations, damping_factor, stop_epsilon)` + +#### Input: + +* `max_iterations: integer (default=100)` ➑ Maximum number of iterations within PageRank + algorithm. +* `damping_factor: double (default=0.85)` ➑ PageRanks damping factor. This is the + probability of continuing the random walk from a random node within the graph. +* `stop_epsilon: double (default=1e-5)` ➑ Value used to terminate the iterations of + PageRank. If change from one iteration to another is lower than + *stop_epsilon*, execution is stopped. + +#### Output: + +* `node` ➑ Node in the graph, for which PageRank is calculated. +* `rank` ➑ Normalized ranking of a node. Expresses the probability that a random + surfer will finish in a certain node by a random walk. + +#### Usage: + +```cypher +CALL pagerank.get() +YIELD node, rank; +``` + +## Example + + + + + + + + + +```cypher +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 5}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 6}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 7}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +CALL pagerank.get() +YIELD node, rank +RETURN node, rank; +``` + + + + +```plaintext ++-----------------+-----------------+ +| node | rank | ++-----------------+-----------------+ +| (:Node {id: 1}) | 0.0546896 | +| (:Node {id: 0}) | 0.333607 | +| (:Node {id: 2}) | 0.0546896 | +| (:Node {id: 3}) | 0.0546896 | +| (:Node {id: 4}) | 0.0546896 | +| (:Node {id: 5}) | 0.0546896 | +| (:Node {id: 6}) | 0.0546896 | +| (:Node {id: 7}) | 0.338255 | ++-----------------+-----------------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/pagerank_online.md b/docs2/advanced-algorithms/available-algorithms/pagerank_online.md new file mode 100644 index 00000000000..c9d28cb336e --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/pagerank_online.md @@ -0,0 +1,222 @@ +--- +id: pagerank-online +title: pagerank_online +sidebar_label: pagerank_online +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +**Online PageRank** is a streaming algorithm made for calculating +[PageRank](pagerank.md) in a graph streaming scenario. Incremental- local +changes are introduced in the algorithm to prevent users from recalculating +PageRank values each time a change occurs in the graph (something is added or +deleted). + +To make it as fast as possible, the online algorithm is only the approximation +of PageRank but carrying the same information - the likelihood of random walk +ending in a particular vertex. The work is based on "[Fast Incremental and +Personalized +PageRank](http://snap.stanford.edu/class/cs224w-readings/bahmani10pagerank.pdf)" +[^1], where authors are deeply focused on providing the streaming experience of +a highly popular graph algorithm. + +Approximating PageRank is done simply by exploring random walks and calculating +the frequency of a node within all walks. `R` walks are sampled by using a +random walk with a stopping probability of `eps`.Therefore, on average, walks +would have a length of `1/eps`. Approximative PageRank is based on the formula +below: + +``` +RankApprox(v) = X_v / (n * R / eps) +``` + +Where `X_v` is the number of walks where the node `v` appears. The theorem +written in the paper explains that RankApprox(v) is sharply concentrated around +its expectation, which is Rank(v). + +### Usage + +Online PageRank should be used in a specific way. To set the parameters, the +user should call a `set()` procedure. This function also sets the context of a +streaming algorithm. `get()` function only returns the resulting values stored +in a cache. Therefore, if you try to get values before first calling `set()`, +the run will fail with a proper message. + +To make the incremental flow, you should set the proper trigger. For that, we'll +use the `update()` function: + +```cypher +CREATE TRIGGER pagerank_trigger +(BEFORE | AFTER) COMMIT +EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + +Finally, the `reset()` function resets the context and enables the user to start +new runs. + +[^1] [Fast Incremental and Personalized +PageRank](http://snap.stanford.edu/class/cs224w-readings/bahmani10pagerank.pdf), +Bahman Bahmani et al. + +[![docs-source](https://img.shields.io/badge/source-pagerank_online-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/pagerank_module/pagerank_online_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **directed** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `set(walks_per_node, walk_stop_epsilon)` + +#### Input: + +- `walks_per_node: integer (default=10)` ➑ Number of sampled walks per node. +- `walk_stop_epsilon: double (default=0.1)` ➑ The probability of stopping when deriving + the random walk. On average, it will create walks of length `1 / + walk_stop_epsilon`. + +#### Output: + +- `node` ➑ Node in the graph, for which PageRank is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the probability that a random + surfer will finish in a certain node by a random walk. + +#### Usage: + +```cypher +CALL pagerank_online.set(100, 0.2) +YIELD node, rank; +``` + +### `get(max_iterations, damping_factor, stop_epsilon)` + +\* This should be used if the trigger has been set or a `set` function has +been called before adding changes to the graph. + +#### Output: + +- `node` ➑ Node in the graph, for which PageRank is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the probability that a random + surfer will finish in a certain node by a random walk. + +#### Usage: + +```cypher +CALL pagerank_online.get() +YIELD node, rank; +``` + +### `update(created_vertices, created_edges, deleted_vertices, deleted_edges)` + +#### Input: + +- `created_vertices` ➑ Vertices that were created in the last transaction. +- `created_edges` ➑ Edges created in a period from the last transaction. +- `deleted_vertices` ➑ Vertices deleted from the last transaction. +- `deleted_edges` ➑ Edges deleted from the last transaction. + +#### Output: + +- `node` ➑ Node in the graph, for which PageRank is calculated. +- `rank` ➑ Normalized ranking of a node. Expresses the probability that a random + surfer will finish in a certain node by a random walk. + +#### Usage: + +```cypher +CREATE TRIGGER pagerank_trigger +(BEFORE | AFTER) COMMIT +EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + +## Example + + + + + + + + + +```cypher +CALL pagerank_online.set(100, 0.2) YIELD *; + +CREATE TRIGGER pagerank_trigger +BEFORE COMMIT +EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) YIELD * +SET node.rank = rank; +``` + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 4}) MERGE (b:Node {id: 6}) CREATE (a)-[:RELATION]->(b); +``` + + + + +```cypher +MATCH (node) +RETURN node.id AS node_id, node.rank AS rank; +``` + + + + +```plaintext ++-----------+-----------+ +| node_id | rank | ++-----------+-----------+ +| 0 | 0.225225 | +| 1 | 0.225225 | +| 2 | 0.225225 | +| 3 | 0.0675676 | +| 4 | 0.0765766 | +| 5 | 0.0585586 | +| 6 | 0.121622 | ++-----------+-----------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/periodic.md b/docs2/advanced-algorithms/available-algorithms/periodic.md new file mode 100644 index 00000000000..26f77ae340d --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/periodic.md @@ -0,0 +1,137 @@ +--- +id: periodic +title: periodic +sidebar_label: periodic +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +The **periodic module** enables users to execute a query periodically in +batches. In this case, the name periodic doesn't indicate that the query is +executed after a time interval, but rather that, due to the complexity of the +query, the results of some input source are batched to speed up execution. + +:::caution + +As the results are batched and executed in different transactions, every +executed batch is committed by itself. If an issue occurs while running this +procedure, the already committed batches cannot be rolled back. + +::: + +[![docs-source](https://img.shields.io/badge/source-graph_util-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/cpp/graph_util_module) + +| Trait | Value | +| ------------------- | --------------------------------------------------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **C++** | +| **Parallelism** | **sequential** | + +### Procedures + +### `iterate(input_query, running_query, params)` + +#### Input: + +- `input_query: string` ➑ the input query which will yield the results that need to be batched +- `running_query: string` ➑ query which will be executed on the batched results +- `params: Map[string, string]` ➑ parameters for the procedure + - `batch_size: Integer` ➑ key specifying how many results should a batch contain + + +#### Output: + +- `success: boolean` ➑ `true` if the procedure executed successfully, `false` otherwise +- `number_of_executed_batches: Integer` ➑ number of executed batches (possibly a fraction of the full number if the procedure returned `success: false`) + +#### Usage: + +```cypher +CALL periodic.iterate( + "LOAD CSV FROM '/tmp/file.csv' WITH HEADER AS row RETURN row.node_id AS node_id, row.supernode_id AS supernode_id", + "MATCH (s:SuperNode {supernode_id: supernode_id}), (n:Node {node_id: node_id}) CREATE (s)-[:HAS_REL_TO]->(n)", + {batch_size: 5000}) +YIELD * RETURN *; +``` + +## Example + + + + + +```cypher +CREATE INDEX ON :SuperNode; +CREATE INDEX ON :SuperNode(supernode_id); +CREATE INDEX ON :Node; +CREATE INDEX ON :Node(node_id); + +CREATE (:SuperNode {supernode_id: 1}); +FOREACH (i IN range(1, 1000000) | CREATE (:Node {id: i})); +``` + + + + +```cypher +supernode_id,node_id +1,1 +1,2 +1,3 +1,4 +1,5 +1,6 +... +1,999998 +1,999999 +1,1000000 +``` + + + + +```cypher +CALL periodic.iterate( + "LOAD CSV FROM '/tmp/file.csv' WITH HEADER AS row RETURN row.node_id AS node_id, row.supernode_id AS supernode_id", + "MATCH (s:SuperNode {supernode_id: supernode_id}), (n:Node {node_id: node_id}) CREATE (s)-[:HAS_REL_TO]->(n)", + {batch_size: 5000}) +YIELD * RETURN *; +``` + + + + +```plaintext ++------------------+----------------------------+ +| success | number_of_executed_batches | ++------------------+----------------------------+ +| true | 200 | ++------------------+----------------------------+ + +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/set_cover.md b/docs2/advanced-algorithms/available-algorithms/set_cover.md new file mode 100644 index 00000000000..1f56df7abc9 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/set_cover.md @@ -0,0 +1,162 @@ +--- +id: set_cover +title: set_cover +sidebar_label: set_cover +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +**The Set Cover** problem is one of the problems in graph theory that tries to solve the least possible set of sets that covers all elements inside those sets. Given a set of *n* elements, and a collection of *m* sets containing them, the algorithm tries to identify the **smallest sub-collection** of sets whose union equal to all the elements. +It is *NP-complete*, however solvable with techniques such as constraint programming. The current algorithm uses *GEKKO* optimizer as a constraint programming solver. + +[![docs-source](https://img.shields.io/badge/source-set_cover-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/set_cover.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `cp_solve(element_vertexes, set_vertexes)` + +#### Input +The input itself represents an *element-set* pair with each row of the lists. +* `element_vertexes: List[Vertex]` ➑ List of element nodes in pairs +* `set_vertexes: List[Vertex]` ➑ List of set nodes in pairs + +#### Output + +* `containing_set` ➑ minimal set of sets in which all the elements are contained + +#### Usage + +```cypher +CALL set_cover.cp_solve([(:Point), (:Point)], [(:Set), (:Set)]) +YIELD containing_set; +``` + +## Example + + + + + + + + + + +```cypher +CREATE (e:AnimalSpecies {name: 'Snake'}); +CREATE (e:AnimalSpecies {name: 'Bear'}); +CREATE (e:AnimalSpecies {name: 'Falcon'}); +CREATE (e:AnimalSpecies {name: 'Beaver'}); +CREATE (e:AnimalSpecies {name: 'Fox'}); + +CREATE (s:NationalPark {name: 'Yosemite'}); +CREATE (s:NationalPark {name: 'Grand Canyon'}); +CREATE (s:NationalPark {name: 'Yellowstone'}); +CREATE (s:NationalPark {name: 'Glacier'}); +CREATE (s:NationalPark {name: 'Everglades'}); + +MATCH (e: AnimalSpecies {name: 'Snake'}), (s:NationalPark {name: 'Yosemite'}) +CREATE (e)-[:LIVES_IN]->(s); +MATCH (e: AnimalSpecies {name: 'Bear'}), (s:NationalPark {name: 'Yosemite'}) +CREATE (e)-[:LIVES_IN]->(s); +MATCH (e: AnimalSpecies {name: 'Falcon'}), (s:NationalPark {name: 'Yosemite'}) +CREATE (e)-[:LIVES_IN]->(s); +MATCH (e: AnimalSpecies {name: 'Beaver'}), (s:NationalPark {name: 'Yosemite'}) +CREATE (e)-[:LIVES_IN]->(s); + +MATCH (e: AnimalSpecies {name: 'Fox'}), (s:NationalPark {name: 'Yellowstone'}) +CREATE (e)-[:LIVES_IN]->(s); +MATCH (e: AnimalSpecies {name: 'Beaver'}), (s:NationalPark {name: 'Yellowstone'}) +CREATE (e)-[:LIVES_IN]->(s); + +MATCH (e: AnimalSpecies {name: 'Snake'}), (s:NationalPark {name: 'Glacier'}) +CREATE (e)-[:LIVES_IN]->(s); +MATCH (e: AnimalSpecies {name: 'Bear'}), (s:NationalPark {name: 'Glacier'}) +CREATE (e)-[:LIVES_IN]->(s); + +MATCH (e: AnimalSpecies {name: 'Falcon'}), (s:NationalPark {name: 'Everglades'}) +CREATE (e)-[:LIVES_IN]->(s); + +``` + + + + +```cypher +MATCH (e:AnimalSpecies)-[l:LIVES_IN]-(s:NationalPark) +WITH collect(e) AS animal_list, collect(s) AS park_list +CALL set_cover.cp_solve(animal_list, park_list) +YIELD containing_set +WITH containing_set AS national_park +MATCH (animal:AnimalSpecies)-[l:LIVES_IN]->(national_park) +RETURN animal, l, national_park; +``` + + + + + + + + + + + + +## `greedy(context, element_vertexes, set_vertexes)` + +Not bad, not terrible. +#### Input +The input itself represents an *element-set* pair with each row of the lists. +* `element_vertexes: List[Vertex]` ➑ List of element nodes in pairs +* `set_vertexes: List[Vertex]` ➑ List of set nodes in pairs + +#### Output + +* `containing_set` ➑ minimal set of sets in which all the elements are contained + +#### Usage + +```cypher +CALL set_cover.greedy([(:Point), (:Point)], [(:Set), (:Set)]) +YIELD containing_set; +``` diff --git a/docs2/advanced-algorithms/available-algorithms/temporal_graph_networks.md b/docs2/advanced-algorithms/available-algorithms/temporal_graph_networks.md new file mode 100644 index 00000000000..f552603898e --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/temporal_graph_networks.md @@ -0,0 +1,507 @@ +--- +id: temporal_graph_networks +title: temporal_graph_networks +sidebar_label: temporal_graph_networks +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + +{children} + +); + +The **temporal_graph_networks (TGNs)** are a type of [graph neural network +(GNN)](https://distill.pub/2021/gnn-intro/) for dynamic graphs. In recent years, +**GNNs** have become very popular due to their ability to perform a wide variety +of machine learning tasks on graphs, such as link prediction, node +classification, and so on. This rise started with [Graph convolutional networks +(GCN)](https://arxiv.org/pdf/1609.02907.pdf) introduced by _Kipf et al._, +followed by [GraphSAGE](https://arxiv.org/pdf/1706.02216.pdf) introduced by +_Hamilton et al._, and recently a new method that introduces the **attention +mechanism** to graphs was presented - [Graph attention networks +(GAT)](https://arxiv.org/pdf/1710.10903.pdf?ref=https://githubhelp.com), by +_VeličkoviΔ‡ et al_. The last two methods offer a great possibility for inductive +learning. But they haven't been specifically developed to handle different +events occurring on graphs, such as **node features updates**, **node +deletion**, **edge deletion** and so on. These events happen regularly in +**real-world** examples such as the [Twitter +network](https://twitter.com/memgraphmage), where users update their profile, +delete their profile or just unfollow another user. + +In their work, Rossi et al. introduce [Temporal graph +networks](https://arxiv.org/abs/2006.10637), an architecture for machine +learning on streamed graphs, a rapidly-growing ML use case. + +### About the query module + +[![docs-source](https://img.shields.io/badge/source-temporal_graph_networks-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/tgn.py) + +What we have got in this module: + +- **link prediction** - train your **TGN** to predict new **links/edges** and + **node classification** - predict labels of nodes from graph structure and + **node/edge** features +- **graph attention layer** embedding calculation and **graph sum layer** + embedding layer calculation +- **mean** and **last** as message aggregators +- **mlp** and **identity(concatenation)** as message functions +- **gru** and **rnn** as memory updater +- **uniform** temporal neighborhood sampler +- **memory** store and **raw message store** + +as introduced by [Rossi et al.](https://emanuelerossi.co.uk/). + +The following means **you** can use **TGN** to **predict edges** or perform +**node classification** tasks, with **graph attention layer** or **graph sum +layer**, by using either **mean** or **last** as message aggregator, **mlp** or +**identity** as message function, and finally **gru** or **rnn** as memory +updater. + +In total, this gives _you_ **2βœ•2βœ•2βœ•2βœ•2 options**, that is, **32** options to +explore on your graph! :smile: + +If you want to explore our implementation, jump to +**[github/memgraph/mage](https://github.com/memgraph/mage)** and find +`python/tgn.py`. You can also jump to the [download +page](https://memgraph.com/download), download **Memgraph Platform** and fire up +**TGN**. We have prepared an **Amazon user-item** dataset on which you can +explore link prediction using a **[Jupyter +Notebook](https://github.com/memgraph/jupyter-memgraph-tutorials)** + +What is **not** implemented in the module: + +- **node update/deletion events** since they occur very rarely - although we + have prepared a codebase to easily integrate them. +- **edge deletion** events +- **time projection** embedding calculation and **identity** embedding + calculation since the author mentions they perform very poorly on all datasets + - although it is trivial to add a new layer + +Feel free to open a **[GitHub issue](https://github.com/memgraph/mage/issues)** +or start a discussion on **[Discord](https://discord.gg/memgraph)** if you want +to speed up development. + +How should **you** use the following module? Prepare Cypher queries, and split +them into a **train** set and **eval** set. Don't forget to call the `set_mode` +method. Every result is stored so that you can easily get it with the module. +The module reports the [mean average +precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html) +for every batch _training_ or _evaluation_ done. + +### Usage + +The following procedure is expected when using **TGN**: + +- set parameters by calling `set_params()` function +- set trigger on edge create event to call `update()` function +- start loading your `train` queries +- when `train` queries are loaded, switch **TGN** mode to `eval` by calling + `set_eval()` function +- load `eval` queries +- do a few more epochs of training and evaluation to get the best results by + calling `train_and_eval()` + +One thing is important to mention: by calling `set_eval()` function you change +the mode of **temporal graph networks** to `eval` mode. Any new edges which +arrive will **not** be used to `train` the module, but to `eval`. + +### Implementation details + +#### Query module + +The module is implemented using **[PyTorch](https://pytorch.org/)**. From the +input (`mgp.Edge` and `mgp.Vertex` labels), `edge features` and `node features` +are extracted. With a trigger set, the `update` query module procedure will +parse all new edges and extract the information the **TGN** needs to do batch by +batch processing. + +On the following piece of code, _you_ can see what is extracted from edges while +the **batch** is filling up. When the current processing batch size reaches +`batch size` (predefined in `set()`), we **forward** the extracted information +to the **TGN**, which extends `torch.nn.Module`. + +```python +@dataclasses.dataclass +class QueryModuleTGNBatch: + current_batch_size: int + sources: np.array + destinations: np.array + timestamps: np.array + edge_idxs: np.array + node_features: Dict[int, torch.Tensor] + edge_features: Dict[int, torch.Tensor] + batch_size: int + labels: np.array +``` + +#### Processing one batch + +```python + self._process_previous_batches() + + graph_data = self._get_graph_data( + np.concatenate([sources.copy(), destinations.copy()], dtype=int), + np.concatenate([timestamps, timestamps]), + ) + + embeddings = self.tgn_net(graph_data) + + ... process negative edges in a similar way + + self._process_current_batch( + sources, destinations, node_features, edge_features, edge_idxs, timestamps + ) +``` + +Our `torch.nn.Module` is organized as follows: + +- processing previous batches - as in the _[research + paper](https://arxiv.org/abs/2006.10637)_ this will include new computation of + messages collected for each node with a **message function**, aggregation of + messages for each node with a **message aggregator** and finally updating of + each node's memory with a **memory updater** +- afterward, we create a computation graph used by the **graph attention layer** + or **graph sum layer** +- the final step includes processing the current batch, creating new + **interaction** or **node events**, and updating the **raw message store** + with new **events** + +The process repeats: as we get new edges in a batch, the batch fills, and the +new edges are forwarded to the **TGN** and so on. + +:::info + +This **MAGE** module is still in its early stage. We intend to use it only for +**learning** activities. The current state of the module is that you need to +manually switch the TGN mode to `eval`. After the switch, incoming edges will be +used for **evaluation** only. If you wish to make it production-ready, make sure +to either open a **[GitHub issue](https://github.com/memgraph/mage/issues)** or +drop us a comment on **[Discord](https://discord.gg/memgraph)**. Also, consider +throwing us a :star: so we can continue to do even better work. + +::: + +| Trait | Value | +| ------------------- | -------------------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **directed** | +| **Edge weights** | **weighted/unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `set_params(params)` + +We have defined `default` value for each of the parameters. If you wish to +change any of them, call the method with the defined new value. + +#### Input: + +- `params: mgp.Map` ➑ a dictionary containing the following parameters: + +| Name | Type | Default | Description | +| -------------------------- | ------- | ----------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| learning_type | String | `self_supervised` | `self_supervised` or `supervised` depending on if you want to predict edges or node labels | +| batch_size | Integer | 200 | size of batch to process by TGN, recommended size **200** | +| num_of_layers | Integer | 2 | number of layers of graph neural network, **2** is the optimal size, GNNs perform worse with more layers in terms of time needed to train, but the gain in accuracy is not significant | +| layer_type | String | `graph_attn` | `graph_attn` or `graph_sum` layer type as defined in the original paper | +| memory_dimension | Integer | 100 | dimension of memory tensor of each node | +| time_dimension | Integer | 100 | dimension of time vector from `time2vec` paper | +| num_edge_features | Integer | 50 | number of edge features we will use from each edge | +| num_node_features | Integer | 50 | number of expected node features | +| message_dimension | Integer | 100 | dimension of the message, only used if you use MLP as the message function type, otherwise ignored | +| num_neighbors | Integer | 15 | number of sampled neighbors | +| edge_message_function_type | String | `identity` | message function type, `identity` for concatenation or `mlp` for projection | +| message_aggregator_type | String | `last` | message aggregator type, `mean` or `last` | +| memory_updater_type | String | `gru` | memory updater type, `gru` or `rnn` | +| num_attention_heads | Integer | 1 | number of attention heads used if you define `graph_attn` as layer type | +| learning_rate | Float | 1e-4 | learning rate for `adam` optimizer | +| weight_decay | Float | 5e-5 | weight decay used in `adam` optimizer | +| device_type | String | `cuda` | type of device you want to use for training - `cuda` or `cpu` | +| node_features_property | String | `features` | name of features property on nodes from which we read features | +| edge_features_property | String | `features` | name of features property on edges from which we read features | +| node_label_property | String | `label` | name of label property on nodes from which we read features | + +#### Usage: + +```cypher + CALL tgn.set_params({learning_type:'self_supervised', batch_size:200, num_of_layers:2, + layer_type:'graph_attn',memory_dimension:20, time_dimension:50, + num_edge_features:20, num_node_features:20, message_dimension:100, + num_neighbors:15, edge_message_function_type:'identity', + message_aggregator_type:'last', memory_updater_type:'gru', num_attention_heads:1}); +``` + +### `update(edges)` + +This function scrapes data from edges, including `edge_features` and +`node_features` if they exist, and fills up the batch. If the batch is ready the +**TGN** will process it and be ready to accept new incoming edges. + +#### Input: + +- `edges: mgp.List[mgp.Edges]` ➑ List of edges to preprocess (that arrive in a + stream to Memgraph). If a batch is full, `train` or `eval` starts, depending + on the mode. + +#### Usage: + +There are a few options here: + +The most convenient one is to create a +**[trigger](https://memgraph.com/docs/memgraph/reference-guide/triggers)** so +that every time an edge is added to the graph, the trigger calls the procedure +and makes an update. + +```cypher +CREATE TRIGGER create_embeddings ON --> CREATE BEFORE COMMIT +EXECUTE CALL tgn.update(createdEdges) RETURN 1; +``` + +The second option is to add all the edges and then call the algorithm with them: + +```cypher +MATCH (n)-[e]->(m) +WITH COLLECT(e) as edges +CALL tgn.update(edges) RETURN 1; +``` + +### `get()` + +Get calculated embeddings for each vertex. + +#### Output: + +- `node: mgp.Vertex` ➑ Vertex (node) in Memgraph. +- `embedding: mgp.List[float]` ➑ Low-dimensional representation of node in form + of graph embedding. + +#### Usage: + +```cypher +CALL tgn.get() YIELD * RETURN *; +``` + +### `set_eval()` + +Change **TGN** mode to `eval`. + +#### Usage: + +```cypher +CALL tgn.set_eval() YIELD *; +``` + +### `get_results()` + +This method will return `results` for every batch you did `train` or `eval` on, +as well as `average_precision`, and `batch_process_time`. Epoch count starts +from 1. + +#### Output: + +- `epoch_num:mgp.Number` ➑ The number of `train` or `eval` epochs. +- `batch_num:mgp.Number` ➑ The number of batches per `train` or `eval` epoch. +- `batch_process_time:mgp.Number` ➑ Time needed to process a batch. +- `average_precision:mgp.Number` ➑ Mean average precision on the current batch. +- `batch_type:string` ➑ A string indicating whether `train` or `eval` is performed + on the batch. + +#### Usage: + +```cypher +CALL tgn.get_results() YIELD * RETURN *; +``` + +### `train_and_eval(num_epochs)` + +The purpose of this method is to do additional training rounds on `train` edges +and `eval` on evaluation edges. + +#### Input: + +- `num_epochs: integer` ➑ Perform additional epoch training and evaluation **after** + the stream is done. + +#### Output: + +- `epoch_num: integer` ➑ The epoch of the batch for which performance statistics + will be returned. +- `batch_num: integer` ➑ The number of the batch for which performance statistics + will be returned. +- `batch_process_time: float` ➑ Processing time in seconds for a batch. +- `average_precision:mgp.Number` ➑ Mean average precision on the current batch. +- `batch_type:string` ➑ Whether we performed `train` or `eval` on the batch. + +#### Usage: + +```cypher +CALL tgn.train_and_eval(10) YIELD * RETURN *; +``` + + +### `predict_link_score(vertex_1, vertex_2)` + +The purpose of this method is to get the link prediction score for two vertices in graph if you have been +training `TGN` for the link prediction task. + +#### Input: + +- `src: mgp.Vertex` ➑ Source vertex of the link prediction +- `dest: mgp.Vertex` ➑ Destination vertex of the link prediction + +#### Output: + +- `prediction: mgp.Number` ➑ Float number between 0 and 1, likelihood of link between `source` vertex and `destination` +vertex. + +#### Usage: + +```cypher +MATCH (n:User) +WITH n +LIMIT 1 +MATCH (m:Item) +OPTIONAL MATCH (n)-[r]->(m) + WHERE r is null +CALL tgn.predict_link_score(n,m) YIELD * +RETURN n,m, prediction; +``` + +## Example + + + + + + + + + +```cypher +CALL tgn.set_params({learning_type:'self_supervised', batch_size:2, num_of_layers:1, + layer_type:'graph_attn',memory_dimension:100, time_dimension:100, + num_edge_features:20, num_node_features:20, message_dimension:100, + num_neighbors:10, edge_message_function_type:'identity', + message_aggregator_type:'last', memory_updater_type:'gru', num_attention_heads:1}); +``` + + + + +```cypher +CREATE TRIGGER create_embeddings ON --> CREATE BEFORE COMMIT +EXECUTE CALL tgn.update(createdEdges) RETURN 1; +``` + + + + +```cypher +MERGE (n:Node {id: 1}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 2}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 10}) MERGE (m:Node {id: 5}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 5}) MERGE (m:Node {id: 2}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 7}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 7}) MERGE (m:Node {id: 3}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 3}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 9}) MERGE (m:Node {id: 8}) CREATE (n)-[:RELATION]->(m); +``` + + + + +```cypher +CALL tgn.set_eval() YIELD *; + +``` + + + + +```cypher +MERGE (n:Node {id: 8}) MERGE (m:Node {id: 4}) CREATE (n)-[:RELATION]->(m); +MERGE (n:Node {id: 4}) MERGE (m:Node {id: 6}) CREATE (n)-[:RELATION]->(m); +``` + + + + +```cypher + CALL tgn.train_and_eval(5) YIELD * +``` + + + + +```cypher + CALL tgn.get_results() YIELD epoch_num, batch_num, average_precision, batch_process_time, batch_type + RETURN epoch_num, batch_num, average_precision, batch_type, batch_process_time; +``` + + + + +```plaintext ++--------------------+--------------------+--------------------+--------------------+--------------------+ +| epoch_num | batch_num | average_precision | batch_type | batch_process_time | ++--------------------+--------------------+--------------------+--------------------+--------------------+ +| 1 | 1 | 0.5 | "Train" | 0.05 | +| 1 | 2 | 0.42 | "Eval" | 0.02 | +| 2 | 1 | 0.83 | "Train" | 0.03 | +| 2 | 2 | 0.5 | "Train" | 0.04 | +| 2 | 3 | 0.5 | "Train" | 0.04 | +| 2 | 4 | 0.58 | "Train" | 0.04 | +| 2 | 5 | 0.83 | "Eval" | 0.02 | +| 3 | 1 | 0.5 | "Train" | 0.03 | +| 3 | 2 | 0.75 | "Train" | 0.03 | +| 3 | 3 | 0.83 | "Train" | 0.03 | +| 3 | 4 | 1 | "Train" | 0.04 | +| 3 | 5 | 0.83 | "Eval" | 0.02 | +| 4 | 1 | 0.5 | "Train" | 0.03 | +| 4 | 2 | 0.58 | "Train" | 0.03 | +| 4 | 3 | 1 | "Train" | 0.03 | +| 4 | 4 | 1 | "Train" | 0.04 | +| 4 | 5 | 1 | "Eval" | 0.02 | +| 5 | 1 | 0.83 | "Train" | 0.03 | +| 5 | 2 | 0.58 | "Train" | 0.03 | +| 5 | 3 | 1 | "Train" | 0.03 | +| 5 | 4 | 1 | "Train" | 0.03 | +| 5 | 5 | 0.83 | "Eval" | 0.02 | +| 6 | 1 | 0.58 | "Train" | 0.03 | +| 6 | 2 | 0.83 | "Train" | 0.03 | +| 6 | 3 | 1 | "Train" | 0.03 | +| 6 | 4 | 1 | "Train" | 0.03 | +| 6 | 5 | 1 | "Eval" | 0.01 | ++--------------------+--------------------+--------------------+--------------------+--------------------+ +``` + + + diff --git a/docs2/advanced-algorithms/available-algorithms/tsp.md b/docs2/advanced-algorithms/available-algorithms/tsp.md new file mode 100644 index 00000000000..4fd92902295 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/tsp.md @@ -0,0 +1,132 @@ +--- +id: tsp +title: tsp +sidebar_label: tsp +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +TSP or "Travelling salesman problem" is one of the well-known problems in graph theory. The goal of the problem is to find the shortest route that visits each node once, starting and finishing from the same node, given the distance between each one of them. It is an NP-hard problem in optimization and therefore there exists no exact solution. Here implemented are trivial, *greedy* and *k-approx* methods that find the solution within a *k-bound* of the optimal one. That means that solution is not going to be more than *k* times worse than the best possible. The algorithm uses the distance calculator to determine the distance between points, and works only with geographical locations, meaning each node needs to have its *lat* and *lng* property. + +```cypher +(location:Location {lat: 44.1194, lng: 15.2314}) +``` + +[![docs-source](https://img.shields.io/badge/source-tsp-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/tsp.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `solve(points, method)` + +#### Input: + +* `points: List[Vertex]` ➑ List of points to calculate TSP on. Required to have *lng* and *lat* properties. +* `method: string (default=1.5_approx)` ➑ Method used for optimization. Can be either ***1.5_approx***, ***2_approx*** or ***greedy*** + +#### Output: + +* `sources: List[Vertex]` ➑ List of elements from 1st to (n-1)-th element +* `destinations: List[Vertex]` ➑ List of elements from 2nd to n-th element +The pairs of them represent individual edges between 2 nodes in the graph. + +#### Usage: +```cypher +MATCH (n:Location) +WITH COLLECT(n) as locations +CALL tsp_module.solve(points) YIELD sources, destinations; +``` + +## Example + + + + + + + + + + + +```cypher +CREATE (location:Location {name: 'Zagreb', lat: 45.8150, lng: 15.9819}); +CREATE (location:Location {name: 'Split', lat: 43.5081, lng: 16.4402}); +CREATE (location:Location {name: 'Rijeka', lat: 45.3271, lng: 14.4422}); +CREATE (location:Location {name: 'Osijek', lat: 45.5550, lng: 18.6955}); +CREATE (location:Location {name: 'Zadar', lat: 44.1194, lng: 15.2314}); +``` + + + + + +```cypher +MATCH (n:Location) +WITH COLLECT(n) AS locations +CALL tsp.solve(locations, "1.5_approx") YIELD sources, destinations +WITH EXTRACT(i IN RANGE(0, SIZE(sources) - 1) | [sources[i], destinations[i]]) AS path +UNWIND path as edge +WITH edge[0] AS from, edge[1] AS to +CREATE (from)-[path:PATH]->(to) +RETURN from, to, path; +``` + + + + + + +```plaintext ++----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+ +| from | to | path | ++----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+ +| (:Location {lat: 45.815, lng: 15.9819, name: "Zagreb"}) | (:Location {lat: 45.555, lng: 18.6955, name: "Osijek"}) | [:PATH] | +| (:Location {lat: 45.555, lng: 18.6955, name: "Osijek"}) | (:Location {lat: 43.5081, lng: 16.4402, name: "Split"}) | [:PATH] | +| (:Location {lat: 43.5081, lng: 16.4402, name: "Split"}) | (:Location {lat: 44.1194, lng: 15.2314, name: "Zadar"}) | [:PATH] | +| (:Location {lat: 44.1194, lng: 15.2314, name: "Zadar"}) | (:Location {lat: 45.3271, lng: 14.4422, name: "Rijeka"}) | [:PATH] | +| (:Location {lat: 45.3271, lng: 14.4422, name: "Rijeka"}) | (:Location {lat: 45.815, lng: 15.9819, name: "Zagreb"}) | [:PATH] | ++----------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+ +``` + + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/union_find.md b/docs2/advanced-algorithms/available-algorithms/union_find.md new file mode 100644 index 00000000000..d127b49f5db --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/union_find.md @@ -0,0 +1,138 @@ +--- +id: union_find +title: union_find +sidebar_label: union_find +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +Analysis of connected components is a common task in graph analytics. + +By using a disjoint-set data structure that keeps track of them, the algorithm implemented in this module enables the user to quickly check whether a pair of given nodes is in the same or different connected component. +A check on a pair of nodes is effectively executed in O(1) time. + +The implementation of the disjoint-set data structure and its operations uses the *union by rank* and *path splitting* optimizations described in "[Worst-case Analysis of Set Union Algorithms](https://dl.acm.org/doi/10.1145/62.2160)" [^1], and presented with examples [here](https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf). + +[^1] [Worst-case Analysis of Set Union Algorithms](https://dl.acm.org/doi/10.1145/62.2160), Robert E. Tarjan and Jan van Leeuwen + +[![docs-source](https://img.shields.io/badge/source-union_find-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/union_find.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `connected(nodes1, nodes2, mode, update)` + +#### Input: + +* `nodes1: Union[Vertex, List[Vertex]]` ➑ First value (or list thereof) in connectedness calculation. +* `nodes2: Union[Vertex, List[Vertex]]` ➑ Second value (or list thereof) in connectedness calculation. +* `mode: string (default="pairwise")` ➑ Mode of combining `nodes1` and `nodes2`. Can be ***p*** or ***pairwise*** for a pairwise product, or ***c*** or ***cartesian*** for a Cartesian product of the arguments. Pairwise by default. +* `update: boolean (default=True)` ➑ Specifies whether the disjoint-set data structure should be reinitialized. Enabled by default. If the graph has been modified since the previous call of this procedure, turning `update` off ensures that the changes are not visible in the output. + +#### Output: + +* `node1: Vertex` ➑ Node in `nodes1`. +* `node2: Vertex` ➑ Node in `nodes2`. +* `connected: boolean` ➑ `True` if the above nodes are in the same connected component of the graph. + +#### Usage: +```cypher +MATCH (m:Node) +WITH collect(m) AS nodes1 +MATCH (n:Node) +WITH collect(n) AS nodes2, nodes1 +CALL union_find.connected(nodes1, nodes2) YIELD * +RETURN node1, node2, connected; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (m:Node) +WHERE m.id = 0 OR m.id = 1 +WITH collect(m) AS nodes1 +MATCH (n:Node) +WHERE n.id = 2 OR n.id = 3 +WITH collect(n) AS nodes2, nodes1 +CALL union_find.connected(nodes1, nodes2) YIELD * +RETURN node1, node2, connected; +``` + + + + + + +```plaintext ++-----------------+-----------------+-----------------+ +| node1 | node2 | connected | ++-----------------+-----------------+-----------------+ +| (:Node {id: 0}) | (:Node {id: 2}) | false | +| (:Node {id: 1}) | (:Node {id: 3}) | false | ++-----------------+-----------------+-----------------+ +``` + + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/uuid_generator.md b/docs2/advanced-algorithms/available-algorithms/uuid_generator.md new file mode 100644 index 00000000000..4ea53f8a72e --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/uuid_generator.md @@ -0,0 +1,110 @@ +--- +id: uuid-generator +title: uuid_generator +sidebar_label: uuid_generator +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +This module is used to generate string UUIDs which can be stored as properties +on nodes or edges. The underlying implementation makes use of the `uuid-dev` +library. When using the `uuid` module on Linux systems, the library can be +installed by running `sudo apt-get install uuid-dev`. + +[![docs-source](https://img.shields.io/badge/source-uuid-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/uuid_module/uuid_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **util** | +| **Implementation** | **C++** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get()` + +#### Output: + +* `uuid` ➑ Returns a UUID string. + + +#### Usage: +```cypher +MATCH (n) +CALL uuid_generator.get() YIELD uuid +SET n.uuid = uuid +RETURN n.uuid AS node_uuid; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +MATCH (n) +CALL uuid_generator.get() YIELD uuid +SET n.uuid = uuid +RETURN n.uuid AS node_uuid; +``` + + + + + + +```plaintext ++----------------------------------------+ +| node_uuid | ++----------------------------------------+ +| "ef4722b2-628b-4f93-8667-fc91134ed980" | +| "601faade-8c61-4dc3-a68a-693fed4ad40c" | +| "dc4283b8-90d6-402e-8fc0-f37f9959b593" | ++----------------------------------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/available-algorithms/vrp.md b/docs2/advanced-algorithms/available-algorithms/vrp.md new file mode 100644 index 00000000000..b1a00aa018b --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/vrp.md @@ -0,0 +1,165 @@ +--- +id: vrp +title: vrp +sidebar_label: vrp +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +VRP or **Vehicle Routing problem** is a generalization of the *Travelling Salesman Problem*. The goal of the problem is to find the shortest route that visits each node once, starting and finishing from the same node, called a depot, while using a fleet of vehicles. Each vehicle does not need to be at every location, it is enough that every node is visited by at least one vehicle. The problem is *NP-hard* in optimization, and therefore methods such as constraint programming, approximations or heuristics are a good approach for solving. The current implementation of VRP includes constraint programming with *GEKKO* solver which works with 1 depot and an arbitrary number of vehicles. The algorithm uses the distance calculator to determine the distance between driving points, and works only with geographical locations, meaning each node needs to have its *lat* and *lng* property. + +```cypher +(location:Location {lat: 44.1194, lng: 15.2314}) +``` +[![docs-source](https://img.shields.io/badge/source-vrp-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/python/vrp.py) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **module** | +| **Implementation** | **Python** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +:::note Too slow? + +If this algorithm implementation is too slow for your use case, [contact us](mailto:tech@memgraph.com) and request a rewrite to C++ ! + +::: + +## Procedures + + + +### `route(depot_node, number_of_vehicles)` + +#### Input: + +* `depot_node: Vertex` ➑ Depot node with its corresponding *lat* and *lng* coordinate properties. +* `number_of_vehicles: integer = 1` ➑ Designates how many vehicles are used. Set to 1 by default + +#### Output: + +* `from_vertex: Vertex` ➑ Beginning point of one part of the route +* `to_vertex: Vertex` ➑ Ending point of one part of the route +* `vehicle_id: integer` ➑ Vehicle ID that will drive the corresponding path (*from_vertex*)->(*to_vertex*) +All pairs of the route represent the full route with all vehicles used. + +#### Usage: +```cypher +MATCH (d:Depot) +CALL vrp.route(d) YIELD from_vertex, to_vertex, vehicle_id; +``` + +## Example + + + + + + + + + + +```cypher +CREATE (:Location {lat:45.81397494712325, lng:15.977107314009686}); +CREATE (:Location {lat:45.809786288641924, lng:15.969953021143715}); +CREATE (:Location {lat:45.801513169575195, lng:15.979868413090431}); +CREATE (:Location {lat:45.80062044456095, lng:15.971453134506456}); +CREATE (:Location {lat:45.80443233736649, lng:15.993114737391515}); +CREATE (:Location {lat:45.77165828306254, lng:15.943635971437576}); +CREATE (:Location {lat:45.785275159565806, lng:15.947448603375522}); +CREATE (:Location {lat:45.780581597098646, lng:15.935278141510148}); +CREATE (:Location {lat:45.82208303601525, lng:16.019498047049822}); +CREATE (:Depot {lat:45.7872369074369, lng:15.984469921454693}); +``` +Note: all vertices in graph need to be either Location or Depot. + + + + + +```cypher +MATCH (d:Depot) +CALL vrp.route(d) YIELD from_vertex, to_vertex, vehicle_id +CREATE (from_vertex)-[r:Route]->(to_vertex); + +MATCH (n)-[r:Route]->(m) +RETURN n, r, m; +``` + + + + + + + + + + + +```cypher +MATCH (d:Depot, 2) +CALL vrp.route(d) YIELD from_vertex, to_vertex, vehicle_id +CREATE (from_vertex)-[r:Route]->(to_vertex); + +MATCH (n)-[r:Route]->(m) +RETURN n, r, m; +``` + + + + + + + + + + + +```plaintext ++------------------------------------------+------------------------------------------+------------------------------------------+ +| from_vertex | to_vertex | vehicle_id | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| (:Depot {lat: 45.7872, lng: 15.9845}) | (:Location {lat: 45.7853, lng: 15.9474}) | 1 | +| (:Location {lat: 45.7853, lng: 15.9474}) | (:Location {lat: 45.7806, lng: 15.9353}) | 1 | +| (:Location {lat: 45.7806, lng: 15.9353}) | (:Location {lat: 45.7717, lng: 15.9436}) | 1 | +| (:Location {lat: 45.7717, lng: 15.9436}) | (:Location {lat: 45.814, lng: 15.9771}) | 1 | +| (:Location {lat: 45.814, lng: 15.9771}) | (:Location {lat: 45.8044, lng: 15.9931}) | 1 | +| (:Location {lat: 45.8044, lng: 15.9931}) | (:Location {lat: 45.8015, lng: 15.9799}) | 1 | +| (:Location {lat: 45.8015, lng: 15.9799}) | (:Location {lat: 45.8006, lng: 15.9715}) | 1 | +| (:Location {lat: 45.8006, lng: 15.9715}) | (:Location {lat: 45.8098, lng: 15.97}) | 1 | +| (:Location {lat: 45.8098, lng: 15.97}) | (:Depot {lat: 45.7872, lng: 15.9845}) | 1 | +| (:Depot {lat: 45.7872, lng: 15.9845}) | (:Location {lat: 45.8221, lng: 16.0195}) | 2 | +| (:Location {lat: 45.8221, lng: 16.0195}) | (:Depot {lat: 45.7872, lng: 15.9845}) | 2 | ++------------------------------------------+------------------------------------------+------------------------------------------+ +``` + + + \ No newline at end of file diff --git a/docs2/advanced-algorithms/available-algorithms/weakly_connected_components.md b/docs2/advanced-algorithms/available-algorithms/weakly_connected_components.md new file mode 100644 index 00000000000..301806a6ec8 --- /dev/null +++ b/docs2/advanced-algorithms/available-algorithms/weakly_connected_components.md @@ -0,0 +1,115 @@ +--- +id: weakly-connected-components +title: weakly_connected_components +sidebar_label: weakly_connected_components +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; +import RunOnSubgraph from '../../templates/_run_on_subgraph.mdx'; + +export const Highlight = ({children, color}) => ( + + {children} + +); + +The first analysis that is most often run on a graph is usually a search for disconnected components. +The algorithm implemented within this module does exactly that, it searches for different components of +the graph. Within components, nodes have connections toward each other, while between components there +is no edge that connects nodes from separate components. + +[![docs-source](https://img.shields.io/badge/source-weakly_connected_components-FB6E00?logo=github&style=for-the-badge)](https://github.com/memgraph/mage/blob/main/cpp/connectivity_module/connectivity_module.cpp) + +| Trait | Value | +| ------------------- | ----------------------------------------------------- | +| **Module type** | **algorithm** | +| **Implementation** | **C++** | +| **Graph direction** | **undirected** | +| **Edge weights** | **unweighted** | +| **Parallelism** | **sequential** | + +## Procedures + + + +### `get()` + +#### Output: + +* `node` ➑ Vertex object with all properties which is going to be related to the component ID it belongs. +* `component_id` ➑ Component ID for each node in the graph. Components are zero-indexed and there is no rule of how they will be appointed to node. The only guarantee is that divided components will have distinct component IDs. + +#### Usage: +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id; +``` + +## Example + + + + + + + + + + + +```cypher +MERGE (a:Node {id: 0}) MERGE (b:Node {id: 1}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 1}) MERGE (b:Node {id: 2}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 2}) MERGE (b:Node {id: 0}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 3}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 4}) CREATE (a)-[:RELATION]->(b); +MERGE (a:Node {id: 3}) MERGE (b:Node {id: 5}) CREATE (a)-[:RELATION]->(b); +``` + + + + + +```cypher +CALL weakly_connected_components.get() +YIELD node, component_id +RETURN node, component_id; +``` + + + + + + +```plaintext ++-----------------+-----------------+ +| node | component_id | ++-----------------+-----------------+ +| (:Node {id: 5}) | 1 | +| (:Node {id: 4}) | 1 | +| (:Node {id: 3}) | 1 | +| (:Node {id: 2}) | 0 | +| (:Node {id: 0}) | 0 | +| (:Node {id: 1}) | 0 | ++-----------------+-----------------+ +``` + + + + diff --git a/docs2/advanced-algorithms/built-in-graph-algorithms.md b/docs2/advanced-algorithms/built-in-graph-algorithms.md new file mode 100644 index 00000000000..8edf66cc1bf --- /dev/null +++ b/docs2/advanced-algorithms/built-in-graph-algorithms.md @@ -0,0 +1,401 @@ +--- +id: built-in-graph-algorithms +title: Built-in graph algorithms +sidebar_label: Built-in graph algorithms +--- + +Graph algorithms are a set of instructions that traverse (visits nodes of) a +graph and find specific nodes, paths, or a path between two nodes. Some of these +algorithms are built into Memgraph and don't require any additional libraries: + + * [Depth-first search (DFS)](#depth-first-search) + * [Breadth-first search (BFS)](#breadth-first-search) + * [Weighted shortest path (WSP)](#weighted-shortest-path) + * [All shortest paths (ASP)](#all-shortest-paths) + + +Below you can find examples of how to use these algorithms, and you can try them out +in the [Playground +sandbox](https://playground.memgraph.com/sandbox/europe-backpacking) using the +Europe backpacking dataset, or adjust them to the dataset of your choice. + +:::tip + +Memgraph has a lot more graph algorithms to offer besides these three, and they +are all a part of [MAGE](/mage) - Memgraph Advanced Graph Extensions, an +open-source repository that contains graph algorithms and modules written in the +form of query modules that can be used to tackle the most interesting and +challenging graph analytics problems. Check the [full list of algorithms](/mage/algorithms). + +::: + +## Depth-first search + +Depth-first search (DFS) is an algorithm for traversing through the graph. The +algorithm starts at the root node and explores each neighboring node as far as +possible. The moment it reaches a dead-end, it backtracks until it finds a new, +undiscovered node, then traverses from that node to find more undiscovered +nodes. In that way, the algorithm visits each node in the graph. + +DFS in Memgraph has been implemented based on the relationship expansion syntax +which allows it to find multiple relationships between two nodes if such exist. +Below are several examples of how to use the DFS in Memgraph. + +### Getting various results + +The following query will show all the paths from node `n` to node `m`: + +```cypher +MATCH path=(n {id: 0})-[*]->(m {id: 8}) +RETURN path; +``` + +To get the list of all relationships, add a variable in the square brackets and +return it as a result: + +```cypher +MATCH (n {id: 0})-[relationships *]->(m {id: 8}) +RETURN relationships; +``` + +To get the list of path nodes, use the `nodes()` function: + +```cypher +MATCH path=(n {id: 0})-[*]->(m {id: 8}) +RETURN path,nodes(path); +``` + +### Filtering by relationships type and direction + +You can filter relationships by type by defining the type after the relationship +list variable, and you decide the direction by adding or removing an arrow from +the dash. + +In the following example, the algorithm will traverse only across `CloseTo` type +of relationships: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *]->(m {id: 8}) +RETURN path,relationships; +``` + +You can also list multiple relationship types and allow your algorithm to traverse across any of them. + +In the following example, the algorithm will traverse across any of the `CloseTo`, `Borders` or the `Inside` type +of relationship: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo | :Borders | :Inside *]->(m {id: 8}) +RETURN path,relationships; +``` + +Be careful when using algorithms, especially DFS, without defining a direction. +Depending on the size of the dataset, the execution of the query can cause a +timeout. + +### Constraining the path's length + +The constraints on the path length are defined after the asterisk sign. The +following query will return all the results when the path is equal to or shorter +than 5 hops: + +```cypher +MATCH path=(n {id: 0})-[relationships * ..5]->(m {id: 8}) +RETURN path,relationships; +``` + +This query will return all the paths that are equal to or longer than 3, and +equal to or shorter than 5 hops: + +```cypher +MATCH path=(n {id: 0})-[relationships * 3..5]->(m {id: 8}) +RETURN path,relationships; +``` + +### Constraining the expansion based on property values + +Depth-first expansion allows an arbitrary expression filter that determines if +an expansion is allowed over a certain relationship to a certain node. The +filter is defined as a lambda function over `r` and `n`, which denotes the +relationship expanded over and node expanded to in the depth-first search. + +In the following example, expansion is allowed over relationships with an `eu_border` +property equal to `false` and to nodes with a `drinks_USD` property less than `15`: + +```cypher +MATCH path=(n {id: 0})-[* (r, n | r.eu_border = false AND n.drinks_USD < 15)]->(m {id: 8}) +RETURN path; +``` + +## Breadth-first search + +In breadth-first search (BFS) traversal starts from a single node, and the order of +visited nodes is decided based on nodes' breadth (distance from the source +node). This means that when a certain node is visited, it can be safely assumed +that all nodes that are fewer relationships away from the source node have +already been visited, resulting in the shortest path from the source node to the +newly visited node. + +BFS in Memgraph has been implemented based on the relationship expansion syntax. +There are a few benefits of the breadth-first expansion approach, instead of +a specialized function. For one, it is possible to inject expressions that +filter nodes and relationships along the path itself, not just the final +destination node. Furthermore, it's possible to find multiple paths to multiple +destination nodes. Also, it is possible to simply go through a node's +neighborhood in breadth-first manner. + +Currently, it isn't possible to get all the shortest paths to a single node using +Memgraph's breadth-first expansion. Below are several examples of how to use the BFS +in Memgraph. + +### Getting various results + +The following query will show the shortest path between nodes `n` and `m` as a +graph result. + +```cypher +MATCH path=(n {id: 0})-[*BFS]->(m {id: 8}) +RETURN path; +``` + +To get the list of relationships, add a variable before the `*BFS` and return +it as a result: + +```cypher +MATCH (n {id: 0})-[relationships *BFS]->(m {id: 8}) +RETURN relationships; +``` + +To get a list of path nodes use the `nodes()` function. You can then return the +results as a list, or use the `UNWIND` clause to return individual node +properties: + +```cypher +MATCH path=(n {id: 0})-[*BFS]->(m {id: 8}) +RETURN nodes(path); +``` + +### Filtering by relationships type and direction + +You can filter relationships by type by defining the type after the relationship +list variable, and you decide the direction by adding or removing an arrow from +the dash. + +In the following example, the algorithm will traverse only across `CloseTo` type +of relationships regardless of the direction: + +```cypher +MATCH (n {id: 0})-[relationships:CloseTo *BFS]-(m {id: 8}) +RETURN relationships; +``` + +### Constraining the path's length + +The constraints on the path length are defined after the *BFS. The following +query will return a result only if the path is equal to or shorter than 5 hops: + +```cypher +MATCH (n {id: 0})-[relationships:CloseTo *BFS ..5]->(m {id: 8}) +RETURN relationships; +``` + +The result will be returned only if the path is equal to or longer than 3, and +equal to or shorter than 5 hops: + +```cypher +MATCH (n {id: 0})-[relationships:CloseTo *BFS 3..5]-(m {id: 15}) +RETURN relationships; +``` + +### Constraining the expansion based on property values + +Breadth-first expansion allows an arbitrary expression filter that determines if +an expansion is allowed over a certain relationship to a certain node. The +filter is defined as a lambda function over `r` and `n`, which denotes the +relationship expanded over and node expanded to in the breadth-first search. + +In the following example, expansion is allowed over relationships with an `eu_border` +property equal to `false` and to nodes with a `drinks_USD` property less than `15`: + +```cypher +MATCH path=(n {id: 0})-[*BFS (r, n | r.eu_border = false AND n.drinks_USD < 15)]-(m {id: 8}) +RETURN path; +``` + +## Weighted shortest path + +In graph theory, the weighted shortest path problem is the problem of finding a path +between two nodes in a graph such that the sum of the weights of relationships +connecting nodes, or the sum of the weight of some node property on the path, is +minimized. + +One of the most important algorithms for finding weighted shortest paths is +**Dijkstra's algorithm**. In Memgraph it has been implemented based on the +relationship expansion syntax. In the brackets following the `*WSHORTEST` +algorithm definition, you need to define what relationship or node property +carries the weight, for example, `[*WSHORTEST (r, n | r.weight)]`. Below are +several examples of how to use the WSHORTEST in Memgraph. + +### Getting various results + +To find the weighted shortest path between nodes based on the value of the +`total_USD` node property, traversing only across `CloseTo` relationships and +return the result as a graph, use the following query: + +```cypher +MATCH path=(n {id: 0})-[:CloseTo *WSHORTEST (r, n | n.total_USD)]-(m {id: 15}) +RETURN path; +``` + +In the above example, the weight is a property of a node, but you can also +use weight of some relationship property: + +```cypher +MATCH path=(n {id: 0})-[:Type *WSHORTEST (r, n | r.weight)]-(m {id: 9}) +RETURN path; +``` + +To get the list of relationships, add a variable before the `*WSHORTEST` and +return it as a result: + +```cypher +MATCH (n {id: 0})-[relationships:CloseTo *WSHORTEST (r, n | n.total_USD)]-(m {id: 9}) +RETURN relationships; +``` + +To get the path nodes, use the `nodes()` function. You can then return the +results as a list, or use the `UNWIND` clause to return individual node +properties: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *WSHORTEST (r, n | n.total_USD)]-(m {id: 9}) +UNWIND (nodes(path)) AS node +RETURN node.id; +``` + +To get the total weight, add a variable at the end of the expansion expression: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *WSHORTEST (r, n | n.total_USD) total_weight]-(m {id: 9}) +RETURN nodes(path), total_weight; +``` + +Remember that in the case when weight is taken from the node property, the value +of the last node is not taken into the total weight. + +### Filtering by relationships type and direction + +You can filter relationships by type by defining the type after the relationship +list variable, and you decide the direction by adding or removing an arrow from +the dash. + +In the following example, the algorithm will traverse only across `CloseTo` type +of relationships: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *WSHORTEST (r, n | n.total_USD)]->(m {id: 46}) +RETURN relationships; +``` + +### Constraining the path's length + +Memgraph's implementation of the Dijkstra's algorithm uses a modified version of +this algorithm that can handle length restriction and is based on the relationship +expansion syntax. The length restriction parameter is optional, and when it's not +set, it can increase the complexity of algorithm execution. It is important to note +that the term "length" in this context denotes the number of traversed +relationships and not the sum of their weights. + +The following example will find the shortest path with a maximum length of 4 +relationships between nodes `n` and `m`. + +```cypher +MATCH path=(n {id: 0})-[:CloseTo *WSHORTEST 4 (r, n | n.total_USD) total_weight]-(m {id: 46}) +RETURN path,total_weight; +``` + +### Constraining the expansion based on property values + +Weighted shortest path expansion allows an arbitrary expression filter that +determines if an expansion is allowed over a certain relationship to a certain +node. The filter is defined as a lambda function over `r` and `n`, which denotes +the relationship expanded over and node expanded to in finding the weighted shortest path. + +In the following example, expansion is allowed over relationships with an `eu_border` +property equal to `false` and to nodes with a `drinks_USD` property less than `15`: + +```cypher +MATCH path=(n {id: 0})-[*WSHORTEST (r, n | n.total_USD) total_weight (r, n | r.eu_border = false AND n.drinks_USD < 15)]-(m {id: 46}) +RETURN path,total_weight; +``` + +## All shortest paths + +Finding all shortest paths is an expansion of the weighted shortest paths problem. The goal +of finding the shortest path is obtaining any minimum sum of weights on the path from one +node to the other. However, there could be multiple similar-weighted paths, and this algorithm +fetches them all. + +Weighted shortest path implementation returns only one resulting path from one +node to the other. Commonly, multiple shortest paths are flowing through different +routes. Syntax of obtaining all shortest paths is similar to the weighted shortest path +and boils down to calling `[*ALLSHORTEST (r, n | r.weight)]` where `r` and `n` define +the current expansion relationship and node respectively. + +### Getting various results + +The following query searches for all shortest paths with a default weight equal to 1: + +To showcase the characteristics of all shortest paths, we'll use a default weight equal to 1 in the next example. + +```cypher +MATCH path=(n {id: 0})-[:CloseTo *ALLSHORTEST (r, n | 1)]-(m {id: 15}) +RETURN path; +``` + +The query returns multiple results, all with 4 hops meaning that there are +multiple paths flowing from the source node to the destination node. + +The following is a weighted query based on the weight property on each visited relationship: + +```cypher +MATCH path=(n {id: 0})-[:Type *ALLSHORTEST (r, n | r.weight)]-(m {id: 5}) +RETURN path; +``` + +To obtain all relationship on all shortest paths, use the `relationships` function, unwind the results, and return unique edges so there is no duplicates in our output: + +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *ALLSHORTEST (r, n | n.total_USD)]-(m {id: 51}) +UNWIND (relationships(path)) AS edge +RETURN DISTINCT edge; +``` + +To get the total weight, add a variable at the end of the expansion expression: +```cypher +MATCH path=(n {id: 0})-[relationships:CloseTo *ALLSHORTEST (r, n | 1) total_weight]-(m {id: 9}) +RETURN nodes(path), total_weight; +``` + +### Constraining the path's length + +All shortest paths allows for upper bound path restriction. This addition significantly modifies the outcome of the algorithm if the unrestricted shortest path is obtained from a route with more hops than the set upper bound. Finding the all shortest paths with path restriction +boils down to finding the minimum weighted path with a maximum length of `upper_bound`. Upper bound is set to 4 just after the operator: + +```cypher +MATCH path=(n {id: 0})-[:CloseTo *ALLSHORTEST 4 (r, n | n.total_USD) total_weight]-(m {id: 46}) +RETURN path,total_weight; +``` + +### Constraining the expansion based on property values + +All shortest paths algorithm enables the usage of an expansions filter. To define it, you need to write a lambda function +with a filter expression over `r` (relationship) and `n` (node) variables as parameters. + +In the following example, expansion is allowed over relationships with a `eu_border` +property equal to `false` and to nodes with a `drinks_USD` property less than `20`: + +```cypher +MATCH path=(n {id: 0})-[*ALLSHORTEST (r, n | n.total_USD) total_weight (r, n | r.eu_border = false AND n.drinks_USD < 20)]-(m {id: 46}) +RETURN path, total_weight; +``` diff --git a/docs2/advanced-algorithms/install-mage.md b/docs2/advanced-algorithms/install-mage.md new file mode 100644 index 00000000000..7562beb4704 --- /dev/null +++ b/docs2/advanced-algorithms/install-mage.md @@ -0,0 +1,501 @@ +# Install MAGE graph algorithm library + +:::note + +The **Docker Hub** and **Docker build** installation methods only require you to +[install Docker](https://docs.docker.com/get-docker/) while the **Build from +source on Linux** method requires the installation of additional dependencies. + +::: + +## Memgraph compatibility + +With changes in Memgraph API, MAGE started to track version numbers. Check out +the table below which will reveal MAGE compatibility with Memgraph versions. + +| MAGE version | Memgraph version | +|--------------|-------------------| +| >= 1.6 | >= 2.5.2 | +| >= 1.4 | >= 2.4.0 | +| >= 1.0 | >= 2.0.0 | +| ^0 | >= 1.4.0 <= 1.6.1 | + +## Docker + +MAGE has prepared a Docker image on [**Docker +Hub**](https://hub.docker.com/r/memgraph/memgraph-mage) :whale: ready to be +pulled from +[memgraph/memgraph-mage](https://hub.docker.com/r/memgraph/memgraph-mage). + +Install MAGE: + +**1.** This is the only command you will need to make it run in your +environment: + +```shell +docker run -p 7687:7687 memgraph/memgraph-mage:latest +``` + +:::info + +You can download a specific version of MAGE. For example, if you want to +download version `1.1`, you should run the following command: + +```shell +docker run -p 7687:7687 memgraph/memgraph-mage:1.1 +``` + +You can also download a MAGE image equipped for development inside of Docker +containers: + +```shell +docker run -p 7687:7687 memgraph/memgraph-mage:1.1-dev +``` + +By running this command, you will get an image with the following tools +installed: Python3, Rust, Clang, Make, and CMake. This way, you can copy files +to the container, build them inside and import query modules in Memgraph. + +If you want to develop your own query modules, be sure to check the [Development +process for MAGE with +Docker](https://github.com/memgraph/mage#developing-mage-with-docker). + +::: + +## Docker build + +This way, you will create a Docker image directly from the [MAGE Github +repository](https://github.com/memgraph/mage) and won't have to pull it from +Docker Hub. You can: + +- download a [specific release](https://github.com/memgraph/mage/releases) from + the MAGE repository or +- clone the [repository](https://github.com/memgraph/mage) for the latest + version. + +If you downloaded a specific release, skip the first step. + +## Installing MAGE + +**1.** Download the MAGE source code from +**[GitHub](https://github.com/memgraph/mage)**: + +```shell +git clone --recurse-submodules https://github.com/memgraph/mage.git && cd mage +``` + +**2.** Build the **MAGE** tagged Docker image with the following command: + +```shell +docker build -t memgraph-mage . +``` + +**3.** Start Memgraph-MAGE with the following command: + +```shell +docker run --rm -p 7687:7687 --name mage memgraph-mage +``` + +:::info + +Now you can query Memgraph with any of the querying platforms like [Memgraph +Lab](https://memgraph.com/product/lab) or +[mgconsole](https://github.com/memgraph/mgconsole). + +If you made any changes while the **MAGE** Docker container was running, you +would need to stop the Docker container and rebuild the whole image. If you +don't want to repeat these steps each time, be sure to check the [Development +process for MAGE with +Docker](https://github.com/memgraph/mage#developing-mage-with-docker). + +::: + +## Developing MAGE with Docker + +When developing your query module, you need to load it inside Memgraph running +inside the Docker container. You can do that by [rebuilding the whole MAGE +image](#1-rebuild-the-whole-mage-image) or by [building it inside the Docker +container](#2-build-inside-the-docker-container). + +### 1. Rebuild the whole MAGE image + +This command will trigger the rebuild of the whole Docker image. Make sure that +you have added Python requirements inside `python/requirements.txt` file. + +**1.** Firstly, do the build of the **MAGE** image: + +``` +docker build -t memgraph-mage . +``` + +**2.** Now, start `memgraph-mage` image with the following command and enjoy +**your** own **MAGE**: + +``` +docker run --rm -p 7687:7687 --name mage memgraph-mage +``` + +### 2. Build inside the Docker container + +You can build a **MAGE** Docker image equipped for development. `Rust`, `Clang`, +`Python3-pip`, and everything else necessary for development will still be +inside the running container. This means that you can copy the **MAGE** +repository to the container and do the build inside the `mage` container. There +is no need to do the whole Docker image build again. + +**1.** To create `dev` **MAGE** image, run the following command: + +``` +docker build --target dev -t memgraph-mage:dev . +``` + +**2.** Then run the image with the following command: + +``` +docker run --rm -p 7687:7687 --name mage memgraph-mage:dev +``` + +**3.** Next, copy the files inside the container and do the build: + +**a)** First, you need to copy the files to the container named `mage` + +``` +docker cp . mage:/mage/ +``` + +**b)** Then, you need to position yourself inside the container as root: + +``` +docker exec -u root -it mage /bin/bash +``` + +:::note + +Note: If you have done the build locally, make sure to delete the directory +`cpp/build` because you might be dealing with different `architectures` or +problems with `CMakeCache.txt`. To delete it, run: + +`rm -rf cpp/build` + +::: + +**c)** After that, run build and copy `mage/dist` to +`/usr/lib/memgraph/query_modules`: + +``` +python3 setup build -p /usr/lib/memgraph/query_modules/ +``` + +**d)** Everything should be ready, and you can run the following command to exit +the container: + +``` +exit +``` + +:::note + +Note that query modules are loaded into Memgraph on startup, so if your instance +was already running, you would need to execute the following query inside one of +the [querying platforms](https://memgraph.com/docs/memgraph/connect-to-memgraph) +to load them: + +`CALL mg.load_all();` + +::: + +## MAGE Γ— NVIDIA cuGraph + +Follow this guide to install Memgraph with [**NVIDIA +cuGraph**](https://github.com/rapidsai/cugraph) GPU-powered graph algorithms. + +### Prerequisites + +:::info + +To be able to run cuGraph analytics, make sure you have compatible +infrastructure first. The exact system requirements are available at the +[**NVIDIA RAPIDS site**](https://rapids.ai/start.html#requirements), and include +an NVIDIA Pascal (or better) GPU and up-to-date CUDA & NVIDIA drivers. + +::: + +**Docker requirements :whale:** + +If running MAGE Γ— NVIDIA cuGraph in Docker, the following applies: + +- Official [**NVIDIA driver**](https://www.nvidia.com/download/index.aspx) for + your operating system. +- To run on NVIDIA-powered GPUs, RAPIDS requires Docker CE v19.03+ and + [**nvidia-container-toolkit**](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker) + installed. +- Legacy Docker CE v17-18 users require the installation of the + [**nvidia-docker2**]() + package. + +**Local build requirements:** + +If building MAGE Γ— NVIDIA cuGraph locally, these requirements apply (tested on +Ubuntu): + +- Official [**NVIDIA driver**](https://www.nvidia.com/download/index.aspx) for + your operating system. +- [**CMake**](https://cmake.org/) version above 3.20 +- [**NVIDIA CUDA developer toolkit**](https://developer.nvidia.com/cuda-toolkit) + – CUDA version 11.6 +- System dependencies: `libblas-dev`, `liblapack-dev`, `libboost-all-dev` +- [**NVIDIA NCCL communications library**](https://developer.nvidia.com/nccl) + +## Installing the Docker image from Docker Hub + +The simplest way of starting Memgraph with cuGraph GPU analytics is to download +the image from Docker Hub. Just pull the image, and get it running with these +simple commands: + +```shell +docker run --rm --gpus all -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage:1.3-cugraph-22.02-cuda-11.5 +``` + +Depending on your environment, different versions of MAGE/cuGraph/CUDA can be +installed: + +```shell +docker run --gpus all -p 7687:7687 -p 7444:7444 memgraph/memgraph-mage:${MAGE_VERSION}-cugraph-${CUGRAPH_VERSION}-cuda-${CUDA_VERSION} +``` + +To see the available versions, explore our Docker Hub organization and look for +the images tagged +[**memgraph-mage**](https://hub.docker.com/r/memgraph/memgraph-mage/tags). + +:::info + +The development image with cuGraph support is not available yet. If you want to +develop cuGraph-powered query modules in Docker, do not hesitate to [contact +us](https://memgraph.com/community) about it. + +::: + +### Building MAGE with NVIDIA cuGraph locally with Docker + +1. Download the MAGE source code from + [GitHub](https://github.com/memgraph/mage): + + ```shell + git clone https://github.com/memgraph/mage.git && cd mage + ``` + +2. Build the **MAGE Γ— cuGraph**-tagged Docker image: + + ```shell + docker build -f Dockerfile.cugraph -t memgraph-mage . + ``` + +3. Start Memgraph-MAGE with the following command: + ```shell + docker run --rm --gpus all -p 7687:7687 -p 7444:7444 --name mage memgraph-mage + ``` + +:::info + +You can now query Memgraph from querying platforms such as [Memgraph +Lab](https://memgraph.com/product/lab) or +[mgconsole](https://github.com/memgraph/mgconsole). + +If you made any changes while the Docker container was running, you need to stop +the container and rebuild the image. For a workaround, check [Development +process for MAGE with +Docker](/installation/docker-build.md#developing-mage-with-docker). + +::: + +### Installing MAGE natively from the source + +:::warning + +Make sure you have installed all prerequisites and dependencies before building +the MAGE Γ— NVIDIA cuGraph from source. + +::: + +1. Download the MAGE source code from + [**GitHub**](https://github.com/memgraph/mage) and run the `setup` script. It + will generate a `dist` directory with all the needed files: + ```shell + python3 setup build --gpu + ``` + + :::info + + The `--gpu` flag enables building the cuGraph dependencies and creating the + shared library with cuGraph algorithms that are loaded into Memgraph. + + ::: + +2. Copy the contents of the newly created `dist` directory to + `/usr/lib/memgraph/query_modules`: + + :::info + + To speed the installation up, you can specify a path for the setup script to + copy the built executables: + + ```shell + python3 setup build -p /usr/lib/memgraph/query_modules --gpu + ``` + + ::: + +3. Start Memgraph and enjoy MAGE Γ— cuGraph! + + :::info + + If your Memgraph instance was already running, execute the following query + inside one of the [**querying + platforms**](https://memgraph.com/docs/memgraph/connect-to-memgraph) to reload + the modules: + + ``` + CALL mg.load_all(); + ``` + + If the modules are still missing, restart the instance by running `systemctl + stop memgraph` and then `systemctl start memgraph`. + + For more about loading query modules, consult [**this + guide**](/usage/loading-modules.md). + + ::: + + ## Install MAGE on Linux from source + + This step is only suitable for Linux users as you need to [download and install +a Linux based Memgraph package](https://memgraph.com/download). To build from +source, you will need **Python3**, **Make**, **CMake**, **Clang**, **UUID** +and **Rust**. + +:::info + +You should not build MAGE from source and import the modules into Memgraph +running in a Docker container. You would need to build MAGE inside the same +container where Memgraph is running due to the possibility of different +architectures on your local machine and the Docker container. If you need to +work with Docker, we have prepared a Docker image equipped for local +development. Make sure to check the [Docker +build](/installation/docker-build.md) or [Docker +Hub](/installation/docker-hub.md) guides. + +::: + +## Installing MAGE + +### Prerequisits + +To install MAGE from source, first [install Rust and Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html). + +Then set up the machine by running the following commands: + +```bash +sudo apt-get update && apt-get install -y \ + libcurl4 `memgraph` \ + libpython${PY_VERSION} `memgraph` \ + libssl-dev `memgraph` \ + openssl `memgraph` \ + build-essential `mage-memgraph` \ + cmake `mage-memgraph` \ + curl `mage-memgraph` \ + g++ `mage-memgraph` \ + python3 `mage-memgraph` \ + python3-pip `mage-memgraph` \ + python3-setuptools `mage-memgraph` \ + python3-dev `mage-memgraph` \ + clang `mage-memgraph` \ + git `mage-memgraph` \ + --no-install-recommends \ + && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* +``` + +### Installation process +**1.** Download the MAGE source code from +**[GitHub](https://github.com/memgraph/mage)** and run the `setup` script. + +The script will generate a `dist` directory with all the needed files: + +```shell +python3 setup build -p /usr/lib/memgraph/query_modules +``` + +The command above will also copy the contents of the newly created `dist` directory to +`/usr/lib/memgraph/query_modules`. Memgraph loads query modules from this directory. + +**If something isn't installed properly, the `setup` script will stop the installation process. If you have any +questions, contact us on [Discord](https://discord.gg/memgraph).** + +:::warning + +Be sure you cloned the `mage` GitHub repository using the `--recurse-submodules` flag since it has incorporated Memgraph inside: + +```shell +git clone --recurse-submodules https://github.com/memgraph/mage.git +``` + +if you didn't, you can run following command to update submodules: + +```shell +git submodule update --init --recursive +``` +::: + + +**2.** Start Memgraph and enjoy **MAGE**! + +:::warning +Query modules are loaded into Memgraph on startup, so if your instance +was already running you will need to execute the following query inside one of +[querying platforms](https://memgraph.com/docs/memgraph/connect-to-memgraph) to +load them: + +``` +CALL mg.load_all(); +``` + +If your changes are not loaded, make sure to restart the instance by running +`systemctl stop memgraph` and `systemctl start memgraph`. + +If you want to find out more about loading query modules, visit [this +guide](/usage/loading-modules.md). + +::: + +### Advanced configuration + +#### 1. Set a different `query_modules` directory + +The `setup` script can set your local `mage/dist` directory or **any** other +directory as the **default** one in the Memgraph configuration file (flag +`--query-modules-directory` defined in `/etc/memgraph/memgraph.conf`). There are +a few options: + +**1.** Set `` as the **default** one: + +``` +python3 setup modules_storage -p +``` + +This way Memgraph will be looking for query modules inside ``. + +:::note + +Don't forget to copy the aforementioned files from `mage/dist` to +``. + +::: + +**2.** Set `/mage/dist` as the **default** one: + +``` +python3 setup modules_storage +``` + +If the **default** directory is `mage/dist` then you don't need to copy `*.so` +and `*.py` files from the `mage/dist` directory +to`/usr/lib/memgraph/query_modules` every time you run `build`. diff --git a/docs2/advanced-algorithms/run-algorithm.md b/docs2/advanced-algorithms/run-algorithm.md new file mode 100644 index 00000000000..e27878799a2 --- /dev/null +++ b/docs2/advanced-algorithms/run-algorithm.md @@ -0,0 +1,279 @@ +# Run algorithms + +## Load procedures + +Once you start Memgraph, it will attempt to load query modules from all `*.so` +and `*.py` files from the default (`/usr/lib/memgraph/query_modules` and +`/var/lib/memgraph/internal_modules`) directories. + +MAGE modules are located at +`/usr/lib/memgraph/query_modules` and custom modules developed via Memgraph Lab at +`/var/lib/memgraph/internal_modules`. + +Memgraph can load query modules from additional directories, if their path is +added to the `--query-modules-directory` flag in the main configuration file +(`/etc/memgraph/memgraph.conf`) or supplied as a command-line parameter (e.g. +when using Docker). + +If you are supplying the additional directory as a parameter, do not forget to +include the path to `/usr/lib/memgraph/query_modules`, otherwise queries from +that directory will not be loaded when Memgraph starts. + +:::caution + +When working with Docker and `memgraph-platform` image, you should pass +configuration flags inside of environment variables, for example: + +```terminal +docker run -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH="--query-modules-directory=/usr/lib/memgraph/query_modules,/usr/lib/memgraph/my_modules" memgraph/memgraph-platform` +``` + +If you are working with `memgraph` or `memgraph-mage` images you should pass +configuration options like this: + +```terminal +docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph --query-modules-directory=/usr/lib/memgraph/query_modules,/usr/lib/memgraph/my_modules +``` + +::: + +If a certain query module was added while Memgraph was already running, you need +to load it manually using the `mg.load("module_name")` procedure within a query: + +```cypher +CALL mg.load("py_example"); +``` + +If there is no response (no error message), the load was successful. + +If you want to reload all existing modules and load any newly added ones, use +`mg.load_all()`: + +```cypher +CALL mg.load_all(); +``` + +If there is no response (no error message), the load was successful. + +You can check if the query module has been loaded by using the `mg.procedures()` +procedure within a query: + +```cypher +CALL mg.procedures() YIELD *; +``` + +Once the MAGE query modules or any custom modules you developed have been +loaded into Memgraph, you can call them within queries using the following Cypher +syntax: + +```cypher +CALL module.procedure([optional parameter], arg1, "string_argument", ...) YIELD res1, res2, ...; +``` + +## Run procedures + +Every procedure has a first optional parameter and the rest of them depend on +the procedure you are trying to call. The optional parameter must be result of +the aggregation function +[`project()`](/cypher-manual/functions#aggregation-functions). If such a +parameter is provided, **all** operations will be executed on a projected graph. +Otherwise, you will work on the whole graph stored inside Memgraph. + +Each procedure returns zero or more records, where each record contains named +fields. The `YIELD` clause is used to select fields you are interested in or all +of them (*). If you are not interested in any fields, omit the `YIELD` clause. +The procedure will still run, but the record fields will not be stored in +variables. If you are trying to `YIELD` fields that are not a part of the +produced record, the query will result in an error. + +Procedures can be standalone as in the example above, or a part of a larger +query when we want the procedure to work on data the query is +producing. + +For example: + +```cypher +MATCH (node) CALL module.procedure(node) YIELD result RETURN *; +``` + +When the `CALL` clause is a part of a larger query, results from the query are +returned using the `RETURN` clause. If the `CALL` clause is followed by a clause +that only updates the data and doesn't read it, `RETURN` is unnecessary. It is +the Cypher convention that read-only queries need to end with a `RETURN`, while +queries that update something don't need to `RETURN` anything. + +Also, if the procedure itself writes into the database, all the rest of the +clauses in the query can only read from the database, and the `CALL` clause can +only be followed by the `YIELD` clause and/or `RETURN` clause. + +If a procedure returns a record with the same field name as some variable we +already have in the query, that field name can be aliased with some other name +using the `AS` sub-clause: + +```cypher +MATCH (result) CALL module.procedure(42) YIELD result AS procedure_result RETURN *; +``` + +## Run on subgraph + +The following how-to guide will demonstrate how to run graph analytics on sub-graphs. A portion of the graph is projected from the whole network persisted in Memgraph, and algorithms are run on that portion of the graph. + +If you need help with running MAGE modules and graph algorithms, check out the [how-to guide on that topic](/mage/how-to-guides/run-a-query-module.md). + +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/how-we-designed-and-implemented-graph-projection-feature) +[![Related - +How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/docs/gqlalchemy/how-to-guides/query-builder/graph-projection) + +### When not to run algorithms across the entire network and use the projection feature? +Executing any MAGE query module, the algorithm is executed on the whole network. +This is impractical in the following use cases: +- if the graph is heterogeneous, and you want to run the module only on specific labels +- if the graph is too large, and you only want to use the analytics to update only a portion of it +- the network contains multiple diverse data models and graphs, and running analytics on mixed graphs at once might yield unexpected results + +That is why Memgraph enables module execution on subgraphs and graph +projections. The insights yielded by graph algorithms can then affect only the necessary nodes in your graph, +making the data more consistent and up to its specifications. + +### Available graph projections + +Graph projection function in Memgraph is called [project()](/cypher-manual/functions#graph-projection-functions), +and it is used in the following way: + +```cypher +MATCH p=(n)-[r]->(m) +WITH project(p) AS subgraph +RETURN subgraph; +``` + +The path is specified first which denotes source and target nodes as well as relationships connecting them. +The function `project` then constructs a subgraph out of all the generated paths. + +Because the matched pattern actually includes all the nodes and the relationships in the graph, the result of this query is a projection of the whole graph. +To isolate a certain part of the graph, constraints need to be added to either labels, edge types, or properties, like in the query below: + +```cypher +MATCH p=(n:SpecificLabel)-[r:REL_TYPE]->(m:SpecificLabel) +WITH project(p) AS subgraph +RETURN subgraph; +``` + +The query above will return a subgraph of `SpecificLabel` nodes connected with the relationships of type `REL_TYPE`. + +### Calling query modules on graph projections + +If you want to run query modules on subgraphs, specify the projected graph as the first argument of the query module. + +```cypher +CALL module.procedure([optional graph parameter], argument1, argument2, ...) YIELD * RETURN *; +``` + +If the optional graph projection parameter is not included as the first argument, +the query module will be executed on the whole graph. + +### Practical example with Twitter influencers + +In this practical example, PageRank algorithm will be executed on a fictional Twitter dataset. +PageRank execution is grouped by the Twitter hashtag, and each Tweet can have a different number of retweets. + + + +```cypher +CREATE (n:Tweet {id: 1, hashtag: "#WorldCup", text: "Cool world cup! #WorldCup"}); +CREATE (n:Tweet {id: 2, hashtag: "#WorldCup", text: "The ball is round #WorldCup!"}); + +CREATE (n:Tweet {id: 3, hashtag: "#christmas", text: "The town is so shiny during #christmas!"}); +CREATE (n:Tweet {id: 4, hashtag: "#christmas", text: "Why didn't I get any presents for #christmas?"}); + +MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", text: "Croatia first this year!"}); +MATCH (n:Tweet {id: 1}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", text: "Farewall Dani Alves!"}); +MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", text: "This is the best WC ever!"}); +MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", text: "It's not so hot this time of the year in Qatar."}); +MATCH (n:Tweet {id: 2}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#WorldCup", text: "What are your predictions?"}); +MATCH (n:Tweet {id: 3}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", text: "Don't be a Grinch!"}); +MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", text: "I'll give you a present!"}); +MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", text: "Santa Claus will visit you tonight!"}); +MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", text: "This year save me from tears"}); +MATCH (n:Tweet {id: 4}) MERGE (n)<-[:RETWEETED]-(:Tweet {hashtag: "#christmas", text: "All I want for Christmas is youuuu"}); +``` + +#### Running PageRank on the whole network + +To run the PageRank algorithms available in the MAGE library, use the following query: + +```cypher +CALL pagerank.get() YIELD node, rank +SET node.rank = rank; +``` + +The PageRank algorithm will take into account all the nodes in the graph. +It doesn't really make sense to correlate tweets about World Cup with tweets about Christmas, +as they are thematically quite different and should be analyzed separately. + + +#### Running PageRank on a subgraph + +To run the PageRank on a subset of Christmas tweets only, that portion of the graph is +saved as a projection and used as the first argument of the query module: + +```cypher +MATCH p=(n:Tweet {hashtag: "#christmas"})-[r]->(m) +WITH project(p) AS christmas_graph +CALL pagerank.get(christmas_graph) YIELD node, rank +SET node.rank = rank +RETURN node.hashtag, node.text, rank +ORDER BY rank DESC; +``` + + + +The above query successfully updated the rank of the Christmas tweets only! Let's do the same +on the World Cup tweets by changing the value of the hashtag property: + +```cypher +MATCH p=(n:Tweet {hashtag: "#WorldCup"})-[r]->(m) +WITH project(p) AS christmas_graph +CALL pagerank.get(christmas_graph) YIELD node, rank +SET node.rank = rank +RETURN node.hashtag, node.text, rank +ORDER BY rank DESC; +``` + + + +## Managing query modules from Memgraph Lab + +You can inspect query modules in Memgraph Lab (v2.0 and newer). +Just navigate to **Query Modules**. + + + +There you can see all the loaded query modules, delete them, or see procedures +and transformations they define by clicking on the arrow icon. + +By expanding procedures you can receive information about the procedure's +signature, input and output variables and their data type, as well as the `CALL` +query you can run directly from the **Query Modules** view. + +Custom modules developed via Memgraph Lab are located at +`/var/lib/memgraph/internal_modules`. + + + +## Control procedure memory usage + +When running a procedure, Memgraph controls the maximum memory usage that the +procedure may consume during its execution. By default, the upper memory limit +when running a procedure is `100 MB`. If your query procedure requires more +memory to yield its results, you can increase the memory limit using the +following syntax: + +```cypher +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 KB YIELD result; +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 MB YIELD result; +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY UNLIMITED YIELD result; +``` + +The limit can either be specified to a specific value (either in `KB` or in +`MB`), or it can be set to unlimited. diff --git a/docs2/advanced-algorithms/utilize-networkx.md b/docs2/advanced-algorithms/utilize-networkx.md new file mode 100644 index 00000000000..525ab5a2b80 --- /dev/null +++ b/docs2/advanced-algorithms/utilize-networkx.md @@ -0,0 +1,136 @@ +# Utilize the NetworkX library with Memgraph + +NetworkX is a Python package for the creation, manipulation, and study of the structure, +dynamics, and functions of complex networks. Memgraph has [**`nxalg`**](/docs/mage/query-modules/python/nxalg) query module, which is a wrapper around NetworkX graph algorithms. It also provides **[Graph Analyzer](/mage/query-modules/python/graph-analyzer)** query module, which utilizes the NetworkX library. Besides that, you can create a custom query module that uses the NetworkX library. Through this how-to guide, you can find out: + +- [**How to run NetworkX algorithms in Memgraph Lab**](#how-to-run-networkx-algorithms-in-memgraph-lab) +- [**How to implement custom NetworkX module**](#how-to-implement-custom-networkx-module) + + +## How to run NetworkX algorithms in Memgraph Lab + +NetworkX algorithms are integrated into Memgraph as query modules inside Memgraph’s open-source graph extension library [MAGE](/docs/mage). Head over to the guide on [how to call MAGE procedures](/docs/mage/usage/calling-procedures) to find out how to call all Memgraph procedures, including those that utilize the NetworkX library. + +This how-to guide will show one simple example of calling a NetworkX procedure in Memgraph's visual interface Memgraph Lab. + +### 1. Connect to Memgraph + +First, run Memgraph using the Memgraph Platform Docker image, which includes both the MAGE library and Memgraph Lab. +To run the image, open a command-line interpreter and run the following Docker command: + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform:latest +``` + +[Connect to Memgraph](/docs/memgraph-lab/connect-to-memgraph#connecting-to-memgraph) via Memgraph Lab which is running at `localhost:3000`. + +Check out the [installation guide](/docs/memgraph/installation) for other installation options. If you wish to avoid the installation, you can also use [Memgraph Cloud](/docs/memgraph-cloud/). + +### 2. Load the dataset + +Head over to the **Datasets** section to load a dataset and load the **Europe backpacking dataset**. + + + +### 3. Run NetworkX algorithm + +Once the dataset is loaded, go to the **Query Modules** section and search for `nxalg` module. Click on the arrow next to the module name to **view module details**. + + + +The goal is to run the [`is_bipartite()`](/docs/mage/query-modules/python/nxalg#is_bipartite) procedure to check whether the graph is bipartite. + + + +Copy the query, go to the **Query Execution** tab and paste the query into the **Cypher Editor**: + +```cypher +CALL nxalg.is_bipartite() YIELD is_bipartite; +``` + +By clicking on the **Run Query** button, you can see that the Europe backpacking graph is not bipartite. + + + +In the same way, you can run other procedures from the `nxalg` module and the procedures from the `graph_analyzer` module, which can be found in the **Query Modules** section. + +## How to implement custom NetworkX module in Memgraph Lab + +Besides using already implemented modules, you can create your own module which utilizes the NetworkX library. +To learn how to implement a custom query module, head over to the [example of query module in Python](/docs/memgraph/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example#python-api). + +Since Memgraph is integrated with NetworkX, you can import NetworkX library inside Python code. This guide will show you how to create a new query module that utilizes the NetworkX library within Memgraph's visual interface Memgraph Lab. + +### 1. Connect to Memgraph + +First, run Memgraph using the Memgraph Platform Docker image, which includes both the MAGE library and Memgraph Lab. +To run the image, open a command-line interpreter and run the following Docker command: + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform:latest +``` + +[Connect to Memgraph](/docs/memgraph-lab/connect-to-memgraph#connecting-to-memgraph) via Memgraph Lab which is running at `localhost:3000`. + +Check out the [installation guide](/docs/memgraph/installation) for other installation options. If you wish to avoid the installation, you can also use [Memgraph Cloud](/docs/memgraph-cloud/). + +### 2. Load the dataset + +In the **Datasets** section, find and load the Karate club friendship network dataset. + + + +### 3. Implement a custom query module + +Once the dataset is loaded, go to the **Query modules** section. The goal is to create a community detection algorithm that can partition a network into multiple communities with the help of the NetworkX library. Click on the **New Module** and type in the module name, e.g., `communities`. + + + +There is a sample Python code on the next screen, inside the code editor. Select it, delete it and paste the following code: + +```python +import mgp +import networkx as nx +from networkx.algorithms import community +from mgp_networkx import MemgraphDiGraph + + +@mgp.read_proc +def detect( + ctx: mgp.ProcCtx + ) -> mgp.Record(communities=mgp.List[mgp.List[mgp.Vertex]]): + + networkxGraph = nx.DiGraph(MemgraphDiGraph(ctx=ctx)) + communities_generator = community.girvan_newman(networkxGraph) + + return mgp.Record(communities=[ + list(s) for s in next(communities_generator)]) +``` + +In the above code we are creating a read procedure which creates a NetworkX DiGraph from the MemgraphDiGraph object which takes the existing graph from the database. After that, we run the Girvan-Newman community algorithm and return its results. + +Here is what the code looks like in the code editor: + + + +Click **Save & close**, and head over to the **Query Execution** tab. + +### 4. Run the custom query module + +Copy and paste the following query to the **Cypher Editor**: + +```cypher +CALL communities.detect() +YIELD communities +UNWIND communities AS community +RETURN community; +``` + +After you click on **Run Query**, you can see the result, which consists of two lists. Each list represents one community. + + + + +## Where to next? + +If you want to learn more about using Memgraph with NetworkX, check out the [**Memgraph for NetworkX developers resources**](https://memgraph.com/memgraph-for-networkx?utm_source=networkx-guide&utm_medium=referral&utm_campaign=networkx_ppp&utm_term=docs%2Bhowtoutilize&utm_content=resources). If you are using GQLAlchemy to connect to Memgraph, learn [how to import NetworkX graph into Memgraph](/docs/gqlalchemy/how-to-guides/import-python-graphs#import-networkx-graph-into-memgraph). diff --git a/docs2/client-libraries/c-sharp.md b/docs2/client-libraries/c-sharp.md new file mode 100644 index 00000000000..4af0f25aceb --- /dev/null +++ b/docs2/client-libraries/c-sharp.md @@ -0,0 +1,154 @@ +--- +id: c-sharp +title: C# quick start +sidebar_label: C# +--- + +At the end of this guide, you will have created a simple . NET console **`Hello, +World!`** program that connects to the Memgraph database and executes simple +queries. + +## Prerequisites + +For this guide you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). + :::caution + In order for the Neo4j driver to work, you need [modify configuration + setting](/docs/memgraph/how-to-guides/config-logs) + `--bolt-server-name-for-init`. When running Memgraph, set + `--bolt-server-name-for-init=Neo4j/5.2.0`. If you use other version of Neo4j + driver, make sure to put the appropriate version number. + ::: +- A basic understanding of graph databases and the property graph model. + +## Driver + +Please note that the code samples in this guide utilize the +`Neo4j.Driver.Simple` package which implements a blocking interface around the +'main' driver. It should be used as a tool for getting started quickly. The +`Neo4j.Driver` package contains the official and complete driver for real-world +projects. The driver documentation can be found here: [Neo4j . NET +Driver](https://github.com/neo4j/neo4j-dotnet-driver). + +## Basic Setup + +We'll be using Visual Studio 2022 on Windows 10 to connect a simple . NET +console application to a running Memgraph instance. If you're using a different +IDE, the steps might be slightly different, but the code is either the same or +very similar.
+ +Let's jump in and connect a simple program to Memgraph. + +**1.** Open **Visual Studio** and create a new project.
**2.** Find and +select the **Console App (. NET Core)** template by using the search box.
+**3.** Name your project **_MemgraphApp_**, choose an appropriate location for +it, and click **Create**.
**4.** Select the **Tools > Manage NuGet +Packages** menu command.
**5.** Once the window opens, search for the +**Neo4j.Driver.Simple**.
**6.** Select the appropriate driver and click **Add +package**. + +Now, you should have the newest version of the driver installed and can proceed +to copy the following code into the **Program.cs** file. + +```csharp +using Neo4j.Driver; + +namespace MemgraphApp +{ + public class Program + { + public static void Main() + { + string message = "Hello, World!"; + + using var _driver = GraphDatabase.Driver("bolt://localhost:7687", AuthTokens.None); + using var session = _driver.Session(); + + var greeting = session.ExecuteWrite(tx => + { + var result = tx.Run("CREATE (n:FirstNode) " + + "SET n.message = $message " + + "RETURN 'Node ' + id(n) + ': ' + n.message", + new { message }); + return result.Single()[0].As(); + }); + Console.WriteLine(greeting); + } + } +} +``` + +Once you run the program, you should see an output similar to the following: + +``` +Node 1: Hello, World! +``` + +:::caution +To configure indexes and constraints properly, do one operation at a time and use the non-transactional API: +```cs +await session.RunAsync(query: "CREATE INDEX ON :FirstNode"); +await session.RunAsync(query: "CREATE INDEX ON :FirstNode(message)"); +``` +::: + +## Alternative Setup + +If you want to try out more complex operations, feel free to use the refactored +code below. + +```csharp +using Neo4j.Driver; + +namespace MemgraphApp +{ + public class Program : IDisposable + { + private readonly IDriver _driver; + + public Program(string uri, string user, string password) + { + _driver = GraphDatabase.Driver(uri, AuthTokens.Basic(user, password)); + } + + public void PrintGreeting(string message) + { + using (var session = _driver.Session()) + { + var greeting = session.ExecuteWrite(tx => + { + var result = tx.Run("CREATE (n:FirstNode) " + + "SET n.message = $message " + + "RETURN 'Node ' + id(n) + ': ' + n.message", + new { message }); + return result.Single()[0].As(); + }); + Console.WriteLine(greeting); + } + } + + public void Dispose() + { + _driver?.Dispose(); + } + + public static void Main() + { + using (var greeter = new Program("bolt://localhost:7687", "", "")) + { + greeter.PrintGreeting("Hello, World!"); + } + } + } +} +``` + + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** section to get an overview +of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/client-libraries.md b/docs2/client-libraries/client-libraries.md new file mode 100644 index 00000000000..e9fc098f85b --- /dev/null +++ b/docs2/client-libraries/client-libraries.md @@ -0,0 +1,23 @@ +--- +id: client-libraries +title: Client libraries +sidebar_label: Client libraries +--- + +Memgraph supports the following languages: + +- **[C#](/connect-to-memgraph/drivers/c-sharp.md)** +- **[C/C++](https://github.com/memgraph/mgclient)** +- **[Go](/connect-to-memgraph/drivers/go.md)** +- **[Haskell](https://github.com/zmactep/hasbolt)** +- **[Java](/connect-to-memgraph/drivers/java.md)** +- **[JavaScript](/connect-to-memgraph/drivers/javascript.md)** +- **[Node.js](/connect-to-memgraph/drivers/nodejs.md)** +- **[PHP](/connect-to-memgraph/drivers/php.md)** +- **[Python](/connect-to-memgraph/drivers/python.md)** +- **[Ruby](https://github.com/neo4jrb/neo4j)** +- **[Rust](/connect-to-memgraph/drivers/rust.md)** + +To query Memgraph programmatically use the [Bolt protocol](https://7687.org/). +The Bolt protocol was designed for efficient communication with graph databases +and **Memgraph supports versions 1, 4 adn 5.2** of the protocol. \ No newline at end of file diff --git a/docs2/client-libraries/go.md b/docs2/client-libraries/go.md new file mode 100644 index 00000000000..aa93cef7f79 --- /dev/null +++ b/docs2/client-libraries/go.md @@ -0,0 +1,135 @@ +--- +id: go +title: Go quick start +sidebar_label: Go +--- + +At the end of this guide, you will have created a simple Go **`Hello, World!`** +program that connects to the Memgraph database and executes simple queries. + +:::note Go driver + +You can find the official Go driver on +[GitHub](https://github.com/neo4j/neo4j-go-driver). + +::: + +:::note Go Object Modeled Graph (OMG) + +If you are looking for something similar to the Object Graph Mapper for Go, check out [`gograph`](https://github.com/prahaladd/gograph). This project aims to provide a mechanism to interact with any graph database using a unified and minimalistic API layer for the core operations on a graph database. It is an open-source project not maintained by the Memgraph team. + +::: + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- A basic understanding of graph databases and the property graph model. +- The newest version of **Go** [installed](https://golang.org/doc/install). + +## Basic Setup + +We'll be using a simple Go application to demonstrate how to connect to a +running Memgraph instance. + +Let's jump in and create our application. + +**1.** Create a new directory for your app, for example `/MyApp` and position +yourself in it.
**2.** Create a `program.go` file and add the following +code: + +```go +package main + +import ( + "fmt" + "github.com/neo4j/neo4j-go-driver/v5/neo4j" +) + +func main() { + dbUri := "bolt://localhost:7687" + driver, err := neo4j.NewDriver(dbUri, neo4j.BasicAuth("", "", "")) + if err != nil { + panic(err) + } + // Handle driver lifetime based on your application lifetime requirements driver's lifetime is usually + // bound by the application lifetime, which usually implies one driver instance per application + defer driver.Close() + item, err := insertItem(driver) + if err != nil { + panic(err) + } + fmt.Printf("%v\n", item.Message) +} + +func insertItem(driver neo4j.Driver) (*Item, error) { + // Sessions are short-lived, cheap to create and NOT thread safe. Typically create one or more sessions + // per request in your web application. Make sure to call Close on the session when done. + // For multi-database support, set sessionConfig.DatabaseName to requested database + // Session config will default to write mode, if only reads are to be used configure session for + // read mode. + session := driver.NewSession(neo4j.SessionConfig{}) + defer session.Close() + result, err := session.WriteTransaction(createItemFn) + if err != nil { + return nil, err + } + return result.(*Item), nil +} + +func createItemFn(tx neo4j.Transaction) (interface{}, error) { + records, err := tx.Run( + "CREATE (a:Greeting) SET a.message = $message RETURN 'Node ' + id(a) + ': ' + a.message", + map[string]interface{}{"message": "Hello, World!"}) + // In face of driver native errors, make sure to return them directly. + // Depending on the error, the driver may try to execute the function again. + if err != nil { + return nil, err + } + record, err := records.Single() + if err != nil { + return nil, err + } + // You can also retrieve values by name, with e.g. `id, found := record.Get("n.id")` + return &Item{ + Message: record.Values[0].(string), + }, nil +} + +type Item struct { + Message string +} +``` + +**3.** Create a `go.mod` file by running: + +``` +go mod init example.com/hello +``` + +**4.** Add the **Bolt driver** with the command: + +``` +go get github.com/neo4j/neo4j-go-driver/v5 +``` + +**5.** Run the app with the following command: + +``` +go run ./program.go +``` + +You should see an output similar to the following: + +``` +Node 0: Hello, World! +``` + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** section to get an overview +of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/java.md b/docs2/client-libraries/java.md new file mode 100644 index 00000000000..ee7ceca90b0 --- /dev/null +++ b/docs2/client-libraries/java.md @@ -0,0 +1,112 @@ +--- +id: java +title: Java quick start +sidebar_label: Java +--- + +At the end of this guide, you will have created a simple Java console **`Hello, +World!`** program that connects to the Memgraph database and executes simple +queries. + +## Prerequisites + +For this guide you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). + :::caution + In order for this driver to work, you need [modify configuration + setting](/docs/memgraph/how-to-guides/config-logs) + `--bolt-server-name-for-init`. When running Memgraph, set + `--bolt-server-name-for-init=Neo4j/`. + ::: +- Java 8, 11, 17 or 19 installed. + +## Basic Setup + +We'll be using Eclipse IDE 2022-12 on MacOS to connect a simple Java +console application to a running Memgraph instance using **Maven**. If you're +using a different IDE, the steps might be slightly different, but the code is +probably the same or very similar.
+ +Let's jump in and connect a simple program to Memgraph. + +**1.** Open **Eclipse** and create a new **Maven project**.
**2.** Select +the **Create a simple project** option.
**3.** For the **Group Id** field +put `com.memgraph.app` and for **Artifact Id** put `my-app` . Afterwards, click +the **Finish** button.
**4.** Open the `pom.xml` file and add the +dependencies inside your project: + +```java + + + org.neo4j.driver + neo4j-java-driver + 5.4.0 + + +``` + +**5.** Create the `HelloWorld.java` program and copy the following code: + +```java +import org.neo4j.driver.AuthTokens; +import org.neo4j.driver.Driver; +import org.neo4j.driver.GraphDatabase; +import org.neo4j.driver.Query; + +import static org.neo4j.driver.Values.parameters; + +public class HelloWorld implements AutoCloseable +{ + private final Driver driver; + + public HelloWorld( String uri, String user, String password ) + { + driver = GraphDatabase.driver( uri, AuthTokens.basic( user, password ) ); + } + + public void close() throws Exception + { + driver.close(); + } + + public void printGreeting( final String message ) + { + + try (var session = driver.session()) { + var hello = session.executeWrite( transaction -> { + var query = new Query("CREATE (a:Greeting) SET a.message = $message RETURN 'Node ' + id(a) + ': ' + a.message", parameters("message", message)); + var result = transaction.run(query); + return result.single().get(0).asString(); + }); + System.out.println(hello); + } + } + + public static void main( String... args ) throws Exception + { + try ( HelloWorld greeter = new HelloWorld( "bolt://localhost:7687", "", "" ) ) + { + greeter.printGreeting( "Hello, World!" ); + } + } +} +``` + +Once you run the program, you should see an output similar to the following: + +``` +Node 1: Hello, World! +``` + +:::info +Memgraph created [Bolt Java Driver](https://github.com/memgraph/bolt-java-driver) which can be used to connect to a running Memgraph instance. We still recommend you use the above mentioned driver. +::: + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** section to get an overview +of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/javascript.md b/docs2/client-libraries/javascript.md new file mode 100644 index 00000000000..1054dc7add1 --- /dev/null +++ b/docs2/client-libraries/javascript.md @@ -0,0 +1,160 @@ +--- +id: javascript +title: JavaScript quick start +sidebar_label: JavaScript +--- + +At the end of this guide, you will have created a JavaScript program that connects to the Memgraph database and executes simple +queries. + +:::note + +Running queries directly from a web browser is **not recommended** +because of additional requirements and possible performance issues. In other +words, we encourage you to use server-side libraries and clients for top +performance whenever possible. + +::: + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- A basic understanding of graph databases and the property graph model. + +## Basic Setup + +Let's jump in and connect a simple program to Memgraph. + +**1.** Create a new directory for your application, for example `/MyApp` and +position yourself in it. + +**2.** To make the actual program, create a `program.html` file and add the +following code: + +```html + + + + + Javascript Browser Example | Memgraph + + + +

Check console for Cypher query outputs...

+ + + +``` + +**3.** Open the `program.html` file in your browser and look for the output in +the console. + +You should see an output similar to the following: + +``` +Database cleared. +Record created. +Record matched. +Label: Person +Name: Alice +Age: 22 +``` + +## Transaction timeout + +Both automatic transactions and explicit transactions can be provided with a +timeout. + +### Automatic transaction +```js +const session: RxSession = driver.rxSession({ defaultAccessMode: 'READ' }); +session + .run("MATCH (), (), (), () RETURN 42 AS thing;", // NOTE: A long query + undefined, + { timeout: 50 } // NOTE: with a short timeout + ) + .records() + .pipe(finalize(() => { + session.close(); + driver.close(); + })) + .subscribe({ + next: record => { }, + complete: () => { console.info('complete'); process.exit(1); }, // UNEXPECTED + error: msg => console.error('Error:', msg.message), // NOTE: expected to error with server side timeout + }); + +``` + +### Explicit transaction +```js +const session = driver.rxSession({ defaultAccessMode: 'READ' }); +session + .beginTransaction({ timeout: 50 }) // NOTE: a short timeout + .pipe( + mergeMap(tx => + tx + .run('MATCH (),(),(),() RETURN 42 AS thing;') // NOTE: a long query + .records() + .pipe( + catchError(err => { tx.rollback(); throw err; }), + concatWith(EMPTY.pipe(finalize(() => tx.commit()))) + ) + ), + finalize(() => { session.close(); driver.close() }) + ) + .subscribe({ + next: record => { }, + complete: () => { console.info('complete'); process.exit(1); }, // UNEXPECTED + error: msg => console.error('Error:', msg.message), // NOTE: expected to error with server side timeout + }) +``` + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** +section to get an overview of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/nodejs.md b/docs2/client-libraries/nodejs.md new file mode 100644 index 00000000000..32a3ddaf13f --- /dev/null +++ b/docs2/client-libraries/nodejs.md @@ -0,0 +1,104 @@ +--- +id: nodejs +title: Node.js quick start +sidebar_label: Node.js +--- + +At the end of this guide, you will have created a simple Node.js **`Hello, World!`** program that connects to the Memgraph database and executes simple +queries. + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- A basic understanding of graph databases and the property graph model. +- The newest version of **Node.js** installed. Instructions on how to setup + Node. JS can be found on the [official + website](https://nodejs.org/en/download/). + +## Basic Setup + +We'll be using **Express.js** to demonstrate how to connect to a running +Memgraph instance. Express.js is a web application framework that enables us to +create complete Node.js applications. If you don't won't to use it, the steps +might be slightly different, but the code is either the same or very +similar. + +Let's jump in and connect a simple program to Memgraph. + +**1.** Create a new directory for your application, for example `/MyApp` and +position yourself in it.
**2.** Create a `package.json` file with the +command: + +``` +npm init +``` + +**3.** Install **Express.js** and the **Bolt driver** in the `/MyApp` directory +while adding them to the dependencies list. + +``` +npm install express --save +npm install neo4j-driver --save +``` + +**4.** To make the actual program, create a `program.js` file and add the +following code: + +```javascript +const express = require("express"); +const app = express(); +const port = 3000; +var neo4j = require("neo4j-driver"); + +app.get("/", async (req, res) => { + const driver = neo4j.driver("bolt://localhost:7687"); + const session = driver.session(); + + try { + const result = await session.writeTransaction((tx) => + tx.run( + 'CREATE (a:Greeting) SET a.message = $message RETURN "Node " + id(a) + ": " + a.message', + { + message: "Hello, World!", + } + ) + ); + + const singleRecord = result.records[0]; + const greeting = singleRecord.get(0); + + console.log(greeting); + } finally { + await session.close(); + } + + // on application exit: + await driver.close(); +}); + +app.listen(port, () => { + console.log(`Example app listening at http://localhost:${port}`); +}); +``` + +**5.** Run the application with the following command: + +``` +node program.js +``` + +You should see an output similar to the following: + +``` +Node 1: Hello, World! +``` + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** +section to get an overview of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/overview.md b/docs2/client-libraries/overview.md new file mode 100644 index 00000000000..82376f71e50 --- /dev/null +++ b/docs2/client-libraries/overview.md @@ -0,0 +1,24 @@ +--- +id: overview +title: Drivers overview +sidebar_label: Drivers overview +slug: /connect-to-memgraph/drivers +--- + +Memgraph supports the following languages: + +- **[C#](/connect-to-memgraph/drivers/c-sharp.md)** +- **[C/C++](https://github.com/memgraph/mgclient)** +- **[Go](/connect-to-memgraph/drivers/go.md)** +- **[Haskell](https://github.com/zmactep/hasbolt)** +- **[Java](/connect-to-memgraph/drivers/java.md)** +- **[JavaScript](/connect-to-memgraph/drivers/javascript.md)** +- **[Node.js](/connect-to-memgraph/drivers/nodejs.md)** +- **[PHP](/connect-to-memgraph/drivers/php.md)** +- **[Python](/connect-to-memgraph/drivers/python.md)** +- **[Ruby](https://github.com/neo4jrb/neo4j)** +- **[Rust](/connect-to-memgraph/drivers/rust.md)** + +To query Memgraph programmatically use the [Bolt protocol](https://7687.org/). +The Bolt protocol was designed for efficient communication with graph databases +and **Memgraph supports versions 1 and 4** of the protocol. \ No newline at end of file diff --git a/docs2/client-libraries/php.md b/docs2/client-libraries/php.md new file mode 100644 index 00000000000..df67d65e8f4 --- /dev/null +++ b/docs2/client-libraries/php.md @@ -0,0 +1,114 @@ +--- +id: php +title: PHP quick start +sidebar_label: PHP +--- + +At the end of this guide, you will have created a simple PHP **`Hello, World!`** +program that connects to the Memgraph database and executes simple queries. + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- A basic understanding of graph databases and the property graph model. +- **Composer**, a tool for dependency management in PHP. Instructions on how to + install Composer can be found [here](https://getcomposer.org/doc/00-intro.md). + +:::note + +We recommend using the **[Bolt driver](https://github.com/neo4j-php/Bolt)** for +PHP. + +::: + +## Basic Setup + +We'll be using a very simple **PHP script** in combination with **Composer** to +demonstrate how to connect to a running Memgraph instance. + +Let's jump in and connect a simple program to Memgraph. + +**1.** Create a new directory for your application, for example `/MyApp` and +position yourself in it.
**2.** Create a `index.php` file and add the +following code to it: + +```php +setProtocolVersions(4.1, 4, 3); +// Build and get protocol version instance which creates connection and executes handshake. +$protocol = $bolt->build(); +// Login to database with credentials. +$protocol->hello(\Bolt\helpers\Auth::basic('username', 'password')); + +// Pipeline two messages. One to execute query with parameters and second to pull records. +$protocol + ->run('CREATE (a:Greeting) SET a.message = $message RETURN id(a) AS nodeId, a.message AS message', ['message' => 'Hello, World!']) + ->pull(); + +// Server responses are waiting to be fetched through iterator. +$rows = iterator_to_array($protocol->getResponses(), false); +// Get content from requested record. +$row = $rows[1]->getContent(); + +echo 'Node ' . $row[0] . ' says: ' . $row[1]; +``` + +If you need SSL connection you have to replace `Socket` instance with `StreamSocket` and enable SSL with additional method. + +```php +$conn = new \Bolt\connection\StreamSocket('URI or IP', 7687); +$conn->setSslContextOptions([ + 'verify_peer' => true +]); +``` + +If you want to connect to Memgraph Cloud you have to set these parameters. + +```php +$conn = new \Bolt\connection\StreamSocket('URI or IP', 7687); +$conn->setSslContextOptions([ + 'peer_name' => 'Memgraph DB', + 'allow_self_signed' => true +]); +``` + +**3.** Run a composer command to get the required library: + +```sh +composer require stefanak-michal/memgraph-bolt-wrapper +``` + +It will auto create `composer.json` file. + +**4.** Start the application with the following command: + +``` +php -S localhost:4000 +``` + +Open you browser, enter `localhost:4000` as URL and you should see an output similar to the following: + +``` +Node 1 says: Hello, World! +``` + +## Where to next? + +Check out the [PHP Bolt driver repository](https://github.com/neo4j-php/Bolt) to learn more about using the PHP Bolt library. + +You can simplify usage of this library with wrapper [Memgraph Bolt wrapper](https://github.com/stefanak-michal/memgraph-bolt-wrapper). + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** section to get an overview +of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/python.md b/docs2/client-libraries/python.md new file mode 100644 index 00000000000..b3333d9271c --- /dev/null +++ b/docs2/client-libraries/python.md @@ -0,0 +1,90 @@ +--- +id: python +title: Python quick start +sidebar_label: Python +--- + +At the end of this guide, you will have created a simple Python **`Hello, +World!`** program that connects to the Memgraph database and executes simple +queries. + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- The [**GQLAlchemy client**](https://github.com/memgraph/gqlalchemy). A + Memgraph OGM (Object Graph Mapper) for the Python programming language. +- A basic understanding of graph databases and the property graph model. + +## Basic setup + +We'll be using a **Python program** to demonstrate how to connect to a running +Memgraph database instance.
+ +Let's jump in and connect a simple program to Memgraph. + +**1.** Create a new directory for your program, for example, `/memgraph_python` +and position yourself in it.
+ +**2.** Create a new Python script and name it `program.py` . Add the following +code to it: + +```python +from gqlalchemy import Memgraph + +# Make a connection to the database +memgraph = Memgraph(host='127.0.0.1', port=7687) + +# Delete all nodes and relationships +query = "MATCH (n) DETACH DELETE n" + +# Execute the query +memgraph.execute(query) + +# Create a node with the label FirstNode and message property with the value "Hello, World!" +query = """CREATE (n:FirstNode) + SET n.message = '{message}' + RETURN 'Node ' + id(n) + ': ' + n.message AS result""".format(message="Hello, World!") + +# Execute the query +results = memgraph.execute_and_fetch(query) + +# Print the first member +print(list(results)[0]['result']) +``` + +:::note Note for Docker users + +If the program fails to connect to a Memgraph instance that was started with +Docker, you may need to use a different IP address (not the default `localhost` +/ `127.0.0.1` ) to connect to the instance. + +You can find the **`CONTAINER_ID`** with `docker ps` and use it in the following +command to retrieve the address: + +``` +docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' CONTAINER_ID +``` + +::: + +**3.** Now, you can run the application with the following command: + +``` +python ./program.py +``` + +You should see an output similar to the following: + +``` +Node 1: Hello, World! +``` + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** section to get an overview +of all the functionalities Memgraph offers. diff --git a/docs2/client-libraries/rust.md b/docs2/client-libraries/rust.md new file mode 100644 index 00000000000..8539108fe4d --- /dev/null +++ b/docs2/client-libraries/rust.md @@ -0,0 +1,107 @@ +--- +id: rust +title: Rust quick start +sidebar_label: Rust +--- + +At the end of this guide, you will have created a Rust program that connects to the Memgraph database and executes simple +queries. + +## Prerequisites + +To follow this guide, you will need: + +- A **running Memgraph instance**. If you need to set up Memgraph, take a look + at the [Installation guide](/installation/overview.mdx). +- A locally installed [**rsmgclient + driver**](https://github.com/memgraph/rsmgclient). + + +## Basic setup + +Let's jump in and connect a simple program to connect to Memgraph. + +**1.** Create a new Rust project with the name **memgraph_rust** by running the +following command: + +``` +cargo new memgraph_rust --bin +``` + +**2.** Add the following line to the `Cargo.toml` file under the line +`[dependencies]` : + +``` +rsmgclient = "2.0.0" +``` + +**3.** To create the actual program, add the following code to the `src/main.rs` +file: + +```rust +use rsmgclient::{ConnectParams, Connection, MgError, Value, SSLMode}; + +fn execute_query() -> Result<(), MgError> { + // Connect to Memgraph. + let connect_params = ConnectParams { + host: Some(String::from("localhost")), + port: 7687, + sslmode: SSLMode::Disable, + ..Default::default() + }; + let mut connection = Connection::connect(&connect_params)?; + + // Create simple graph. + connection.execute_without_results( + "CREATE (p1:Person {name: 'Alice'})-[l1:Likes]->(m:Software {name: 'Memgraph'}) \ + CREATE (p2:Person {name: 'John'})-[l2:Likes]->(m);", + )?; + + // Fetch the graph. + let columns = connection.execute("MATCH (n)-[r]->(m) RETURN n, r, m;", None)?; + println!("Columns: {}", columns.join(", ")); + for record in connection.fetchall()? { + for value in record.values { + match value { + Value::Node(node) => print!("{}", node), + Value::Relationship(edge) => print!("-{}-", edge), + value => print!("{}", value), + } + } + println!(); + } + connection.commit()?; + + Ok(()) +} + +fn main() { + if let Err(error) = execute_query() { + panic!("{}", error) + } +} +``` + +**4.** Open a terminal, position yourself in the project root directory `/memgraph_rust` and run: +``` +cargo build +``` +and after that: +``` +cargo run +``` + +You should see an output similar to the following: + +``` +Columns: n, r, m +(:Person {'name': 'Alice'})-[:Likes {}]-(:Software {'name': 'Memgraph'}) +(:Person {'name': 'John'})-[:Likes {}]-(:Software {'name': 'Memgraph'}) +``` + +## Where to next? + +For real-world examples of how to use Memgraph, we suggest you take a look at +the **[Tutorials](/tutorials/overview.md)** page. You can also browse through +the **[How-to guides](/how-to-guides/overview.md)** +section to get an overview of all the functionalities Memgraph offers. diff --git a/docs2/configuration/configuration-settings.md b/docs2/configuration/configuration-settings.md new file mode 100644 index 00000000000..d0e0a491260 --- /dev/null +++ b/docs2/configuration/configuration-settings.md @@ -0,0 +1,366 @@ +--- +id: configuration-settings +title: Configuration settings +sidebar_label: Configuration settings +--- + +[![Related - How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/config-logs.md) + +The main Memgraph configuration file is available in +`/etc/memgraph/memgraph.conf` . You can modify that file to suit your specific +needs. Additional configuration can be specified by including another +configuration file, in a file pointed to by the `MEMGRAPH_CONFIG` environment +variable or by passing arguments on the command line. + +When working with Memgraph Platform Docker image, you should pass configuration +flags inside of environment variables. + +For example, you can start the MemgraphDB Docker image with `docker run +memgraph/memgraph --bolt-port=7687 --log-level=TRACE`, but you should start +Memgraph Platform with `docker run -p 7687:7687 -p 7444:7444 -p 3000:3000 -e +MEMGRAPH="--bolt-port=7687 --log-level=TRACE" memgraph/memgraph-platform`. + +Each configuration setting is in the form: `--setting-name=value` . + +You can check the current configuration by using the following query (privilege +level `CONFIG` is required): +```opencypher +SHOW CONFIG; +``` + +## Bolt + +| Flag | Description | Type | +| -------------- | -------------- | -------------- | +| --bolt-address=0.0.0.0 | IP address on which the Bolt server should listen. | `[string]` | +| --bolt-cert-file= | Certificate file which should be used for the Bolt server. | `[string]` | +| --bolt-key-file= | Key file which should be used for the Bolt server. | `[string]` | +| --bolt-num-workers= | Number of workers used by the Bolt server.
By default, this will be the number of processing units available on the machine. | `[int32]` | +| --bolt-port=7687 | Port on which the Bolt server should listen. | `[int32]` | +| --bolt-server-name-for-init= | Server name which the database should send to the client in the Bolt INIT message. | `[string]` | +| --bolt-session-inactivity-timeout=1800 | Time in seconds after which inactive Bolt sessions will be closed. | `[int32]` | + +:::note + +Memgraph does not limit the maximum amount of simultaneous sessions. +Transactions within all open sessions are served with a limited number of Bolt +workers simultaneously. + +::: + + +## Query + +| Flag | Description | Type | +| -------------- | -------------- | -------------- | +| --query-cost-planner=true | Use the cost-estimating query planner. | `[bool]` | +| --query-execution-timeout-sec=180 | Maximum allowed query execution time.
Queries exceeding this limit will be aborted. Value of 0 means no limit. | `[uint64]` | +| --query-max-plans=1000 | Maximum number of generated plans for a query. | `[uint64]` | +| --query-modules-directory=/usr/lib/memgraph/query_modules | Directory where modules with custom query procedures are stored. NOTE: Multiple comma-separated directories can be defined. | `[string]` | +| --query-plan-cache-ttl=60 | Time to live for cached query plans, in seconds. | `[int32]` | +| --query-vertex-count-to-expand-existing=10 | Maximum count of indexed vertices which provoke indexed lookup and then expand to existing,
instead of a regular expand. Default is 10, to turn off use -1. | `[int64]` | + +## Storage + +| Flag | Description | Type | +| -------------- | -------------- | -------------- | +| --storage-gc-cycle-sec=30 | Storage garbage collector interval (in seconds). | `[uint64]` | +| --storage-properties-on-edges=true | Controls whether edges have properties. | `[bool]` | +| --storage-recover-on-startup=true | Controls whether the storage recovers persisted data on startup. | `[bool]` | +| --storage-snapshot-interval-sec=300 | Storage snapshot creation interval (in seconds). Set to 0 to disable periodic snapshot creation. | `[uint64]` | +| --storage-snapshot-on-exit=true | Controls whether the storage creates another snapshot on exit. | `[bool]` | +| --storage-snapshot-retention-count=3 | The number of snapshots that should always be kept. | `[uint64]` | +| --storage-wal-enabled=true | Controls whether the storage uses write-ahead-logging. To enable WAL periodic snapshots must be enabled. | `[bool]` | +| --storage-wal-file-flush-every-n-tx=100000 | Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. | `[uint64]` | +| --storage-wal-file-size-kib=20480 | Minimum file size of each WAL file. | `[uint64]` | +| --storage-items-per-batch=1000000 | The number of edges and vertices stored in a batch in a snapshot file. | `[uint64]` | +| --storage-recovery-thread-count= | The number of threads used to recover persisted data from disk. | `[uint64]` | +| --storage-parallel-index-recovery=false | Controls whether the index creation can be done in a multithreaded fashion during recovery. | `[bool]` | + +## Streams + +| Flag | Description | Type | +| -------------- | -------------- | -------------- | +| --kafka-bootstrap-servers | List of Kafka brokers as a comma separated list of broker `host` or `host:port`. | `[string]` | +| --pulsar-service-url | The service URL that will allow Memgraph to locate the Pulsar cluster. | `[string]` | +| --stream-transaction-conflict-retries=30 | Number of times to retry a conflicting transaction of a stream. | `[uint32]` | +| --stream-transaction-retry-interval=500 | The interval to wait (measured in milliseconds) before retrying to execute again a conflicting transaction. | `[uint32]` | + +## Other + +| Flag | Description | Type | +| -------------- | -------------- | -------------- | +| --allow-load-csv=true | Controls whether LOAD CSV clause is allowed in queries. | `[bool]` | +| --also-log-to-stderr=false | Log messages go to stderr in addition to logfiles. | `[bool]` | +| --data-directory=/var/lib/memgraph | Path to directory in which to save all permanent data. | `[string]` | +| --init-file | Path to the CYPHERL file which contains queries that need to be executed before the Bolt server starts, such as creating users. | `[string]` | +| --init-data-file | Path to the CYPHERL file, which contains queries that need to be executed after the Bolt server starts. | `[string]` | +| --isolation-level=SNAPSHOT_ISOLATION | Isolation level used for the transactions. Allowed values: SNAPSHOT_ISOLATION, READ_COMMITTED, READ_UNCOMMITTED. | `[string]` | +| --log-file=/var/log/memgraph/memgraph.log | Path to where the log should be stored. | `[string]` | +| --log-level=WARNING | Minimum log level. Allowed values: TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL. | `[string]` | +| --memory-limit=0 | Total memory limit in MiB. Set to 0 to use the default values which are 100% of the physical memory if the swap is enabled and 90% of the physical memory otherwise. | `[uint64]` | +| --metrics-address | Host for HTTP server for exposing metrics. | `[string]` | +| --metrics-port | Port for HTTP server for exposing metrics. | `[uint64]` | +| --memory-warning-threshold=1024 | Memory warning threshold, in MB. If Memgraph detects there is less available RAM it will log a warning.
Set to 0 to disable. | `[uint64]` | +| --password-encryption-algorithm=bcrypt | Algorithm used for password encryption. Defaults to BCrypt. Allowed values: `bcrypt`, `sha256`, `sha256-multiple` (SHA256 with multiple iterations) | `[string]` | +| --replication-replica-check-delay-sec | The time duration in seconds between two replica checks/pings. If < 1, replicas will not be checked at all. The MAIN instance allocates a new thread for each REPLICA. | `[uint64]` | +| --replication-restore-state-on-startup | Set to `true` when initializing an instance to restore the replication role and configuration upon restart. | `[bool]` | +| --telemetry-enabled=true | Set to true to enable telemetry. We collect information about the running system (CPU and memory information), information about the database runtime (vertex and edge counts and resource usage), and aggregated statistics about some features of the database (e.g. how many times a feature is used) to allow for an easier improvement of the product. | `[bool]` | + +## Environment variables + +| Variable | Description | Type | +| -------------- | -------------- | -------------- | +| MEMGRAPH_USER | Username | `[string]` | +| MEMGRAPH_PASSWORD | User password | `[string]` | +| MEMGRAPH_PASSFILE | Path to file that contains username and password for creating user. Data in file should be in format `username:password` if your username or password contains `:` just add `\` before for example `us\:ername:password` | `[string]` | + +## Additional configuration inclusion + +You can define additional configuration files in the main configuration file or +within a Docker command in the terminal. Additional files are processed after +the main configuration file and they override the main configuration file. +Additional configuration files are specified with the `--flag-file` flag. + +Example: + + `--flag-file=another.conf` + + ## Set configuration flags + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/configuration.md) + +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; + +This how-to guide will show you how to change [configuration settings](/reference-guide/configuration.md) for +Memgraph and check the log files. + +Continue reading if you are using [Memgraph with Docker](#docker), or skip to +the [Linux chapter](#linux) if you are using MemgraphDB with **WSL**, +**Ubuntu**, **Debian**, or **RPM package**. + +## Docker + +Below you will find instructions on configuring Memgraph and [checking +logs](#accessing-logs) if you are using Memgraph with Docker. + +### Configuring Memgraph + +If you want a custom configuration to be in effect every time you run Memgraph, +[change the main configuration file](#file). + +If you want a certain configuration setting to be applied during this run only, +[pass the configuration option within the `docker run` command](#command). + +#### Changing the configuration file {#file} + +Begin with starting Memgraph and finding out the `CONTAINER ID`: + +**1.** Start Memgraph with a `docker run` command but be sure to include the +following flag `-v mg_etc:/etc/memgraph`. + +**2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker +container using the following command: + +```plaintext +docker ps +``` + +Now, you can choose to either modify the main configuration file outside of +Docker, or within Docker with a command-line text editor (such as **vim**). + + + + +To change the configuration file outside the Docker container continue with the +following steps: + +**3.** Place yourself in the directory where you want to copy the configuration +file. + +**4.** Copy the file from the container to your current directory with the +command: + +```plaintext +docker cp :/etc/memgraph/memgraph.conf memgraph.conf +``` + +Be sure to replace the `` parameter. + +The example below will copy the configuration file to the user's Desktop: + +```plaintext +C:\Users\Vlasta\Desktop>docker cp bb3de2634afe:/etc/memgraph/memgraph.conf memgraph.conf +``` + +**5.** Open the configuration file with a text editor. + +**6.** Modify the configuration file and save the changes. + +**7.** Copy the file from your current directory to the container with the +command: + +```plaintext +docker cp memgraph.conf :/etc/memgraph/memgraph.conf +``` + +Be sure to replace the `` parameter. + +The example below will replace the configuration file with the one from the +user's Desktop: + +```plaintext +C:\Users\Vlasta\Desktop>docker cp memgraph.conf bb3de2634afe:/etc/memgraph/memgraph.conf +``` + +**8.** Restart Memgraph. + +**9.** You can check the current configuration by running the `SHOW CONFIG;` query. + + + + +To change the configuration file inside the Docker container continue with the +following steps: + +**3.** Enter the Docker container with the following command: + +```plaintext +docker exec -it bash +``` + +**4.** Install the text editor of your choice. + +**5.** Edit the configuration file located at `/etc/memgraph/memgraph.conf` + +**6.** Restart Memgraph. + +**7.** You can check the current configuration by running the `SHOW CONFIG;` query. + + + + +---- + +#### Passing configuration options within the `docker run` command {#command} + +Select the image you are using to find out how to customize the configuration by +passing configuration options within the `docker run` command. + + + + +If you are working with the `memgraph-platform` image, you should pass +configuration options with environment variables. + +For example, if you want to limit memory usage for the whole instance to 50 MiB +and set the log level to `TRACE`, pass the configuration like this: + +``` +docker run -it -p 7687:7687 -p 3000:3000 -p 7444:7444 -e MEMGRAPH="--memory-limit=50 --log-level=TRACE" memgraph/memgraph-platform +``` + + + + +When you are working with `memgraph` or `memgraph-mage` images, you should pass +configuration options as arguments. + +For example, if you want to limit memory usage for the whole instance to 50 MiB +and set the log level to `TRACE`, pass the configuration argument like this: + +``` +docker run -it -p 7687:7687 -p 7444:7444 memgraph/memgraph --memory-limit=50 --log-level=TRACE +``` + + + + + +You can check the current configuration by running the `SHOW CONFIG;` query. + +### Accessing logs + +To access the logs of a running instance: + +**1.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker +container: + +```plaintext +docker ps +``` + +**2.** Run the following command: + +```plaintext +docker exec -it bash +``` + +Be sure to replace the `` parameter. + +**3.** Position yourself in the `/var/log/memgraph` directory. + +```plaintext +cd /var/log/memgraph +``` + +**4.** List all the log files with `ls`. + +**5.** List the content of the log with the `cat .log` command. + +For example, if the `ls` command returned `memgraph_2022-03-04.log` you would +list the contents using the following command: + +```plaintext +cat memgraph_2022-03-04.log +``` + +**6.** If you want to save the log to your computer, exit the container with +`CTRL+D`, place yourself in a directory where you want to save the copy and run +the following command: + +```plaintext +docker cp .log :/var/log/memgraph/.log +``` + +For example, the following command will make a copy of the +`memgraph_2022-03-04.log` file on the user's desktop: + +```plaintext +C:\Users\Vlasta\Desktop>docker cp memgraph_2022-03-04.log bb3de2634afe:/var/log/memgraph/memgraph_2022-03-04.log.log +``` + +## Linux + +This section of the how-to guide will explain how to change the configuration +file and access logs if you are using MemgraphDB with WSL, Ubuntu, Debian or +RPM package. + +### Configuring Memgraph + +**1.** Install and run Memgraph. + +**2.** Open the configuration file available at `/etc/memgraph/memgraph.conf`. + +**3.** Modify the configuration file and save the changes. + +**4.** Restart Memgraph. + +**5.** You can check the current configuration by running the `SHOW CONFIG;` query. + +### Accessing logs + +Logs can be found in the `/var/log/memgraph` directory. \ No newline at end of file diff --git a/docs2/configuration/data-durability-and-backup.md b/docs2/configuration/data-durability-and-backup.md new file mode 100644 index 00000000000..137ed9e18aa --- /dev/null +++ b/docs2/configuration/data-durability-and-backup.md @@ -0,0 +1,345 @@ +# Data durability and backup + +Memgraph uses two mechanisms to ensure the durability of stored data and make +disaster recovery possible: + +* write-ahead logging (WAL) +* periodic snapshot creation + +These mechanisms generate **durability files** and save them in the respective +`wal` and `snapshots` folders in the **data directory**. Data directory stores +permanent data on-disk. + +The default data directory path is `var/lib/memgraph` but the path can be +changed by [modifying the `data-dir` configuration +flag](/memgraph/reference-guide/configuration#other). + +Durability files are deleted when certain events are triggered, for example, +exceeding the maximum number of snapshots. + +To manage this behavior, run the following queries: + +```cypher +LOCK DATA DIRECTORY; +UNLOCK DATA DIRECTORY; +``` + +To show the status of the data directory, run: + +```cypher +DATA DIRECTORY LOCK STATUS; +``` + +To encrypt the data directory, use +[LUKS](https://gitlab.com/cryptsetup/cryptsetup/) as it works with Memgraph out +of the box and is undetectable from the applications perspective so it shouldn't +break any existing applications. + +[![Related - How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/create-backup.md) + +## Durability mechanisms + +To configure the durability mechanisms check their respective configuration +flags in the [configuration reference +guide](/memgraph/reference-guide/configuration#storage). + +If you need help configuring Memgraph, check out theΒ configuration [how-to +guide](/how-to-guides/config-logs.md). + +### Write-ahead logging + +Write-ahead logging (WAL) is a technique applied in providing **atomicity** and +**durability** to database systems. Each database modification is recorded in a +log file before being written to the DB and therefore the log file contains all +steps needed to reconstruct the DB’s most recent state. + +Memgraph has WAL enabled by default. To switch it on and off, use the boolean +`storage-wal-enabled` flag. For other WAL-related flags check the [configuration +reference guide](/memgraph/reference-guide/configuration#storage). + +WAL files are usually located at `/var/lib/memgraph/wal`. + +### Snapshots +Snapshots provide a faster way to restore the states of your database. Memgraph +periodically takes snapshots during runtime. When a snapshot creation is +triggered, the entire data storage is written to the drive. Nodes and +relationships are divided into groups called batches. + +On startup, the database state is recovered from the most recent snapshot file. +Memgraph can read the data and build the indices on multiple threads, using +batches as a parallelization unit: each thread will recover one batch at a time +until there are no unhandled batches. + +This means the same batch size might not be suitable for every dataset. A +smaller dataset might require a smaller batch size to utilize a multi-threaded +processor, while bigger datasets might use bigger batches to minimize the +synchronization between the worker threads. Therefore the size of batches and +the number of used threads [are +configurable](/memgraph/reference-guide/configuration#storage) similarly to +other durability related settings. + +The timestamp of the snapshot is compared with the latest update recorded in the +WAL file and, if the snapshot is less recent, the state of the DB will be +recovered using the WAL file. + +Memgraph has snapshot creation enabled by default. You can configure the exact +snapshot creation behavior by [defining the relevant flags](/memgraph/reference-guide/configuration#storage). +Alternatively, you can make one directly by running the following query: + +```opencypher +CREATE SNAPSHOT; +``` +Snapshot files are saved inside the `snapshots` directory located in the data directory +(`var/lib/memgraph`). + +:::caution +Snapshots and WAL files are presently not compatible between Memgraph versions. +::: + +## Backup and restore + + + + +You can easily back up Memgraph by following a four-step process: + +1. Lock the data directory with the `LOCK DATA DIRECTORY;` query. +2. Create a snapshot with the `CREATE SNAPSHOT;` query. +3. Copy the snapshot from the `snapshots` directory to a backup location. +4. Unlock the directory with the `UNLOCK DATA DIRECTORY;` query. + +Locking the data directory ensures that no files are deleted by the system. + +To restore from back-up you have two options: + +1. Start an instance by adding a `-v ~/snapshots:/var/lib/memgraph/snapshots` + flag to the `docker run` command, where the `~/snapshots` represents a path to + the local directory with the back-up snapshot, for example: + + ``` + docker run -p 7687:7687 -p 7444:7444 -v ~/snapshots:/var/lib/memgraph/snapshots memgraph/memgraph + ``` + +2. Copy the backed-up snapshot file into the `snapshots` directory after creating the container and start the database. So the commands should look like this: + + ``` + docker create -p 7687:7687 -p 7444:7444 -v `snapshots`:/var/lib/memgraph/snapshots --name memgraphDB memgraph/memgraph + tar -cf - sample_snapshot_file | docker cp -a - memgraphDB:/var/lib/memgraph/snapshots + ``` + The `sample_snapshot_file` is the snapshot file you want to use to restore the data. Due to the nature of Docker file ownership, you need to use `tar` to copy the file as STDIN into the non-running container. It will allow you to change the ownership of the file to the `memgraph` user inside the container. + + After that, start the database with: + ``` + docker start -a memgraphDB + ``` + The `-a` flag is used to attach to the container's output so you can see the logs. + + Once memgraph is started, change the snapshot directory ownership to the `memgraph` user by running the following command: + ``` + docker exec -it -u 0 memgraphDB bash -c "chown memgraph:memgraph /var/lib/memgraph/snasphots" + ``` + Otherwise, Memgraph will not be able to write the future snapshot files and will fail. + + + + + +You can easily back up Memgraph by following a four-step process: + +1. Lock the data directory with the `LOCK DATA DIRECTORY;` query. +2. Create a snapshot with the `CREATE SNAPSHOT;` query. +3. Copy the snapshot from the `snapshots` directory to a backup location. +4. Unlock the directory with the `UNLOCK DATA DIRECTORY;` query. + +Locking the data directory ensures that no files are deleted by the system. + +To restore from back-up: + +1. Copy the backed up snapshot into the `snapshots` directory. +2. Ensure that the snapshot file you want to use to restore the data is the only + snapshot file in the `snapshots` directory and that the `wal` directory is + empty. +3. Start the database. + + + + +Check out [a detailed guide](/how-to-guides/create-backup.md). + +## Database dump + +The database dump contains a record of the database state in the form of Cypher +queries. It’s equivalent to the SQL dump in relational DBs. + +You can run the queries constituting the dump to recreate the state of the DB as +it was at the time of the dump. + +To dump the Memgraph DB, run the following query: + +```opencypher +DUMP DATABASE; +``` +If you are using Memgraph Lab, you can dump the database, that is, the queries +to recreate it, to a CYPHERL file in the `Import & Export` section of the Lab. + +## Storage modes + +Memgraph has the option to work in `IN_MEMORY_ANALYTICAL`, +`IN_MEMORY_TRANSACTIONAL` or `ON_DISK_TRANSACTIONAL` [storage +modes](/reference-guide/storage-modes.md). + +Memgraph always starts in the `IN_MEMORY_TRANSACTIONAL` mode in which it creates +periodic snapshots and write-ahead logging as durability mechanisms, and also +enables creating manual snapshots. + +In the `IN_MEMORY_ANALYTICAL` mode, Memgraph offers no periodic snapshots and +write-ahead logging. Users can create a snapshot with the `CREATE SNAPSHOT;` +Cypher query. During the process of snapshot creation, other transactions will +be prevented from starting until the snapshot creation is completed. + +In the `ON_DISK_TRANSACTIONAL` mode, durability is supported by RocksDB since it +keeps its own +[WAL](https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-%28WAL%29) files. +Memgraph persists the metadata used in the implementation of the on-disk +storage. + +## How to create backup + +While running, Memgraph generates various files in its [data +directory](/docs/memgraph/reference-guide/backup), including the **durability +files**, that is, snapshots and WALs that contain Memgraph's data in a +recoverable format and are located in the `wal` and `snapshots` folders in the +data directory. On startup, Memgraph searches for previously saved durability +files and uses them to recreate the most recent database state. + +When talking about the data directory in the context of backup and restore, we +are actually talking about two directories, `snapshots` and `wal`, which are +usually located in the `/var/lib/memgraph` directory. + +Snapshots are created periodically based on the value defined with the +`--storage-snapshot-interval-sec` configuration flag, as well as upon exit based +on the configuration flag `--storage-snapshot-on-exit`, defined by the +configuration file. + +You can configure the exact snapshot creation behavior [by defining the +relevant](/memgraph/reference-guide/configuration#storage). If you need +help adjusting the configuration, check out the [how-to guide on changing the +configuration](/how-to-guides/config-logs.md). + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/backup.md) + +### Create backup + + Follow these steps to create database backup: + +1. **Create a snapshot** + + If necessary, create a snapshot of the current database state by running the + following query in `mgconsole` or Memgraph Lab: + + ```cypher + CREATE SNAPSHOT; + ``` + The snapshot is saved in the `snapshots` directory of the data directory + (`/var/lib/memgraph`). + +2. **Lock the data directory** + + Durability files are deleted when an event is triggered, for example, exceeding + the maximum number of snapshots. + + To disable this behavior, run the following query in `mgconsole` or Memgraph + Lab: + + ```cypher + LOCK DATA DIRECTORY; + ``` + +3. **Copy files** + + Copy snapshot files (from the `snapshots` directory) and any additional WAL + files (from the `wal` directory) to a backup location. + + If you've just created a snapshot file there is no need to backup WAL files. + + To help copying the files from the Docker container, check out the [Working with + docker + guide](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-docker-container). + +4. **Unlock the data directory** + + Run the following query in `mgconsole` or Memgraph Lab to unlock the + directory: + + ```cypher + UNLOCK DATA DIRECTORY; + ``` + + Memgraph will delete the files which should have been deleted before locking and + allow any future deletion of the durability files. + +### Restore data + +To restore data from a backup + + + + + +1. Empty the `wal` directory + + If you want to restore data only from the snapshot file, ensure that the + `wal` directory is empty: + + - Find the container ID using a `docker ps` command, then enter the container using: + + ``` + docker exec -it CONTAINER_ID bash + ``` + - Position yourself in the `/var/lib/memgraph/wal` directory and `rm *` + +1. Stop the instance using `docker stop CONTAINER_ID` +2. Start the instance by adding a `-v ~/snapshots:/var/lib/memgraph/snapshots` + flag to the `docker run` command, where the `~/snapshots` represents a path + to the location of the directory with the back-up snapshot, for example: + + ``` + docker run -p 7687:7687 -p 7444:7444 -v ~/snapshots:/var/lib/memgraph/snapshots memgraph/memgraph + ``` +4. If you want to copy both WAL and snapshot files start the instance by adding + a `-v ~/snapshots:/var/lib/memgraph/snapshots -v ~/wal:/var/lib/memgraph/wal` + flags to the `docker run` command, where the `~/snapshots` represents a path + to the location of the backed-up snapshot directory, and `~/wal` represents a + path to the location of the backed-up wal directory for example: + + ``` + docker run -p 7687:7687 -p 7444:7444 -v ~/snapshots:/var/lib/memgraph/snapshots -v ~/wal:/var/lib/memgraph/wal memgraph/memgraph + ``` + + + + +1. Before running an instance, copy the backed up snapshot into the `snapshots` + directory, and optionally, copy the backed-up WAL files into the `wal` + directory. +2. If you are restoring data only from the snapshot file, ensure that the file + you want to use to restore the data is the only snapshot file in the + `snapshots` directory and that the `wal` directory is empty. If you are + restoring data from both the snapshot and WAL files, ensure they are the only + files in the `snapshot` and `wal` directories. +3. Start the database. + + + \ No newline at end of file diff --git a/docs2/configuration/enabling-memgraph-enterprise.md b/docs2/configuration/enabling-memgraph-enterprise.md new file mode 100644 index 00000000000..e86ee2e87b0 --- /dev/null +++ b/docs2/configuration/enabling-memgraph-enterprise.md @@ -0,0 +1,33 @@ +--- +id: enabling-memgraph-enterprise +title: Enabling Memgraph Enterprise +sidebar_label: Enabling Memgraph Enterprise +--- + +Some of Memgraph's features are only available in Enterprise Edition. They are +present in the same binary but protected by a license key. + +If you're interested in Memgraph Enterprise, you need to fill out the following +[form](https://docs.google.com/forms/d/e/1FAIpQLSddH_XV000h58MhwJwwrUu2L3uTkejEDPqvstl6eMou_AW-yw/viewform) where one of the fields is the organization name. + +After getting your license key, set the `organization.name` to the same +organization name you used for the license key, and the `enterprise.license` to +the license key you received by running the following queries: + +``` +SET DATABASE SETTING 'organization.name' TO 'Organzation'; +SET DATABASE SETTING 'enterprise.license' TO 'License'; +``` + +To check the set values run: + +```opencypher +SHOW DATABASE SETTING 'organization.name'; +SHOW DATABASE SETTING 'enterprise.license'; +``` + +or: + +```opencypher +SHOW DATABASE SETTINGS; +``` \ No newline at end of file diff --git a/docs2/custom-query-modules/c/c-api.md b/docs2/custom-query-modules/c/c-api.md new file mode 100644 index 00000000000..2e1265e5b91 --- /dev/null +++ b/docs2/custom-query-modules/c/c-api.md @@ -0,0 +1,4259 @@ +--- +id: c-api +title: Query modules C API +sidebar_label: C API +slug: /reference-guide/query-modules/api/c-api +--- + +This is the API documentation for `mg_procedure.h` that contains declarations +of all functions that can be used to implement a query module procedure. The +source file can be found in the Memgraph installation directory, under +`/usr/include/memgraph`. + +:::tip + +For an example of how to implementΒ query modules in C, take a look at [the +example we +provided](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#c-api). + +::: + +:::tip + +If you install any C modules after running Memgraph, you'll have to [load +them into Memgraph](../load-call-query-modules#loading-query-modules) or restart +Memgraph in order to use them. + +::: + +## Classes + +| Name | Description | +| -------------- | -------------- | +| **[mgp_label](#mgp_label)** | Label of a vertex. | +| **[mgp_edge_type](#mgp_edge_type)** | Type of an edge. | +| **[mgp_property](#mgp_property)** | Reference to a named property value. | +| **[mgp_vertex_id](#mgp_vertex_id)** | ID of a vertex; valid during a single query execution. | +| **[mgp_edge_id](#mgp_edge_id)** | ID of an edge; valid during a single query execution. | +| **[mgp_date_parameters](#mgp_date_parameters)** | | +| **[mgp_local_time_parameters](#mgp_local_time_parameters)** | | +| **[mgp_local_date_time_parameters](#mgp_local_date_time_parameters)** | | +| **[mgp_duration_parameters](#mgp_duration_parameters)** | | + +## Types + +| | Name | +| -------------- | -------------- | +| enum| **[mgp_value_type](#enum-mgp-value-type)** { MGP_VALUE_TYPE_NULL, MGP_VALUE_TYPE_BOOL, MGP_VALUE_TYPE_INT, MGP_VALUE_TYPE_DOUBLE, MGP_VALUE_TYPE_STRING, MGP_VALUE_TYPE_LIST, MGP_VALUE_TYPE_MAP, MGP_VALUE_TYPE_VERTEX, MGP_VALUE_TYPE_EDGE, MGP_VALUE_TYPE_PATH, MGP_VALUE_TYPE_DATE, MGP_VALUE_TYPE_LOCAL_TIME, MGP_VALUE_TYPE_LOCAL_DATE_TIME, MGP_VALUE_TYPE_DURATION}
All available types that can be stored in a mgp_value. | +| typedef void(*)(struct mgp_list *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *) | **[mgp_proc_cb](#typedef-mgp-proc-cb)**
Entry-point for a query module read procedure, invoked through openCypher. | +| typedef void(*)(struct mgp_list *, struct mgp_graph *, struct mgp_memory *) | **[mgp_proc_initializer](#typedef-mgp-proc-initializer)**
Initialization point for a query module read procedure, invoked before procedure. | +| typedef void(*)() | **[mgp_proc_cleanup](#typedef-mgp-proc-cleanup)**
Cleanup for a query module read procedure | +| typedef void(*)(struct mgp_messages *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *) | **[mgp_trans_cb](#typedef-mgp-trans-cb)**
Entry-point for a module transformation, invoked through a stream transformation. | + +## Functions + +| | Name | +| -------------- | -------------- | +| enum [mgp_error](#variable-mgp-error) | **[mgp_alloc](#function-mgp-alloc)**(struct mgp_memory * memory, size_t size_in_bytes, void ** result)
Allocate a block of memory with given size in bytes. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_aligned_alloc](#function-mgp-aligned-alloc)**(struct mgp_memory * memory, size_t size_in_bytes, size_t alignment, void ** result)
Allocate an aligned block of memory with given size in bytes. | +| void | **[mgp_free](#function-mgp-free)**(struct mgp_memory * memory, void * ptr)
Deallocate an allocation from mgp_alloc or mgp_aligned_alloc. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_global_alloc](#function-mgp-global-alloc)**(size_t size_in_bytes, void ** result)
Allocate a global block of memory with given size in bytes. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_global_aligned_alloc](#function-mgp-global-aligned-alloc)**(size_t size_in_bytes, size_t alignment, void ** result)
Allocate an aligned global block of memory with given size in bytes. | +| void | **[mgp_global_free](#function-mgp-global-free)**(void * p)
Deallocate an allocation from mgp_global_alloc or mgp_global_aligned_alloc. | +| void | **[mgp_value_destroy](#function-mgp-value-destroy)**(struct mgp_value * val)
Free the memory used by the given mgp_value instance. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_null](#function-mgp-value-make-null)**(struct mgp_memory * memory, struct mgp_value ** result)
Construct a value representing `null` in openCypher. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_bool](#function-mgp-value-make-bool)**(int val, struct mgp_memory * memory, struct mgp_value ** result)
Construct a boolean value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_int](#function-mgp-value-make-int)**(int64_t val, struct mgp_memory * memory, struct mgp_value ** result)
Construct an integer value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_double](#function-mgp-value-make-double)**(double val, struct mgp_memory * memory, struct mgp_value ** result)
Construct a double floating point value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_string](#function-mgp-value-make-string)**(const char * val, struct mgp_memory * memory, struct mgp_value ** result)
Construct a character string value from a NULL terminated string. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_list](#function-mgp-value-make-list)**(struct mgp_list * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_list. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_map](#function-mgp-value-make-map)**(struct mgp_map * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_map. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_vertex](#function-mgp-value-make-vertex)**(struct mgp_vertex * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_edge](#function-mgp-value-make-edge)**(struct mgp_edge * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_path](#function-mgp-value-make-path)**(struct mgp_path * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_path. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_date](#function-mgp-value-make-date)**(struct mgp_date * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_local_time](#function-mgp-value-make-local-time)**(struct mgp_local_time * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_local_time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_local_date_time](#function-mgp-value-make-local-date-time)**(struct mgp_local_date_time * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_local_date_time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_make_duration](#function-mgp-value-make-duration)**(struct mgp_duration * val, struct mgp_value ** result)
Create a mgp_value storing a mgp_duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_type](#function-mgp-value-get-type)**(struct mgp_value * val, enum [mgp_value_type](#enum-mgp-value-type) * result)
Get the type of the value contained in mgp_value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_null](#function-mgp-value-is-null)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value represents `null`. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_bool](#function-mgp-value-is-bool)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a boolean. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_int](#function-mgp-value-is-int)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores an integer. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_double](#function-mgp-value-is-double)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a double floating-point. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_string](#function-mgp-value-is-string)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a character string. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_list](#function-mgp-value-is-list)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a list of values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_map](#function-mgp-value-is-map)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a map of values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_vertex](#function-mgp-value-is-vertex)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_edge](#function-mgp-value-is-edge)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores an edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_path](#function-mgp-value-is-path)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a path. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_date](#function-mgp-value-is-date)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_local_time](#function-mgp-value-is-local-time)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_local_date_time](#function-mgp-value-is-local-date-time)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_is_duration](#function-mgp-value-is-duration)**(struct mgp_value * val, int * result)
Result is non-zero if the given mgp_value stores a duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_bool](#function-mgp-value-get-bool)**(struct mgp_value * val, int * result)
Get the contained boolean value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_int](#function-mgp-value-get-int)**(struct mgp_value * val, int64_t * result)
Get the contained integer. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_double](#function-mgp-value-get-double)**(struct mgp_value * val, double * result)
Get the contained double floating-point. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_string](#function-mgp-value-get-string)**(struct mgp_value * val, const char ** result)
Get the contained character string. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_list](#function-mgp-value-get-list)**(struct mgp_value * val, struct mgp_list ** result)
Get the contained list of values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_map](#function-mgp-value-get-map)**(struct mgp_value * val, struct mgp_map ** result)
Get the contained map of values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_vertex](#function-mgp-value-get-vertex)**(struct mgp_value * val, struct mgp_vertex ** result)
Get the contained vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_edge](#function-mgp-value-get-edge)**(struct mgp_value * val, struct mgp_edge ** result)
Get the contained edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_path](#function-mgp-value-get-path)**(struct mgp_value * val, struct mgp_path ** result)
Get the contained path. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_date](#function-mgp-value-get-date)**(struct mgp_value * val, struct mgp_date ** result)
Get the contained date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_local_time](#function-mgp-value-get-local-time)**(struct mgp_value * val, struct mgp_local_time ** result)
Get the contained local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_local_date_time](#function-mgp-value-get-local-date-time)**(struct mgp_value * val, struct mgp_local_date_time ** result)
Get the contained local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_value_get_duration](#function-mgp-value-get-duration)**(struct mgp_value * val, struct mgp_duration ** result)
Get the contained duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_make_empty](#function-mgp-list-make-empty)**(size_t capacity, struct mgp_memory * memory, struct mgp_list ** result)
Create an empty list with given capacity. | +| void | **[mgp_list_destroy](#function-mgp-list-destroy)**(struct mgp_list * list)
Free the memory used by the given mgp_list and contained elements. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_append](#function-mgp-list-append)**(struct mgp_list * list, struct mgp_value * val)
Append a copy of mgp_value to mgp_list if capacity allows. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_append_extend](#function-mgp-list-append-extend)**(struct mgp_list * list, struct mgp_value * val)
Append a copy of mgp_value to mgp_list increasing capacity if needed. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_size](#function-mgp-list-size)**(struct mgp_list * list, size_t * result)
Get the number of elements stored in mgp_list. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_capacity](#function-mgp-list-capacity)**(struct mgp_list * list, size_t * result)
Get the total number of elements for which there's already allocated memory in mgp_list. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_list_at](#function-mgp-list-at)**(struct mgp_list * list, size_t index, struct mgp_value ** result)
Get the element in mgp_list at given position. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_make_empty](#function-mgp-map-make-empty)**(struct mgp_memory * memory, struct mgp_map ** result)
Create an empty map of character strings to mgp_value instances. | +| void | **[mgp_map_destroy](#function-mgp-map-destroy)**(struct mgp_map * map)
Free the memory used by the given mgp_map and contained items. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_insert](#function-mgp-map-insert)**(struct mgp_map * map, const char * key, struct mgp_value * value)
Insert a new mapping from a NULL terminated character string to a value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_size](#function-mgp-map-size)**(struct mgp_map * map, size_t * result)
Get the number of items stored in mgp_map. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_at](#function-mgp-map-at)**(struct mgp_map * map, const char * key, struct mgp_value ** result)
Get the mapped mgp_value to the given character string. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_item_key](#function-mgp-map-item-key)**(struct mgp_map_item * item, const char ** result)
Get the key of the mapped item. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_item_value](#function-mgp-map-item-value)**(struct mgp_map_item * item, struct mgp_value ** result)
Get the value of the mapped item. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_iter_items](#function-mgp-map-iter-items)**(struct mgp_map * map, struct mgp_memory * memory, struct mgp_map_items_iterator ** result)
Start iterating over items contained in the given map. | +| void | **[mgp_map_items_iterator_destroy](#function-mgp-map-items-iterator-destroy)**(struct mgp_map_items_iterator * it)
Deallocate memory used by mgp_map_items_iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_items_iterator_get](#function-mgp-map-items-iterator-get)**(struct mgp_map_items_iterator * it, struct mgp_map_item ** result)
Get the current item pointed to by the iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_map_items_iterator_next](#function-mgp-map-items-iterator-next)**(struct mgp_map_items_iterator * it, struct mgp_map_item ** result)
Advance the iterator to the next item stored in map and return it. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_make_with_start](#function-mgp-path-make-with-start)**(struct mgp_vertex * vertex, struct mgp_memory * memory, struct mgp_path ** result)
Create a path with the copy of the given starting vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_copy](#function-mgp-path-copy)**(struct mgp_path * path, struct mgp_memory * memory, struct mgp_path ** result)
Copy a mgp_path. | +| void | **[mgp_path_destroy](#function-mgp-path-destroy)**(struct mgp_path * path)
Free the memory used by the given mgp_path and contained vertices and edges. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_expand](#function-mgp-path-expand)**(struct mgp_path * path, struct mgp_edge * edge)
Append an edge continuing from the last vertex on the path. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_size](#function-mgp-path-size)**(struct mgp_path * path, size_t * result)
Get the number of edges in a mgp_path. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_vertex_at](#function-mgp-path-vertex-at)**(struct mgp_path * path, size_t index, struct mgp_vertex ** result)
Get the vertex from a path at given index. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_edge_at](#function-mgp-path-edge-at)**(struct mgp_path * path, size_t index, struct mgp_edge ** result)
Get the edge from a path at given index. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_path_equal](#function-mgp-path-equal)**(struct mgp_path * p1, struct mgp_path * p2, int * result)
Result is non-zero if given paths are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_result_set_error_msg](#function-mgp-result-set-error-msg)**(struct mgp_result * res, const char * error_msg)
Set the error as the result of the procedure. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_result_new_record](#function-mgp-result-new-record)**(struct mgp_result * res, struct mgp_result_record ** result)
Create a new record for results. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_result_record_insert](#function-mgp-result-record-insert)**(struct mgp_result_record * record, const char * field_name, struct mgp_value * val)
Assign a value to a field in the given record. | +| void | **[mgp_properties_iterator_destroy](#function-mgp-properties-iterator-destroy)**(struct mgp_properties_iterator * it)
Free the memory used by a mgp_properties_iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_properties_iterator_get](#function-mgp-properties-iterator-get)**(struct mgp_properties_iterator * it, struct [mgp_property](#mgp_property) ** result)
Get the current property pointed to by the iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_properties_iterator_next](#function-mgp-properties-iterator-next)**(struct mgp_properties_iterator * it, struct [mgp_property](#mgp_property) ** result)
Advance the iterator to the next property and return it. | +| void | **[mgp_edges_iterator_destroy](#function-mgp-edges-iterator-destroy)**(struct mgp_edges_iterator * it)
Free the memory used by a mgp_edges_iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_get_id](#function-mgp-vertex-get-id)**(struct mgp_vertex * v, struct [mgp_vertex_id](#mgp_vertex_id) * result)
Get the ID of given vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_underlying_graph_is_mutable](#function-mgp-vertex-underlying-graph-is-mutable)**(struct mgp_vertex * v, int * result)
Result is non-zero if the vertex can be modified. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_set_property](#function-mgp-vertex-set-property)**(struct mgp_vertex * v, const char * property_name, struct mgp_value * property_value)
Set the value of a property on a vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_add_label](#function-mgp-vertex-add-label)**(struct mgp_vertex * v, struct [mgp_label](#mgp_label) label)
Add the label to the vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_remove_label](#function-mgp-vertex-remove-label)**(struct mgp_vertex * v, struct [mgp_label](#mgp_label) label)
Remove the label from the vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_copy](#function-mgp-vertex-copy)**(struct mgp_vertex * v, struct mgp_memory * memory, struct mgp_vertex ** result)
Copy a mgp_vertex. | +| void | **[mgp_vertex_destroy](#function-mgp-vertex-destroy)**(struct mgp_vertex * v)
Free the memory used by a mgp_vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_equal](#function-mgp-vertex-equal)**(struct mgp_vertex * v1, struct mgp_vertex * v2, int * result)
Result is non-zero if given vertices are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_labels_count](#function-mgp-vertex-labels-count)**(struct mgp_vertex * v, size_t * result)
Get the number of labels a given vertex has. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_label_at](#function-mgp-vertex-label-at)**(struct mgp_vertex * v, size_t index, struct [mgp_label](#mgp_label) * result)
Get [mgp_label](#mgp_label) in mgp_vertex at given index. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_has_label](#function-mgp-vertex-has-label)**(struct mgp_vertex * v, struct [mgp_label](#mgp_label) label, int * result)
Result is non-zero if the given vertex has the given label. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_has_label_named](#function-mgp-vertex-has-label-named)**(struct mgp_vertex * v, const char * label_name, int * result)
Result is non-zero if the given vertex has a label with given name. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_get_property](#function-mgp-vertex-get-property)**(struct mgp_vertex * v, const char * property_name, struct mgp_memory * memory, struct mgp_value ** result)
Get a copy of a vertex property mapped to a given name. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_iter_properties](#function-mgp-vertex-iter-properties)**(struct mgp_vertex * v, struct mgp_memory * memory, struct mgp_properties_iterator ** result)
Start iterating over properties stored in the given vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_iter_in_edges](#function-mgp-vertex-iter-in-edges)**(struct mgp_vertex * v, struct mgp_memory * memory, struct mgp_edges_iterator ** result)
Start iterating over inbound edges of the given vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertex_iter_out_edges](#function-mgp-vertex-iter-out-edges)**(struct mgp_vertex * v, struct mgp_memory * memory, struct mgp_edges_iterator ** result)
Start iterating over outbound edges of the given vertex. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edges_iterator_underlying_graph_is_mutable](#function-mgp-edges-iterator-underlying-graph-is-mutable)**(struct mgp_edges_iterator * it, int * result)
Result is non-zero if the edges returned by this iterator can be modified. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edges_iterator_get](#function-mgp-edges-iterator-get)**(struct mgp_edges_iterator * it, struct mgp_edge ** result)
Get the current edge pointed to by the iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edges_iterator_next](#function-mgp-edges-iterator-next)**(struct mgp_edges_iterator * it, struct mgp_edge ** result)
Advance the iterator to the next edge and return it. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_get_id](#function-mgp-edge-get-id)**(struct mgp_edge * e, struct [mgp_edge_id](#mgp_edge_id) * result)
Get the ID of given edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_underlying_graph_is_mutable](#function-mgp-edge-underlying-graph-is-mutable)**(struct mgp_edge * e, int * result)
Result is non-zero if the edge can be modified. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_copy](#function-mgp-edge-copy)**(struct mgp_edge * e, struct mgp_memory * memory, struct mgp_edge ** result)
Copy a mgp_edge. | +| void | **[mgp_edge_destroy](#function-mgp-edge-destroy)**(struct mgp_edge * e)
Free the memory used by a mgp_edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_equal](#function-mgp-edge-equal)**(struct mgp_edge * e1, struct mgp_edge * e2, int * result)
Result is non-zero if given edges are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_get_type](#function-mgp-edge-get-type)**(struct mgp_edge * e, struct [mgp_edge_type](#mgp_edge_type) * result)
Get the type of the given edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_get_from](#function-mgp-edge-get-from)**(struct mgp_edge * e, struct mgp_vertex ** result)
Get the source vertex of the given edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_get_to](#function-mgp-edge-get-to)**(struct mgp_edge * e, struct mgp_vertex ** result)
Get the destination vertex of the given edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_get_property](#function-mgp-edge-get-property)**(struct mgp_edge * e, const char * property_name, struct mgp_memory * memory, struct mgp_value ** result)
Get a copy of a edge property mapped to a given name. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_set_property](#function-mgp-edge-set-property)**(struct mgp_edge * e, const char * property_name, struct mgp_value * property_value)
Set the value of a property on an edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_edge_iter_properties](#function-mgp-edge-iter-properties)**(struct mgp_edge * e, struct mgp_memory * memory, struct mgp_properties_iterator ** result)
Start iterating over properties stored in the given edge. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_get_vertex_by_id](#function-mgp-graph-get-vertex-by-id)**(struct mgp_graph * g, struct [mgp_vertex_id](#mgp_vertex_id) id, struct mgp_memory * memory, struct mgp_vertex ** result)
Get the vertex corresponding to given ID, or NULL if no such vertex exists. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_is_mutable](#function-mgp-graph-is-mutable)**(struct mgp_graph * graph, int * result)
Result is non-zero if the graph can be modified. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_create_vertex](#function-mgp-graph-create-vertex)**(struct mgp_graph * graph, struct mgp_memory * memory, struct mgp_vertex ** result)
Add a new vertex to the graph. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_delete_vertex](#function-mgp-graph-delete-vertex)**(struct mgp_graph * graph, struct mgp_vertex * vertex)
Delete a vertex from the graph. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_detach_delete_vertex](#function-mgp-graph-detach-delete-vertex)**(struct mgp_graph * graph, struct mgp_vertex * vertex)
Delete a vertex and all of its edges from the graph. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_create_edge](#function-mgp-graph-create-edge)**(struct mgp_graph * graph, struct mgp_vertex * from, struct mgp_vertex * to, struct [mgp_edge_type](#mgp_edge_type) type, struct mgp_memory * memory, struct mgp_edge ** result)
Add a new directed edge between the two vertices with the specified label. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_delete_edge](#function-mgp-graph-delete-edge)**(struct mgp_graph * graph, struct mgp_edge * edge)
Delete an edge from the graph. | +| void | **[mgp_vertices_iterator_destroy](#function-mgp-vertices-iterator-destroy)**(struct mgp_vertices_iterator * it)
Free the memory used by a mgp_vertices_iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_graph_iter_vertices](#function-mgp-graph-iter-vertices)**(struct mgp_graph * g, struct mgp_memory * memory, struct mgp_vertices_iterator ** result)
Start iterating over vertices of the given graph. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertices_iterator_underlying_graph_is_mutable](#function-mgp-vertices-iterator-underlying-graph-is-mutable)**(struct mgp_vertices_iterator * it, int * result)
Result is non-zero if the vertices returned by this iterator can be modified. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertices_iterator_get](#function-mgp-vertices-iterator-get)**(struct mgp_vertices_iterator * it, struct mgp_vertex ** result)
Get the current vertex pointed to by the iterator. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_from_string](#function-mgp-date-from-string)**(const char * string, struct mgp_memory * memory, struct mgp_date ** date)
Create a date from a string following the ISO 8601 format. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_from_parameters](#function-mgp-date-from-parameters)**(struct [mgp_date_parameters](#mgp_date_parameters) * parameters, struct mgp_memory * memory, struct mgp_date ** date)
Create a date from mgp_date_parameter. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_copy](#function-mgp-date-copy)**(struct mgp_date * date, struct mgp_memory * memory, struct mgp_date ** result)
Copy a mgp_date. | +| void | **[mgp_date_destroy](#function-mgp-date-destroy)**(struct mgp_date * date)
Free the memory used by a mgp_date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_equal](#function-mgp-date-equal)**(struct mgp_date * first, struct mgp_date * second, int * result)
Result is non-zero if given dates are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_get_year](#function-mgp-date-get-year)**(struct mgp_date * date, int * year)
Get the year property of the date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_get_month](#function-mgp-date-get-month)**(struct mgp_date * date, int * month)
Get the month property of the date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_get_day](#function-mgp-date-get-day)**(struct mgp_date * date, int * day)
Get the day property of the date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_timestamp](#function-mgp-date-timestamp)**(struct mgp_date * date, int64_t * timestamp)
Get the date as microseconds from Unix epoch. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_now](#function-mgp-date-now)**(struct mgp_memory * memory, struct mgp_date ** date)
Get the date representing current date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_add_duration](#function-mgp-date-add-duration)**(struct mgp_date * date, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_date ** result)
Add a duration to the date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_sub_duration](#function-mgp-date-sub-duration)**(struct mgp_date * date, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_date ** result)
Subtract a duration from the date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_date_diff](#function-mgp-date-diff)**(struct mgp_date * first, struct mgp_date * second, struct mgp_memory * memory, struct mgp_duration ** result)
Get a duration between two dates. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_from_string](#function-mgp-local-time-from-string)**(const char * string, struct mgp_memory * memory, struct mgp_local_time ** local_time)
Create a local time from a string following the ISO 8601 format. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_from_parameters](#function-mgp-local-time-from-parameters)**(struct [mgp_local_time_parameters](#mgp_local_time_parameters) * parameters, struct mgp_memory * memory, struct mgp_local_time ** local_time)
Create a local time from [mgp_local_time_parameters](#mgp_local_time_parameters). | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_copy](#function-mgp-local-time-copy)**(struct mgp_local_time * local_time, struct mgp_memory * memory, struct mgp_local_time ** result)
Copy a mgp_local_time. | +| void | **[mgp_local_time_destroy](#function-mgp-local-time-destroy)**(struct mgp_local_time * local_time)
Free the memory used by a mgp_local_time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_equal](#function-mgp-local-time-equal)**(struct mgp_local_time * first, struct mgp_local_time * second, int * result)
Result is non-zero if given local times are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_get_hour](#function-mgp-local-time-get-hour)**(struct mgp_local_time * local_time, int * hour)
Get the hour property of the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_get_minute](#function-mgp-local-time-get-minute)**(struct mgp_local_time * local_time, int * minute)
Get the minute property of the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_get_second](#function-mgp-local-time-get-second)**(struct mgp_local_time * local_time, int * second)
Get the second property of the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_get_millisecond](#function-mgp-local-time-get-millisecond)**(struct mgp_local_time * local_time, int * millisecond)
Get the millisecond property of the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_get_microsecond](#function-mgp-local-time-get-microsecond)**(struct mgp_local_time * local_time, int * microsecond)
Get the microsecond property of the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_timestamp](#function-mgp-local-time-timestamp)**(struct mgp_local_time * local_time, int64_t * timestamp)
Get the local time as microseconds from midnight. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_now](#function-mgp-local-time-now)**(struct mgp_memory * memory, struct mgp_local_time ** local_time)
Get the local time representing current time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_add_duration](#function-mgp-local-time-add-duration)**(struct mgp_local_time * local_time, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_local_time ** result)
Add a duration to the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_sub_duration](#function-mgp-local-time-sub-duration)**(struct mgp_local_time * local_time, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_local_time ** result)
Subtract a duration from the local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_time_diff](#function-mgp-local-time-diff)**(struct mgp_local_time * first, struct mgp_local_time * second, struct mgp_memory * memory, struct mgp_duration ** result)
Get a duration between two local times. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_from_string](#function-mgp-local-date-time-from-string)**(const char * string, struct mgp_memory * memory, struct mgp_local_date_time ** local_date_time)
Create a local date-time from a string following the ISO 8601 format. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_from_parameters](#function-mgp-local-date-time-from-parameters)**(struct [mgp_local_date_time_parameters](#mgp_local_date_time_parameters) * parameters, struct mgp_memory * memory, struct mgp_local_date_time ** local_date_time)
Create a local date-time from [mgp_local_date_time_parameters](#mgp_local_date_time_parameters). | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_copy](#function-mgp-local-date-time-copy)**(struct mgp_local_date_time * local_date_time, struct mgp_memory * memory, struct mgp_local_date_time ** result)
Copy a mgp_local_date_time. | +| void | **[mgp_local_date_time_destroy](#function-mgp-local-date-time-destroy)**(struct mgp_local_date_time * local_date_time)
Free the memory used by a mgp_local_date_time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_equal](#function-mgp-local-date-time-equal)**(struct mgp_local_date_time * first, struct mgp_local_date_time * second, int * result)
Result is non-zero if given local date-times are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_year](#function-mgp-local-date-time-get-year)**(struct mgp_local_date_time * local_date_time, int * year)
Get the year property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_month](#function-mgp-local-date-time-get-month)**(struct mgp_local_date_time * local_date_time, int * month)
Get the month property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_day](#function-mgp-local-date-time-get-day)**(struct mgp_local_date_time * local_date_time, int * day)
Get the day property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_hour](#function-mgp-local-date-time-get-hour)**(struct mgp_local_date_time * local_date_time, int * hour)
Get the hour property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_minute](#function-mgp-local-date-time-get-minute)**(struct mgp_local_date_time * local_date_time, int * minute)
Get the minute property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_second](#function-mgp-local-date-time-get-second)**(struct mgp_local_date_time * local_date_time, int * second)
Get the second property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_millisecond](#function-mgp-local-date-time-get-millisecond)**(struct mgp_local_date_time * local_date_time, int * millisecond)
Get the milisecond property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_get_microsecond](#function-mgp-local-date-time-get-microsecond)**(struct mgp_local_date_time * local_date_time, int * microsecond)
Get the microsecond property of the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_timestamp](#function-mgp-local-date-time-timestamp)**(struct mgp_local_date_time * local_date_time, int64_t * timestamp)
Get the local date-time as microseconds from Unix epoch. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_now](#function-mgp-local-date-time-now)**(struct mgp_memory * memory, struct mgp_local_date_time ** local_date_time)
Get the local date-time representing current date and time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_add_duration](#function-mgp-local-date-time-add-duration)**(struct mgp_local_date_time * local_date_time, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_local_date_time ** result)
Add a duration to the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_sub_duration](#function-mgp-local-date-time-sub-duration)**(struct mgp_local_date_time * local_date_time, struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_local_date_time ** result)
Subtract a duration from the local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_local_date_time_diff](#function-mgp-local-date-time-diff)**(struct mgp_local_date_time * first, struct mgp_local_date_time * second, struct mgp_memory * memory, struct mgp_duration ** result)
Get a duration between two local date-times. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_from_string](#function-mgp-duration-from-string)**(const char * string, struct mgp_memory * memory, struct mgp_duration ** duration)
Create a duration from a string following the ISO 8601 format. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_from_parameters](#function-mgp-duration-from-parameters)**(struct [mgp_duration_parameters](#mgp_duration_parameters) * parameters, struct mgp_memory * memory, struct mgp_duration ** duration)
Create a duration from [mgp_duration_parameters](#mgp_duration_parameters). | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_from_microseconds](#function-mgp-duration-from-microseconds)**(int64_t microseconds, struct mgp_memory * memory, struct mgp_duration ** duration)
Create a duration from microseconds. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_copy](#function-mgp-duration-copy)**(struct mgp_duration * duration, struct mgp_memory * memory, struct mgp_duration ** result)
Copy a mgp_duration. | +| void | **[mgp_duration_destroy](#function-mgp-duration-destroy)**(struct mgp_duration * duration)
Free the memory used by a mgp_duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_equal](#function-mgp-duration-equal)**(struct mgp_duration * first, struct mgp_duration * second, int * result)
Result is non-zero if given durations are equal, otherwise 0. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_get_microseconds](#function-mgp-duration-get-microseconds)**(struct mgp_duration * duration, int64_t * microseconds)
Get the duration as microseconds. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_neg](#function-mgp-duration-neg)**(struct mgp_duration * dur, struct mgp_memory * memory, struct mgp_duration ** result)
Apply unary minus operator to the duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_add](#function-mgp-duration-add)**(struct mgp_duration * first, struct mgp_duration * second, struct mgp_memory * memory, struct mgp_duration ** result)
Add two durations. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_duration_sub](#function-mgp-duration-sub)**(struct mgp_duration * first, struct mgp_duration * second, struct mgp_memory * memory, struct mgp_duration ** result)
Subtract two durations. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_any](#function-mgp-type-any)**(struct mgp_type ** result)
Get the type representing any value that isn't `null`. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_bool](#function-mgp-type-bool)**(struct mgp_type ** result)
Get the type representing boolean values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_string](#function-mgp-type-string)**(struct mgp_type ** result)
Get the type representing character string values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_int](#function-mgp-type-int)**(struct mgp_type ** result)
Get the type representing integer values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_float](#function-mgp-type-float)**(struct mgp_type ** result)
Get the type representing floating-point values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_number](#function-mgp-type-number)**(struct mgp_type ** result)
Get the type representing any number value. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_map](#function-mgp-type-map)**(struct mgp_type ** result)
Get the type representing map values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_node](#function-mgp-type-node)**(struct mgp_type ** result)
Get the type representing graph node values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_relationship](#function-mgp-type-relationship)**(struct mgp_type ** result)
Get the type representing graph relationship values. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_path](#function-mgp-type-path)**(struct mgp_type ** result)
Get the type representing a graph path (walk) from one node to another. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_list](#function-mgp-type-list)**(struct mgp_type * element_type, struct mgp_type ** result)
Build a type representing a list of values of given `element_type`. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_date](#function-mgp-type-date)**(struct mgp_type ** result)
Get the type representing a date. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_local_time](#function-mgp-type-local-time)**(struct mgp_type ** result)
Get the type representing a local time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_local_date_time](#function-mgp-type-local-date-time)**(struct mgp_type ** result)
Get the type representing a local date-time. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_duration](#function-mgp-type-duration)**(struct mgp_type ** result)
Get the type representing a duration. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_type_nullable](#function-mgp-type-nullable)**(struct mgp_type * type, struct mgp_type ** result)
Build a type representing either a `null` value or a value of given `type`. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_module_add_read_procedure](#function-mgp-module-add-read-procedure)**(struct mgp_module * module, const char * name, [mgp_proc_cb](#typedef-mgp-proc-cb) cb, struct mgp_proc ** result)
Register a read-only procedure to a module. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_module_add_write_procedure](#function-mgp-module-add-write-procedure)**(struct mgp_module * module, const char * name, [mgp_proc_cb](#typedef-mgp-proc-cb) cb, struct mgp_proc ** result)
Register a writeable procedure to a module. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_module_add_batch_read_procedure](#function-mgp-module-add-read-procedure)**(struct mgp_module * module, const char * name, [mgp_proc_cb](#typedef-mgp-proc-cb) cb, [mgp_proc_initializer](#typedef-mgp-proc-initializer) initializer, [mgp_proc_cleanup](#typedef-mgp-proc-cleanup) cleanup, struct mgp_proc ** result)
Register a read-only procedure to a module. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_module_add_batch_write_procedure](#function-mgp-module-add-write-procedure)**(struct mgp_module * module, const char * name, [mgp_proc_cb](#typedef-mgp-proc-cb) cb, [mgp_proc_initializer](#typedef-mgp-proc-initializer) initializer, [mgp_proc_cleanup](#typedef-mgp-proc-cleanup) cleanup, struct mgp_proc ** result)
Register a writeable procedure to a module. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_proc_add_arg](#function-mgp-proc-add-arg)**(struct mgp_proc * proc, const char * name, struct mgp_type * type)
Add a required argument to a procedure. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_proc_add_opt_arg](#function-mgp-proc-add-opt-arg)**(struct mgp_proc * proc, const char * name, struct mgp_type * type, struct mgp_value * default_value)
Add an optional argument with a default value to a procedure. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_proc_add_result](#function-mgp-proc-add-result)**(struct mgp_proc * proc, const char * name, struct mgp_type * type)
Add a result field to a procedure. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_proc_add_deprecated_result](#function-mgp-proc-add-deprecated-result)**(struct mgp_proc * proc, const char * name, struct mgp_type * type)
Add a result field to a procedure and mark it as deprecated. | +| int | **[mgp_must_abort](#function-mgp-must-abort)**(struct mgp_graph * graph)
Return non-zero if the currently executing procedure should abort as soon as possible. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_payload](#function-mgp-message-payload)**(struct mgp_message * message, const char ** result)
Payload is not null terminated and not a string but rather a byte array. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_payload_size](#function-mgp-message-payload-size)**(struct mgp_message * message, size_t * result)
Get the payload size. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_topic_name](#function-mgp-message-topic-name)**(struct mgp_message * message, const char ** result)
Get the name of topic. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_key](#function-mgp-message-key)**(struct mgp_message * message, const char ** result)
Get the key of mgp_message as a byte array. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_key_size](#function-mgp-message-key-size)**(struct mgp_message * message, size_t * result)
Get the key size of mgp_message. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_message_timestamp](#function-mgp-message-timestamp)**(struct mgp_message * message, int64_t * result)
Get the timestamp of mgp_message as a byte array. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_messages_size](#function-mgp-messages-size)**(struct mgp_messages * message, size_t * result)
Get the number of messages contained in the mgp_messages list Current implementation always returns without errors. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_messages_at](#function-mgp-messages-at)**(struct mgp_messages * message, size_t index, struct mgp_message ** result)
Get the message from a messages list at given index. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_module_add_transformation](#function-mgp-module-add-transformation)**(struct mgp_module * module, const char * name, [mgp_trans_cb](#typedef-mgp-trans-cb) cb)
Register a transformation with a module. | +| enum [mgp_error](#variable-mgp-error) | **[mgp_vertices_iterator_next](#function-mgp-vertices-iterator-next)**(struct mgp_vertices_iterator * it, struct mgp_vertex ** result)
Advance the iterator to the next vertex and return it. | +| enum [mgp_error](#variable-mgp-error)| **[mgp_log](#function-mgp-log)**(mgp_log_level log_level, const char *output)
Log a message on a certain level. | + +## Attributes + +| | Name | +| -------------- | -------------- | +| enum MGP_NODISCARD | **[mgp_error](#variable-mgp-error)**
All functions return an error code that can be used to figure out whether the API call was successful or not. | +| | **[MGP_ERROR_NO_ERROR](#variable-mgp-error-no-error)** | +| | **[MGP_ERROR_UNKNOWN_ERROR](#variable-mgp-error-unknown-error)** | +| | **[MGP_ERROR_UNABLE_TO_ALLOCATE](#variable-mgp-error-unable-to-allocate)** | +| | **[MGP_ERROR_INSUFFICIENT_BUFFER](#variable-mgp-error-insufficient-buffer)** | +| | **[MGP_ERROR_OUT_OF_RANGE](#variable-mgp-error-out-of-range)** | +| | **[MGP_ERROR_LOGIC_ERROR](#variable-mgp-error-logic-error)** | +| | **[MGP_ERROR_DELETED_OBJECT](#variable-mgp-error-deleted-object)** | +| | **[MGP_ERROR_INVALID_ARGUMENT](#variable-mgp-error-invalid-argument)** | +| | **[MGP_ERROR_KEY_ALREADY_EXISTS](#variable-mgp-error-key-already-exists)** | +| | **[MGP_ERROR_IMMUTABLE_OBJECT](#variable-mgp-error-immutable-object)** | +| | **[MGP_ERROR_VALUE_CONVERSION](#variable-mgp-error-value-conversion)** | +| | **[MGP_ERROR_SERIALIZATION_ERROR](#variable-mgp-error-serialization-error)** | + +## Macros + +**[MGP_NODISCARD](#define-mgp-nodiscard)** + +## Classes Documentation + +### mgp_label + +Label of a vertex. + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| const char * | **[name](#variable-name)**
Name of the label as a NULL terminated character string. | + +#### variable name {#variable-name} + +```cpp +const char * name; +``` + +Name of the label as a NULL terminated character string. + +### mgp_edge_type + +Type of an edge. + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| const char * | **[name](#variable-name)**
Name of the type as a NULL terminated character string. | + +#### variable name {#variable-name} + +```cpp +const char * name; +``` + +Name of the type as a NULL terminated character string. + +### mgp_property + +Reference to a named property value. + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| const char * | **[name](#variable-name)**
Name (key) of a property as a NULL terminated string. | +| struct mgp_value * | **[value](#variable-value)**
Value of the referenced property. | + +#### variable name {#variable-name} + +```cpp +const char * name; +``` + +Name (key) of a property as a NULL terminated string. + +#### variable value {#variable-value} + +```cpp +struct mgp_value * value; +``` + +Value of the referenced property. + +### mgp_vertex_id + +ID of a vertex valid during a single query execution. + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| int64_t | **[as_int](#variable-as-int)** | + +#### variable as_int {#variable-as-int} + +```cpp +int64_t as_int; +``` + +### mgp_edge_id + +ID of an edge; valid during a single query execution. + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| int64_t | **[as_int](#variable-as-int)** | + +#### variable as_int {#variable-as-int} + +```cpp +int64_t as_int; +``` + +### mgp_date_parameters + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| int | **[year](#variable-year)** | +| int | **[month](#variable-month)** | +| int | **[day](#variable-day)** | + +#### variable year {#variable-year} + +```cpp +int year; +``` + +#### variable month {#variable-month} + +```cpp +int month; +``` + +#### variable day {#variable-day} + +```cpp +int day; +``` + +### mgp_local_time_parameters + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| int | **[hour](#variable-hour)** | +| int | **[minute](#variable-minute)** | +| int | **[second](#variable-second)** | +| int | **[millisecond](#variable-millisecond)** | +| int | **[microsecond](#variable-microsecond)** | + +#### variable hour {#variable-hour} + +```cpp +int hour; +``` + + +#### variable minute {#variable-minute} + +```cpp +int minute; +``` + + +#### variable second {#variable-second} + +```cpp +int second; +``` + + +#### variable millisecond {#variable-millisecond} + +```cpp +int millisecond; +``` + + +#### variable microsecond {#variable-microsecond} + +```cpp +int microsecond; +``` + +### mgp_local_date_time_parameters + +#### Public Attributes + +| Type | Name | +| -------------- | -------------- | +| struct [mgp_date_parameters](#mgp_date_parameters) * | **[date_parameters](#variable-date-parameters)** | +| struct [mgp_local_time_parameters](#mgp_local_time_parameters) * | **[local_time_parameters](#variable-local-time-parameters)** | + +#### variable date_parameters {#variable-date-parameters} + +```cpp +struct mgp_date_parameters * date_parameters; +``` + + +#### variable local_time_parameters {#variable-local-time-parameters} + +```cpp +struct mgp_local_time_parameters * local_time_parameters; +``` + +### mgp_duration_parameters + +#### Public Attributes + +| | Name | +| -------------- | -------------- | +| double | **[day](#variable-day)** | +| double | **[hour](#variable-hour)** | +| double | **[minute](#variable-minute)** | +| double | **[second](#variable-second)** | +| double | **[millisecond](#variable-millisecond)** | +| double | **[microsecond](#variable-microsecond)** | + +#### variable day {#variable-day} + +```cpp +double day; +``` + +#### variable hour {#variable-hour} + +```cpp +double hour; +``` + +#### variable minute {#variable-minute} + +```cpp +double minute; +``` + +#### variable second {#variable-second} + +```cpp +double second; +``` + +#### variable millisecond {#variable-millisecond} + +```cpp +double millisecond; +``` + +#### variable microsecond {#variable-microsecond} + +```cpp +double microsecond; +``` + +## Types Documentation + +### enum mgp_value_type {#enum-mgp-value-type} + +All available types that can be stored in a mgp_value. + +| Enumerator | +| ---------- | +| MGP_VALUE_TYPE_NULL | +| MGP_VALUE_TYPE_BOOL | +| MGP_VALUE_TYPE_INT | +| MGP_VALUE_TYPE_DOUBLE | +| MGP_VALUE_TYPE_STRING | +| MGP_VALUE_TYPE_LIST | +| MGP_VALUE_TYPE_MAP | +| MGP_VALUE_TYPE_VERTEX | +| MGP_VALUE_TYPE_EDGE | +| MGP_VALUE_TYPE_PATH | +| MGP_VALUE_TYPE_DATE | +| MGP_VALUE_TYPE_LOCAL_TIME | +| MGP_VALUE_TYPE_LOCAL_DATE_TIME | +| MGP_VALUE_TYPE_DURATION | + +### typedef mgp_proc_cb {#typedef-mgp-proc-cb} +### typedef mgp_proc_initializer {#typedef-mgp-proc-initializer} +### typedef mgp_proc_cleanup {#typedef-mgp-proc-cleanup} + +```cpp +typedef void(* mgp_proc_cb) (struct mgp_list *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *); +``` + +Entry-point for a query module read procedure, invoked through openCypher. + +Passed in arguments will not live longer than the callback's execution. Therefore, you must not store them globally or use the passed in mgp_memory to allocate global resources. + + +### typedef mgp_trans_cb {#typedef-mgp-trans-cb} + +```cpp +typedef void(* mgp_trans_cb) (struct mgp_messages *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *); +``` + +Entry-point for a module transformation, invoked through a stream transformation. + +Passed in arguments will not live longer than the callback's execution. Therefore, you must not store them globally or use the passed in mgp_memory to allocate global resources. + + + +## Functions Documentation + +### mgp_alloc {#function-mgp-alloc} + +```cpp +enum mgp_error mgp_alloc( + struct mgp_memory * memory, + size_t size_in_bytes, + void ** result +) +``` + +Allocate a block of memory with given size in bytes. + +Unlike malloc, this function is not thread-safe. `size_in_bytes` must be greater than 0. The resulting pointer must be freed with mgp_free. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to serve the requested allocation. + + +### mgp_aligned_alloc {#function-mgp-aligned-alloc} + +```cpp +enum mgp_error mgp_aligned_alloc( + struct mgp_memory * memory, + size_t size_in_bytes, + size_t alignment, + void ** result +) +``` + +Allocate an aligned block of memory with given size in bytes. + +Unlike malloc and aligned_alloc, this function is not thread-safe. `size_in_bytes` must be greater than 0. `alignment` must be a power of 2 value. The resulting pointer must be freed with mgp_free. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to serve the requested allocation. + + +### mgp_free {#function-mgp-free} + +```cpp +void mgp_free( + struct mgp_memory * memory, + void * ptr +) +``` + +Deallocate an allocation from mgp_alloc or mgp_aligned_alloc. + +Unlike free, this function is not thread-safe. If `ptr` is NULL, this function does nothing. The behavior is undefined if `ptr` is not a value returned from a prior mgp_alloc or mgp_aligned_alloc call with the corresponding `memory`. + + +### mgp_global_alloc {#function-mgp-global-alloc} + +```cpp +enum mgp_error mgp_global_alloc( + size_t size_in_bytes, + void ** result +) +``` + +Allocate a global block of memory with given size in bytes. + +This function can be used to allocate global memory that persists beyond a single invocation of mgp_main. The resulting pointer must be freed with mgp_global_free. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to serve the requested allocation. + + +### mgp_global_aligned_alloc {#function-mgp-global-aligned-alloc} + +```cpp +enum mgp_error mgp_global_aligned_alloc( + size_t size_in_bytes, + size_t alignment, + void ** result +) +``` + +Allocate an aligned global block of memory with given size in bytes. + +This function can be used to allocate global memory that persists beyond a single invocation of mgp_main. The resulting pointer must be freed with mgp_global_free. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to serve the requested allocation. + + +### mgp_global_free {#function-mgp-global-free} + +```cpp +void mgp_global_free( + void * p +) +``` + +Deallocate an allocation from mgp_global_alloc or mgp_global_aligned_alloc. + +If `ptr` is NULL, this function does nothing. The behavior is undefined if `ptr` is not a value returned from a prior [mgp_global_alloc()](#function-mgp-global-alloc) or [mgp_global_aligned_alloc()](#function-mgp-global-aligned-alloc). + + +### mgp_value_destroy {#function-mgp-value-destroy} + +```cpp +void mgp_value_destroy( + struct mgp_value * val +) +``` + +Free the memory used by the given mgp_value instance. + +### mgp_value_make_null {#function-mgp-value-make-null} + +```cpp +enum mgp_error mgp_value_make_null( + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Construct a value representing `null` in openCypher. + +You need to free the instance through mgp_value_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_bool {#function-mgp-value-make-bool} + +```cpp +enum mgp_error mgp_value_make_bool( + int val, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Construct a boolean value. + +Non-zero values represent `true`, while zero represents `false`. You need to free the instance through mgp_value_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_int {#function-mgp-value-make-int} + +```cpp +enum mgp_error mgp_value_make_int( + int64_t val, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Construct an integer value. + +You need to free the instance through mgp_value_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_double {#function-mgp-value-make-double} + +```cpp +enum mgp_error mgp_value_make_double( + double val, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Construct a double floating point value. + +You need to free the instance through mgp_value_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_string {#function-mgp-value-make-string} + +```cpp +enum mgp_error mgp_value_make_string( + const char * val, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Construct a character string value from a NULL terminated string. + +You need to free the instance through mgp_value_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_list {#function-mgp-value-make-list} + +```cpp +enum mgp_error mgp_value_make_list( + struct mgp_list * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_list. + +You need to free the instance through mgp_value_destroy. The ownership of the list is given to the created mgp_value and destroying the mgp_value will destroy the mgp_list. Therefore, if a mgp_value is successfully created you must not call mgp_list_destroy on the given list. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_map {#function-mgp-value-make-map} + +```cpp +enum mgp_error mgp_value_make_map( + struct mgp_map * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_map. + +You need to free the instance through mgp_value_destroy. The ownership of the map is given to the created mgp_value and destroying the mgp_value will destroy the mgp_map. Therefore, if a mgp_value is successfully created you must not call mgp_map_destroy on the given map. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_vertex {#function-mgp-value-make-vertex} + +```cpp +enum mgp_error mgp_value_make_vertex( + struct mgp_vertex * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_vertex. + +You need to free the instance through mgp_value_destroy. The ownership of the vertex is given to the created mgp_value and destroying the mgp_value will destroy the mgp_vertex. Therefore, if a mgp_value is successfully created you must not call mgp_vertex_destroy on the given vertex. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_edge {#function-mgp-value-make-edge} + +```cpp +enum mgp_error mgp_value_make_edge( + struct mgp_edge * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_edge. + +You need to free the instance through mgp_value_destroy. The ownership of the edge is given to the created mgp_value and destroying the mgp_value will destroy the mgp_edge. Therefore, if a mgp_value is successfully created you must not call mgp_edge_destroy on the given edge. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_path {#function-mgp-value-make-path} + +```cpp +enum mgp_error mgp_value_make_path( + struct mgp_path * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_path. + +You need to free the instance through mgp_value_destroy. The ownership of the path is given to the created mgp_value and destroying the mgp_value will destroy the mgp_path. Therefore, if a mgp_value is successfully created you must not call mgp_path_destroy on the given path. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_date {#function-mgp-value-make-date} + +```cpp +enum mgp_error mgp_value_make_date( + struct mgp_date * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_date. + +You need to free the instance through mgp_value_destroy. The ownership of the date is transferred to the created mgp_value and destroying the mgp_value will destroy the mgp_date. Therefore, if a mgp_value is successfully created you must not call mgp_date_destroy on the given date. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_local_time {#function-mgp-value-make-local-time} + +```cpp +enum mgp_error mgp_value_make_local_time( + struct mgp_local_time * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_local_time. + +You need to free the instance through mgp_value_destroy. The ownership of the local time is transferred to the created mgp_value and destroying the mgp_value will destroy the mgp_local_time. Therefore, if a mgp_value is successfully created you must not call mgp_local_time_destroy on the given local time. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_local_date_time {#function-mgp-value-make-local-date-time} + +```cpp +enum mgp_error mgp_value_make_local_date_time( + struct mgp_local_date_time * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_local_date_time. + +You need to free the instance through mgp_value_destroy. The ownership of the local date-time is transferred to the created mgp_value and destroying the mgp_value will destroy the mgp_local_date_time. Therefore, if a mgp_value is successfully created you must not call mgp_local_date_time_destroy on the given local date-time. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_make_duration {#function-mgp-value-make-duration} + +```cpp +enum mgp_error mgp_value_make_duration( + struct mgp_duration * val, + struct mgp_value ** result +) +``` + +Create a mgp_value storing a mgp_duration. + +You need to free the instance through mgp_value_destroy. The ownership of the duration is transferred to the created mgp_value and destroying the mgp_value will destroy the mgp_duration. Therefore, if a mgp_value is successfully created you must not call mgp_duration_destroy on the given duration. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_value. + + +### mgp_value_get_type {#function-mgp-value-get-type} + +```cpp +enum mgp_error mgp_value_get_type( + struct mgp_value * val, + enum mgp_value_type * result +) +``` + +Get the type of the value contained in mgp_value. + +Current implementation always returns without errors. + + +### mgp_value_is_null {#function-mgp-value-is-null} + +```cpp +enum mgp_error mgp_value_is_null( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value represents `null`. + +Current implementation always returns without errors. + + +### mgp_value_is_bool {#function-mgp-value-is-bool} + +```cpp +enum mgp_error mgp_value_is_bool( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a boolean. + +Current implementation always returns without errors. + + +### mgp_value_is_int {#function-mgp-value-is-int} + +```cpp +enum mgp_error mgp_value_is_int( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores an integer. + +Current implementation always returns without errors. + + +### mgp_value_is_double {#function-mgp-value-is-double} + +```cpp +enum mgp_error mgp_value_is_double( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a double floating-point. + +Current implementation always returns without errors. + + +### mgp_value_is_string {#function-mgp-value-is-string} + +```cpp +enum mgp_error mgp_value_is_string( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a character string. + +Current implementation always returns without errors. + + +### mgp_value_is_list {#function-mgp-value-is-list} + +```cpp +enum mgp_error mgp_value_is_list( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a list of values. + +Current implementation always returns without errors. + + +### mgp_value_is_map {#function-mgp-value-is-map} + +```cpp +enum mgp_error mgp_value_is_map( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a map of values. + +Current implementation always returns without errors. + + +### mgp_value_is_vertex {#function-mgp-value-is-vertex} + +```cpp +enum mgp_error mgp_value_is_vertex( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a vertex. + +Current implementation always returns without errors. + + +### mgp_value_is_edge {#function-mgp-value-is-edge} + +```cpp +enum mgp_error mgp_value_is_edge( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores an edge. + +Current implementation always returns without errors. + + +### mgp_value_is_path {#function-mgp-value-is-path} + +```cpp +enum mgp_error mgp_value_is_path( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a path. + +Current implementation always returns without errors. + + +### mgp_value_is_date {#function-mgp-value-is-date} + +```cpp +enum mgp_error mgp_value_is_date( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a date. + +Current implementation always returns without errors. + + +### mgp_value_is_local_time {#function-mgp-value-is-local-time} + +```cpp +enum mgp_error mgp_value_is_local_time( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a local time. + +Current implementation always returns without errors. + + +### mgp_value_is_local_date_time {#function-mgp-value-is-local-date-time} + +```cpp +enum mgp_error mgp_value_is_local_date_time( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a local date-time. + +Current implementation always returns without errors. + + +### mgp_value_is_duration {#function-mgp-value-is-duration} + +```cpp +enum mgp_error mgp_value_is_duration( + struct mgp_value * val, + int * result +) +``` + +Result is non-zero if the given mgp_value stores a duration. + +Current implementation always returns without errors. + + +### mgp_value_get_bool {#function-mgp-value-get-bool} + +```cpp +enum mgp_error mgp_value_get_bool( + struct mgp_value * val, + int * result +) +``` + +Get the contained boolean value. + +Non-zero values represent `true`, while zero represents `false`. Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_int {#function-mgp-value-get-int} + +```cpp +enum mgp_error mgp_value_get_int( + struct mgp_value * val, + int64_t * result +) +``` + +Get the contained integer. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_double {#function-mgp-value-get-double} + +```cpp +enum mgp_error mgp_value_get_double( + struct mgp_value * val, + double * result +) +``` + +Get the contained double floating-point. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_string {#function-mgp-value-get-string} + +```cpp +enum mgp_error mgp_value_get_string( + struct mgp_value * val, + const char ** result +) +``` + +Get the contained character string. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_list {#function-mgp-value-get-list} + +```cpp +enum mgp_error mgp_value_get_list( + struct mgp_value * val, + struct mgp_list ** result +) +``` + +Get the contained list of values. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_map {#function-mgp-value-get-map} + +```cpp +enum mgp_error mgp_value_get_map( + struct mgp_value * val, + struct mgp_map ** result +) +``` + +Get the contained map of values. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_vertex {#function-mgp-value-get-vertex} + +```cpp +enum mgp_error mgp_value_get_vertex( + struct mgp_value * val, + struct mgp_vertex ** result +) +``` + +Get the contained vertex. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_edge {#function-mgp-value-get-edge} + +```cpp +enum mgp_error mgp_value_get_edge( + struct mgp_value * val, + struct mgp_edge ** result +) +``` + +Get the contained edge. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_path {#function-mgp-value-get-path} + +```cpp +enum mgp_error mgp_value_get_path( + struct mgp_value * val, + struct mgp_path ** result +) +``` + +Get the contained path. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_date {#function-mgp-value-get-date} + +```cpp +enum mgp_error mgp_value_get_date( + struct mgp_value * val, + struct mgp_date ** result +) +``` + +Get the contained date. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_local_time {#function-mgp-value-get-local-time} + +```cpp +enum mgp_error mgp_value_get_local_time( + struct mgp_value * val, + struct mgp_local_time ** result +) +``` + +Get the contained local time. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_local_date_time {#function-mgp-value-get-local-date-time} + +```cpp +enum mgp_error mgp_value_get_local_date_time( + struct mgp_value * val, + struct mgp_local_date_time ** result +) +``` + +Get the contained local date-time. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_value_get_duration {#function-mgp-value-get-duration} + +```cpp +enum mgp_error mgp_value_get_duration( + struct mgp_value * val, + struct mgp_duration ** result +) +``` + +Get the contained duration. + +Result is undefined if mgp_value does not contain the expected type. Current implementation always returns without errors. + + +### mgp_list_make_empty {#function-mgp-list-make-empty} + +```cpp +enum mgp_error mgp_list_make_empty( + size_t capacity, + struct mgp_memory * memory, + struct mgp_list ** result +) +``` + +Create an empty list with given capacity. + +You need to free the created instance with mgp_list_destroy. The created list will have allocated enough memory for `capacity` elements of mgp_value, but it will not contain any elements. Therefore, mgp_list_size will return 0. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_list. + + +### mgp_list_destroy {#function-mgp-list-destroy} + +```cpp +void mgp_list_destroy( + struct mgp_list * list +) +``` + +Free the memory used by the given mgp_list and contained elements. + +### mgp_list_append {#function-mgp-list-append} + +```cpp +enum mgp_error mgp_list_append( + struct mgp_list * list, + struct mgp_value * val +) +``` + +Append a copy of mgp_value to mgp_list if capacity allows. + +The list copies the given value and therefore does not take ownership of the original value. You still need to call mgp_value_destroy to free the original value. Return MGP_ERROR_INSUFFICIENT_BUFFER if there's no capacity. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_value. + + +### mgp_list_append_extend {#function-mgp-list-append-extend} + +```cpp +enum mgp_error mgp_list_append_extend( + struct mgp_list * list, + struct mgp_value * val +) +``` + +Append a copy of mgp_value to mgp_list increasing capacity if needed. + +The list copies the given value and therefore does not take ownership of the original value. You still need to call mgp_value_destroy to free the original value. In case of a capacity change, the previously contained elements will move in memory and any references to them will be invalid. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_value. + + +### mgp_list_size {#function-mgp-list-size} + +```cpp +enum mgp_error mgp_list_size( + struct mgp_list * list, + size_t * result +) +``` + +Get the number of elements stored in mgp_list. + +Current implementation always returns without errors. + + +### mgp_list_capacity {#function-mgp-list-capacity} + +```cpp +enum mgp_error mgp_list_capacity( + struct mgp_list * list, + size_t * result +) +``` + +Get the total number of elements for which there's already allocated memory in mgp_list. + +Current implementation always returns without errors. + + +### mgp_list_at {#function-mgp-list-at} + +```cpp +enum mgp_error mgp_list_at( + struct mgp_list * list, + size_t index, + struct mgp_value ** result +) +``` + +Get the element in mgp_list at given position. + +MGP_ERROR_OUT_OF_RANGE is returned if the index is not within mgp_list_size. + + +### mgp_map_make_empty {#function-mgp-map-make-empty} + +```cpp +enum mgp_error mgp_map_make_empty( + struct mgp_memory * memory, + struct mgp_map ** result +) +``` + +Create an empty map of character strings to mgp_value instances. + +You need to free the created instance with mgp_map_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_map. + + +### mgp_map_destroy {#function-mgp-map-destroy} + +```cpp +void mgp_map_destroy( + struct mgp_map * map +) +``` + +Free the memory used by the given mgp_map and contained items. + +### mgp_map_insert {#function-mgp-map-insert} + +```cpp +enum mgp_error mgp_map_insert( + struct mgp_map * map, + const char * key, + struct mgp_value * value +) +``` + +Insert a new mapping from a NULL terminated character string to a value. + +If a mapping with the same key already exists, it is _not_ replaced. In case of insertion, both the string and the value are copied into the map. Therefore, the map does not take ownership of the original key nor value, so you still need to free their memory explicitly. Return MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate for insertion. Return MGP_ERROR_KEY_ALREADY_EXISTS if a previous mapping already exists. + + +### mgp_map_size {#function-mgp-map-size} + +```cpp +enum mgp_error mgp_map_size( + struct mgp_map * map, + size_t * result +) +``` + +Get the number of items stored in mgp_map. + +Current implementation always returns without errors. + + +### mgp_map_at {#function-mgp-map-at} + +```cpp +enum mgp_error mgp_map_at( + struct mgp_map * map, + const char * key, + struct mgp_value ** result +) +``` + +Get the mapped mgp_value to the given character string. + +Result is NULL if no mapping exists. + + +### mgp_map_item_key {#function-mgp-map-item-key} + +```cpp +enum mgp_error mgp_map_item_key( + struct mgp_map_item * item, + const char ** result +) +``` + +Get the key of the mapped item. + +### mgp_map_item_value {#function-mgp-map-item-value} + +```cpp +enum mgp_error mgp_map_item_value( + struct mgp_map_item * item, + struct mgp_value ** result +) +``` + +Get the value of the mapped item. + +### mgp_map_iter_items {#function-mgp-map-iter-items} + +```cpp +enum mgp_error mgp_map_iter_items( + struct mgp_map * map, + struct mgp_memory * memory, + struct mgp_map_items_iterator ** result +) +``` + +Start iterating over items contained in the given map. + +The resulting mgp_map_items_iterator needs to be deallocated with mgp_map_items_iterator_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_map_items_iterator. + + +### mgp_map_items_iterator_destroy {#function-mgp-map-items-iterator-destroy} + +```cpp +void mgp_map_items_iterator_destroy( + struct mgp_map_items_iterator * it +) +``` + +Deallocate memory used by mgp_map_items_iterator. + +### mgp_map_items_iterator_get {#function-mgp-map-items-iterator-get} + +```cpp +enum mgp_error mgp_map_items_iterator_get( + struct mgp_map_items_iterator * it, + struct mgp_map_item ** result +) +``` + +Get the current item pointed to by the iterator. + +When the mgp_map_items_iterator_next is invoked, the returned pointer to mgp_map_item becomes invalid. On the other hand, pointers obtained with mgp_map_item_key and mgp_map_item_value remain valid throughout the lifetime of a map. Therefore, you can store the key as well as the value before, and use them after invoking mgp_map_items_iterator_next. Result is NULL if the end of the iteration has been reached. + + +### mgp_map_items_iterator_next {#function-mgp-map-items-iterator-next} + +```cpp +enum mgp_error mgp_map_items_iterator_next( + struct mgp_map_items_iterator * it, + struct mgp_map_item ** result +) +``` + +Advance the iterator to the next item stored in map and return it. + +The previous pointer obtained through mgp_map_items_iterator_get will be invalidated, but the pointers to key and value will remain valid. Result is NULL if the end of the iteration has been reached. + + +### mgp_path_make_with_start {#function-mgp-path-make-with-start} + +```cpp +enum mgp_error mgp_path_make_with_start( + struct mgp_vertex * vertex, + struct mgp_memory * memory, + struct mgp_path ** result +) +``` + +Create a path with the copy of the given starting vertex. + +You need to free the created instance with mgp_path_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_path. + + +### mgp_path_copy {#function-mgp-path-copy} + +```cpp +enum mgp_error mgp_path_copy( + struct mgp_path * path, + struct mgp_memory * memory, + struct mgp_path ** result +) +``` + +Copy a mgp_path. + +Returned pointer must be freed with mgp_path_destroy. MGP_ERROR_UNABLE_TO_ALLOCATE is returned if unable to allocate a mgp_path. + + +### mgp_path_destroy {#function-mgp-path-destroy} + +```cpp +void mgp_path_destroy( + struct mgp_path * path +) +``` + +Free the memory used by the given mgp_path and contained vertices and edges. + +### mgp_path_expand {#function-mgp-path-expand} + +```cpp +enum mgp_error mgp_path_expand( + struct mgp_path * path, + struct mgp_edge * edge +) +``` + +Append an edge continuing from the last vertex on the path. + +The edge is copied into the path. Therefore, the path does not take ownership of the original edge, so you still need to free the edge memory explicitly. The last vertex on the path will become the other endpoint of the given edge, as continued from the current last vertex. Return MGP_ERROR_LOGIC_ERROR if the current last vertex in the path is not part of the given edge. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for path extension. + + +### mgp_path_size {#function-mgp-path-size} + +```cpp +enum mgp_error mgp_path_size( + struct mgp_path * path, + size_t * result +) +``` + +Get the number of edges in a mgp_path. + +Current implementation always returns without errors. + + +### mgp_path_vertex_at {#function-mgp-path-vertex-at} + +```cpp +enum mgp_error mgp_path_vertex_at( + struct mgp_path * path, + size_t index, + struct mgp_vertex ** result +) +``` + +Get the vertex from a path at given index. + +The valid index range is [0, mgp_path_size]. MGP_ERROR_OUT_OF_RANGE is returned if index is out of range. + + +### mgp_path_edge_at {#function-mgp-path-edge-at} + +```cpp +enum mgp_error mgp_path_edge_at( + struct mgp_path * path, + size_t index, + struct mgp_edge ** result +) +``` + +Get the edge from a path at given index. + +The valid index range is [0, mgp_path_size - 1]. MGP_ERROR_OUT_OF_RANGE is returned if index is out of range. + + +### mgp_path_equal {#function-mgp-path-equal} + +```cpp +enum mgp_error mgp_path_equal( + struct mgp_path * p1, + struct mgp_path * p2, + int * result +) +``` + +Result is non-zero if given paths are equal, otherwise 0. + +### mgp_result_set_error_msg {#function-mgp-result-set-error-msg} + +```cpp +enum mgp_error mgp_result_set_error_msg( + struct mgp_result * res, + const char * error_msg +) +``` + +Set the error as the result of the procedure. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE ff there's no memory for copying the error message. + + +### mgp_result_new_record {#function-mgp-result-new-record} + +```cpp +enum mgp_error mgp_result_new_record( + struct mgp_result * res, + struct mgp_result_record ** result +) +``` + +Create a new record for results. + +The previously obtained mgp_result_record pointer is no longer valid, and you must not use it. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_result_record. + + +### mgp_result_record_insert {#function-mgp-result-record-insert} + +```cpp +enum mgp_error mgp_result_record_insert( + struct mgp_result_record * record, + const char * field_name, + struct mgp_value * val +) +``` + +Assign a value to a field in the given record. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory to copy the mgp_value to mgp_result_record. Return MGP_ERROR_OUT_OF_RANGE if there is no field named `field_name`. Return MGP_ERROR_LOGIC_ERROR `val` does not satisfy the type of the field name `field_name`. + + +### mgp_properties_iterator_destroy {#function-mgp-properties-iterator-destroy} + +```cpp +void mgp_properties_iterator_destroy( + struct mgp_properties_iterator * it +) +``` + +Free the memory used by a mgp_properties_iterator. + +### mgp_properties_iterator_get {#function-mgp-properties-iterator-get} + +```cpp +enum mgp_error mgp_properties_iterator_get( + struct mgp_properties_iterator * it, + struct mgp_property ** result +) +``` + +Get the current property pointed to by the iterator. + +When the mgp_properties_iterator_next is invoked, the previous [mgp_property](#mgp_property) is invalidated and its value must not be used. Result is NULL if the end of the iteration has been reached. + + +### mgp_properties_iterator_next {#function-mgp-properties-iterator-next} + +```cpp +enum mgp_error mgp_properties_iterator_next( + struct mgp_properties_iterator * it, + struct mgp_property ** result +) +``` + +Advance the iterator to the next property and return it. + +The previous [mgp_property](#mgp_property) obtained through mgp_properties_iterator_get will be invalidated, and you must not use its value. Result is NULL if the end of the iteration has been reached. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a [mgp_property](#mgp_property). + + +### mgp_edges_iterator_destroy {#function-mgp-edges-iterator-destroy} + +```cpp +void mgp_edges_iterator_destroy( + struct mgp_edges_iterator * it +) +``` + +Free the memory used by a mgp_edges_iterator. + +### mgp_vertex_get_id {#function-mgp-vertex-get-id} + +```cpp +enum mgp_error mgp_vertex_get_id( + struct mgp_vertex * v, + struct mgp_vertex_id * result +) +``` + +Get the ID of given vertex. + +### mgp_vertex_underlying_graph_is_mutable {#function-mgp-vertex-underlying-graph-is-mutable} + +```cpp +enum mgp_error mgp_vertex_underlying_graph_is_mutable( + struct mgp_vertex * v, + int * result +) +``` + +Result is non-zero if the vertex can be modified. + +The mutability of the vertex is the same as the graph which it is part of. If a vertex is immutable, then edges cannot be created or deleted, properties and labels cannot be set or removed and all of the returned edges will be immutable also. Current implementation always returns without errors. + + +### mgp_vertex_set_property {#function-mgp-vertex-set-property} + +```cpp +enum mgp_error mgp_vertex_set_property( + struct mgp_vertex * v, + const char * property_name, + struct mgp_value * property_value +) +``` + +Set the value of a property on a vertex. + +When the value is `null`, then the property is removed from the vertex. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for storing the property. Return MGP_ERROR_IMMUTABLE_OBJECT if `v` is immutable. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. Return MGP_ERROR_SERIALIZATION_ERROR if `v` has been modified by another transaction. Return MGP_ERROR_VALUE_CONVERSION if `property_value` is vertex, edge or path. + + +### mgp_vertex_add_label {#function-mgp-vertex-add-label} + +```cpp +enum mgp_error mgp_vertex_add_label( + struct mgp_vertex * v, + struct mgp_label label +) +``` + +Add the label to the vertex. + +If the vertex already has the label, this function does nothing. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for storing the label. Return MGP_ERROR_IMMUTABLE_OBJECT if `v` is immutable. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. Return MGP_ERROR_SERIALIZATION_ERROR if `v` has been modified by another transaction. + + +### mgp_vertex_remove_label {#function-mgp-vertex-remove-label} + +```cpp +enum mgp_error mgp_vertex_remove_label( + struct mgp_vertex * v, + struct mgp_label label +) +``` + +Remove the label from the vertex. + +If the vertex doesn't have the label, this function does nothing. Return MGP_ERROR_IMMUTABLE_OBJECT if `v` is immutable. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. Return MGP_ERROR_SERIALIZATION_ERROR if `v` has been modified by another transaction. + + +### mgp_vertex_copy {#function-mgp-vertex-copy} + +```cpp +enum mgp_error mgp_vertex_copy( + struct mgp_vertex * v, + struct mgp_memory * memory, + struct mgp_vertex ** result +) +``` + +Copy a mgp_vertex. + +Resulting pointer must be freed with mgp_vertex_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_vertex. + + +### mgp_vertex_destroy {#function-mgp-vertex-destroy} + +```cpp +void mgp_vertex_destroy( + struct mgp_vertex * v +) +``` + +Free the memory used by a mgp_vertex. + +### mgp_vertex_equal {#function-mgp-vertex-equal} + +```cpp +enum mgp_error mgp_vertex_equal( + struct mgp_vertex * v1, + struct mgp_vertex * v2, + int * result +) +``` + +Result is non-zero if given vertices are equal, otherwise 0. + +### mgp_vertex_labels_count {#function-mgp-vertex-labels-count} + +```cpp +enum mgp_error mgp_vertex_labels_count( + struct mgp_vertex * v, + size_t * result +) +``` + +Get the number of labels a given vertex has. + +Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_label_at {#function-mgp-vertex-label-at} + +```cpp +enum mgp_error mgp_vertex_label_at( + struct mgp_vertex * v, + size_t index, + struct mgp_label * result +) +``` + +Get [mgp_label](#mgp_label) in mgp_vertex at given index. + +Return MGP_ERROR_OUT_OF_RANGE if the index is out of range. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_has_label {#function-mgp-vertex-has-label} + +```cpp +enum mgp_error mgp_vertex_has_label( + struct mgp_vertex * v, + struct mgp_label label, + int * result +) +``` + +Result is non-zero if the given vertex has the given label. + +Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_has_label_named {#function-mgp-vertex-has-label-named} + +```cpp +enum mgp_error mgp_vertex_has_label_named( + struct mgp_vertex * v, + const char * label_name, + int * result +) +``` + +Result is non-zero if the given vertex has a label with given name. + +Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_get_property {#function-mgp-vertex-get-property} + +```cpp +enum mgp_error mgp_vertex_get_property( + struct mgp_vertex * v, + const char * property_name, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Get a copy of a vertex property mapped to a given name. + +Resulting value must be freed with mgp_value_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_value. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_iter_properties {#function-mgp-vertex-iter-properties} + +```cpp +enum mgp_error mgp_vertex_iter_properties( + struct mgp_vertex * v, + struct mgp_memory * memory, + struct mgp_properties_iterator ** result +) +``` + +Start iterating over properties stored in the given vertex. + +The properties of the vertex are copied when the iterator is created, therefore later changes won't affect them. The resulting mgp_properties_iterator needs to be deallocated with mgp_properties_iterator_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_properties_iterator. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_iter_in_edges {#function-mgp-vertex-iter-in-edges} + +```cpp +enum mgp_error mgp_vertex_iter_in_edges( + struct mgp_vertex * v, + struct mgp_memory * memory, + struct mgp_edges_iterator ** result +) +``` + +Start iterating over inbound edges of the given vertex. When the first parameter to a procedure is a projected graph, iterating will start over the inbound edges of the given vertex in the projected graph. + +The connection information of the vertex is copied when the iterator is created, therefore later creation or deletion of edges won't affect the iterated edges, however the property changes on the edges will be visible. The resulting mgp_edges_iterator needs to be deallocated with mgp_edges_iterator_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_edges_iterator. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_vertex_iter_out_edges {#function-mgp-vertex-iter-out-edges} + +```cpp +enum mgp_error mgp_vertex_iter_out_edges( + struct mgp_vertex * v, + struct mgp_memory * memory, + struct mgp_edges_iterator ** result +) +``` + +Start iterating over outbound edges of the given vertex. When the first parameter to a procedure is a projected graph, iterating will start over the inbound edges of the given vertex in the projected graph. + +The connection information of the vertex is copied when the iterator is created, therefore later creation or deletion of edges won't affect the iterated edges, however the property changes on the edges will be visible. The resulting mgp_edges_iterator needs to be deallocated with mgp_edges_iterator_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_edges_iterator. Return MGP_ERROR_DELETED_OBJECT if `v` has been deleted. + + +### mgp_edges_iterator_underlying_graph_is_mutable {#function-mgp-edges-iterator-underlying-graph-is-mutable} + +```cpp +enum mgp_error mgp_edges_iterator_underlying_graph_is_mutable( + struct mgp_edges_iterator * it, + int * result +) +``` + +Result is non-zero if the edges returned by this iterator can be modified. + +The mutability of the mgp_edges_iterator is the same as the graph which it belongs to. Current implementation always returns without errors. + + +### mgp_edges_iterator_get {#function-mgp-edges-iterator-get} + +```cpp +enum mgp_error mgp_edges_iterator_get( + struct mgp_edges_iterator * it, + struct mgp_edge ** result +) +``` + +Get the current edge pointed to by the iterator. + +When the mgp_edges_iterator_next is invoked, the previous mgp_edge is invalidated and its value must not be used. Result is NULL if the end of the iteration has been reached. + + +### mgp_edges_iterator_next {#function-mgp-edges-iterator-next} + +```cpp +enum mgp_error mgp_edges_iterator_next( + struct mgp_edges_iterator * it, + struct mgp_edge ** result +) +``` + +Advance the iterator to the next edge and return it. + +The previous mgp_edge obtained through mgp_edges_iterator_get will be invalidated, and you must not use its value. Result is NULL if the end of the iteration has been reached. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_edge. + + +### mgp_edge_get_id {#function-mgp-edge-get-id} + +```cpp +enum mgp_error mgp_edge_get_id( + struct mgp_edge * e, + struct mgp_edge_id * result +) +``` + +Get the ID of given edge. + +### mgp_edge_underlying_graph_is_mutable {#function-mgp-edge-underlying-graph-is-mutable} + +```cpp +enum mgp_error mgp_edge_underlying_graph_is_mutable( + struct mgp_edge * e, + int * result +) +``` + +Result is non-zero if the edge can be modified. + +The mutability of the edge is the same as the graph which it is part of. If an edge is immutable, properties cannot be set or removed and all of the returned vertices will be immutable also. Current implementation always returns without errors. + + +### mgp_edge_copy {#function-mgp-edge-copy} + +```cpp +enum mgp_error mgp_edge_copy( + struct mgp_edge * e, + struct mgp_memory * memory, + struct mgp_edge ** result +) +``` + +Copy a mgp_edge. + +Resulting pointer must be freed with mgp_edge_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_edge. + + +### mgp_edge_destroy {#function-mgp-edge-destroy} + +```cpp +void mgp_edge_destroy( + struct mgp_edge * e +) +``` + +Free the memory used by a mgp_edge. + +### mgp_edge_equal {#function-mgp-edge-equal} + +```cpp +enum mgp_error mgp_edge_equal( + struct mgp_edge * e1, + struct mgp_edge * e2, + int * result +) +``` + +Result is non-zero if given edges are equal, otherwise 0. + +### mgp_edge_get_type {#function-mgp-edge-get-type} + +```cpp +enum mgp_error mgp_edge_get_type( + struct mgp_edge * e, + struct mgp_edge_type * result +) +``` + +Get the type of the given edge. + +### mgp_edge_get_from {#function-mgp-edge-get-from} + +```cpp +enum mgp_error mgp_edge_get_from( + struct mgp_edge * e, + struct mgp_vertex ** result +) +``` + +Get the source vertex of the given edge. + +Resulting vertex is valid until the edge is valid and it must not be used afterwards. Current implementation always returns without errors. + + +### mgp_edge_get_to {#function-mgp-edge-get-to} + +```cpp +enum mgp_error mgp_edge_get_to( + struct mgp_edge * e, + struct mgp_vertex ** result +) +``` + +Get the destination vertex of the given edge. + +Resulting vertex is valid until the edge is valid and it must not be used afterwards. Current implementation always returns without errors. + + +### mgp_edge_get_property {#function-mgp-edge-get-property} + +```cpp +enum mgp_error mgp_edge_get_property( + struct mgp_edge * e, + const char * property_name, + struct mgp_memory * memory, + struct mgp_value ** result +) +``` + +Get a copy of a edge property mapped to a given name. + +Resulting value must be freed with mgp_value_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_value. Return MGP_ERROR_DELETED_OBJECT if `e` has been deleted. + + +### mgp_edge_set_property {#function-mgp-edge-set-property} + +```cpp +enum mgp_error mgp_edge_set_property( + struct mgp_edge * e, + const char * property_name, + struct mgp_value * property_value +) +``` + +Set the value of a property on an edge. + +When the value is `null`, then the property is removed from the edge. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for storing the property. Return MGP_ERROR_IMMUTABLE_OBJECT if `e` is immutable. Return MGP_ERROR_DELETED_OBJECT if `e` has been deleted. Return MGP_ERROR_LOGIC_ERROR if properties on edges are disabled. Return MGP_ERROR_SERIALIZATION_ERROR if `e` has been modified by another transaction. Return MGP_ERROR_VALUE_CONVERSION if `property_value` is vertex, edge or path. + + +### mgp_edge_iter_properties {#function-mgp-edge-iter-properties} + +```cpp +enum mgp_error mgp_edge_iter_properties( + struct mgp_edge * e, + struct mgp_memory * memory, + struct mgp_properties_iterator ** result +) +``` + +Start iterating over properties stored in the given edge. + +The properties of the edge are copied when the iterator is created, therefore later changes won't affect them. Resulting mgp_properties_iterator needs to be deallocated with mgp_properties_iterator_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_properties_iterator. Return MGP_ERROR_DELETED_OBJECT if `e` has been deleted. + + +### mgp_graph_get_vertex_by_id {#function-mgp-graph-get-vertex-by-id} + +```cpp +enum mgp_error mgp_graph_get_vertex_by_id( + struct mgp_graph * g, + struct mgp_vertex_id id, + struct mgp_memory * memory, + struct mgp_vertex ** result +) +``` + +Get the vertex corresponding to given ID, or NULL if no such vertex exists. When the first parameter to a procedure is a projected graph, the vertex must also exist in the projected graph. + +Resulting vertex must be freed using mgp_vertex_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the vertex. + + +### mgp_graph_is_mutable {#function-mgp-graph-is-mutable} + +```cpp +enum mgp_error mgp_graph_is_mutable( + struct mgp_graph * graph, + int * result +) +``` + +Result is non-zero if the graph can be modified. + +If a graph is immutable, then vertices cannot be created or deleted, and all of the returned vertices will be immutable also. The same applies for edges. Current implementation always returns without errors. + + +### mgp_graph_create_vertex {#function-mgp-graph-create-vertex} + +```cpp +enum mgp_error mgp_graph_create_vertex( + struct mgp_graph * graph, + struct mgp_memory * memory, + struct mgp_vertex ** result +) +``` + +Add a new vertex to the graph. When the first parameter to a procedure is a projected graph, the vertex is also added to the projected graph view. + +Resulting vertex must be freed using mgp_vertex_destroy. Return MGP_ERROR_IMMUTABLE_OBJECT if `graph` is immutable. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_vertex. + + +### mgp_graph_delete_vertex {#function-mgp-graph-delete-vertex} + +```cpp +enum mgp_error mgp_graph_delete_vertex( + struct mgp_graph * graph, + struct mgp_vertex * vertex +) +``` + +Delete a vertex from the graph. When the first parameter to a procedure is a projected graph, the vertex must also exist in the projected graph. + +Return MGP_ERROR_IMMUTABLE_OBJECT if `graph` is immutable. Return MGP_ERROR_LOGIC_ERROR if `vertex` has edges. Return MGP_ERROR_SERIALIZATION_ERROR if `vertex` has been modified by another transaction. + + +### mgp_graph_detach_delete_vertex {#function-mgp-graph-detach-delete-vertex} + +```cpp +enum mgp_error mgp_graph_detach_delete_vertex( + struct mgp_graph * graph, + struct mgp_vertex * vertex +) +``` + +Delete a vertex and all of its edges from the graph. When the first parameter to a procedure is a projected graph, such an operation is not possible. + +Return MGP_ERROR_IMMUTABLE_OBJECT if `graph` is immutable. Return MGP_ERROR_SERIALIZATION_ERROR if `vertex` has been modified by another transaction. + + +### mgp_graph_create_edge {#function-mgp-graph-create-edge} + +```cpp +enum mgp_error mgp_graph_create_edge( + struct mgp_graph * graph, + struct mgp_vertex * from, + struct mgp_vertex * to, + struct mgp_edge_type type, + struct mgp_memory * memory, + struct mgp_edge ** result +) +``` + +Add a new directed edge between the two vertices with the specified label. When the first parameter is a projected graph, it will create a new directed edge with the specified label only if both vertices are a part of the projected graph. + +Resulting edge must be freed using mgp_edge_destroy. Return MGP_ERROR_IMMUTABLE_OBJECT if `graph` is immutable. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_edge. Return MGP_ERROR_DELETED_OBJECT if `from` or `to` has been deleted. Return MGP_ERROR_SERIALIZATION_ERROR if `from` or `to` has been modified by another transaction. + + +### mgp_graph_delete_edge {#function-mgp-graph-delete-edge} + +```cpp +enum mgp_error mgp_graph_delete_edge( + struct mgp_graph * graph, + struct mgp_edge * edge +) +``` + +Delete an edge from the graph. When the first parameter to a procedure is a projected graph, the edge must also exist in the projected graph. + +Return MGP_ERROR_IMMUTABLE_OBJECT if `graph` is immutable. Return MGP_ERROR_SERIALIZATION_ERROR if `edge`, its source or destination vertex has been modified by another transaction. + + +### mgp_vertices_iterator_destroy {#function-mgp-vertices-iterator-destroy} + +```cpp +void mgp_vertices_iterator_destroy( + struct mgp_vertices_iterator * it +) +``` + +Free the memory used by a mgp_vertices_iterator. + +### mgp_graph_iter_vertices {#function-mgp-graph-iter-vertices} + +```cpp +enum mgp_error mgp_graph_iter_vertices( + struct mgp_graph * g, + struct mgp_memory * memory, + struct mgp_vertices_iterator ** result +) +``` + +Start iterating over vertices of the given graph. + +Resulting mgp_vertices_iterator needs to be deallocated with mgp_vertices_iterator_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_vertices_iterator. + + +### mgp_vertices_iterator_underlying_graph_is_mutable {#function-mgp-vertices-iterator-underlying-graph-is-mutable} + +```cpp +enum mgp_error mgp_vertices_iterator_underlying_graph_is_mutable( + struct mgp_vertices_iterator * it, + int * result +) +``` + +Result is non-zero if the vertices returned by this iterator can be modified. + +The mutability of the mgp_vertices_iterator is the same as the graph which it belongs to. Current implementation always returns without errors. + + +### mgp_vertices_iterator_get {#function-mgp-vertices-iterator-get} + +```cpp +enum mgp_error mgp_vertices_iterator_get( + struct mgp_vertices_iterator * it, + struct mgp_vertex ** result +) +``` + +Get the current vertex pointed to by the iterator. + +When the mgp_vertices_iterator_next is invoked, the previous mgp_vertex is invalidated and its value must not be used. Result is NULL if the end of the iteration has been reached. + + +### mgp_date_from_string {#function-mgp-date-from-string} + +```cpp +enum mgp_error mgp_date_from_string( + const char * string, + struct mgp_memory * memory, + struct mgp_date ** date +) +``` + +Create a date from a string following the ISO 8601 format. + +Resulting date must be freed with mgp_date_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the string cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_from_parameters {#function-mgp-date-from-parameters} + +```cpp +enum mgp_error mgp_date_from_parameters( + struct mgp_date_parameters * parameters, + struct mgp_memory * memory, + struct mgp_date ** date +) +``` + +Create a date from mgp_date_parameter. + +Resulting date must be freed with mgp_date_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the parameters cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_copy {#function-mgp-date-copy} + +```cpp +enum mgp_error mgp_date_copy( + struct mgp_date * date, + struct mgp_memory * memory, + struct mgp_date ** result +) +``` + +Copy a mgp_date. + +Resulting pointer must be freed with mgp_date_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_destroy {#function-mgp-date-destroy} + +```cpp +void mgp_date_destroy( + struct mgp_date * date +) +``` + +Free the memory used by a mgp_date. + +### mgp_date_equal {#function-mgp-date-equal} + +```cpp +enum mgp_error mgp_date_equal( + struct mgp_date * first, + struct mgp_date * second, + int * result +) +``` + +Result is non-zero if given dates are equal, otherwise 0. + +### mgp_date_get_year {#function-mgp-date-get-year} + +```cpp +enum mgp_error mgp_date_get_year( + struct mgp_date * date, + int * year +) +``` + +Get the year property of the date. + +### mgp_date_get_month {#function-mgp-date-get-month} + +```cpp +enum mgp_error mgp_date_get_month( + struct mgp_date * date, + int * month +) +``` + +Get the month property of the date. + +### mgp_date_get_day {#function-mgp-date-get-day} + +```cpp +enum mgp_error mgp_date_get_day( + struct mgp_date * date, + int * day +) +``` + +Get the day property of the date. + +### mgp_date_timestamp {#function-mgp-date-timestamp} + +```cpp +enum mgp_error mgp_date_timestamp( + struct mgp_date * date, + int64_t * timestamp +) +``` + +Get the date as microseconds from Unix epoch. + +### mgp_date_now {#function-mgp-date-now} + +```cpp +enum mgp_error mgp_date_now( + struct mgp_memory * memory, + struct mgp_date ** date +) +``` + +Get the date representing current date. + +Resulting date must be freed with mgp_date_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_add_duration {#function-mgp-date-add-duration} + +```cpp +enum mgp_error mgp_date_add_duration( + struct mgp_date * date, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_date ** result +) +``` + +Add a duration to the date. + +Resulting date must be freed with mgp_date_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid date. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_sub_duration {#function-mgp-date-sub-duration} + +```cpp +enum mgp_error mgp_date_sub_duration( + struct mgp_date * date, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_date ** result +) +``` + +Subtract a duration from the date. + +Resulting date must be freed with mgp_date_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid date. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_date_diff {#function-mgp-date-diff} + +```cpp +enum mgp_error mgp_date_diff( + struct mgp_date * first, + struct mgp_date * second, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Get a duration between two dates. + +Resulting duration must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_from_string {#function-mgp-local-time-from-string} + +```cpp +enum mgp_error mgp_local_time_from_string( + const char * string, + struct mgp_memory * memory, + struct mgp_local_time ** local_time +) +``` + +Create a local time from a string following the ISO 8601 format. + +Resulting local time must be freed with mgp_local_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the string cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_from_parameters {#function-mgp-local-time-from-parameters} + +```cpp +enum mgp_error mgp_local_time_from_parameters( + struct mgp_local_time_parameters * parameters, + struct mgp_memory * memory, + struct mgp_local_time ** local_time +) +``` + +Create a local time from [mgp_local_time_parameters](#mgp_local_time_parameters). + +Resulting local time must be freed with mgp_local_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the parameters cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_copy {#function-mgp-local-time-copy} + +```cpp +enum mgp_error mgp_local_time_copy( + struct mgp_local_time * local_time, + struct mgp_memory * memory, + struct mgp_local_time ** result +) +``` + +Copy a mgp_local_time. + +Resulting pointer must be freed with mgp_local_time_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_destroy {#function-mgp-local-time-destroy} + +```cpp +void mgp_local_time_destroy( + struct mgp_local_time * local_time +) +``` + +Free the memory used by a mgp_local_time. + +### mgp_local_time_equal {#function-mgp-local-time-equal} + +```cpp +enum mgp_error mgp_local_time_equal( + struct mgp_local_time * first, + struct mgp_local_time * second, + int * result +) +``` + +Result is non-zero if given local times are equal, otherwise 0. + +### mgp_local_time_get_hour {#function-mgp-local-time-get-hour} + +```cpp +enum mgp_error mgp_local_time_get_hour( + struct mgp_local_time * local_time, + int * hour +) +``` + +Get the hour property of the local time. + +### mgp_local_time_get_minute {#function-mgp-local-time-get-minute} + +```cpp +enum mgp_error mgp_local_time_get_minute( + struct mgp_local_time * local_time, + int * minute +) +``` + +Get the minute property of the local time. + +### mgp_local_time_get_second {#function-mgp-local-time-get-second} + +```cpp +enum mgp_error mgp_local_time_get_second( + struct mgp_local_time * local_time, + int * second +) +``` + +Get the second property of the local time. + +### mgp_local_time_get_millisecond {#function-mgp-local-time-get-millisecond} + +```cpp +enum mgp_error mgp_local_time_get_millisecond( + struct mgp_local_time * local_time, + int * millisecond +) +``` + +Get the millisecond property of the local time. + +### mgp_local_time_get_microsecond {#function-mgp-local-time-get-microsecond} + +```cpp +enum mgp_error mgp_local_time_get_microsecond( + struct mgp_local_time * local_time, + int * microsecond +) +``` + +Get the microsecond property of the local time. + +### mgp_local_time_timestamp {#function-mgp-local-time-timestamp} + +```cpp +enum mgp_error mgp_local_time_timestamp( + struct mgp_local_time * local_time, + int64_t * timestamp +) +``` + +Get the local time as microseconds from midnight. + +### mgp_local_time_now {#function-mgp-local-time-now} + +```cpp +enum mgp_error mgp_local_time_now( + struct mgp_memory * memory, + struct mgp_local_time ** local_time +) +``` + +Get the local time representing current time. + +Resulting pointer must be freed with mgp_local_time_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_add_duration {#function-mgp-local-time-add-duration} + +```cpp +enum mgp_error mgp_local_time_add_duration( + struct mgp_local_time * local_time, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_local_time ** result +) +``` + +Add a duration to the local time. + +Resulting pointer must be freed with mgp_local_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid local time. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_sub_duration {#function-mgp-local-time-sub-duration} + +```cpp +enum mgp_error mgp_local_time_sub_duration( + struct mgp_local_time * local_time, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_local_time ** result +) +``` + +Subtract a duration from the local time. + +Resulting pointer must be freed with mgp_local_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid local time. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_time_diff {#function-mgp-local-time-diff} + +```cpp +enum mgp_error mgp_local_time_diff( + struct mgp_local_time * first, + struct mgp_local_time * second, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Get a duration between two local times. + +Resulting pointer must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_date. + + +### mgp_local_date_time_from_string {#function-mgp-local-date-time-from-string} + +```cpp +enum mgp_error mgp_local_date_time_from_string( + const char * string, + struct mgp_memory * memory, + struct mgp_local_date_time ** local_date_time +) +``` + +Create a local date-time from a string following the ISO 8601 format. + +Resulting local date-time must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the string cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_from_parameters {#function-mgp-local-date-time-from-parameters} + +```cpp +enum mgp_error mgp_local_date_time_from_parameters( + struct mgp_local_date_time_parameters * parameters, + struct mgp_memory * memory, + struct mgp_local_date_time ** local_date_time +) +``` + +Create a local date-time from [mgp_local_date_time_parameters](#mgp_local_date_time_parameters). + +Resulting local date-time must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the parameters cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_copy {#function-mgp-local-date-time-copy} + +```cpp +enum mgp_error mgp_local_date_time_copy( + struct mgp_local_date_time * local_date_time, + struct mgp_memory * memory, + struct mgp_local_date_time ** result +) +``` + +Copy a mgp_local_date_time. + +Resulting pointer must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_destroy {#function-mgp-local-date-time-destroy} + +```cpp +void mgp_local_date_time_destroy( + struct mgp_local_date_time * local_date_time +) +``` + +Free the memory used by a mgp_local_date_time. + +### mgp_local_date_time_equal {#function-mgp-local-date-time-equal} + +```cpp +enum mgp_error mgp_local_date_time_equal( + struct mgp_local_date_time * first, + struct mgp_local_date_time * second, + int * result +) +``` + +Result is non-zero if given local date-times are equal, otherwise 0. + +### mgp_local_date_time_get_year {#function-mgp-local-date-time-get-year} + +```cpp +enum mgp_error mgp_local_date_time_get_year( + struct mgp_local_date_time * local_date_time, + int * year +) +``` + +Get the year property of the local date-time. + +### mgp_local_date_time_get_month {#function-mgp-local-date-time-get-month} + +```cpp +enum mgp_error mgp_local_date_time_get_month( + struct mgp_local_date_time * local_date_time, + int * month +) +``` + +Get the month property of the local date-time. + +### mgp_local_date_time_get_day {#function-mgp-local-date-time-get-day} + +```cpp +enum mgp_error mgp_local_date_time_get_day( + struct mgp_local_date_time * local_date_time, + int * day +) +``` + +Get the day property of the local date-time. + +### mgp_local_date_time_get_hour {#function-mgp-local-date-time-get-hour} + +```cpp +enum mgp_error mgp_local_date_time_get_hour( + struct mgp_local_date_time * local_date_time, + int * hour +) +``` + +Get the hour property of the local date-time. + +### mgp_local_date_time_get_minute {#function-mgp-local-date-time-get-minute} + +```cpp +enum mgp_error mgp_local_date_time_get_minute( + struct mgp_local_date_time * local_date_time, + int * minute +) +``` + +Get the minute property of the local date-time. + +### mgp_local_date_time_get_second {#function-mgp-local-date-time-get-second} + +```cpp +enum mgp_error mgp_local_date_time_get_second( + struct mgp_local_date_time * local_date_time, + int * second +) +``` + +Get the second property of the local date-time. + +### mgp_local_date_time_get_millisecond {#function-mgp-local-date-time-get-millisecond} + +```cpp +enum mgp_error mgp_local_date_time_get_millisecond( + struct mgp_local_date_time * local_date_time, + int * millisecond +) +``` + +Get the milisecond property of the local date-time. + +### mgp_local_date_time_get_microsecond {#function-mgp-local-date-time-get-microsecond} + +```cpp +enum mgp_error mgp_local_date_time_get_microsecond( + struct mgp_local_date_time * local_date_time, + int * microsecond +) +``` + +Get the microsecond property of the local date-time. + +### mgp_local_date_time_timestamp {#function-mgp-local-date-time-timestamp} + +```cpp +enum mgp_error mgp_local_date_time_timestamp( + struct mgp_local_date_time * local_date_time, + int64_t * timestamp +) +``` + +Get the local date-time as microseconds from Unix epoch. + +### mgp_local_date_time_now {#function-mgp-local-date-time-now} + +```cpp +enum mgp_error mgp_local_date_time_now( + struct mgp_memory * memory, + struct mgp_local_date_time ** local_date_time +) +``` + +Get the local date-time representing current date and time. + +Resulting local date-time must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_add_duration {#function-mgp-local-date-time-add-duration} + +```cpp +enum mgp_error mgp_local_date_time_add_duration( + struct mgp_local_date_time * local_date_time, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_local_date_time ** result +) +``` + +Add a duration to the local date-time. + +Resulting local date-time must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid local date-time. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_sub_duration {#function-mgp-local-date-time-sub-duration} + +```cpp +enum mgp_error mgp_local_date_time_sub_duration( + struct mgp_local_date_time * local_date_time, + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_local_date_time ** result +) +``` + +Subtract a duration from the local date-time. + +Resulting local date-time must be freed with mgp_local_date_time_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid local date-time. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_local_date_time_diff {#function-mgp-local-date-time-diff} + +```cpp +enum mgp_error mgp_local_date_time_diff( + struct mgp_local_date_time * first, + struct mgp_local_date_time * second, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Get a duration between two local date-times. + +Resulting duration must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_local_date_time. + + +### mgp_duration_from_string {#function-mgp-duration-from-string} + +```cpp +enum mgp_error mgp_duration_from_string( + const char * string, + struct mgp_memory * memory, + struct mgp_duration ** duration +) +``` + +Create a duration from a string following the ISO 8601 format. + +Resulting duration must be freed with mgp_duration_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the string cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_from_parameters {#function-mgp-duration-from-parameters} + +```cpp +enum mgp_error mgp_duration_from_parameters( + struct mgp_duration_parameters * parameters, + struct mgp_memory * memory, + struct mgp_duration ** duration +) +``` + +Create a duration from [mgp_duration_parameters](#mgp_duration_parameters). + +Resulting duration must be freed with mgp_duration_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the parameters cannot be parsed correctly. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_from_microseconds {#function-mgp-duration-from-microseconds} + +```cpp +enum mgp_error mgp_duration_from_microseconds( + int64_t microseconds, + struct mgp_memory * memory, + struct mgp_duration ** duration +) +``` + +Create a duration from microseconds. + +Resulting duration must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_copy {#function-mgp-duration-copy} + +```cpp +enum mgp_error mgp_duration_copy( + struct mgp_duration * duration, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Copy a mgp_duration. + +Resulting pointer must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_destroy {#function-mgp-duration-destroy} + +```cpp +void mgp_duration_destroy( + struct mgp_duration * duration +) +``` + +Free the memory used by a mgp_duration. + +### mgp_duration_equal {#function-mgp-duration-equal} + +```cpp +enum mgp_error mgp_duration_equal( + struct mgp_duration * first, + struct mgp_duration * second, + int * result +) +``` + +Result is non-zero if given durations are equal, otherwise 0. + +### mgp_duration_get_microseconds {#function-mgp-duration-get-microseconds} + +```cpp +enum mgp_error mgp_duration_get_microseconds( + struct mgp_duration * duration, + int64_t * microseconds +) +``` + +Get the duration as microseconds. + +### mgp_duration_neg {#function-mgp-duration-neg} + +```cpp +enum mgp_error mgp_duration_neg( + struct mgp_duration * dur, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Apply unary minus operator to the duration. + +Resulting pointer must be freed with mgp_duration_destroy. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_add {#function-mgp-duration-add} + +```cpp +enum mgp_error mgp_duration_add( + struct mgp_duration * first, + struct mgp_duration * second, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Add two durations. + +Resulting pointer must be freed with mgp_duration_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid duration. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_duration_sub {#function-mgp-duration-sub} + +```cpp +enum mgp_error mgp_duration_sub( + struct mgp_duration * first, + struct mgp_duration * second, + struct mgp_memory * memory, + struct mgp_duration ** result +) +``` + +Subtract two durations. + +Resulting pointer must be freed with mgp_duration_destroy. Return MGP_ERROR_INVALID_ARGUMENT if the operation results in an invalid duration. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_duration. + + +### mgp_type_any {#function-mgp-type-any} + +```cpp +enum mgp_error mgp_type_any( + struct mgp_type ** result +) +``` + +Get the type representing any value that isn't `null`. + +The ANY type is the parent type of all types. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_bool {#function-mgp-type-bool} + +```cpp +enum mgp_error mgp_type_bool( + struct mgp_type ** result +) +``` + +Get the type representing boolean values. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_string {#function-mgp-type-string} + +```cpp +enum mgp_error mgp_type_string( + struct mgp_type ** result +) +``` + +Get the type representing character string values. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_int {#function-mgp-type-int} + +```cpp +enum mgp_error mgp_type_int( + struct mgp_type ** result +) +``` + +Get the type representing integer values. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_float {#function-mgp-type-float} + +```cpp +enum mgp_error mgp_type_float( + struct mgp_type ** result +) +``` + +Get the type representing floating-point values. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_number {#function-mgp-type-number} + +```cpp +enum mgp_error mgp_type_number( + struct mgp_type ** result +) +``` + +Get the type representing any number value. + +This is the parent type for numeric types, i.e. INTEGER and FLOAT. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_map {#function-mgp-type-map} + +```cpp +enum mgp_error mgp_type_map( + struct mgp_type ** result +) +``` + +Get the type representing map values. + +**See**: + + * [mgp_type_node](#function-mgp-type-node) + * [mgp_type_relationship](#function-mgp-type-relationship) Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +Map values are those which map string keys to values of any type. For example `{ database: "Memgraph", version: 1.42 }`. Note that graph nodes contain property maps, so a node value will also satisfy the MAP type. The same applies for graph relationship values. + + +### mgp_type_node {#function-mgp-type-node} + +```cpp +enum mgp_error mgp_type_node( + struct mgp_type ** result +) +``` + +Get the type representing graph node values. + +Since a node contains a map of properties, the node itself is also of MAP type. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_relationship {#function-mgp-type-relationship} + +```cpp +enum mgp_error mgp_type_relationship( + struct mgp_type ** result +) +``` + +Get the type representing graph relationship values. + +Since a relationship contains a map of properties, the relationship itself is also of MAP type. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_path {#function-mgp-type-path} + +```cpp +enum mgp_error mgp_type_path( + struct mgp_type ** result +) +``` + +Get the type representing a graph path (walk) from one node to another. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_list {#function-mgp-type-list} + +```cpp +enum mgp_error mgp_type_list( + struct mgp_type * element_type, + struct mgp_type ** result +) +``` + +Build a type representing a list of values of given `element_type`. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_date {#function-mgp-type-date} + +```cpp +enum mgp_error mgp_type_date( + struct mgp_type ** result +) +``` + +Get the type representing a date. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_local_time {#function-mgp-type-local-time} + +```cpp +enum mgp_error mgp_type_local_time( + struct mgp_type ** result +) +``` + +Get the type representing a local time. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_local_date_time {#function-mgp-type-local-date-time} + +```cpp +enum mgp_error mgp_type_local_date_time( + struct mgp_type ** result +) +``` + +Get the type representing a local date-time. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_duration {#function-mgp-type-duration} + +```cpp +enum mgp_error mgp_type_duration( + struct mgp_type ** result +) +``` + +Get the type representing a duration. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_type_nullable {#function-mgp-type-nullable} + +```cpp +enum mgp_error mgp_type_nullable( + struct mgp_type * type, + struct mgp_type ** result +) +``` + +Build a type representing either a `null` value or a value of given `type`. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate the new type. + + +### mgp_module_add_read_procedure {#function-mgp-module-add-read-procedure} + +```cpp +enum mgp_error mgp_module_add_read_procedure( + struct mgp_module * module, + const char * name, + mgp_proc_cb cb, + struct mgp_proc ** result +) +``` + +Register a read-only procedure to a module. + +The `name` must be a sequence of digits, underscores, lowercase and uppercase Latin letters. The name must begin with a non-digit character. Note that Unicode characters are not allowed. Additionally, names are case-sensitive. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for mgp_proc. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid procedure name. RETURN MGP_ERROR_LOGIC_ERROR if a procedure with the same name was already registered. + + +### mgp_module_add_write_procedure {#function-mgp-module-add-write-procedure} + +```cpp +enum mgp_error mgp_module_add_write_procedure( + struct mgp_module * module, + const char * name, + mgp_proc_cb cb, + struct mgp_proc ** result +) +``` + +Register a writeable procedure to a module. + +The `name` must be a valid identifier, following the same rules as the procedure`name` in mgp_module_add_read_procedure. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for mgp_proc. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid procedure name. RETURN MGP_ERROR_LOGIC_ERROR if a procedure with the same name was already registered. + + +### mgp_proc_add_arg {#function-mgp-proc-add-arg} + +```cpp +enum mgp_error mgp_proc_add_arg( + struct mgp_proc * proc, + const char * name, + struct mgp_type * type +) +``` + +Add a required argument to a procedure. + +The order of adding arguments will correspond to the order the procedure must receive them through openCypher. Required arguments will be followed by optional arguments. + +The `name` must be a valid identifier, following the same rules as the procedure`name` in mgp_module_add_read_procedure. + +Passed in `type` describes what kind of values can be used as the argument. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for an argument. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid argument name. RETURN MGP_ERROR_LOGIC_ERROR if the procedure already has any optional argument. + + +### mgp_proc_add_opt_arg {#function-mgp-proc-add-opt-arg} + +```cpp +enum mgp_error mgp_proc_add_opt_arg( + struct mgp_proc * proc, + const char * name, + struct mgp_type * type, + struct mgp_value * default_value +) +``` + +Add an optional argument with a default value to a procedure. + +The order of adding arguments will correspond to the order the procedure must receive them through openCypher. Optional arguments must follow the required arguments. + +The `name` must be a valid identifier, following the same rules as the procedure `name` in mgp_module_add_read_procedure. + +Passed in `type` describes what kind of values can be used as the argument. + +`default_value` is copied and set as the default value for the argument. Don't forget to call mgp_value_destroy when you are done using `default_value`. When the procedure is called, if this argument is not provided, `default_value` will be used instead. `default_value` must not be a graph element (node, relationship, path) and it must satisfy the given `type`. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for an argument. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid argument name. RETURN MGP_ERROR_VALUE_CONVERSION if `default_value` is a graph element (vertex, edge or path). RETURN MGP_ERROR_LOGIC_ERROR if `default_value` does not satisfy `type`. + + +### mgp_proc_add_result {#function-mgp-proc-add-result} + +```cpp +enum mgp_error mgp_proc_add_result( + struct mgp_proc * proc, + const char * name, + struct mgp_type * type +) +``` + +Add a result field to a procedure. + +The `name` must be a valid identifier, following the same rules as the procedure `name` in mgp_module_add_read_procedure. + +Passed in `type` describes what kind of values can be returned through the result field. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for an argument. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid result name. RETURN MGP_ERROR_LOGIC_ERROR if a result field with the same name was already added. + + +### mgp_proc_add_deprecated_result {#function-mgp-proc-add-deprecated-result} + +```cpp +enum mgp_error mgp_proc_add_deprecated_result( + struct mgp_proc * proc, + const char * name, + struct mgp_type * type +) +``` + +Add a result field to a procedure and mark it as deprecated. + +This is the same as mgp_proc_add_result, but the result field will be marked as deprecated. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for an argument. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid result name. RETURN MGP_ERROR_LOGIC_ERROR if a result field with the same name was already added. + + +### mgp_must_abort {#function-mgp-must-abort} + +```cpp +int mgp_must_abort( + struct mgp_graph * graph +) +``` + +Return non-zero if the currently executing procedure should abort as soon as possible. + +Procedures which perform heavyweight processing run the risk of running too long and going over the query execution time limit. To prevent this, such procedures should periodically call this function at critical points in their code in order to determine whether they should abort or not. Note that this mechanism is purely cooperative and depends on the procedure doing the checking and aborting on its own. + + +### mgp_message_payload {#function-mgp-message-payload} + +```cpp +enum mgp_error mgp_message_payload( + struct mgp_message * message, + const char ** result +) +``` + +Payload is not null terminated and not a string but rather a byte array. + +You need to call [mgp_message_payload_size()](#function-mgp-message-payload-size) first, to read the size of the payload. + + +### mgp_message_payload_size {#function-mgp-message-payload-size} + +```cpp +enum mgp_error mgp_message_payload_size( + struct mgp_message * message, + size_t * result +) +``` + +Get the payload size. + +### mgp_message_topic_name {#function-mgp-message-topic-name} + +```cpp +enum mgp_error mgp_message_topic_name( + struct mgp_message * message, + const char ** result +) +``` + +Get the name of topic. + +### mgp_message_key {#function-mgp-message-key} + +```cpp +enum mgp_error mgp_message_key( + struct mgp_message * message, + const char ** result +) +``` + +Get the key of mgp_message as a byte array. + +### mgp_message_key_size {#function-mgp-message-key-size} + +```cpp +enum mgp_error mgp_message_key_size( + struct mgp_message * message, + size_t * result +) +``` + +Get the key size of mgp_message. + +### mgp_message_timestamp {#function-mgp-message-timestamp} + +```cpp +enum mgp_error mgp_message_timestamp( + struct mgp_message * message, + int64_t * result +) +``` + +Get the timestamp of mgp_message as a byte array. + +### mgp_messages_size {#function-mgp-messages-size} + +```cpp +enum mgp_error mgp_messages_size( + struct mgp_messages * message, + size_t * result +) +``` + +Get the number of messages contained in the mgp_messages list Current implementation always returns without errors. + +### mgp_messages_at {#function-mgp-messages-at} + +```cpp +enum mgp_error mgp_messages_at( + struct mgp_messages * message, + size_t index, + struct mgp_message ** result +) +``` + +Get the message from a messages list at given index. + +### mgp_module_add_transformation {#function-mgp-module-add-transformation} + +```cpp +enum mgp_error mgp_module_add_transformation( + struct mgp_module * module, + const char * name, + mgp_trans_cb cb +) +``` + +Register a transformation with a module. + +The `name` must be a sequence of digits, underscores, lowercase and uppercase Latin letters. The name must begin with a non-digit character. Note that Unicode characters are not allowed. Additionally, names are case-sensitive. + +Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate memory for transformation. Return MGP_ERROR_INVALID_ARGUMENT if `name` is not a valid transformation name. RETURN MGP_ERROR_LOGIC_ERROR if a transformation with the same name was already registered. + + +### mgp_vertices_iterator_next {#function-mgp-vertices-iterator-next} + +```cpp +enum mgp_error mgp_vertices_iterator_next( + struct mgp_vertices_iterator * it, + struct mgp_vertex ** result +) +``` + +Advance the iterator to the next vertex and return it. + +The previous mgp_vertex obtained through mgp_vertices_iterator_get will be invalidated, and you must not use its value. Result is NULL if the end of the iteration has been reached. Return MGP_ERROR_UNABLE_TO_ALLOCATE if unable to allocate a mgp_vertex. + +### mgp_log {#function-mgp-log} + +```cpp +enum mgp_error mgp_log( + enum mgp_log_level log_level, + const char *output +) +``` + +Log a message on a certain level. + + +## Attributes Documentation + +### mgp_error {#variable-mgp-error} + +```cpp +enum MGP_NODISCARD mgp_error; +``` + +All functions return an error code that can be used to figure out whether the API call was successful or not. + +In case of failure, the specific error code can be used to identify the reason of the failure. + + +### MGP_ERROR_NO_ERROR {#variable-mgp-error-no-error} + +```cpp +MGP_ERROR_NO_ERROR = 0; +``` + + +### MGP_ERROR_UNKNOWN_ERROR {#variable-mgp-error-unknown-error} + +```cpp +MGP_ERROR_UNKNOWN_ERROR; +``` + + +### MGP_ERROR_UNABLE_TO_ALLOCATE {#variable-mgp-error-unable-to-allocate} + +```cpp +MGP_ERROR_UNABLE_TO_ALLOCATE; +``` + + +### MGP_ERROR_INSUFFICIENT_BUFFER {#variable-mgp-error-insufficient-buffer} + +```cpp +MGP_ERROR_INSUFFICIENT_BUFFER; +``` + + +### MGP_ERROR_OUT_OF_RANGE {#variable-mgp-error-out-of-range} + +```cpp +MGP_ERROR_OUT_OF_RANGE; +``` + + +### MGP_ERROR_LOGIC_ERROR {#variable-mgp-error-logic-error} + +```cpp +MGP_ERROR_LOGIC_ERROR; +``` + + +### MGP_ERROR_DELETED_OBJECT {#variable-mgp-error-deleted-object} + +```cpp +MGP_ERROR_DELETED_OBJECT; +``` + + +### MGP_ERROR_INVALID_ARGUMENT {#variable-mgp-error-invalid-argument} + +```cpp +MGP_ERROR_INVALID_ARGUMENT; +``` + + +### MGP_ERROR_KEY_ALREADY_EXISTS {#variable-mgp-error-key-already-exists} + +```cpp +MGP_ERROR_KEY_ALREADY_EXISTS; +``` + + +### MGP_ERROR_IMMUTABLE_OBJECT {#variable-mgp-error-immutable-object} + +```cpp +MGP_ERROR_IMMUTABLE_OBJECT; +``` + + +### MGP_ERROR_VALUE_CONVERSION {#variable-mgp-error-value-conversion} + +```cpp +MGP_ERROR_VALUE_CONVERSION; +``` + + +### MGP_ERROR_SERIALIZATION_ERROR {#variable-mgp-error-serialization-error} + +```cpp +MGP_ERROR_SERIALIZATION_ERROR; +``` + + + +## Macros Documentation + +### define MGP_NODISCARD {#define-mgp-nodiscard} + +```cpp +#define MGP_NODISCARD +``` + + +## Source code + +```cpp +// Copyright 2021 Memgraph Ltd. +// +// Use of this software is governed by the Business Source License +// included in the file licenses/BSL.txt; by using this file, you agree to be bound by the terms of the Business Source +// License, and you may not use this file except in compliance with the Business Source License. +// +// As of the Change Date specified in that file, in accordance with +// the Business Source License, use of this software will be governed +// by the Apache License, Version 2.0, included in the file +// licenses/APL.txt. + +#ifndef MG_PROCEDURE_H +#define MG_PROCEDURE_H + +#ifdef __cplusplus +extern "C" { +#endif + +#if __cplusplus >= 201703L +#define MGP_NODISCARD [[nodiscard]] +#else +#define MGP_NODISCARD +#endif + +#include +#include + + +enum MGP_NODISCARD mgp_error { + MGP_ERROR_NO_ERROR = 0, + MGP_ERROR_UNKNOWN_ERROR, + MGP_ERROR_UNABLE_TO_ALLOCATE, + MGP_ERROR_INSUFFICIENT_BUFFER, + MGP_ERROR_OUT_OF_RANGE, + MGP_ERROR_LOGIC_ERROR, + MGP_ERROR_DELETED_OBJECT, + MGP_ERROR_INVALID_ARGUMENT, + MGP_ERROR_KEY_ALREADY_EXISTS, + MGP_ERROR_IMMUTABLE_OBJECT, + MGP_ERROR_VALUE_CONVERSION, + MGP_ERROR_SERIALIZATION_ERROR, +}; + + +struct mgp_memory; + +enum mgp_error mgp_alloc(struct mgp_memory *memory, size_t size_in_bytes, void **result); + +enum mgp_error mgp_aligned_alloc(struct mgp_memory *memory, size_t size_in_bytes, size_t alignment, void **result); + +void mgp_free(struct mgp_memory *memory, void *ptr); + +enum mgp_error mgp_global_alloc(size_t size_in_bytes, void **result); + +enum mgp_error mgp_global_aligned_alloc(size_t size_in_bytes, size_t alignment, void **result); + +void mgp_global_free(void *p); + + +struct mgp_value; + +struct mgp_list; + +struct mgp_map; + +struct mgp_vertex; + +struct mgp_edge; + +struct mgp_path; + +struct mgp_date; + +struct mgp_local_time; + +struct mgp_local_date_time; + +struct mgp_duration; + +enum mgp_value_type { + // NOTE: New types need to be appended, so as not to break ABI. + MGP_VALUE_TYPE_NULL, + MGP_VALUE_TYPE_BOOL, + MGP_VALUE_TYPE_INT, + MGP_VALUE_TYPE_DOUBLE, + MGP_VALUE_TYPE_STRING, + MGP_VALUE_TYPE_LIST, + MGP_VALUE_TYPE_MAP, + MGP_VALUE_TYPE_VERTEX, + MGP_VALUE_TYPE_EDGE, + MGP_VALUE_TYPE_PATH, + MGP_VALUE_TYPE_DATE, + MGP_VALUE_TYPE_LOCAL_TIME, + MGP_VALUE_TYPE_LOCAL_DATE_TIME, + MGP_VALUE_TYPE_DURATION, +}; + +void mgp_value_destroy(struct mgp_value *val); + +enum mgp_error mgp_value_make_null(struct mgp_memory *memory, struct mgp_value **result); + +enum mgp_error mgp_value_make_bool(int val, struct mgp_memory *memory, struct mgp_value **result); + +enum mgp_error mgp_value_make_int(int64_t val, struct mgp_memory *memory, struct mgp_value **result); + +enum mgp_error mgp_value_make_double(double val, struct mgp_memory *memory, struct mgp_value **result); + +enum mgp_error mgp_value_make_string(const char *val, struct mgp_memory *memory, struct mgp_value **result); + +enum mgp_error mgp_value_make_list(struct mgp_list *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_map(struct mgp_map *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_vertex(struct mgp_vertex *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_edge(struct mgp_edge *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_path(struct mgp_path *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_date(struct mgp_date *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_local_time(struct mgp_local_time *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_local_date_time(struct mgp_local_date_time *val, struct mgp_value **result); + +enum mgp_error mgp_value_make_duration(struct mgp_duration *val, struct mgp_value **result); + +enum mgp_error mgp_value_get_type(struct mgp_value *val, enum mgp_value_type *result); + +enum mgp_error mgp_value_is_null(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_bool(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_int(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_double(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_string(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_list(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_map(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_vertex(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_edge(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_path(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_date(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_local_time(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_local_date_time(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_is_duration(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_get_bool(struct mgp_value *val, int *result); + +enum mgp_error mgp_value_get_int(struct mgp_value *val, int64_t *result); + +enum mgp_error mgp_value_get_double(struct mgp_value *val, double *result); + +enum mgp_error mgp_value_get_string(struct mgp_value *val, const char **result); + +enum mgp_error mgp_value_get_list(struct mgp_value *val, struct mgp_list **result); + +enum mgp_error mgp_value_get_map(struct mgp_value *val, struct mgp_map **result); + +enum mgp_error mgp_value_get_vertex(struct mgp_value *val, struct mgp_vertex **result); + +enum mgp_error mgp_value_get_edge(struct mgp_value *val, struct mgp_edge **result); + +enum mgp_error mgp_value_get_path(struct mgp_value *val, struct mgp_path **result); + +enum mgp_error mgp_value_get_date(struct mgp_value *val, struct mgp_date **result); + +enum mgp_error mgp_value_get_local_time(struct mgp_value *val, struct mgp_local_time **result); + +enum mgp_error mgp_value_get_local_date_time(struct mgp_value *val, struct mgp_local_date_time **result); + +enum mgp_error mgp_value_get_duration(struct mgp_value *val, struct mgp_duration **result); + +enum mgp_error mgp_list_make_empty(size_t capacity, struct mgp_memory *memory, struct mgp_list **result); + +void mgp_list_destroy(struct mgp_list *list); + +enum mgp_error mgp_list_append(struct mgp_list *list, struct mgp_value *val); + +enum mgp_error mgp_list_append_extend(struct mgp_list *list, struct mgp_value *val); + +enum mgp_error mgp_list_size(struct mgp_list *list, size_t *result); + +enum mgp_error mgp_list_capacity(struct mgp_list *list, size_t *result); + +enum mgp_error mgp_list_at(struct mgp_list *list, size_t index, struct mgp_value **result); + +enum mgp_error mgp_map_make_empty(struct mgp_memory *memory, struct mgp_map **result); + +void mgp_map_destroy(struct mgp_map *map); + +enum mgp_error mgp_map_insert(struct mgp_map *map, const char *key, struct mgp_value *value); + +enum mgp_error mgp_map_size(struct mgp_map *map, size_t *result); + +enum mgp_error mgp_map_at(struct mgp_map *map, const char *key, struct mgp_value **result); + +struct mgp_map_item; + +enum mgp_error mgp_map_item_key(struct mgp_map_item *item, const char **result); + +enum mgp_error mgp_map_item_value(struct mgp_map_item *item, struct mgp_value **result); + +struct mgp_map_items_iterator; + +enum mgp_error mgp_map_iter_items(struct mgp_map *map, struct mgp_memory *memory, + struct mgp_map_items_iterator **result); + +void mgp_map_items_iterator_destroy(struct mgp_map_items_iterator *it); + +enum mgp_error mgp_map_items_iterator_get(struct mgp_map_items_iterator *it, struct mgp_map_item **result); + +enum mgp_error mgp_map_items_iterator_next(struct mgp_map_items_iterator *it, struct mgp_map_item **result); + +enum mgp_error mgp_path_make_with_start(struct mgp_vertex *vertex, struct mgp_memory *memory, struct mgp_path **result); + +enum mgp_error mgp_path_copy(struct mgp_path *path, struct mgp_memory *memory, struct mgp_path **result); + +void mgp_path_destroy(struct mgp_path *path); + +enum mgp_error mgp_path_expand(struct mgp_path *path, struct mgp_edge *edge); + +enum mgp_error mgp_path_size(struct mgp_path *path, size_t *result); + +enum mgp_error mgp_path_vertex_at(struct mgp_path *path, size_t index, struct mgp_vertex **result); + +enum mgp_error mgp_path_edge_at(struct mgp_path *path, size_t index, struct mgp_edge **result); + +enum mgp_error mgp_path_equal(struct mgp_path *p1, struct mgp_path *p2, int *result); + + + +struct mgp_result; +struct mgp_result_record; + +enum mgp_error mgp_result_set_error_msg(struct mgp_result *res, const char *error_msg); + +enum mgp_error mgp_result_new_record(struct mgp_result *res, struct mgp_result_record **result); + +enum mgp_error mgp_result_record_insert(struct mgp_result_record *record, const char *field_name, + struct mgp_value *val); + + +struct mgp_label { + const char *name; +}; + +struct mgp_edge_type { + const char *name; +}; + +struct mgp_properties_iterator; + +void mgp_properties_iterator_destroy(struct mgp_properties_iterator *it); + +struct mgp_property { + const char *name; + struct mgp_value *value; +}; + +enum mgp_error mgp_properties_iterator_get(struct mgp_properties_iterator *it, struct mgp_property **result); + +enum mgp_error mgp_properties_iterator_next(struct mgp_properties_iterator *it, struct mgp_property **result); + +struct mgp_edges_iterator; + +void mgp_edges_iterator_destroy(struct mgp_edges_iterator *it); + +struct mgp_vertex_id { + int64_t as_int; +}; + +enum mgp_error mgp_vertex_get_id(struct mgp_vertex *v, struct mgp_vertex_id *result); + +enum mgp_error mgp_vertex_underlying_graph_is_mutable(struct mgp_vertex *v, int *result); + +enum mgp_error mgp_vertex_set_property(struct mgp_vertex *v, const char *property_name, + struct mgp_value *property_value); + +enum mgp_error mgp_vertex_add_label(struct mgp_vertex *v, struct mgp_label label); + +enum mgp_error mgp_vertex_remove_label(struct mgp_vertex *v, struct mgp_label label); + +enum mgp_error mgp_vertex_copy(struct mgp_vertex *v, struct mgp_memory *memory, struct mgp_vertex **result); + +void mgp_vertex_destroy(struct mgp_vertex *v); + +enum mgp_error mgp_vertex_equal(struct mgp_vertex *v1, struct mgp_vertex *v2, int *result); + +enum mgp_error mgp_vertex_labels_count(struct mgp_vertex *v, size_t *result); + +enum mgp_error mgp_vertex_label_at(struct mgp_vertex *v, size_t index, struct mgp_label *result); + +enum mgp_error mgp_vertex_has_label(struct mgp_vertex *v, struct mgp_label label, int *result); + +enum mgp_error mgp_vertex_has_label_named(struct mgp_vertex *v, const char *label_name, int *result); + +enum mgp_error mgp_vertex_get_property(struct mgp_vertex *v, const char *property_name, struct mgp_memory *memory, + struct mgp_value **result); + +enum mgp_error mgp_vertex_iter_properties(struct mgp_vertex *v, struct mgp_memory *memory, + struct mgp_properties_iterator **result); + +enum mgp_error mgp_vertex_iter_in_edges(struct mgp_vertex *v, struct mgp_memory *memory, + struct mgp_edges_iterator **result); + +enum mgp_error mgp_vertex_iter_out_edges(struct mgp_vertex *v, struct mgp_memory *memory, + struct mgp_edges_iterator **result); + +enum mgp_error mgp_edges_iterator_underlying_graph_is_mutable(struct mgp_edges_iterator *it, int *result); + +enum mgp_error mgp_edges_iterator_get(struct mgp_edges_iterator *it, struct mgp_edge **result); + +enum mgp_error mgp_edges_iterator_next(struct mgp_edges_iterator *it, struct mgp_edge **result); + +struct mgp_edge_id { + int64_t as_int; +}; + +enum mgp_error mgp_edge_get_id(struct mgp_edge *e, struct mgp_edge_id *result); + +enum mgp_error mgp_edge_underlying_graph_is_mutable(struct mgp_edge *e, int *result); + +enum mgp_error mgp_edge_copy(struct mgp_edge *e, struct mgp_memory *memory, struct mgp_edge **result); + +void mgp_edge_destroy(struct mgp_edge *e); + +enum mgp_error mgp_edge_equal(struct mgp_edge *e1, struct mgp_edge *e2, int *result); + +enum mgp_error mgp_edge_get_type(struct mgp_edge *e, struct mgp_edge_type *result); + +enum mgp_error mgp_edge_get_from(struct mgp_edge *e, struct mgp_vertex **result); + +enum mgp_error mgp_edge_get_to(struct mgp_edge *e, struct mgp_vertex **result); + +enum mgp_error mgp_edge_get_property(struct mgp_edge *e, const char *property_name, struct mgp_memory *memory, + struct mgp_value **result); + +enum mgp_error mgp_edge_set_property(struct mgp_edge *e, const char *property_name, struct mgp_value *property_value); + +enum mgp_error mgp_edge_iter_properties(struct mgp_edge *e, struct mgp_memory *memory, + struct mgp_properties_iterator **result); + +struct mgp_graph; + +enum mgp_error mgp_graph_get_vertex_by_id(struct mgp_graph *g, struct mgp_vertex_id id, struct mgp_memory *memory, + struct mgp_vertex **result); + +enum mgp_error mgp_graph_is_mutable(struct mgp_graph *graph, int *result); + +enum mgp_error mgp_graph_create_vertex(struct mgp_graph *graph, struct mgp_memory *memory, struct mgp_vertex **result); + +enum mgp_error mgp_graph_delete_vertex(struct mgp_graph *graph, struct mgp_vertex *vertex); + +enum mgp_error mgp_graph_detach_delete_vertex(struct mgp_graph *graph, struct mgp_vertex *vertex); + +enum mgp_error mgp_graph_create_edge(struct mgp_graph *graph, struct mgp_vertex *from, struct mgp_vertex *to, + struct mgp_edge_type type, struct mgp_memory *memory, struct mgp_edge **result); + +enum mgp_error mgp_graph_delete_edge(struct mgp_graph *graph, struct mgp_edge *edge); + +struct mgp_vertices_iterator; + +void mgp_vertices_iterator_destroy(struct mgp_vertices_iterator *it); + +enum mgp_error mgp_graph_iter_vertices(struct mgp_graph *g, struct mgp_memory *memory, + struct mgp_vertices_iterator **result); + +enum mgp_error mgp_vertices_iterator_underlying_graph_is_mutable(struct mgp_vertices_iterator *it, int *result); + +enum mgp_error mgp_vertices_iterator_get(struct mgp_vertices_iterator *it, struct mgp_vertex **result); + + +struct mgp_date_parameters { + int year; + int month; + int day; +}; + +enum mgp_error mgp_date_from_string(const char *string, struct mgp_memory *memory, struct mgp_date **date); + +enum mgp_error mgp_date_from_parameters(struct mgp_date_parameters *parameters, struct mgp_memory *memory, + struct mgp_date **date); + +enum mgp_error mgp_date_copy(struct mgp_date *date, struct mgp_memory *memory, struct mgp_date **result); + +void mgp_date_destroy(struct mgp_date *date); + +enum mgp_error mgp_date_equal(struct mgp_date *first, struct mgp_date *second, int *result); + +enum mgp_error mgp_date_get_year(struct mgp_date *date, int *year); + +enum mgp_error mgp_date_get_month(struct mgp_date *date, int *month); + +enum mgp_error mgp_date_get_day(struct mgp_date *date, int *day); + +enum mgp_error mgp_date_timestamp(struct mgp_date *date, int64_t *timestamp); + +enum mgp_error mgp_date_now(struct mgp_memory *memory, struct mgp_date **date); + +enum mgp_error mgp_date_add_duration(struct mgp_date *date, struct mgp_duration *dur, struct mgp_memory *memory, + struct mgp_date **result); + +enum mgp_error mgp_date_sub_duration(struct mgp_date *date, struct mgp_duration *dur, struct mgp_memory *memory, + struct mgp_date **result); + +enum mgp_error mgp_date_diff(struct mgp_date *first, struct mgp_date *second, struct mgp_memory *memory, + struct mgp_duration **result); + +struct mgp_local_time_parameters { + int hour; + int minute; + int second; + int millisecond; + int microsecond; +}; + +enum mgp_error mgp_local_time_from_string(const char *string, struct mgp_memory *memory, + struct mgp_local_time **local_time); + +enum mgp_error mgp_local_time_from_parameters(struct mgp_local_time_parameters *parameters, struct mgp_memory *memory, + struct mgp_local_time **local_time); + +enum mgp_error mgp_local_time_copy(struct mgp_local_time *local_time, struct mgp_memory *memory, + struct mgp_local_time **result); + +void mgp_local_time_destroy(struct mgp_local_time *local_time); + +enum mgp_error mgp_local_time_equal(struct mgp_local_time *first, struct mgp_local_time *second, int *result); + +enum mgp_error mgp_local_time_get_hour(struct mgp_local_time *local_time, int *hour); + +enum mgp_error mgp_local_time_get_minute(struct mgp_local_time *local_time, int *minute); + +enum mgp_error mgp_local_time_get_second(struct mgp_local_time *local_time, int *second); + +enum mgp_error mgp_local_time_get_millisecond(struct mgp_local_time *local_time, int *millisecond); + +enum mgp_error mgp_local_time_get_microsecond(struct mgp_local_time *local_time, int *microsecond); + +enum mgp_error mgp_local_time_timestamp(struct mgp_local_time *local_time, int64_t *timestamp); + +enum mgp_error mgp_local_time_now(struct mgp_memory *memory, struct mgp_local_time **local_time); + +enum mgp_error mgp_local_time_add_duration(struct mgp_local_time *local_time, struct mgp_duration *dur, + struct mgp_memory *memory, struct mgp_local_time **result); + +enum mgp_error mgp_local_time_sub_duration(struct mgp_local_time *local_time, struct mgp_duration *dur, + struct mgp_memory *memory, struct mgp_local_time **result); + +enum mgp_error mgp_local_time_diff(struct mgp_local_time *first, struct mgp_local_time *second, + struct mgp_memory *memory, struct mgp_duration **result); + +struct mgp_local_date_time_parameters { + struct mgp_date_parameters *date_parameters; + struct mgp_local_time_parameters *local_time_parameters; +}; + +enum mgp_error mgp_local_date_time_from_string(const char *string, struct mgp_memory *memory, + struct mgp_local_date_time **local_date_time); + +enum mgp_error mgp_local_date_time_from_parameters(struct mgp_local_date_time_parameters *parameters, + struct mgp_memory *memory, + struct mgp_local_date_time **local_date_time); + +enum mgp_error mgp_local_date_time_copy(struct mgp_local_date_time *local_date_time, struct mgp_memory *memory, + struct mgp_local_date_time **result); + +void mgp_local_date_time_destroy(struct mgp_local_date_time *local_date_time); + +enum mgp_error mgp_local_date_time_equal(struct mgp_local_date_time *first, struct mgp_local_date_time *second, + int *result); + +enum mgp_error mgp_local_date_time_get_year(struct mgp_local_date_time *local_date_time, int *year); + +enum mgp_error mgp_local_date_time_get_month(struct mgp_local_date_time *local_date_time, int *month); + +enum mgp_error mgp_local_date_time_get_day(struct mgp_local_date_time *local_date_time, int *day); + +enum mgp_error mgp_local_date_time_get_hour(struct mgp_local_date_time *local_date_time, int *hour); + +enum mgp_error mgp_local_date_time_get_minute(struct mgp_local_date_time *local_date_time, int *minute); + +enum mgp_error mgp_local_date_time_get_second(struct mgp_local_date_time *local_date_time, int *second); + +enum mgp_error mgp_local_date_time_get_millisecond(struct mgp_local_date_time *local_date_time, int *millisecond); + +enum mgp_error mgp_local_date_time_get_microsecond(struct mgp_local_date_time *local_date_time, int *microsecond); + +enum mgp_error mgp_local_date_time_timestamp(struct mgp_local_date_time *local_date_time, int64_t *timestamp); + +enum mgp_error mgp_local_date_time_now(struct mgp_memory *memory, struct mgp_local_date_time **local_date_time); + +enum mgp_error mgp_local_date_time_add_duration(struct mgp_local_date_time *local_date_time, struct mgp_duration *dur, + struct mgp_memory *memory, struct mgp_local_date_time **result); + +enum mgp_error mgp_local_date_time_sub_duration(struct mgp_local_date_time *local_date_time, struct mgp_duration *dur, + struct mgp_memory *memory, struct mgp_local_date_time **result); + +enum mgp_error mgp_local_date_time_diff(struct mgp_local_date_time *first, struct mgp_local_date_time *second, + struct mgp_memory *memory, struct mgp_duration **result); + +struct mgp_duration_parameters { + double day; + double hour; + double minute; + double second; + double millisecond; + double microsecond; +}; + +enum mgp_error mgp_duration_from_string(const char *string, struct mgp_memory *memory, struct mgp_duration **duration); + +enum mgp_error mgp_duration_from_parameters(struct mgp_duration_parameters *parameters, struct mgp_memory *memory, + struct mgp_duration **duration); + +enum mgp_error mgp_duration_from_microseconds(int64_t microseconds, struct mgp_memory *memory, + struct mgp_duration **duration); + +enum mgp_error mgp_duration_copy(struct mgp_duration *duration, struct mgp_memory *memory, + struct mgp_duration **result); + +void mgp_duration_destroy(struct mgp_duration *duration); + +enum mgp_error mgp_duration_equal(struct mgp_duration *first, struct mgp_duration *second, int *result); + +enum mgp_error mgp_duration_get_microseconds(struct mgp_duration *duration, int64_t *microseconds); + +enum mgp_error mgp_duration_neg(struct mgp_duration *dur, struct mgp_memory *memory, struct mgp_duration **result); + +enum mgp_error mgp_duration_add(struct mgp_duration *first, struct mgp_duration *second, struct mgp_memory *memory, + struct mgp_duration **result); + +enum mgp_error mgp_duration_sub(struct mgp_duration *first, struct mgp_duration *second, struct mgp_memory *memory, + struct mgp_duration **result); + +enum mgp_error mgp_vertices_iterator_next(struct mgp_vertices_iterator *it, struct mgp_vertex **result); + + +struct mgp_type; + +enum mgp_error mgp_type_any(struct mgp_type **result); + +enum mgp_error mgp_type_bool(struct mgp_type **result); + +enum mgp_error mgp_type_string(struct mgp_type **result); + +enum mgp_error mgp_type_int(struct mgp_type **result); + +enum mgp_error mgp_type_float(struct mgp_type **result); + +enum mgp_error mgp_type_number(struct mgp_type **result); + +enum mgp_error mgp_type_map(struct mgp_type **result); + +enum mgp_error mgp_type_node(struct mgp_type **result); + +enum mgp_error mgp_type_relationship(struct mgp_type **result); + +enum mgp_error mgp_type_path(struct mgp_type **result); + +enum mgp_error mgp_type_list(struct mgp_type *element_type, struct mgp_type **result); + +enum mgp_error mgp_type_date(struct mgp_type **result); + +enum mgp_error mgp_type_local_time(struct mgp_type **result); + +enum mgp_error mgp_type_local_date_time(struct mgp_type **result); + +enum mgp_error mgp_type_duration(struct mgp_type **result); + +enum mgp_error mgp_type_nullable(struct mgp_type *type, struct mgp_type **result); + + +struct mgp_module; + +struct mgp_proc; + +/// All available log levels that can be used in mgp_log function +MGP_ENUM_CLASS mgp_log_level{ + MGP_LOG_LEVEL_TRACE, MGP_LOG_LEVEL_DEBUG, MGP_LOG_LEVEL_INFO, + MGP_LOG_LEVEL_WARN, MGP_LOG_LEVEL_ERROR, MGP_LOG_LEVEL_CRITICAL, +}; + +typedef void (*mgp_proc_cb)(struct mgp_list *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *); + +enum mgp_error mgp_module_add_read_procedure(struct mgp_module *module, const char *name, mgp_proc_cb cb, + struct mgp_proc **result); + +enum mgp_error mgp_module_add_write_procedure(struct mgp_module *module, const char *name, mgp_proc_cb cb, + struct mgp_proc **result); + +enum mgp_error mgp_proc_add_arg(struct mgp_proc *proc, const char *name, struct mgp_type *type); + +enum mgp_error mgp_proc_add_opt_arg(struct mgp_proc *proc, const char *name, struct mgp_type *type, + struct mgp_value *default_value); + +enum mgp_error mgp_proc_add_result(struct mgp_proc *proc, const char *name, struct mgp_type *type); + +enum mgp_error mgp_proc_add_deprecated_result(struct mgp_proc *proc, const char *name, struct mgp_type *type); + + +int mgp_must_abort(struct mgp_graph *graph); + + + +struct mgp_message; + +struct mgp_messages; + +enum mgp_error mgp_message_payload(struct mgp_message *message, const char **result); + +enum mgp_error mgp_message_payload_size(struct mgp_message *message, size_t *result); + +enum mgp_error mgp_message_topic_name(struct mgp_message *message, const char **result); + +enum mgp_error mgp_message_key(struct mgp_message *message, const char **result); + +enum mgp_error mgp_message_key_size(struct mgp_message *message, size_t *result); + +enum mgp_error mgp_message_timestamp(struct mgp_message *message, int64_t *result); + +enum mgp_error mgp_messages_size(struct mgp_messages *message, size_t *result); + +enum mgp_error mgp_messages_at(struct mgp_messages *message, size_t index, struct mgp_message **result); + +typedef void (*mgp_trans_cb)(struct mgp_messages *, struct mgp_graph *, struct mgp_result *, struct mgp_memory *); + +enum mgp_error mgp_module_add_transformation(struct mgp_module *module, const char *name, mgp_trans_cb cb); + +#ifdef __cplusplus +} // extern "C" +#endif + +#endif // MG_PROCEDURE_H +``` diff --git a/docs2/custom-query-modules/c/c-example.md b/docs2/custom-query-modules/c/c-example.md new file mode 100644 index 00000000000..ff00e4b91aa --- /dev/null +++ b/docs2/custom-query-modules/c/c-example.md @@ -0,0 +1,237 @@ +# Example of a query module written in C+ + +Query modules can be implemented using the [C +API](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md) +provided by Memgraph. Such modules need to be compiled to a shared library so +that they can be loaded when Memgraph starts. This means that you can write the +procedures in any programming language that can work with C and be compiled to +the ELF shared library format (`.so`). + +:::warning + +If the programming language of your choice throws exceptions, these exceptions +should never leave the scope of your module! You should have a top-level +exception handler that returns an error value and potentially logs the error +message. Exceptions that cross the module boundary will cause unexpected issues. + +::: + +Every single Memgraph installation comes with the `example.so` query module +located in the `/usr/lib/memgraph/query_modules` directory. It was provided as +an example of a query module written with C API for you to examine and learn +from. The `query_module` directory also contains `src` directory, with +`example.c` file. + +Let's take a look at the `example.c` file. + +```c +#include "mg_procedure.h" +``` + +In the first line, we include `mg_procedure.h`, which contains declarations of +all functions that can be used to implement a query module procedure. This file +is located in the Memgraph installation directory, under +`/usr/include/memgraph`. To compile the module, you will have to pass the +appropriate flags to the compiler, for example, `clang`: + +```plaintext +clang -Wall -shared -fPIC -I /usr/include/memgraph example.c -o example.so +``` + +### Query procedures + +Next, we have a `procedure` function. This function will serve as the callback +for our `example.procedure` invocation through Cypher. + +```c +static void procedure(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_result *result, struct mgp_memory *memory) { + ... +} +``` + +If this was C++ you'd probably write the function like this: + +```cpp +namespace { +void procedure(const mgp_list *args, const mgp_graph *graph, + mgp_result *result, mgp_memory *memory) { + try { + ... + } catch (const std::exception &e) { + // We must not let any exceptions out of our module. + mgp_result_set_error_msg(result, e.what()); + return; + } +} +} +``` + +The `procedure` function receives the list of arguments (`args`) passed in the +query. The parameter `result` is used to fill in the resulting records of the +procedure. Parameters `graph` and `memory` are context parameters of the +procedure, and they are used in some parts of the provided C API. + +For more information on what exactly is possible with C API, take a look at the +`mg_procedure.h` file or the [C API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md). + +The following line contains the `mgp_init_module` function that registers procedures +that can be invoked through Cypher. Even though the example has only one +`procedure`, you can register multiple different procedures in a single module. + +Procedures are invoked using the `CALL . ...` syntax. The +`` will correspond to the name of the shared library. Since we +compile our example into `example.so`, then the module is called `example`. +Procedure names can be different than their corresponding implementation +callbacks because the procedure name is defined when registering a procedure. + +```c +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `procedure` as a read procedure with the name "procedure". + struct mgp_proc *proc = + mgp_module_add_read_procedure(module, "procedure", procedure); + // Return non-zero on error. + if (!proc) return 1; + // Additional code for better specifying the procedure (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` + +The omitted part specifies the signature of the registered procedure. The +signature specification states what kind of arguments a procedure accepts and +what will be the resulting set of the procedure. For information on signature +specification API, take a look at `mg_procedure.h` file and read the +documentation on functions prefixed with `mgp_proc_`. + +The passed in `memory` argument is only alive throughout the execution of +`mgp_init_module`, so you must not allocate any global resources with it. If you +really need to set up a certain global state, you may do so in the +`mgp_init_module` using the standard global allocators. + +Consequently, you may want to reset any global state or release global resources +in the following function. + +```c +int mgp_shutdown_module() { + // Return 0 to indicate success. + return 0; +} +``` + +As mentioned before, no exceptions should leave your module. If you are writing +the module in a language that throws them, use exception handlers +in `mgp_init_module` and `mgp_shutdown_module` as well. + + +### Batched query procedures + +Similar to batched query procedures in Python, you can add batched query procedures in C. + +Batched procedures need 3 functions, one for each of batching, initialization, and cleanup. + +```c +static void batch(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_result *result, struct mgp_memory *memory) { + ... +} + +static void init(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_memory *memory) { + ... +} + +static void cleanup() { + ... +} +``` + +The `batch` function receives a list of arguments (`args`) passed in the +query. The parameter `result` is used to fill in the resulting records of the +procedure. Parameters `graph` and `memory` are context parameters of the +procedure, and they are used in some parts of the provided C API. + +At some point, `batch` needs to return an empty `result` to signal that the `batch` procedure is done with execution and `cleanup` can be called. `init` doesn't receive `result` as it is only used for initialization. `init` function will receive same arguments which are registered and passed to the `batch` function. + +Memgraph ensures to call `init` before the `batch` function and `cleanup` at the end. The user directly invokes the `batch` function through OpenCypher. + +The argument passed in `memory` is only alive throughout the execution of +`mgp_init_module`, so you must not allocate any global resources with it. Consequently, you may want to reset any global state or release global resources +in the `cleanup` function. + +For more information on what exactly is possible with C API, take a look at the +`mg_procedure.h` file or the [C API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md). + +The following line contains the `mgp_init_module` function that registers procedures +that can be invoked through Cypher. Even though the example has only one +`procedure`, you can register multiple different procedures in a single module. + +Batch procedures are invoked using the `CALL . ...` syntax. The +`` will correspond to the name of the shared library. Since the example is complied into `example.so`, the module is called `example`. +As mentioned, Memgraph ensures to call `init` before `` and `cleanup` once `` signals end with an empty result. + +```c +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `procedure` as a read procedure with the name "procedure". + struct mgp_proc *proc = + mgp_module_add_batch_read_procedure(module, "procedure", batched, init, cleanup); + // Return non-zero on error. + if (!proc) return 1; + // Additional code for better specifying the procedure (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` + + +### Magic functions + +A major part of defining the "Magic function" is similar to query procedures. +The steps of defining a callback and registering arguments are repeated in the +magic functions, only with a different syntax. + +To define a function, the first step is to define a callback. The example only +shows C++ code. + +```cpp +namespace { +void function(const mgp_list *args, mgp_func_context *func_ctx, + mgp_func_result *result, mgp_memory *memory) { + try { + ... + } catch (const std::exception &e) { + // We must not let any exceptions out of our module. + mgp_func_result_set_error_msg(result, e.what(), memory); + return; + } +} +} +``` + +The parameter `args` is used to fetch the required and optional arguments from +the Cypher call. The parameter `result` defines the resulting value. It can +carry either an error or a return value, depending on the runtime execution. +There is no `mgp_graph` argument because the graph is immutable in functions. + +To initialize and register the written function as a magic function, one should +write the initialization in the `mgp_init_module`. The registered function can +then be called in similar fashion as the built-in functions, just with the +syntax defining the module it is stored in: `.(...)`. + +```cpp +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `function` as a Magic function with the name "function". + struct mgp_func *func = + mgp_module_add_function(module, "function", function); // Above defined function pointer + // Return non-zero on error. + if (!func) return 1; + // Additional code for better specifying the function with arguments (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` \ No newline at end of file diff --git a/docs2/custom-query-modules/contributing.md b/docs2/custom-query-modules/contributing.md new file mode 100644 index 00000000000..8faee6485f0 --- /dev/null +++ b/docs2/custom-query-modules/contributing.md @@ -0,0 +1,231 @@ +--- +id: contributing +title: How to contribute to MAGE? +sidebar_label: Contributing +--- + +## Contributing + +We encourage everyone to contribute with their own algorithm implementations and +ideas. If you want to contribute or report a bug, please take a look at the +[contributions +guide](https://github.com/memgraph/mage/blob/main/CONTRIBUTING.md). + +Here are links to Memgraph and MAGE, which are both opened and ready to recieve feedback +and your contribution: + +- :file_folder: [**Memgraph**](https://github.com/memgraph/memgraph) +- :file_folder: [**MAGE**](https://github.com/memgraph/mage) + +## Code of Conduct + +Everyone participating in this project is governed by the [Code of +Conduct](https://github.com/memgraph/mage/blob/main/CODE_OF_CONDUCT.md). By +participating, you are expected to uphold this code. Please report unacceptable +behavior to . + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +## Prerequisites + +- You have developed a query module by yourself and/or followed our tutorial for + [Python](/mage/how-to-guides/create-a-new-module-python) or + [C++](/mage/how-to-guides/create-a-new-module-cpp) + +:::warning + +The following steps depend on how you installed Memgraph and MAGE as we need +to import the modules. + +::: + +## Importing query modules into Memgraph + + + + +**1.** Start the MAGE container with: + +```shell +docker run --rm -p 7687:7687 --name mage memgraph-mage:version-dev +``` + +Be sure to replace the `version` with the specific version, for example: + +```shell +docker run --rm -p 7687:7687 --name mage memgraph-mage:1.4-dev +``` + +**2.** Copy your local MAGE directory inside the container in order for Memgraph +to import the query modules: + +**a)** First, you need to copy the files to the container named `mage`: + +```shell +docker cp . mage:/mage/ +``` + +**b)** Then, you need to position yourself inside the container as root: + +```shell +docker exec -u root -it mage /bin/bash +``` + +> Note: If you performed the build locally, make sure to delete the `cpp/build` +> directory because you might be dealing with different architectures or +> problems with `CMakeCache.txt`. To delete it, run: +> +> `rm -rf cpp/build` + +**c)** After that, build MAGE with the option to copy executables from +`mage/dist` to `/usr/lib/memgraph/query_modules`: + +```shell +python3 setup build -p /usr/lib/memgraph/query_modules/ +``` + +**d)** Everything should be ready to exit the container and load the query +modules: + +``` +exit +``` + + + + +**1.** To create the `dev` **MAGE** image, run the following command: + +```shell +docker build --target dev -t memgraph-mage:dev . +``` + +**2.** Start the container with the following command: + +```shell +docker run --rm -p 7687:7687 --name mage memgraph-mage:dev +``` + +:::info + +If you make any changes to the module, you can stop the container and do a +rebuild. Additionally, if you don't want to rebuild everything, you can: +1. Copy the changes to the container. +2. Perform a build inside the container. +3. Copy the executables to the `/usr/lib/memgraph/query_modules/` directory, + where Memgraph is looking for query modules. + +The process is the same as described in step **2** of the tab `Docker Hub`. + +::: + + + + +**1.** Make sure your Memgraph instance is running: + +``` +sudo systemctl status memgraph.service +``` + +**2.** Now, we need to copy our developed query module `random_walk.py` to +`/usr/lib/memgraph/query_modules`: + +```shell +python3 setup build -p /usr/lib/memgraph/query_modules +``` + + + + + +## Querying + +> Note that query modules are loaded into Memgraph on startup, so if your +> instance was already running, you would need to execute the following query +> inside one of the [querying +> platforms](https://docs.memgraph.com/memgraph/connect-to-memgraph) to load +> them: + +```cypher +CALL mg.load_all(); +``` + +Lastly, run a query and test your module: + +```cypher +MERGE (start:Node {id: 0})-[:RELATION]->(:Node {id: 1})-[:RELATION]->(:Node {id: 2}) +CALL random_walk.get(start, 2) YIELD path +RETURN path +``` + +Since every query module is run as one transaction in Memgraph, the user can stop +the query module by [terminating the corresponding transaction](/memgraph/reference-guide/transactions). The user first needs +to find out the transaction ID using `SHOW TRANSACTIONS` command and then terminate it +using the `TERMINATE TRANSACTIONS ` command. + +## Testing + +Test decoupled parts of your code that don't depend on Memgraph like you would +in any other setting. E2e (end to end) tests, on the other hand, depend on +internal Memgraph data structures, like nodes and edges. After running Memgraph, +we need to prepare the testing environment on the host machine. Position +yourself in the mage directory you cloned from GitHub. The expected folder +structure for each module is the following: + +```plaintext +mage +└── e2e + └── random_walk_test + └── test_base + β”œβ”€β”€ input.cyp + └── test.yml +``` + +`input.cyp` represents a Cypher script for entering the data into the database. +To simplify this tutorial, we'll leave the database empty. `test.yml` specifies +which test query should be run by the database and what should be the result or +exception. Create the files following the aforementioned directory structure. + +### input.cyp + +```cypher +MATCH (n) DETACH DELETE n; +``` + +### test.yml + +```shell +query: > + MATCH (start:Node {id: 0}) + CALL random_walk.get(start, 2) YIELD path + RETURN path + +output: [] +``` + +Lastly, run the e2e tests with python: + +```shell +python test_e2e +``` + +## Next steps + +Feel free to create an issue or open a pull request on our [Github +repo](https://github.com/memgraph/mage) to speed up the development.
+Also, don't forget to throw us a star on GitHub. :star: + + +## Feedback +Your feedback is always welcome and valuable to us. Please don't hesitate to +post on our [Discord](https://www.discord.gg/memgraph). diff --git a/docs2/custom-query-modules/cpp/cpp-api.md b/docs2/custom-query-modules/cpp/cpp-api.md new file mode 100644 index 00000000000..0d8910bfd7d --- /dev/null +++ b/docs2/custom-query-modules/cpp/cpp-api.md @@ -0,0 +1,1948 @@ +--- +id: cpp-api +title: Query modules C++ API +sidebar_label: C++ API +slug: /reference-guide/query-modules/api/cpp-api +--- + +This is the API documentation for `mgp.hpp`, which contains declarations of all +functions in the C++ API for implementing query module procedures and functions. +The source file can be found in the Memgraph installation directory, under +`/usr/include/memgraph`. + +:::tip + +To see how to implement query modules in C++, take a look at +[the example we provided](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#cpp-api). + +::: + +:::tip + +If you install any C++ modules after running Memgraph, you’ll need to [load +them into Memgraph](../load-call-query-modules#loading-query-modules) or restart +Memgraph in order to use them. + +::: + +## Functions and procedures + +With this API it’s possible to extend your Cypher queries with **functions** and **procedures** with +`AddProcedure` and `AddFunction`. + +:::tip + +The API needs memory access to add procedures and functions; this can be done with `mgp::memory = memory;`. + +::: + +Functions are simple operations that return a single value and can be used in any expression or predicate. + +Procedures are more complex computations that may modify the graph, and their output is available to +later processing steps in your query. A procedure may only be run from `CALL` clauses. +The output is a stream of **records** that is made accessible with a `YIELD` clause. + +### AddProcedure + +Add a procedure to your query module. The procedure is registered as `[QUERY_MODULE_NAME].[PROC_NAME]` +and can be used in Cypher queries. + +```cpp +void AddProcedure( + mgp_proc_cb callback, + std::string_view name, + ProcedureType proc_type, + std::vector parameters, + std::vector returns, + mgp_module *module, + mgp_memory *memory); +``` + +#### Input + +- `callback`: procedure callback +- `name`: procedure name +- `proc_type`: procedure type (read/write) +- `parameters`: vector (list) of procedure parameters +- `returns`: vector (list) of procedure return values +- `module`: the query module that the procedure is added to +- `memory`: access to memory + +#### ProcedureType + +Enum class for Cypher procedure types. + +- `ProcedureType::Read`: read procedure +- `ProcedureType::Write`: write procedure + +### AddBatchProcedure + +Add a batch procedure to your query module. The procedure is registered as `[QUERY_MODULE_NAME].[PROC_NAME]` +and can be used in Cypher queries. + +```cpp +void AddBatchProcedure( + mgp_proc_cb callback, + mgp_proc_initializer initializer, + mgp_proc_cleanup cleanup, + std::string_view name, + ProcedureType proc_type, + std::vector parameters, + std::vector returns, + mgp_module *module, + mgp_memory *memory); +``` + +#### Input + +- `callback`: procedure callback, invoked through OpenCypher +- `initializer`: procedure initializer, invoked before callback +- `cleanup`: procedure cleanup, invoked after batching is done +- `name`: procedure name +- `proc_type`: procedure type (read/write) +- `parameters`: vector (list) of procedure parameters +- `returns`: vector (list) of procedure return values +- `module`: the query module that the procedure is added to +- `memory`: access to memory + +#### ProcedureType + +Enum class for Cypher procedure types. + +- `ProcedureType::Read`: read procedure +- `ProcedureType::Write`: write procedure + +### AddFunction + +Add a function to your query module. The function is registered as `[QUERY_MODULE_NAME].[FUNC_NAME]` +and can be used in Cypher queries. + +```cpp +void AddFunction( + mgp_func_cb callback, + std::string_view name, + std::vector parameters, + std::vector returns, + mgp_module *module, + mgp_memory *memory); +``` + +#### Input + +- `callback`: function callback +- `name`: function name +- `parameters`: vector (list) of function parameters +- `returns`: vector (list) of function return values +- `module`: the query module that the procedure is added to +- `memory`: access to memory + +### Parameter + +Represents a procedure/function parameter. Parameters are defined by their name, type, +and (if optional) default value. + +#### Constructors + +Creates a non-optional parameter with the given `name` and `type`. +```cpp +Parameter(std::string_view name, Type type) +``` + +Creates an optional Boolean parameter with the given `name` and `default_value`. +```cpp +Parameter(std::string_view name, Type type, bool default_value) +``` + +Creates an optional integer parameter with the given `name` and `default_value`. +```cpp +Parameter(std::string_view name, Type type, int default_value) +``` + +Creates an optional floating-point parameter with the given `name` and `default_value`. +```cpp +Parameter(std::string_view name, Type type, double default_value) +``` + +Creates an optional string parameter with the given `name` and `default_value`. +```cpp +Parameter(std::string_view name, Type type, std::string_view default_value) +Parameter(std::string_view name, Type type, const char *default_value) +``` + +Creates a non-optional list parameter with the given `name` and `item_type`. +The `list_type` parameter is organized as follows: `{Type::List, Type::[ITEM_TYPE]}`. +```cpp +Parameter(std::string_view name, std::pair list_type) +``` + +Creates an optional list parameter with the given `name`, `item_type`, and `default_value`. +The `list_type` parameter is organized as follows: `{Type::List, Type::[ITEM_TYPE]}`. +```cpp +Parameter(std::string_view name, std::pair list_type, Value default_value) +``` + +#### Member variables + +| Name | Type | Description | +| ----------------- | ------------------ | ----------------------------------- | +| `name` | `std::string_view` | parameter name | +| `type_` | `Type` | parameter type | +| `list_item_type_` | `Type` | (list parameters) item type | +| `optional` | `bool` | whether the parameter is optional | +| `default_value` | `Value` | (optional parameters) default value | + +### Return + +Represents a procedure/function return value. Values are defined by their name and type. + +#### Constructors + +Creates a return value with the given `name` and `type`. +```cpp +Return(std::string_view name, Type type) +``` + +Creates a return value with the given `name` and `list_type`. +The `list_type` parameter is organized as follows: `{Type::List, Type::[ITEM_TYPE]}`. +```cpp +Return(std::string_view name, std::pair list_type) +``` + +#### Member variables + +| Name | Type | Description | +| ----------------- | ------------------ | ----------------------- | +| `name` | `std::string_view` | return name | +| `type_` | `Type` | return type | +| `list_item_type_` | `Type` | (list values) item type | + +### RecordFactory + +Factory class for [`Record`](#Record). + +#### Constructors + +```cpp +explicit RecordFactory(mgp_result *result) +``` + +#### Member functions + +| Name | Description | +| ----------------- | ----------------------------- | +| `NewRecord` | Adds a new result record. | +| `SetErrorMessage` | Sets the given error message. | + + +##### NewRecord + +Adds a new result record. + +```cpp + const Record NewRecord() const +``` + +##### SetErrorMessage + +Sets the given error message. + +```cpp + void SetErrorMessage(const std::string_view error_msg) const +``` + +```cpp + void SetErrorMessage(const char *error_msg) const +``` + +### Record + +Represents a **record** - the building block of Cypher procedure results. Each result is a stream of records, +and a function’s record is a sequence of (field name: output value) pairs. + +#### Constructors + +```cpp +explicit Record(mgp_result_record *record) +``` + +#### Member functions + +| Name | Description | +| -------- | ------------------------------------------------------- | +| `Insert` | Inserts a value of given type under field `field_name`. | + +##### Insert + +Inserts a value of given type under field `field_name`. + +```cpp + void Insert(const char *field_name, bool value) +``` + +```cpp + void Insert(const char *field_name, std::int64_t value) +``` + +```cpp + void Insert(const char *field_name, double value) +``` + +```cpp + void Insert(const char *field_name, std::string_view value) +``` + +```cpp + void Insert(const char *field_name, const char *value) +``` + +```cpp + void Insert(const char *field_name, const List &value) +``` + +```cpp + void Insert(const char *field_name, const Map &value) +``` + +```cpp + void Insert(const char *field_name, const Node &value) +``` + +```cpp + void Insert(const char *field_name, const Relationship &value) +``` + +```cpp + void Insert(const char *field_name, const Path &value) +``` + +```cpp + void Insert(const char *field_name, const Date &value) +``` + +```cpp + void Insert(const char *field_name, const LocalTime value) +``` + +```cpp + void Insert(const char *field_name, const LocalDateTime value) +``` + +```cpp + void Insert(const char *field_name, const Duration value) +``` + +### Result + +Represents a **result** - the single return value of a Cypher function. + +#### Constructors + +```cpp +explicit Result(mgp_func_result *result) +``` + +#### Member functions + +| Name | Description | +| ----------------- | ---------------------------------- | +| `SetValue` | Sets a return value of given type. | +| `SetErrorMessage` | Sets the given error message. | + +##### SetValue + +Sets a return value of given type. + +```cpp + void SetValue(bool value) +``` + +```cpp + void SetValue(std::int64_t value) +``` + +```cpp + void SetValue(double value) +``` + +```cpp + void SetValue(std::string_view value) +``` + +```cpp + void SetValue(const char *value) +``` + +```cpp + void SetValue(const List &value) +``` + +```cpp + void SetValue(const Map &value) +``` + +```cpp + void SetValue(const Node &value) +``` + +```cpp + void SetValue(const Relationship &value) +``` + +```cpp + void SetValue(const Path &value) +``` + +```cpp + void SetValue(const Date &value) +``` + +```cpp + void SetValue(const LocalTime value) +``` + +```cpp + void SetValue(const LocalDateTime value) +``` + +```cpp + void SetValue(const Duration value) +``` + +##### SetErrorMessage + +Sets the given error message. + +```cpp + void SetErrorMessage(const std::string_view error_msg) const +``` + +```cpp + void SetErrorMessage(const char *error_msg) const +``` + +## Graph API + +This section covers the interface for working with the Memgraph DB graph using the C++ API. +A description of data types is available [here](https://memgraph.com/docs/memgraph/reference-guide/data-types). + +### Graph + +#### Constructors + +```cpp +explicit Graph(mgp_graph *graph) +``` + +#### Member functions + +| Name | Description | +| ---------------------- | --------------------------------------------------------------------------------------------- | +| `Order` | Returns the graph order (number of nodes). | +| `Size` | Returns the graph size (number of relationships). | +| `Nodes` (`GraphNodes`) | Returns an iterable structure of the graph’s nodes. | +| `Relationships` | Returns an iterable structure of the graph’s relationships. | +| `GetNodeById` | Returns the graph node with the given ID. | +| `ContainsNode` | Returns whether the graph contains the given node (accepts node or its ID). | +| `ContainsRelationship` | Returns whether the graph contains the given relationship (accepts relationship or its ID). | +| `IsMutable` | Returns whether the graph is mutable. | +| `CreateNode` | Creates a node and adds it to the graph. | +| `DeleteNode` | Deletes a node from the graph. | +| `DetachDeleteNode` | Deletes a node and all its incident edges from the graph. | +| `CreateRelationship` | Creates a relationship of type `type` between nodes `from` and `to` and adds it to the graph. | +| `DeleteRelationship` | Deletes a relationship from the graph. | + +##### Order + +Returns the graph order (number of nodes). + +```cpp +int64_t Order() const +``` + +##### Size + +Returns the graph size (number of relationships). + +```cpp +int64_t Size() const +``` + +##### Nodes (GraphNodes) + +Returns an iterable structure of the graph’s nodes. + +```cpp +GraphNodes Nodes() const +``` + +##### Relationships + +Returns an iterable structure of the graph’s relationships. + +```cpp +GraphRelationships Relationships() const +``` + +##### GetNodeById + +Returns the graph node with the given ID. + +```cpp +Node GetNodeById(const Id node_id) const +``` + +##### ContainsNode + +Returns whether the graph contains a node with the given ID. + +```cpp +bool ContainsNode(const Id node_id) const +``` + +Returns whether the graph contains the given node. + +```cpp +bool ContainsNode(const Node &node) const +``` + +##### ContainsRelationship + +```cpp +bool ContainsRelationship(const Id relationship_id) const +``` + +```cpp +bool ContainsRelationship(const Relationship &relationship) const +``` + +##### IsMutable + +Returns whether the graph is mutable. + +```cpp +bool IsMutable() const +``` + +##### CreateNode + +Creates a node and adds it to the graph. + +```cpp +Node CreateNode(); +``` + +##### DeleteNode + +Deletes a node from the graph. + +```cpp +void DeleteNode(const Node &node) +``` + +##### DetachDeleteNode + +Deletes a node and all its incident edges from the graph. + +```cpp +void DetachDeleteNode(const Node &node) +``` + +##### CreateRelationship + +Creates a relationship of type `type` between nodes `from` and `to` and adds it to the graph. + +```cpp +Relationship CreateRelationship(const Node &from, const Node &to, const std::string_view type) +``` + +##### DeleteRelationship + +Deletes a relationship from the graph. + +```cpp +void DeleteRelationship(const Relationship &relationship) +``` + +#### GraphNodes + +Auxiliary class providing an iterable view of the nodes contained in the graph. +`GraphNodes` values may only be used for iteration to obtain the values stored within. + +##### Constructors + +```cpp +explicit GraphNodes(mgp_vertices_iterator *nodes_iterator) +``` + +##### Member variables + +| Name | Type | Description | +| ---------- | ---------------------- | ---------------------------------------- | +| `Iterator` | `GraphNodes::Iterator` | Const forward iterator for `GraphNodes`. | + +##### Member functions + +| Name | Description | +| ----------------------------------------- | ------------------------------------------------------- | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `GraphNodes` iterator. | + +#### GraphRelationships + +Auxiliary class providing an iterable view of the relationships contained in the graph. +`GraphRelationships` values may only be used for iteration to obtain the values stored within. + +##### Constructors + +```cpp +explicit GraphRelationships(mgp_graph *graph) +``` + +##### Member variables + +| Name | Type | Description | +| ---------- | ------------------------------ | ------------------------------------------------ | +| `Iterator` | `GraphRelationships::Iterator` | Const forward iterator for `GraphRelationships`. | + +##### Member functions + +| Name | Description | +| ----------------------------------------- | -------------------------------------------------------------- | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `GraphRelationship` iterator. | + +### Node + +Represents a node (vertex) of the Memgraph graph. + +#### Constructors + +Creates a Node from the copy of the given `mgp_vertex`. +```cpp +explicit Node(mgp_vertex *ptr) +explicit Node(const mgp_vertex *const_ptr) +``` + +Copy and move constructors: +```cpp +Node(const Node &other) noexcept +Node(Node &&other) noexcept +``` + +#### Member functions + +| Name | Description | +|--------------------|---------------------------------------------------------------------| +| `Id` | Returns the node’s ID. | +| `Labels` | Returns an iterable & indexable structure of the node’s labels. | +| `HasLabel` | Returns whether the node has the given `label`. | +| `Properties` | Returns an iterable & indexable structure of the node’s properties. | +| `InRelationships` | Returns an iterable structure of the node’s inbound relationships. | +| `OutRelationships` | Returns an iterable structure of the node’s outbound relationships. | +| `AddLabel` | Adds a label to the node. | +| `SetProperty` | Set value of node's property | +| `GetProperty` | Get value of node's property | + +##### Id + +Returns the node’s ID. + +```cpp +mgp::Id Id() const +``` + +##### Labels + +Returns an iterable & indexable structure of the node’s labels. + +```cpp +class Labels Labels() const +``` + +##### HasLabel + +Returns whether the node has the given `label`. + +```cpp +bool HasLabel(std::string_view label) const +``` + +##### Properties + +Returns an iterable & indexable structure of the node’s properties. + +```cpp +std::map Properties() const +``` + +##### GetProperty + +Gets value of node's property. + +```cpp +mgp::value GetProperty(const std::string& property) const +``` + +##### SetProperty + +Sets value of node's property. + +```cpp +void SetProperty(std::string key, std::string value) const +``` + +##### InRelationships + +Returns an iterable structure of the node’s inbound relationships. + +```cpp +Relationships InRelationships() const +``` + +##### OutRelationships + +Returns an iterable structure of the node’s outbound relationships. + +```cpp +Relationships OutRelationships() const +``` + +##### AddLabel + +Adds a label to the node. + +```cpp +void AddLabel(const std::string_view label) +``` + +#### Operators + +| Name | Description | +| --------------------------------------------- | --------------------------------------------------------- | +| `operator[]` | Returns the value of the node’s `property_name` property. | +| `operator==`
`operator!=`
`operator<` | comparison operators | + +##### operator[] + +Returns the value of the node’s `property_name` property. + +```cpp +const Value operator[](std::string_view property_name) const +``` + +### Relationship + +Represents a relationship (edge) of the Memgraph graph. + +#### Constructors + +Creates a Relationship from the copy of the given `mgp_edge`. +```cpp +explicit Relationship(mgp_edge *ptr) +explicit Relationship(const mgp_edge *const_ptr) +``` + +Copy and move constructors: +```cpp +Relationship(const Relationship &other) noexcept +Relationship(Relationship &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ------------------ | --------------------------------------------------------------------------- | +| `Id` | Returns the relationship’s ID. | +| `Type` | Returns the relationship’s type. | +| `Properties` | Returns an iterable & indexable structure of the relationship’s properties. | +| `SetProperty` | Set value of relationship's property | +| `GetProperty` | Get value of relationship's property | +| `From` | Returns the relationship’s source node. | +| `To` | Returns the relationship’s destination node. | + +##### Id + +Returns the relationship’s ID. + +```cpp +mgp::Id Id() const +``` + +##### Type + +Returns the relationship’s type. + +```cpp +std::string_view Type() const +``` + +##### Properties + +Returns an iterable & indexable structure of the relationship’s properties. + +```cpp +std::map Properties() const +``` +##### GetProperty + +Gets value of the relationship's property. + +```cpp +mgp::value GetProperty(const std::string& property) const +``` + +##### SetProperty + +Sets value of the relationship's property. + +```cpp +void SetProperty(std::string key, std::string value) const +``` + +##### From + +Returns the relationship’s source node. + +```cpp +Node From() const +``` + +##### To + +Returns the relationship’s source node. + +```cpp +Node To() const +``` + +#### Operators + +| Name | Description | +| --------------------------------------------- | ----------------------------------------------------------------- | +| `operator[]` | Returns the value of the relationship’s `property_name` property. | +| `operator==`
`operator!=`
`operator<` | comparison operators | + +##### operator[] + +Returns the value of the relationship’s `property_name` property. + +```cpp +const Value operator[](std::string_view property_name) const +``` + +#### Relationships + +Auxiliary class providing an iterable view of the relationships adjacent to a node. +`Relationships` values may only be used for iteration to obtain the values stored within. + +##### Constructors + +```cpp +explicit Relationships(mgp_edges_iterator *relationships_iterator) +``` + +##### Member variables + +| Name | Type | Description | +| ---------- | ------------------------- | ------------------------------------------- | +| `Iterator` | `Relationships::Iterator` | Const forward iterator for `Relationships`. | + +##### Member functions + +| Name | Description | +| ----------------------------------------- | ---------------------------------------------------------- | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `Relationships` iterator. | + +### Id + +Represents the unique ID possessed by all Memgraph nodes and relationships. + +#### Member functions + +| Name | Description | +| ---------- | ------------------------------------------ | +| `FromUint` | Constructs an `Id` object from `uint64_t`. | +| `FromInt` | Constructs an `Id` object from `int64_t`. | +| `AsUint` | Returns the ID value as `uint64_t`. | +| `AsInt` | Returns the ID value as `int64_t`. | + +##### FromUint + +Constructs an `Id` object from `uint64_t`. + +```cpp +static Id FromUint(uint64_t id) +``` + +##### FromInt + +Constructs an `Id` object from `int64_t`. + +```cpp +static Id FromInt(int64_t id) +``` + +##### AsUint + +Returns the ID value as `uint64_t`. + +```cpp +int64_t AsUint() const +``` + +##### AsInt + +Returns the ID value as `int64_t`. + +```cpp +int64_t AsInt() const +``` + +#### Operators + +| Name | Description | +| --------------------------------------------- | -------------------- | +| `operator==`
`operator!=`
`operator<` | comparison operators | + + +### Labels + +Represents a view of node labels. + +#### Constructors + +```cpp +explicit Labels(mgp_vertex *node_ptr) +``` + +Copy and move constructors: +```cpp +Labels(const Labels &other) noexcept +Labels(Labels &&other) noexcept +``` + +#### Member variables + +| Name | Type | Description | +| ---------- | ------------------ | ------------------------------------ | +| `Iterator` | `Labels::Iterator` | Const forward iterator for `Labels`. | + +#### Member functions + +| Name | Description | +| ----------------------------------------- | -------------------------------------------------------------- | +| `Size` | Returns the number of the labels, i.e. the size of their list. | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `Labels` iterator. | + +##### Size + +Returns the number of the labels, i.e. the size of their list. + +```cpp +size_t Size() const +``` + + +#### Operators + +| Name | Description | +| ------------ | --------------------------------------------- | +| `operator[]` | Returns the node’s label at position `index`. | + +##### operator[] + +Returns the node’s label at position `index`. + +```cpp +std::string_view operator[](size_t index) const +``` + +### Date + +Represents a date with a year, month, and day. + +#### Constructors + +Creates a Date object from the copy of the given `mgp_date`. +```cpp +explicit Date(mgp_date *ptr) +explicit Date(const mgp_date *const_ptr) +``` + +Creates a Date object from the given string representing a date in the ISO 8601 format +(`YYYY-MM-DD`, `YYYYMMDD`, or `YYYY-MM`). +```cpp +explicit Date(std::string_view string) +``` + +Creates a Date object with the given `year`, `month`, and `day` properties. +```cpp +Date(int year, int month, int day) +``` + +Copy and move constructors: +```cpp +Date(const Date &other) noexcept +Date(Date &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ----------- | ------------------------------------------------------------- | +| `Now` | Returns the current `Date`. | +| `Year` | Returns the date’s `year` property. | +| `Month` | Returns the date’s `month` property. | +| `Day` | Returns the date’s `day` property. | +| `Timestamp` | Returns the date’s timestamp (microseconds since Unix epoch). | + +##### Now + +Returns the current `Date`. + +```cpp +static Date Now() +``` + +##### Year + +Returns the date’s `year` property. + +```cpp +int Year() const +``` + +##### Month + +Returns the date’s `month` property. + +```cpp +int Month() const +``` + +##### Day + +Returns the date’s `day` property. + +```cpp +int Day() const +``` + +##### Timestamp + +Returns the date’s timestamp (microseconds since Unix epoch). + +```cpp +int64_t Timestamp() const +``` + +#### Operators + +| Name | Description | +| ---------------------------- | -------------------- | +| `operator+`
`operator-` | arithmetic operators | +| `operator==`
`operator<` | comparison operators | + +##### operator- + +```cpp +Date operator-(const Duration &dur) const +``` +```cpp +Duration operator-(const Date &other) const +``` + +##### operator[] + +Returns the value of the relationship’s `property_name` property. + +```cpp +const Value operator[](std::string_view property_name) const +``` + +### LocalTime + +Represents a time within the day without timezone information. + +#### Constructors + +Creates a LocalTime object from the copy of the given `mgp_local_time`. +```cpp +explicit LocalTime(mgp_local_time *ptr) +explicit LocalTime(const mgp_local_time *const_ptr) +``` + +Creates a LocalTime object from the given string representing a date in the ISO 8601 format +(`[T]hh:mm:ss`, `[T]hh:mm`, `[T]hhmmss`, `[T]hhmm`, or `[T]hh`). +```cpp +explicit LocalTime(std::string_view string) +``` + +Creates a LocalTime object with the given `hour`, `minute`, `second`, `millisecond`, and `microsecond` properties. +```cpp +LocalTime(int hour, int minute, int second, int millisecond, int microsecond) +``` + +Copy and move constructors: +```cpp +LocalTime(const LocalTime &other) noexcept +LocalTime(LocalTime &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ------------- | --------------------------------------------------------------- | +| `Now` | Returns the current `LocalTime`. | +| `Hour` | Returns the object’s `hour` property. | +| `Minute` | Returns the object’s `minute` property. | +| `Second` | Returns the object’s `second` property. | +| `Millisecond` | Returns the object’s `millisecond` property. | +| `Microsecond` | Returns the object’s `microsecond` property. | +| `Timestamp` | Returns the object’s timestamp (microseconds since Unix epoch). | + +##### Now + +Returns the current `LocalTime`. + +```cpp +static LocalTime Now() +``` + +##### Hour + +Returns the object’s `hour` property. + +```cpp +int Hour() const +``` + +##### Minute + +Returns the object’s `minute` property. + +```cpp +int Minute() const +``` + +##### Second + +Returns the object’s `second` property. + +```cpp +int Second() const +``` + +##### Millisecond + +Returns the object’s `millisecond` property. + +```cpp +int Millisecond() const +``` + +##### Microsecond + +Returns the object’s `microsecond` property. + +```cpp +int Microsecond() const +``` + +##### Timestamp + +Returns the object’s timestamp (microseconds since Unix epoch). + +```cpp +int64_t Timestamp() const +``` + +#### Operators + +| Name | Description | +| ---------------------------- | -------------------- | +| `operator+`
`operator-` | arithmetic operators | +| `operator==`
`operator<` | comparison operators | + +##### operator- + +```cpp +LocalTime operator-(const Duration &dur) const +``` +```cpp +Duration operator-(const LocalDateTime &other) const +``` + +### LocalDateTime + +Temporal type representing a date and a local time. + +#### Constructors + +Creates a LocalDateTime object from the copy of the given `mgp_local_date_time`. +```cpp +explicit LocalDateTime(mgp_local_date_time *ptr) +explicit LocalDateTime(const mgp_local_date_time *const_ptr) +``` + +Creates a LocalDateTime object from the given string representing a date in the ISO 8601 format +(`YYYY-MM-DDThh:mm:ss`, `YYYY-MM-DDThh:mm`, `YYYYMMDDThhmmss`, `YYYYMMDDThhmm`, or `YYYYMMDDThh`). +```cpp +explicit LocalDateTime(std::string_view string) +``` + +Creates a LocalDateTime object with the given `year`, `month`, `day`, `hour`, `minute`, `second`, `millisecond`, +and `microsecond` properties. +```cpp +LocalDateTime(int year, int month, int day, int hour, int minute, int second, int millisecond, int microsecond) +``` + +Copy and move constructors: +```cpp +LocalDateTime(const LocalDateTime &other) noexcept +LocalDateTime(LocalDateTime &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ------------- | --------------------------------------------------------------- | +| `Now` | Returns the current `LocalDateTime`. | +| `Year` | Returns the object’s `year` property. | +| `Month` | Returns the object’s `month` property. | +| `Day` | Returns the object’s `day` property. | +| `Hour` | Returns the object’s `hour` property. | +| `Minute` | Returns the object’s `minute` property. | +| `Second` | Returns the object’s `second` property. | +| `Millisecond` | Returns the object’s `millisecond` property. | +| `Microsecond` | Returns the object’s `microsecond` property. | +| `Timestamp` | Returns the object’s timestamp (microseconds since Unix epoch). | + +##### Now + +Returns the current `LocalDateTime`. + +```cpp +static LocalDateTime Now() +``` + +##### Year + +Returns the object’s `year` property. + +```cpp +int Year() const +``` + +##### Month + +Returns the object’s `month` property. + +```cpp +int Month() const +``` + +##### Day + +Returns the object’s `day` property. + +```cpp +int Day() const +``` + +##### Hour + +Returns the object’s `hour` property. + +```cpp +int Hour() const +``` + +##### Minute + +Returns the object’s `minute` property. + +```cpp +int Minute() const +``` + +##### Second + +Returns the object’s `second` property. + +```cpp +int Second() const +``` + +##### Millisecond + +Returns the object’s `millisecond` property. + +```cpp +int Millisecond() const +``` + +##### Microsecond + +Returns the object’s `microsecond` property. + +```cpp +int Microsecond() const +``` + +##### Timestamp + +Returns the date’s timestamp (microseconds since Unix epoch). + +```cpp +int64_t Timestamp() const +``` + +#### Operators + +| Name | Description | +| ---------------------------- | -------------------- | +| `operator+`
`operator-` | arithmetic operators | +| `operator==`
`operator<` | comparison operators | + +##### operator- + +```cpp +LocalDateTime operator-(const Duration &dur) const +``` +```cpp +Duration operator-(const LocalDateTime &other) const +``` + +### Duration + +Represents a period of time in Memgraph. + +#### Constructors + +Creates a Duration object from the copy of the given `mgp_duration`. +```cpp +explicit Duration(mgp_duration *ptr) +explicit Duration(const mgp_duration *const_ptr) +``` + +Creates a Duration object from the given string in the following format: `P[nD]T[nH][nM][nS]`, where (1) +`n` stands for a number, (2) capital letters are used as a separator, (3) each field in `[]` is optional, +and (4) only the last field may be a non-integer. +```cpp +explicit Duration(std::string_view string) +``` + +Creates a Duration object from the given number of microseconds. +```cpp +explicit Duration(int64_t microseconds) +``` + +Creates a Duration object with the given `day`, `hour`, `minute`, `second`, `millisecond`, and `microsecond` properties. +```cpp +Duration(double day, double hour, double minute, double second, double millisecond, double microsecond) +``` + +Copy and move constructors: +```cpp +Duration(const Duration &other) noexcept +Duration(Duration &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| -------------- | ------------------------------------- | +| `Microseconds` | Returns the duration as microseconds. | + +##### Microseconds + +Returns the duration as microseconds. + +```cpp +int64_t Microseconds() const +``` + +#### Operators + +| Name | Description | +| ---------------------------- | -------------------- | +| `operator+`
`operator-` | arithmetic operators | +| `operator==`
`operator<` | comparison operators | + +##### operator- + +```cpp +Duration operator-(const Duration &other) const +``` +```cpp +Duration operator-() const +``` + +### Path + +A path is a data structure consisting of alternating nodes and relationships, with the start +and end points of a path necessarily being nodes. + +#### Constructors + +Creates a Path from the copy of the given `mgp_path`. +```cpp +explicit Path(mgp_path *ptr) +explicit Path(const mgp_path *const_ptr) +``` + +Creates a Path starting with the given `start_node`. +```cpp +explicit Path(const Node &start_node) +``` + +Copy and move constructors: +```cpp +Path(const Path &other) noexcept +Path(Path &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ------------------- | ----------------------------------------------------------------------------------------------------- | +| `Length` | Returns the path length (number of relationships). | +| `GetNodeAt` | Returns the node at the given `index`. The `index` must be less than or equal to length of the path. | +| `GetRelationshipAt` | Returns the relationship at the given `index`. The `index` must be less than length of the path. | +| `Expand` | Adds a relationship continuing from the last node on the path. | + +##### Length + +Returns the path length (number of relationships). + +```cpp +size_t Length() const +``` + +##### GetNodeAt + +Returns the node at the given `index`. The `index` must be less than or equal to length of the path. + +```cpp +Node GetNodeAt(size_t index) const +``` + +##### GetRelationshipAt + +Returns the relationship at the given `index`. The `index` must be less than the length of the path. + +```cpp +Relationship GetRelationshipAt(size_t index) const +``` + +##### Expand + +Adds a relationship continuing from the last node on the path. + +```cpp +void Expand(const Relationship &relationship) +``` + +#### Operators + +| Name | Description | +| ----------------------------- | -------------------- | +| `operator==`
`operator!=` | comparison operators | + +### List + +A list containing any number of values of any supported type. + +#### Constructors + +Creates a List from the copy of the given `mgp_list`. +```cpp +explicit List(mgp_list *ptr) +explicit List(const mgp_list *const_ptr) +``` + +Creates an empty List. +```cpp +explicit List() +``` + +Creates a List with the given `capacity`. +```cpp +explicit List(size_t capacity) +``` + +Creates a List from the given vector. +```cpp +explicit List(const std::vector &values) +explicit List(std::vector &&values) +``` + +Creates a List from the given initializer_list. +```cpp +explicit List(const std::initializer_list list) +``` + +Copy and move constructors: +```cpp +List(const List &other) noexcept +List(List &&other) noexcept +``` + +#### Member variables + +| Name | Type | Description | +| ---------- | ---------------- | --------------------------------------------- | +| `Iterator` | `List::Iterator` | Const forward iterator for `List` containers. | + +#### Member functions + +| Name | Description | +| ----------------------------------------- | ----------------------------------------------------- | +| `Size` | Returns the size of the list. | +| `Empty` | Returns whether the list is empty. | +| `Append` | Appends the given `value` to the list. | +| `AppendExtend` | Extends the list and appends the given `value` to it. | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `List` iterator. | + +##### Size + +Returns the size of the list. + +```cpp +size_t Size() const +``` + +##### Empty + +Returns whether the list is empty. + +```cpp +bool Empty() const +``` + +##### Append + +Appends the given `value` to the list. The `value` is copied. + +```cpp +void Append(const Value &value) +``` + +Appends the given `value` to the list. Takes ownership of `value` by moving it. +The behavior of accessing `value` after performing this operation is undefined. + +```cpp +void Append(Value &&value) +``` + +##### AppendExtend + +Extends the list and appends the given `value` to it. The `value` is copied. + +```cpp +void AppendExtend(const Value &value) +``` + +Extends the list and appends the given `value` to it. Takes ownership of `value` by moving it. +The behavior of accessing `value` after performing this operation is undefined. + +```cpp +void AppendExtend(Value &&value) +``` + +#### Operators + +| Name | Description | +| ----------------------------- | --------------------------------------- | +| `operator[]` | Returns the value at the given `index`. | +| `operator==`
`operator!=` | comparison operators | + +##### operator[] + +Returns the value at the given `index`. + +```cpp +const Value operator[](size_t index) const +``` + +### Map + +A map of key-value pairs where keys are strings, and values can be of any supported type. +The pairs are represented as [MapItems](#MapItem). + +#### Constructors + +Creates a Map from the copy of the given `mgp_map`. +```cpp +explicit Map(mgp_map *ptr) +explicit Map(const mgp_map *const_ptr) +``` + +Creates an empty Map. +```cpp +explicit Map() +``` + +Creates a Map from the given STL map. +```cpp +explicit Map(const std::map &items) +explicit Map(std::map &&items) +``` + +Creates a Map from the given initializer_list (map items correspond to initializer list pairs). +```cpp +Map(const std::initializer_list> items) +``` + +Copy and move constructors: +```cpp +Map(const Map &other) noexcept +Map(Map &&other) noexcept +``` + +#### Member variables + +| Name | Type | Description | +| ---------- | ---------------- | --------------------------------------------- | +| `Iterator` | `List::Iterator` | Const forward iterator for `List` containers. | + +#### Member functions + +| Name | Description | +| ----------------------------------------- | -------------------------------------------------- | +| `Size` | Returns the size of the map. | +| `Empty` | Returns whether the map is empty. | +| `At` | Returns the value at the given `key`. | +| `Insert` | Inserts the given `key`-`value` pair into the map. | +| `begin`
`end`
`cbegin`
`cend` | Returns the beginning/end of the `Map` iterator. | + +##### Size + +Returns the size of the map. + +```cpp +size_t Size() const +``` + +##### Empty + +Returns whether the map is empty. + +```cpp +bool Empty() const +``` + +##### At + +Returns the value at the given `key`. + +```cpp +Value const At(std::string_view key) const +``` + +##### Insert + +Inserts the given `key`-`value` pair into the map. The `value` is copied. + +```cpp +void Insert(std::string_view key, const Value &value) +``` +Inserts the given `key`-`value` pair into the map. Takes ownership of `value` by moving it. +The behavior of accessing `value` after performing this operation is undefined. + +```cpp +void Insert(std::string_view key, Value &&value) +``` + +#### Operators + +| Name | Description | +| ----------------------------- | ------------------------------------- | +| `operator[]` | Returns the value at the given `key`. | +| `operator==`
`operator!=` | comparison operators | + +##### operator[] + +Returns the value at the given `key`. + +```cpp +const Value operator[](std::string_view key) const +``` + +#### MapItem + +Auxiliary data structure representing key-value pairs where keys are strings, and values can be of any supported type. + +##### Member variables + +| Name | Type | Description | +| ------- | ------------------ | -------------------------------------------------- | +| `key` | `std::string_view` | Key for accessing the value stored in a `MapItem`. | +| `value` | `Value` | The stored value. | + +##### Operators + +| Name | Description | +| --------------------------------------------- | -------------------- | +| `operator==`
`operator!=`
`operator<` | comparison operators | + +### Value + +Represents a value of any type supported by Memgraph. +The data types are described [here](https://memgraph.com/docs/memgraph/reference-guide/data-types). + +#### Constructors + +Creates a Value from the copy of the given `mgp_value`. +```cpp +explicit Value(mgp_value *ptr) +``` + +Creates a null Value. +```cpp +explicit Value() +``` + +Basic type constructors: +```cpp +explicit Value(const bool value) +explicit Value(const int64_t value) +explicit Value(const double value) +explicit Value(const char *value) +explicit Value(const std::string_view value) +``` + +Container type constructors: +```cpp +explicit Value(const List &value) +explicit Value(List &&value) +explicit Value(const Map &value) +explicit Value(Map &&value) +``` + +Graph element type constructors: +```cpp +explicit Value(const Node &value) +explicit Value(Node &&value) +explicit Value(const Relationship &value) +explicit Value(Relationship &&value) +explicit Value(const Path &value) +explicit Value(Path &&value) +``` + +Temporal type constructors: +```cpp +explicit Value(const Date &value) +explicit Value(Date &&value) +explicit Value(const LocalTime &value) +explicit Value(LocalTime &&value) +explicit Value(const LocalDateTime &value) +explicit Value(LocalDateTime &&value) +explicit Value(const Duration &value) +explicit Value(Duration &&value) +``` + +Copy and move constructors: +```cpp +Value(const Value &other) noexcept +Value(Value &&other) noexcept +``` + +#### Member functions + +| Name | Description | +| ------------- | ------------------------------------------- | +| `ptr` | Returns the pointer to the stored value. | +| `Type` | Returns the type of the value. | +| `Value[TYPE]` | Returns a value of given type. | +| `Is[TYPE]` | Returns whether the value is of given type. | + +##### Type + +Returns the C API pointer to the stored value. + +```cpp +mgp_value *ptr() const +``` + +##### Type + +Returns the type of the value, i.e. the type stored in the `Value` object. + +```cpp +mgp::Type Type() const +``` + +##### Value[TYPE] + +Depending on the exact function called, returns a typed value of the appropriate type. +Throws an exception if the type stored in the `Value` object is not compatible with the function called. + +```cpp +bool ValueBool() const +``` + +```cpp +int64_t ValueInt() const +``` + +```cpp +double ValueDouble const +``` + +```cpp +double ValueNumeric const +``` + +```cpp +std::string_view ValueString() const +``` + +```cpp +const List ValueList() const +``` + +```cpp +const Map ValueMap() const +``` + +```cpp +const Node ValueNode() const +``` + +```cpp +const Relationship ValueRelationship() const +``` + +```cpp +const Path ValuePath() const +``` + +```cpp +const Date ValueDate() const +``` + +```cpp +const LocalTime ValueLocalTime() const +``` + +```cpp +const LocalDateTime ValueLocalDateTime() const +``` + +```cpp +const Map ValueMap() const +``` + +##### Is[TYPE] + +Returns whether the value stored in the `Value` object is of the type in the call. + +```cpp +bool IsNull() const +``` + +```cpp +bool IsBool() const +``` + +```cpp +bool IsInt() const +``` + +```cpp +bool IsDouble() const +``` + +```cpp +bool IsNumeric() const +``` + +```cpp +bool IsString() const +``` + +```cpp +bool IsList() const +``` + +```cpp +bool IsMap() const +``` + +```cpp +bool IsNode() const +``` + +```cpp +bool IsRelationship() const +``` + +```cpp +bool IsPath() const +``` + +```cpp +bool IsDate() const +``` + +```cpp +bool IsLocalTime() const +``` + +```cpp +bool IsLocalDateTime() const +``` + +```cpp +bool IsDuration() const +``` + +#### Operators + +| Name | Description | +| ----------------------------- | -------------------- | +| `operator==`
`operator!=` | comparison operators | + +### Type + +Enumerates the data types supported by Memgraph and its C++ API. +The types are listed and described [on this page](https://memgraph.com/docs/memgraph/reference-guide/data-types). + +- `Type::Null` +- `Type::Any` +- `Type::Bool` +- `Type::Int` +- `Type::Double` +- `Type::String` +- `Type::List` +- `Type::Map` +- `Type::Node` +- `Type::Relationship` +- `Type::Path` +- `Type::Date` +- `Type::LocalTime` +- `Type::LocalDateTime` +- `Type::Duration` + +## Exceptions + +During operation, the following exceptions may be thrown. + +| Exception | Message | +| ----------------------------- | ----------------------------------------------- | +| `ValueException` | various (handles unknown/unexpected types) | +| `NotFoundException` | Node with ID [ID] not found! | +| `NotEnoughMemoryException` | Not enough memory! | +| `UnknownException` | Unknown exception! | +| `AllocationException` | Could not allocate memory! | +| `InsufficientBufferException` | Buffer is not sufficient to process procedure! | +| `IndexException` | Index value out of bounds! | +| `OutOfRangeException` | Index out of range! | +| `LogicException` | Logic exception, check the procedure signature! | +| `DeletedObjectException` | Object is deleted! | +| `InvalidArgumentException` | Invalid argument! | +| `InvalidIDException` | Invalid ID! | +| `KeyAlreadyExistsException` | Key you are trying to set already exists! | +| `ImmutableObjectException` | Object you are trying to change is immutable! | +| `ValueConversionException` | Error in value conversion! | +| `SerializationException` | Error in serialization! | diff --git a/docs2/custom-query-modules/cpp/cpp-example.md b/docs2/custom-query-modules/cpp/cpp-example.md new file mode 100644 index 00000000000..3cf5d73f67a --- /dev/null +++ b/docs2/custom-query-modules/cpp/cpp-example.md @@ -0,0 +1,315 @@ +# Example of a query module written in C++ + +Query modules can be implemented using the [C++ API +](/reference-guide/query-modules/implement-custom-query-modules/api/cpp-api.md) +provided by Memgraph. As with the C API, these modules need to be compiled to a +shared library so that they can be loaded when Memgraph starts. +Compilation of query modules that use the C++ API works much in the same way as +with modules using the C API. + +:::warning + +Any exceptions thrown should never leave the scope of your module. You may have +a top-level exception handler that returns the error value and potentially logs +any error messages. +Exceptions that cross the module boundary may cause unexpected issues! + +::: + +Let’s now take a look at the architecture of a query module itself. +The basic parts of every query module are as follows: + +```cpp +#include + +// (Query procedure & magic function callbacks) + +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register your procedures & functions here +} + +extern "C" int mgp_shutdown_module() { + // If you need to release any resources at shutdown, do it here + return 0; +} + +``` + +* The `mgp.hpp` file contains all declarations of the C++ API for implementing +query module procedures and functions. +* To make your query procedures and functions available, they need to be +registered in `mgp_init_module`. +* Finally, you may use `mgp_shutdown_module` to reset any global states or release +global resources at shutdown. + +### Readable procedures + +We can now examine how query procedures are implemented on the example of the +**random walk algorithm**. + +As mentioned above, procedures are registered in `mgp_init_module`. + +```cpp +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + + AddProcedure(RandomWalk, "get", mgp::ProcedureType::Read, + {mgp::Parameter("start", mgp::Type::Node), mgp::Parameter("length", mgp::Type::Int)}, + {mgp::Return("random_walk", mgp::Type::Path)}, module, memory); + } catch (const std::exception &e) { + return 1; + } + return 0; +} +``` + +Here, we defined our procedure’s signature and added it as a readable +(`ProcedureType::Read`) procedure, named `get`, to our random walk module. +The function takes two named parameters: the start node and random walk length, +and it yields the computed random walk as a `Path` (sequence of nodes connected +by relationships) in the `random_walk` result field. + +When the procedure is called, its arguments (& the graph) will be passed to the +`RandomWalk` callback function. + +:::note + +The API needs memory access for registration; you may grant it with +`mgp::memory = memory`. + +As any exceptions should never leave the scope of the module, the procedure was +registered inside a try-catch block. + +::: + +:::warning + +As `mgp::memory` is a global object, that means all of the procedures and +functions in a single shared library will refer to the same `mgp::memory` +object. As a result, calling such callables simultaneously from multiple threads +will lead to incorrect memory usage. This also includes the case when the same +callable is called from different user sessions. This is a constraint of the +current C++ API that we are planning to improve in the future. + +::: + + + +Callbacks for query procedures all share the same signature, as laid out below. +Parameter by parameter, the callback receives the procedure arguments (`args`), +graph context (`memgraph_graph`), result stream (`result`), and memory access. + +:::tip + +In place of working with the raw `mgp_` type arguments, use the C++ API classes +that provide familiar standard library-like interfaces and do away with needing +manual memory management. + +::: + +```cpp +void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + try { + mgp::memory = memory; + const auto arguments = mgp::List(args); + const auto record_factory = mgp::RecordFactory(result); + + const auto start_node = args[0].ValueNode(); + const auto length = args[1].ValueInt(); + + auto random_walk = mgp::Path(start_node); + + // (Random walk algorithm logic) + + auto record = record_factory.NewRecord(); + record.Insert("random_walk", random_walk); + + } catch (const std::exception &e) { + mgp::result_set_error_msg(result, e.what()); + return; + } +} +``` + +### Writeable procedures + +Writeable procedures differ from readable procedures in their graph context +being **mutable**. With them, you may create or delete nodes and relationships, +modify their properties, and add or remove node labels. + +They use the same interface as readable procedures; the only difference is that +the appropriate procedure type parameter is passed to `AddProcedure`. The below +code registers and implements a writeable procedure `add_x_nodes`, which adds a +user-specified number of nodes (given by int parameter `number`) to the graph. + +```cpp +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + + mgp::AddProcedure(AddXNodes, "add_x_nodes", mgp::ProcedureType::Write, {mgp::Parameter("number", mgp::Type::Int)}, + {}, module, memory); + } catch (const std::exception &e) { + return 1; + } + return 0; +} +``` + + + +```cpp +void AddXNodes(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + mgp::memory = memory; + const auto arguments = mgp::List(args); + auto graph = mgp::Graph(memgraph_graph); + + for (int i = 0; i < arguments[0].ValueInt(); i++) { + graph.CreateNode(); + } +} +``` + +### Batched readable and writeable procedures + +Batched readable and writeable procedures in C++ are pretty similar to batched procedures in C. The way procedures work is the same as in C API, the only difference is procedure registration. + +```cpp + +void BatchCSVFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + ... +} + +void InitBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + ... +} + +void CleanupBatchCsvFile(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + ... +} + + +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + + AddBatchProcedure(BatchCSVFile, InitBatchCsvFile, CleanupBatchCsvFile, + "read_csv", mgp::ProcedureType::Read, + {mgp::Parameter("file_name", mgp::Type::String)}, + {mgp::Return("row", mgp::Type::Map)}, module, memory); + } catch (const std::exception &e) { + return 1; + } + return 0; +} +``` + + +### Magic functions + +Magic functions are a Memgraph feature that lets the user write and call custom +Cypher functions. Unlike procedures, functions are simple operations that can’t +modify the graph; they return a single value and can be used in any expression +or predicate. + +Let’s examine an example function that multiplies the numbers passed to it. The +registration is done by `AddFunction` in the same way as with query procedures, +the difference being the absence of a "function type" argument (functions don’t +modify the graph). + +```cpp +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + + mgp::AddFunction(Multiply, "multiply", + {mgp::Parameter("int", mgp::Type::Int), mgp::Parameter("int", mgp::Type::Int)}, module, memory); + } catch (const std::exception &e) { + return 1; + } + return 0; +} +``` + +There are two key differences in the function signature: +* the lack of a `mgp_graph *` parameter (the graph is immutable in functions) +* different result type (functions return single values, while procedures write +result records to the result stream) + +The difference in result type means that, to work with function results, we use +a different C++ API class: `Result`. Our function is implemented as follows: + +```cpp +void Multiply(mgp_list *args, mgp_func_context *ctx, mgp_func_result *res, mgp_memory *memory) { + mgp::memory = memory; + const auto arguments = mgp::List(args); + auto result = mgp::Result(res); + + auto first = arguments[0].ValueInt(); + auto second = arguments[1].ValueInt(); + + result.SetValue(first * second); +} +``` + +### Terminate procedure execution + +Just as the execution of a Cypher query can be terminated with [`TERMINATE +TRANSACTIONS "id";`](/reference-guide/transactions.md) query, +the execution of the procedure can as well, if it takes too long to yield a +response or gets stuck in an infinite loop due to unpredicted input data. + +Transaction ID is visible upon calling the SHOW TRANSACTIONS; query. + +In order to be able to terminate the procedure, it has to contain function +`graph.CheckMustAbort();` which precedes crucial parts of the code, such as +`while` and `until` loops, or similar points where the procedure might become +costly. + +Consider the following example: + +```cpp +#include +#include +#include +#include +#include +#include + +// Methods +constexpr char const *get = "get"; +// Return object names +char const *return_field = "return"; + + +void Test(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + mgp::memory = memory; + const auto record_factory = mgp::RecordFactory(result); + auto graph = mgp::Graph(memgraph_graph); + int64_t id_ = 1; + try { + while (true) { + graph.CheckMustAbort(); + ++id_; + } + } catch (const mgp::MustAbortException &e) { + std::cout << e.what() << std::endl; + auto new_record = record_factory.NewRecord(); + new_record.Insert(return_field, id_); + } +} + + +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + mgp::AddProcedure(Test, get, mgp::ProcedureType::Read, {}, {mgp::Return(return_field, mgp::Type::Int)}, module, memory); + } catch(const std::exception &e) { + return 1; + } + return 0; +} + +extern "C" int mgp_shutdown_module() { return 0; } +``` diff --git a/docs2/custom-query-modules/cpp/cpp.md b/docs2/custom-query-modules/cpp/cpp.md new file mode 100644 index 00000000000..07eb2ec99fe --- /dev/null +++ b/docs2/custom-query-modules/cpp/cpp.md @@ -0,0 +1,341 @@ +--- +id: create-a-new-module-cpp +title: How to create a query module in C++ +sidebar_label: Create a C++ query module +--- + +Query modules can be implemented using the [C++ +API](/memgraph/reference-guide/query-modules/api/cpp-api) +provided by Memgraph with automatic memory management. +In this tutorial, we will learn how to develop a query +module in C++ on the example of the **random walk algorithm**. + +## Prerequisites + +There are three options for installing and working with Memgraph MAGE: + +1. **Pulling the `memgraph/memgraph-mage` image**: check the `Docker Hub` + [installation guide](/installation/docker-hub.md). +2. **Building a Docker image from the MAGE repository**: check the `Docker + build` [installation guide](/installation/docker-build.md). +3. **Building MAGE from source**: check the `Build from source on Linux` + [installation guide](/installation/source.md). + +## Developing a module + +:::note + +These instructions are the same for every MAGE installation option: _Docker +Hub_, _Docker build_ and _Build from source on Linux_. + +::: + +Position yourself in the **MAGE repository** you cloned earlier. Once you are +there, enter the `cpp` subdirectory and create a new directory called +`random_walk_module` with the `random_walk_module.cpp` file inside it. + +```plaintext +cpp +└── random_walk_module + └── random_walk_module.cpp +``` + +:::info + +To make sure the module is linked with the rest of MAGE’s code, we need to add a +`CMakeLists.txt` script in the new directory and register our module in the +`cpp/CMakelists.txt` script as well. Refer to the existing scripts in MAGE’s +[query modules](https://github.com/memgraph/mage/tree/main/cpp). + +::: + +Our `random_walk` module contains a single procedure `get` which implements the +algorithm. The procedure takes two input parameters: the starting node and the +number of steps (10 by default), and it returns the generated random walk in the +form of a list of `step | node` entries, one for each step. +All in all, we can define its signature as `get(start: Node, steps: int = 10) +-> [step: int | node: Node]`. + +Let’s take a look at the structure of our query module. + +```cpp +#include + +void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, + mgp_result *result, mgp_memory *memory); + +extern "C" int mgp_init_module(struct mgp_module *module, + struct mgp_memory *memory); + +extern "C" int mgp_shutdown_module() { return 0; } + +``` + +In the first line, we include `mg_utils.hpp`. This header contains declarations +of the public C++ API provided by Memgraph, which we need to connect the algorithm +to Memgraph and work with the data stored within. + +Next, we are going to implement the random walk algorithm’s logic in the +`RandomWalk` function, which will be the callback for the invocations of our +openCypher procedure. Callback functions such as this one all need to have the +same signature, but they can be arbitrarily named (e.g. in query modules +containing multiple callback functions). + +Query modules using the C++ API must have the `mgp_init_module` & +`mgp_shutdown_module` functions. The `mgp_init_module` function’s main purpose +is to register procedures so that they can be called from Cypher query language, and with +`mgp_shutdown_module` you may reset any global states or release global +resources. + +:::warning + +WARNING: Exceptions, if thrown, must never leave the scope of your module! You +should have a top-level exception handler that returns an error value and +potentially logs the error message as well. Exceptions crossing the module +boundary may cause all sorts of unexpected issues. + +::: + + +### Main algorithm + +The main implementation of the `RandomWalk` algorithm is implemented in the code snippet below. + +```cpp +const char *kReturnStep = "step"; +const char *kReturnNode = "node"; + +void RandomWalk(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + mgp::memory = memory; + + const auto arguments = mgp::List(args); + const auto record_factory = mgp::RecordFactory(result); + + const auto start = arguments[0].ValueNode(); + const auto n_steps = arguments[1].ValueInt(); + + srand(time(NULL)); + + auto current_nodes = mgp::List(); + current_nodes.AppendExtend(mgp::Value(start)); + + std::int64_t step = 0; + while (step <= n_steps) { + auto current_node = current_nodes[current_nodes.Size() - 1].ValueNode(); + + auto neighbours = mgp::List(); + for (const auto relationship : current_node.OutRelationships()) { + neighbours.AppendExtend(mgp::Value(relationship)); + } + + if (neighbours.Size() == 0) { + break; + } + + const auto next_node = neighbours[rand() % neighbours.Size()].ValueRelationship().To(); + + current_nodes.AppendExtend(mgp::Value(next_node)); + step++; + } + + for (std::int64_t i = 0; i < current_nodes.Size(); i++) { + auto record = record_factory.NewRecord(); + record.Insert(kReturnStep, i); + record.Insert(kReturnNode, current_nodes[i].ValueNode()); + } +} +``` + +Upon being called, `RandomWalk` receives the list of arguments (`args`) passed +in the query. The parameter `result` is used for recording the results of the +procedure, and its context is provided by `graph` and `memory`. + +With the C++ API, we next retrieve the argument values from `args` by putting +them into a list, so we can use the indexing (`[]`) operator. In the code above, +the retrieving of arguments is done in these lines + +```cpp + const auto start = arguments[0].ValueNode(); + const auto n_steps = arguments[1].ValueInt(); +``` + +The arguments are raw values at the time of their fetching from the list, so types +are assigned to them with `ValueNode()` and `ValueInt()` for extra operability and +expressiveness within the algorithm. + +For managing results during the execution of the algorithm, +an instance of `RecordFactory` is used. Insertion of results into the record factory is done like this: + +```cpp + auto record = record_factory.NewRecord(); + record.Insert(kReturnStep, i); + record.Insert(kReturnNode, current_nodes[i].ValueNode()); +``` + +In this code snippet, the result consists of an integer and the corresponding next node +of the random walk algorithm. The types of the results are not arbitrary, as they are +registered in the initialization module, further below. + +:::tip + +Analogous methods for other supported data types are outlined in the +[C++ API reference](/memgraph/reference-guide/query-modules/api/cpp-api). + +::: + +### Initialization of the module + +The `mgp_init_module` function has as its main duty the registration of +procedure(s), which can then be invoked in Cypher query language. With the C++ API, we add our +procedure and its inputs and outputs. + +```cpp +const char *kProcedureGet = "get"; +const char *kParameterStart = "start"; +const char *kParameterSteps = "steps"; +const char *kReturnStep = "step"; +const char *kReturnNode = "node"; + +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + mgp::memory = memory; + + std::int64_t default_steps = 10; + try { + mgp::AddProcedure(RandomWalk, + kProcedureGet, + mgp::ProcedureType::Read, + { + mgp::Parameter(kParameterStart, mgp::Type::Node), + mgp::Parameter(kParameterSteps, mgp::Type::Int, default_steps) + }, + { + mgp::Return(kReturnStep, mgp::Type::Int), + mgp::Return(kReturnNode, mgp::Type::Node) + }, + module, + memory); + } catch (const std::exception &e) { + return 1; + } + return 0; +} +``` + +We add the procedure to the module by specifying: +- **function callback** used for executing the logic of the procedure (`RandomWalk`) +- **name of the procedure** used in Cypher Query Language (`kProcedureGet`) +- **type of the procedure** + - `mgp::Procedure::Read` for read-only procedures + - `mgp::Procedure::Write` for write procedures +- **vector of input parameters** wrapped in `mgp::Parameter` object with name (string) and type (`mgp::Type`) +- **vector of output results** wrapped in `mgp::Return` object with name (string) and type (`mgp::Type`) +- passed `module` object +- passed `memory` object + +Although this example registers a single procedure `get`, you can have multiple +different procedures in one module, each of which can be invoked using the +`CALL . ...` syntax (`` being the name of the shared +library). Since we compile our example to `random_walk.so`, the module is called +`random_walk`. + +:::tip + +As the procedure name is defined upon registration, it can differ from its +respective callback. + +::: + +:::note + +As the `memory` argument is only alive throughout the execution of +`mgp_init_module`, do not allocate any global resources with it. If you still do +need to set up a global state, you may do so in the `mgp_init_module` using the +standard global allocators. + +::: + +### Shutdown of the module + +Finally, you may want to reset any global state or release global resources, +which is done in the following function: + +```cpp +extern "C" int mgp_shutdown_module() { + return 0; +} +``` + +### Terminate procedure execution + +Just as the execution of a Cypher query can be terminated with [`TERMINATE +TRANSACTIONS "id";`](/memgraph/reference-guide/transactions) query, the execution of the procedure can as well, if it takes +too long to yield a response or gets stuck in an infinite loop due to +unpredicted input data. + +Transaction ID is visible upon calling the `SHOW TRANSACTIONS;` query. + +In order to be able to terminate the procedure, it has to contain function +`graph.CheckMustAbort();` which precedes crucial parts of the code, such as +`while` and `until` loops, or similar points where the procedure might become +costly. + +Consider the following example: + +```cpp +#include +#include +#include +#include +#include +#include + +// Methods +constexpr char const *get = "get"; +// Return object names +char const *return_field = "return"; + + +void Test(mgp_list *args, mgp_graph *memgraph_graph, mgp_result *result, mgp_memory *memory) { + mgp::memory = memory; + const auto record_factory = mgp::RecordFactory(result); + auto graph = mgp::Graph(memgraph_graph); + int64_t id_ = 1; + try { + while (true) { + graph.CheckMustAbort(); + ++id_; + } + } catch (const mgp::MustAbortException &e) { + std::cout << e.what() << std::endl; + auto new_record = record_factory.NewRecord(); + new_record.Insert(return_field, id_); + } +} + + +extern "C" int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + try { + mgp::memory = memory; + mgp::AddProcedure(Test, get, mgp::ProcedureType::Read, {}, {mgp::Return(return_field, mgp::Type::Int)}, module, memory); + } catch(const std::exception &e) { + return 1; + } + return 0; +} + +extern "C" int mgp_shutdown_module() { return 0; } +``` + +As mentioned before, no exceptions should leave your module. As done in this +example, exception handlers are in `mgp_init_module` and the callback function. +Depending on your module’s needs, you might want one in `mgp_shutdown_module` as +well. + +## Importing, querying and testing a module + +Now in order to import, query and test a module, check out the [following +page](/mage/how-to-guides/run-a-query-module). + +Feel free to create an issue or open a pull request on our [GitHub +repo](https://github.com/memgraph/mage) to speed up the development.
+Also, don’t forget to throw us a star on GitHub. :star: diff --git a/docs2/custom-query-modules/custom-query-modules.md b/docs2/custom-query-modules/custom-query-modules.md new file mode 100644 index 00000000000..91339d7273e --- /dev/null +++ b/docs2/custom-query-modules/custom-query-modules.md @@ -0,0 +1,121 @@ +[![Related - How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/query-modules.md) + +Memgraph supports extending the query language with user-written procedures in +**C**, **C++**, **Python**, and **Rust**. These procedures are grouped into +modules - **query modules** files (either `*.so` or `*.py` files). + +Some query modules are built-in, and others, like those that can help you solve +complex graph issues, are available as part of the MAGE library you can add to +your Memgraph installation. The library is already included if you are using +Memgraph Platform or Memgraph MAGE Docker images to run Memgraph. + +You can also implement custom query modules. Every single Memgraph installation +comes with the `example.so` and `py_example.py` query modules located in the +`/usr/lib/memgraph/query_modules` directory. They were provided as examples of +query modules for you to examine and learn from. + +Each query module file corresponds to one query module, and file names are +mapped as query module names. For example, `example.so` will be mapped as +`example` module, and `py_example.py` will be mapped as `py_example` module. If +each module file has a procedure called `procedure` defined, those procedures +would be mapped in the Cypher query language as `example.procedure()` and +`py_example.procedure()` respectively. + +Regardless of where they come from and who wrote them, all modules need to be +loaded into Memgraph so that they can be called while querying the database. +They are either loaded automatically when Memgraph starts or manually if they +were added while Memgraph was already running. + +You can also inspect and develop query modules in Memgraph Lab (v2.0 and newer). +Just navigate to **Query Modules**. + +
+ Screenshot of Query Modules from Memgraph Lab + +
+ +Once you start Memgraph, it will attempt to load query modules from all *.so and +*.py files from the default directories. MAGE modules are located at +`/usr/lib/memgraph/query_modules` and custom modules developed via Memgraph Lab at +`/var/lib/memgraph/internal_modules`. + +Memgraph provides public APIs for writing custom query modules in Python, C and +C++. + +## Python API + +Python API is defined in the `mgp` module that can be found in the Memgraph +installation directory `/usr/lib/memgraph/python_support`. In essence, it is a +wrapper around the C API. If you wish to write your own query modules using the +Python API, you need to have Python version `3.5.0` or above installed. + +For more information, check the [Python API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md).
+We also made [an example +module](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#python-api) +to help you start developing your own modules. + +You can develop query modules in Python from Memgraph Lab (v2.0 and newer). Just +navigate to **Query Modules** and click on **New Module** to start. + + + +Custom modules developed via Memgraph Lab are located at +`/var/lib/memgraph/internal_modules`. + +:::info +If you need an additional Python library not included with Memgraph, check out +[the guide on how to install +it](/memgraph/how-to-guides/query-modules#how-to-install-external-python-libraries). +::: + +### Mock Python API + +The [mock Python query module API](api/mock-python-api.md) enables you to +develop and test query modules for Memgraph without having to run a Memgraph +instance by simulating its behavior. As the mock API is compatible with the +[Python API](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md), +you can add modules developed with it to Memgraph as-is, without modifying the +code. + +For more information and examples, check the +[mock Python API reference guide](api/mock-python-api.md). + +## C API + +C API modules need to be compiled to a shared library so that they can be loaded +when Memgraph starts. This means that you can write the procedures in any +programming language that can work with C and be compiled to the ELF shared +library format (`.so`). `mg_procedure.h` that can be found in Memgraph +installation directory `/usr/include/memgraph` contains declarations of all +functions that can be used to implement a query module procedure. To compile the +module, you will have to pass the appropriate flags to the compiler, for +example, `clang`: + +```plaintext +clang -Wall -shared -fPIC -I /usr/include/memgraph example.c -o example.so +``` + +For more information, check the [C API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md).
+We also made [an example +module](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#c-api) +to help you start developing your own modules. + +## C++ API + +C++ API modules, just like C API modules, need to be compiled to a shared +library so that they can be loaded when Memgraph starts. This is done much in +the same way as with C API modules. + +For more information, check the [C++ API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/cpp-api.md).
+We also made [an example +module](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#cpp-api) +to help you start developing your own modules. + +:::info +If you need an additional Python library not included with Memgraph, check out +[the guide on how to install +it](/memgraph/how-to-guides/query-modules#how-to-install-external-python-libraries). +::: \ No newline at end of file diff --git a/docs2/custom-query-modules/manage-query-modules.md b/docs2/custom-query-modules/manage-query-modules.md new file mode 100644 index 00000000000..320056eb779 --- /dev/null +++ b/docs2/custom-query-modules/manage-query-modules.md @@ -0,0 +1,363 @@ +# Manage query modules + +The following page describes how query modules are loaded into Memgraph and +called within a Cypher query. + +## Loading query modules + +Once you start Memgraph, it will attempt to load query modules from all `*.so` +and `*.py` files from the default (`/usr/lib/memgraph/query_modules` and +`/var/lib/memgraph/internal_modules`) directories. + +MAGE modules are located at +`/usr/lib/memgraph/query_modules` and custom modules developed via Memgraph Lab at +`/var/lib/memgraph/internal_modules`. + +Memgraph can load query modules from additional directories, if their path is +added to the `--query-modules-directory` flag in the main configuration file +(`/etc/memgraph/memgraph.conf`) or supplied as a command-line parameter (e.g. +when using Docker). + +If you are supplying the additional directory as a parameter, do not forget to +include the path to `/usr/lib/memgraph/query_modules`, otherwise queries from +that directory will not be loaded when Memgraph starts. + +:::caution + +When working with Docker and `memgraph-platform` image, you should pass +configuration flags inside of environment variables, for example: + +```terminal +docker run -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH="--query-modules-directory=/usr/lib/memgraph/query_modules,/usr/lib/memgraph/my_modules" memgraph/memgraph-platform` +``` + +If you are working with `memgraph` or `memgraph-mage` images you should pass +configuration options like this: + +```terminal +docker run -p 7687:7687 -p 7444:7444 memgraph/memgraph --query-modules-directory=/usr/lib/memgraph/query_modules,/usr/lib/memgraph/my_modules +``` + +::: + +If a certain query module was added while Memgraph was already running, you need +to load it manually using the `mg.load("module_name")` procedure within a query: + +```cypher +CALL mg.load("py_example"); +``` + +If there is no response (no error message), the load was successful. + +If you want to reload all existing modules and load any newly added ones, use +`mg.load_all()`: + +```cypher +CALL mg.load_all(); +``` + +If there is no response (no error message), the load was successful. + +You can check if the query module has been loaded by using the `mg.procedures()` +procedure within a query: + +```cypher +CALL mg.procedures() YIELD *; +``` + +Built-in utility query module (`mg`) contains procedures that enable you to +manage query modules files. + +## General procedures + +Here is the list of procedures from the `mg` query module that can be used with +all other query module files and their signatures: + +| Procedure | Description | +| ----------------------------------------------------------------- | --------------------------------------------- | +| mg.procedures() -> (name\|STRING, signature\|STRING) | Lists loaded procedures and their signatures. | +| mg.load(module_name\|STRING) -> () | Loads or reloads the given module. | +| `mg.load_all() -> ()` | Loads or reloads all modules. | + +### `mg.procedures` + +Lists loaded procedures and their signatures. + +Example of a Cypher query: + +```cypher +CALL mg.procedures() YIELD *; +``` + +Example of a result: + +```nocopy ++-------------+---------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------+ +| is_editable | name | path | signature | ++-------------+---------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------+ +| ... | ... | ... | ... | +| true | graph_analyzer.help | "/path/to/module" | graph_analyzer.help() :: (name :: STRING, value :: STRING) | +| false | mg.load | "builtin" | mg.load(module_name :: STRING) :: () | +| false | mg.load_all | "builtin" | mg.load_all() :: () | +| false | mg.procedures | "builtin" | mg.procedures() :: (name :: STRING, signature :: STRING, is_write :: BOOLEAN, path :: STRING, is_editable :: BOOLEAN) | +| ... | ... | ... | ... | ++-------------+---------------------+-------------------+-----------------------------------------------------------------------------------------------------------------------+ +``` + +### `mg.load_all` + +Loads or reloads all modules. + +Example of a Cypher query: + +```cypher +CALL mg.load_all(); +``` + +If the response is `Empty set (x.x sec)` and there are no error messages, the +update was successful. + +### `mg.load` + +Loads or reloads the given module. + +Example of a Cypher query: + +```cypher +CALL mg.load("py_example"); +``` + +If the response is `Empty set (x.x sec)` and there are no error messages, the +update was successful. + +Upon loading the module, all dependent Python's submodules that are imported will be reloaded too. To support this functionality Memgraph parses module's code into Abstract Syntax Tree and then determines which modules are being imported. For example, let's say that you have a following query_modules_directory structure: +``` +- query_modules/ + - python/ + - module1.py + - module2.py + - mage/ + - module1/ + - module1_utility.py + - module2/ + - module2_utility.py + - cpp/ + - module3.cpp + - module4.cpp +``` +By calling: +```cypher +CALL mg.load("module1"); +``` +Memgraph will reload `module1.py`, all its imported Python packages and it will conclude that there is a subdirectory `module1` which contains Python utility files for `module1.py` and it will reload them too. Note, that if `module1` directory contains some subdirectories, they will also get reloaded. By reloading, it is assumed clearing from the `sys` cache and deleting compiled code from the `__pycache__`. The repository which contains subdirectories can be organized in a different way too, so e.g. `module1/` and `module2/` folders can be placed directly in the `python/` folder. + +## Procedures for `.py` query modules + +Memgraph includes several built-in procedures for editing and inspecting Python +module files. + +Below is a list of the procedures, their signatures, description and required +privilege.
Privileges can be assigned only in the enterprise edition of +Memgraph.
Click on a procedure to see an example of a Cypher query and +possible result. + +| Procedure | Description | Required privilege | +| ---------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | +| [mg.get_module_files() -> (is_editable\|STRING, path\|STRING)](#mgget_module_files) | Returns the value of a `is_editable` flag and the absolute path of each Python query module file in all the query module directories. | `MODULE_READ` | +| [mg.get_module_file(path\|STRING) -> (path\|STRING)](#mgget_module_file) | Returns the content of a file located at the absolute path in one of the query module directories. | `MODULE_READ` | +| [mg.create_module_file(filename\|STRING, content\|STRING) -> (path\|STRING)](#mgcreate_module_file) | Creates a `filename` Python module with `content` inside the internal query module directory (`/var/lib/memgraph/internal_modules`) and returns the path to the newly created file. The flag `is_editable` should be set to true if the module is located in the internal query module directory.
The `filename` can consist of multiple nested directories (e.g. `subdir1/subdir2/module.py`) which will create all the necessary subdirectories. After successful creation, all the modules are reloaded. | `MODULE_WRITE` | +| [mg.update_module_file(path\|STRING, content\|STRING)](#mgupdate_module_file) | Updates a Python module file at an absolute `path` in one of the query module directories with `content` and reloads all the modules. You can only change the files with `is_editable` flag set to `true`. | `MODULE_WRITE` | +| [mg.delete_module_file(path\|STRING)](#mgdelete_module_file) | Deletes a Python module file at an absolute `path` in one of the query module directories and reloads all the modules. You can only delete the files with `is_editable` flag set to `true`. | `MODULE_WRITE` | + +### `mg.get_module_files` + +Returns the value of an `is_editable` flag and the absolute path of each Python +query module file in all the query module directories. + +Example of a Cypher query: + +```cypher +CALL mg.get_module_files() YIELD *; +``` + +Example of a result: + +```nocopy ++-----------------------------------------------------+-----------------------------------------------------+ +| is_editable | path | ++-----------------------------------------------------+-----------------------------------------------------+ +| false | "/usr/lib/memgraph/query_modules/mgp_networkx.py" | +| false | "/usr/lib/memgraph/query_modules/wcc.py" | +| false | "/usr/lib/memgraph/query_modules/graph_analyzer.py" | +| false | "/usr/lib/memgraph/query_modules/py_example.py" | +| false | "/usr/lib/memgraph/query_modules/nxalg.py" | +| true | "/var/lib/memgraph/internal_modules/module1.py" | ++-----------------------------------------------------+-----------------------------------------------------+ +``` + +### `mg.get_module_file` + +Returns the content of a file located at the absolute path in one of the query +module directories. + +Example of a Cypher query: + +```cypher +CALL mg.get_module_file("/usr/lib/memgraph/query_modules/py_example.py") YIELD *; +``` + +### `mg.create_module_file` + +Creates a `filename` Python module with `content` inside the internal query +module directory (`/var/lib/memgraph/internal_modules`) and returns the path to +the newly created file. The flag `is_editable` should be true if the module is +located in the internal query module directory. The `filename` can consist of +multiple nested directories (e.g., `subdir1/subdir2/module.py`) and all the +necessary subdirectories will be created. After successful creation, all the +modules are reloaded. + +Examples of a Cypher query: + +1. **Without defining the absolute path:** + + ```cypher + CALL mg.create_module_file("my_module.py", "Start of my query module.") YIELD *; + ``` + + Result: + + ```nocopy + +---------------------------------------------------+ + | path | + +---------------------------------------------------+ + | "/var/lib/memgraph/internal_modules/my_module.py" | + +---------------------------------------------------+ + ``` + +2. **With absolute path:** + + ```cypher + CALL mg.create_module_file("my_modules/my_module.py", "Start of my query module.") YIELD *; + ``` + + Result: + + ```nocopy + +--------------------------------------------------------------+ + | path | + +--------------------------------------------------------------+ + | "/var/lib/memgraph/internal_modules/my_modules/my_module.py" | + +--------------------------------------------------------------+ + ``` + +### `mg.update_module_file` + +Updates a Python module file at an absolute `path` in one of the query module +directories with `content`. You can only change the files with `is_editable` +flag set to `true`. + +Example of a Cypher query: + +```cypher +CALL mg.update_module_file("/var/lib/memgraph/internal_modules/my_module.py", "Start of my query module. Another line."); +``` + +If the response is `Empty set (x.x sec)` and there are no error messages, the +update was successful. + +### `mg.delete_module_file` + +Deletes a Python module file at an absolute `path` in one of the query module +directories and reloads all the modules. You can only delete the files with +`is_editable` flag set to `true`. + +Example of a Cypher query: + +```cypher +CALL mg.delete_module_file("/var/lib/memgraph/internal_modules/my_module.py"); +``` + +If the response is `Empty set (x.x sec)` and there are no error messages, the +update was successful. + +## Calling query modules + +Once the MAGE query modules or any custom modules you developed have been +loaded into Memgraph, you can call them within queries using the following Cypher +syntax: + +```cypher +CALL module.procedure([optional parameter], arg1, "string_argument", ...) YIELD res1, res2, ...; +``` +Every procedure has a first optional parameter and the rest of them depend on the procedure you are trying to call. The optional parameter must be result of the aggregation function [`project()`](/cypher-manual/functions#aggregation-functions). If such a parameter is provided, **all** operations will be executed on a projected graph. Otherwise, you will work on the whole graph stored inside Memgraph. + +Each procedure returns zero or more records, where each record contains named +fields. The `YIELD` clause is used to select fields you are interested in or all +of them (*). If you are not interested in any fields, omit the `YIELD` clause. +The procedure will still run, but the record fields will not be stored in +variables. If you are trying to `YIELD` fields that are not a part of the +produced record, the query will result in an error. + +Procedures can be standalone as in the example above, or a part of a larger +query when we want the procedure to work on data the query is +producing. + +For example: + +```cypher +MATCH (node) CALL module.procedure(node) YIELD result RETURN *; +``` + +When the `CALL` clause is a part of a larger query, results from the query are +returned using the `RETURN` clause. If the `CALL` clause is followed by a clause +that only updates the data and doesn't read it, `RETURN` is unnecessary. It is +the Cypher convention that read-only queries need to end with a `RETURN`, while +queries that update something don't need to `RETURN` anything. + +Also, if the procedure itself writes into the database, all the rest of the +clauses in the query can only read from the database, and the `CALL` clause can +only be followed by the `YIELD` clause and/or `RETURN` clause. + +If a procedure returns a record with the same field name as some variable we +already have in the query, that field name can be aliased with some other name +using the `AS` sub-clause: + +```cypher +MATCH (result) CALL module.procedure(42) YIELD result AS procedure_result RETURN *; +``` + +## Managing query modules from Memgraph Lab + +You can inspect query modules in Memgraph Lab (v2.0 and newer). +Just navigate to **Query Modules**. + + + +There you can see all the loaded query modules, delete them, or see procedures +and transformations they define by clicking on the arrow icon. + +By expanding procedures you can receive information about the procedure's +signature, input and output variables and their data type, as well as the `CALL` +query you can run directly from the **Query Modules** view. + +Custom modules developed via Memgraph Lab are located at +`/var/lib/memgraph/internal_modules`. + + + +## Controlling procedure memory usage + +When running a procedure, Memgraph controls the maximum memory usage that the +procedure may consume during its execution. By default, the upper memory limit +when running a procedure is `100 MB`. If your query procedure requires more +memory to yield its results, you can increase the memory limit using the +following syntax: + +```cypher +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 KB YIELD result; +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 MB YIELD result; +CALL module.procedure(arg1, arg2, ...) PROCEDURE MEMORY UNLIMITED YIELD result; +``` + +The limit can either be specified to a specific value (either in `KB` or in +`MB`), or it can be set to unlimited. \ No newline at end of file diff --git a/docs2/custom-query-modules/python/implement-custom-query-module-in-python.md b/docs2/custom-query-modules/python/implement-custom-query-module-in-python.md new file mode 100644 index 00000000000..895864fe419 --- /dev/null +++ b/docs2/custom-query-modules/python/implement-custom-query-module-in-python.md @@ -0,0 +1,647 @@ +--- +id: implement-custom-query-module-in-python +title: Implement a custom query module in Python +sidebar_label: Implement a custom query module in Python +--- + +This tutorial will give you a basic idea of how to develop a custom query module +in Python with Memgraph Lab 2.0 and use it on a dataset. + +[![Related - How +to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/query-modules.md) +[![Related - Reference +Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/query-modules/overview.md) + +In short, query modules allow you to expand the Cypher query module with various +procedures. Procedures can be written in Python or C languages. Our MAGE library +has various modules dealing with complex graph algorithms, but you can implement +your own procedures gathered in query modules to optimize your queries. If you +need more information about what query modules are, please read our [reference +guide on query modules](/reference-guide/query-modules/overview.md). + +## Prerequisites + +In order to start developing a custom query you will need: + +- [Memgraph Platform](/installation/overview.mdx) or [Memgraph Cloud](https://cloud.memgraph.com) + +## Data model + +For this tutorial, we will use the Europe backpacking data model with the data +from The European Backpacker Index (2018). The data set contains information +about 56 cities from 36 European countries, such as what cities are close, what +countries border each other, various pricing and recommended accommodation. + +![Backpacking](../data/backpacking_metagraph.png) + +
+ A detailed explanation of the data model + +Nodes: + +- `Country` - country with the following properties (example of the value): + - `id` - country's id (`5`) + - `name` - country's name (`"Spain"`) +- `City` - city with the following properties (example of the value): + - `name` - city's name (`"Barcelona"`) + - `country` - country's name (`"Spain"`) + - `cheapest_hostel` - the cheapest hostel in the city (`"Amistat Beach Hostel + Barcelona"`) + - `hostel_url` - URL of the cheapest hostel in the city + (`"https://www.priceoftravel.com/ABarcelonaHostel"`) + - `rank` - the cheapest hostel's rank according to The European Backpacker + Index (`38`) + - `local_currency` - the name of the local currency (`"Euro"`) + - `local_currency_code` - ISO3 code of the local currency (`"EUR"`) + - `total_USD` - total daily cost including accommodation, attractions, + meals/drinks and transportation in USD (`80.104`) + - `cost_per_night_USD` - daily cost of the cheapest hostel per night in USD + (`23.684`) + - `attractions_USD` - daily cost of the attractions in USD (`16.12`) + - `meals_USD` - daily cost of the meals in USD (`23.808`) + - `drinks_USD` - daily cost of the drinks in USD (`11.16`) + - `transportation_USD` - daily cost of the transportation in USD (`5.332`) + +Relationships: + +- `:Inside` - connects `City` node to the `Country` node if the city is within + the country +- `:CloseTo` - connects two `City` nodes if cities are from the same or + neighboring countries + - `eu_border` - relationship property that indicates whether the EU border + needs to be crossed to reach the other city (`true`) +- `:Borders` - connects two `Country` nodes if they are neighboring countries. + - `eu_border` - relationship property that indicates whether the EU border + needs to be crossed to reach the other country (`true`) + +
+ +In this tutorial, we will mostly focus on the two nodes, `:City` and `:Country`, +and their `:Inside` relationship. + +## Preparing Memgraph + +Let's open Memgraph Lab where we will import the dataset, as well as write and +use the procedures from our query module. + +If you have successfully installed Memgraph Platform, you should be able to open +Memgraph Lab in a browser at [`http://localhost:3000/`](http://localhost:3000/). + +If you are using Memgraph Cloud, open the running instance, and open the +**Connect via Client** tab, then click on **Connect in Browser** to open +Memgraph Lab in a new browser tab. Enter your project password and **Connect Now**. + +In Memgraph Lab, navigate to the **Datasets** menu item, click on the **Europe +backpacking** dataset to import it into Memgraph. You can also check the details +of the dataset by clicking on **Quick View** + + + +Go to the **Query Execution** and try running a test query that will show +the city Vienna and all its relationships: + +```Cypher +MATCH p=(:City {name: "Vienna"})-[]-() +RETURN (p); +``` + + + +You can click on the `:City` nodes to check the nodes' properties and get better +acquainted with the dataset. We will come back to this view every time we want +to test our query modules in making. + + + +Now navigate to **Query Modules**. Here you can see all the query modules +available in Memgraph, such as utility modules or query modules from the MAGE +library. To create a new custom query module, click on the **New Module** +button, give the new module name `backpacking` and create the module. Memgraph +Lab creates sample procedures to kick off your development. But before we start, +let's decide how we will expand the query language. + + + +## Goals + +Before we start to write a query module and procedures within, we need goals. +How do we want to expand the query language? + +**Goal 1:** We want to get a total cost of accommodation expenses for one night +at the cheapest hostel in a given city, based on the number of adults and +children that will be staying in it. + +**Goal 2:** We also want to expand the data model by a given country and city. +The new `City` node should get properties that it shares with the other cities +in that country, such as `country`, `local_currency` and `local_currency_code`. + +## Python API + +Python API is defined in the `mgp` module you can find in the Memgraph +installation directory `/usr/lib/memgraph/python_support`. If you are using +Docker, you can copy the file from the Docker container into your computer for +faster access. + +
+ Copy the mgp module from a Docker container + +**1.** If it's not running, start your Memgraph instance using Docker. + +**2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker +container: + +``` +docker ps +``` + +**3.** Position yourself in the local directory where you want to copy the file. + +**4.** Copy a file from the container to your current directory with the +following command: + +``` +docker cp :/usr/lib/memgraph/python_support/mgp.py mgp.py +``` + +Be sure to enter the correct CONTAINER ID. + +Example of a command when copying the `mgp.py` file to the user's desktop: + +``` +C:\Users\Vlasta\Desktop>docker cp 63e35:/usr/lib/memgraph/python_support/mgp.py mgp.py +``` + +
+ +In essence, Python API is a wrapper around the C API. If you look at row 15 of +the new module we've created in Memgraph Lab, you can see you need to import the +`mgp` module at the beginning of every query module. + +Below the `import mgp`, in line 17, you can see a `@read_proc` decorator. Python +API defines `@mgp.read_proc`, `@mgp.write_proc` and `@mgp.transformation` +decorators. `@mgp.read_proc` decorator handles read-only procedures, the +`@mgp.write_proc` decorator handles procedures that also write to the database, +and the `@mgp.transformation` decorator handles data coming from streams. + +If you look at our two goals, to get the total cost of accommodation, Memgraph +only needs to read from the database to get the value of the +`cost_per_night_USD`, while to create new nodes it also needs to write in the +database. + +Feel free to examine the examples and tips available in this template, and when +you are ready to continue with the tutorial, clear the file so we start writing +our code from line 1. + +We'll start with the `@mgp.read_proc` decorator to achieve the first goal, then +we'll dive into a bit more complicated second goal and its `@mgp.write_proc`. + +## Read procedure + +As we established in the previous chapter, first we need to import the `mgp` +module and then use the `@mgp.read_proc` decorator. Then we will define the +procedure by giving it a name and signature, that is, what arguments it needs to +receive and what values it will return. + +The goal of this procedure is to get a total cost of accommodation expenses for +one night at the cheapest hostel in a given city, based on the number of adults +and children that will be staying in it. + +So, let's name the procedure `total_cost`. The procedure needs to receive the +following arguments in order to calculate the total cost of accommodation: + +- the whole graph (all the nodes and relationships) +- the name of the city we are interested in +- the number of adults staying at the accommodation +- the number of children. + +The graph is passed to the procedure using the `ProcCtx` instance. The name of +the city should be a string value, the number of adults an integer, and the +number of children also an integer. Because customers can travel with or without +children, we will define the children variable as optional by giving it a +possibility to be `NULL` and setting it to a default value of `None`. + +Output values are defined as arguments of the `Record` class. We want the +function to return the total cost per night as a float value, and we'll enable +the value to be NULL so that the procedure doesn't return an error if the city +doesn't have the cost of accommodation as a property and thus can't calculate +the total cost of accommodation. + +After defining the name and signature, the code should look like this: + +```python +import mgp + +@mgp.read_proc +def total_cost(context: mgp.ProcCtx, + city: str, + adults: int, + children: mgp.Nullable[int] = None + ) -> mgp.Record(Total_cost_per_night = mgp.Nullable[float]): +``` + +Now we want to go through all the nodes (vertices) in our graph and find only +those nodes that have both: + +1. the property `name` with the same value as the variable `city` +2. the property `cost_pre_night_USD` with a float value. + +```python +import mgp + +@mgp.read_proc +def total_cost(context: mgp.ProcCtx, + city: str, + adults: int, + children: mgp.Nullable[int] = None + ) -> mgp.Record(Total_cost_per_night = mgp.Nullable[float]): + + for vertex in context.graph.vertices: + if vertex.properties["name"] == city and isinstance(vertex.properties.get("cost_per_night_USD"),float): +``` + +When we find those nodes, we will save the cost of accommodation per night in a +variable `cost_per_night` and multiply it with the number of adults to get the +`total_cost`. + +```python +import mgp + +@mgp.read_proc +def total_cost(context: mgp.ProcCtx, + city: str, + adults: int, + children: mgp.Nullable[int] = None + ) -> mgp.Record(Total_cost_per_night = mgp.Nullable[float]): + + for vertex in context.graph.vertices: + if vertex.properties["name"] == city and isinstance(vertex.properties.get("cost_per_night_USD"),float): + cost_per_night = vertex.properties.get("cost_per_night_USD") + total_cost = cost_per_night * adults +``` + +Then we need to check if the number of children was given as an argument when +calling the procedure in query, and if it was, add half the cost of +accommodation for each child. At the end, we return the total cost of +accommodation per night. + +```python +import mgp + +@mgp.read_proc +def total_cost(context: mgp.ProcCtx, + city: str, + adults: int, + children: mgp.Nullable[int] = None + ) -> mgp.Record(Total_cost_per_night = mgp.Nullable[float]): + + for vertex in context.graph.vertices: + if vertex.properties["name"] == city and isinstance(vertex.properties.get("cost_per_night_USD"),float): + cost_per_night = vertex.properties.get("cost_per_night_USD") + total_cost = cost_per_night * adults + if children is not None: + total_cost += cost_per_night / 2 * children + return mgp.Record(Total_cost_per_night = total_cost) +``` + +If none of the nodes have both the property `name` with the same value as the +variable `city` nor the property `cost_pre_night_USD` with a float value, we +will set the value of `Total_cost_per_night` to `None` (that is, `NULL`) in +order to prevent the procedure from generating an error. + +The finished procedure now looks like this: + +```python +import mgp + +@mgp.read_proc +def total_cost(context: mgp.ProcCtx, + city: str, + adults: int, + children: mgp.Nullable[int] = None + ) -> mgp.Record( + Total_cost_per_night = mgp.Nullable[float]): + + for vertex in context.graph.vertices: + if vertex.properties["name"] == city and isinstance(vertex.properties.get("cost_per_night_USD"),float): + cost_per_night = vertex.properties.get("cost_per_night_USD") + total_cost = cost_per_night * adults + if children is not None: + total_cost += cost_per_night / 2 * children + return mgp.Record(Total_cost_per_night = total_cost) + + return mgp.Record(Total_cost_per_night = None) +``` + +Save and close the query module. You will get an overview of the module that +lists procedures and their signature. + + + +### Testing the read procedure + +Switch to **Query Execution** and call the procedure using the clause `CALL`, +then calling the right module and procedure within it +(`backpacking.total_cost`). List all arguments except the whole graph inside +brackets, and at the end YIELD all the results: + +```cypher +CALL backpacking.total_cost("Zagreb", 2, 3) YIELD *; +``` + +Result -> `Total_cost_per_night = 32.129999999999995` + +```cypher +CALL backpacking.total_cost("Vienna", 2) YIELD *; +``` + +Result -> `Total_cost_per_night = 45.012` + +```cypher +CALL backpacking.total_cost("Whatever", 2) YIELD *; +``` + +Result -> `Total_cost_per_night = null` + + + +## Detecting errors + +Some errors will be written out as you are trying to call the procedure. Others +can be viewed in the log file. + +If you started your Memgraph Platform image by exposing the `7444` port, you can +check the logs from Memgraph Lab. Otherwise, you need to [access the logs in the +Docker container](../how-to-guides/config-logs.md#accessing-logs). + +But the rest of the errors in the code will result in the procedure not being +detected. That means that if you go to the **Query Modules** menu item and check +module details by clicking on the arrow on the right, the procedure with an +error will not be listed. + +## Write procedure + +You can continue writing the write procedure below the read procedure. To edit +the current module go to **Query Modules** and find the `backpacking` module. +Click on the arrow to view details about the module, such as the name of the +procedures and their signatures. To continue editing the module, click on **Edit +code**. If you are writing the write procedure in a new module, don't forget to +import the `mgp` module. For the write procedure, we will use the `@mgp.write_proc` +decorator. + +The goal of this write procedure is to expand the data model by a given country +and city. The new `City` node should get properties that it shares with the +other cities in that country, such as `country`, `local_currency` and +`local_currency_code`. + +Let's name the procedure `new_city`. The procedure needs to receive the +following arguments in order to create two new nodes and connect them: + +- the whole graph (all the nodes and relationships) +- the name of the city +- the name of the country. + +The graph is passed to the procedure using the `ProcCtx` instance. The names of +the city and country should be of string values. + +Output values are defined as arguments of the `Record` class. We want the +function to return the city and country nodes and their relationship. + +After defining the name and signature, the code should look like this: + +```python +@mgp.write_proc +def new_city(context: mgp.ProcCtx, + in_city: mgp.Nullable[str], + in_country: mgp.Nullable[str] + ) -> mgp.Record(City = mgp.Vertex, + Relationship = mgp.Edge, + Country = mgp.Vertex): +``` + +We will gradually expand our code to cover all three cases: + +1. the city and country nodes already exist +2. the country node exists but the city node doesn't +3. neither the country node nor the city node exist + +### The city and country nodes already exist + +We want to check if the city and country passed as arguments already exist in +the database because if they do, there is no need to create them. We can just +return them as a result. Because the `City` nodes also include the `country` +property, we can only check `City` nodes to find out if a certain city inside a +certain country already exists. + +So let's go through all the nodes (vertices) in the graph to check if there is a +node with the `country` property of the same value as the `in_country` argument. +If there is, let's check if the `name` property of that nodes is of the same +value as the as the `in_city` argument. If there is, it means there are already +`City` and `Country` nodes with the same `name` properties as the `in_city` and +`in_country` arguments. + +To return the relationship between them, we need to go through all the +relationships from the `City` node and find the one with type `Inside`. Then we +will save the destination node in the `country` variable, and return both nodes +and the relationship. + +```python +@mgp.write_proc +def new_city(context: mgp.ProcCtx, + in_city: mgp.Nullable[str], + in_country: mgp.Nullable[str] + ) -> mgp.Record(City = mgp.Vertex, + Relationship = mgp.Edge, + Country = mgp.Vertex): + + for v in context.graph.vertices: + if v.properties.get("country") == in_country: + if v.properties.get("name") == in_city: + for r in v.out_edges: + if r.type == "Inside": + country = r.to_vertex + return mgp.Record(City=v, Relationship=r, Country=country) +``` + +At this point you can save the module and test the new procedure, by running the +following query: + +```cypher +CALL backpacking.new_city("Zagreb","Croatia") YIELD *; +``` + + + +### The country node exists but the city node doesn't + +In the case that the `Country` node with that name exists, but the `City` node +doesn't, we should create a new `City` node, and connect it with the existing +`Country` node. Because the `City` nodes have properties about the country they +are connected to, we will use the existing `City` nodes to copy property values +to the new `City` node, such as `local_currency` and `local_currency_code`. + +The new `City` node also has to get a new `id` number, that's why we will save +the highest existing `id` among `City` nodes in the `city_id` and increase that +number by 1 to get the ID of the new `City` node. Now that we have created a new +`City` node, we need to create a relationship to connect it with the existing +`Country` node and return both nodes and the relationship. + +```python +@mgp.write_proc +def new_city(context: mgp.ProcCtx, + in_city: mgp.Nullable[str], + in_country: mgp.Nullable[str] + ) -> mgp.Record(City = mgp.Vertex, + Relationship = mgp.Edge, + Country = mgp.Vertex): + + city_id = 0 + currency = None + currency_code = None + + for v in context.graph.vertices: + label, = v.labels # get node label + if (label == "City") and (v.properties.get("id") > city_id): # the following 2 lines are getting the highest ID + city_id = v.properties.get("id") + if v.properties.get("country") == in_country: + currency = v.properties.get("local_currency") # the following 2 lines are saving property values + currency_code = v.properties.get("local_currency_code") + if v.properties.get("name") == in_city: + for r in v.out_edges: + if r.type == "Inside": + country = r.to_vertex + return mgp.Record(City=v, Relationship=r, Country=country) + + city = context.graph.create_vertex() # creating a new node with properties + city.add_label("City") + city.properties.set("id", city_id + 1) + city.properties.set("name", in_city) + city.properties.set("country", in_country) + city.properties.set("local_currency", currency) + city.properties.set("local_currency_code", currency_code) + + for v in context.graph.vertices: # creating a new relationship to an existing country + if v.properties.get("name") == in_country: + context.graph.create_edge(city, v, mgp.EdgeType("Inside")) + for r in city.out_edges: + if r.type == "Inside": + return mgp.Record(City=city, Relationship=r, Country=v) +``` + +At this point you can save the module and test the new additions to the +procedure, by running the following query: + +```cypher +CALL backpacking.new_city("Makarska","Croatia") YIELD *; +``` + + + +### Neither the country node nor the city node exist + +Lastly, in the case there is no `City` node nor `Country` node with the `name` +properties the same as the provided arguments, we need to create both. + +That is why we also need to find the largest ID among the `Country` nodes. +Because we only need to create a new `Country` node if one doesn't exist, we +will introduce a `country_exists` variable with a default value `False`. The +value of that flag will change to `True` only if both the Country node with the +`name` property the same as the `in_country` argument exists. + +This is also the finished procedure: + +```python +@mgp.write_proc +def new_city(context: mgp.ProcCtx, + in_city: mgp.Nullable[str], + in_country: mgp.Nullable[str] + ) -> mgp.Record(City = mgp.Vertex, + Relationship = mgp.Edge, + Country = mgp.Vertex): + + in_country_exists = False + country_id = 0 + city_id = 0 + currency = None + currency_code = None + + for v in context.graph.vertices: + label, = v.labels # get node label + if (label == "City") and (v.properties.get("id") > city_id): # the following 4 lines are getting the highest IDs + city_id = v.properties.get("id") + if (label == "Country") and (v.properties.get("id") > country_id): + country_id = v.properties.get("id") + if v.properties.get("country") == in_country: + country_exists = True # flag is changed to `True` + currency = v.properties.get("local_currency") + currency_code = v.properties.get("local_currency_code") + if v.properties.get("name") == in_city: + for r in v.out_edges: + if r.type == "Inside": + country = r.to_vertex + return mgp.Record(City=v, Relationship=r, Country=country) + + city = context.graph.create_vertex() # creating a new node with properties + city.add_label("City") + city.properties.set("id", city_id + 1) + city.properties.set("name", in_city) + city.properties.set("country", in_country) + city.properties.set("local_currency", currency) + city.properties.set("local_currency_code", currency_code) + + if in_country_exists == True: # creating a relationship if the country node exist + for v in context.graph.vertices: # creating a new relationship to an existing country + if v.properties.get("name") == in_country: + context.graph.create_edge(city, v, mgp.EdgeType("Inside")) + for r in city.out_edges: + if r.type == "Inside": + return mgp.Record(City=city, Relationship=r, Country=v) + + if in_country_exists == False: # creating a node and relationship if the country node doesn't exist + new_country = context.graph.create_vertex() + new_country.add_label("Country") + new_country.properties.set("id", country_id + 1) + new_country.properties.set("name", in_country) + context.graph.create_edge(city, new_country, mgp.EdgeType("Inside")) + for r in city.out_edges: + if r.type == "Inside": + return mgp.Record(City=city, Relationship=r, Country=new_country) +``` + +### Testing the write procedure + +Save the query module, switch to **Query Execution** and call the procedure +using the clause `CALL`, then calling the right module and procedure within it +(`backpacking.new_city`). List all arguments except the whole graph inside +brackets, and at the end `YIELD` all the results: + +```cypher +CALL backpacking.new_city("Zagreb", "Croatia") YIELD *; +``` + +The query returns existing `City` and `Country` nodes. + +```cypher +CALL backpacking.new_city("Vinkovci", "Croatia") YIELD *; +``` + +The query returns new `City` node connected to an existing `Country` node. + +```cypher +CALL backpacking.new_city("Vinkovci", "Makroland") YIELD *; +``` + +The query returns a new `City` node connected to a new `Country` node. + +## Where to next? + +Congratulations! You've written your first custom query module! Feel free to +play around with the Python API and let us know what you are working on through +our [Discord server](https://discord.gg/memgraph). diff --git a/docs2/custom-query-modules/python/mock-python-api.md b/docs2/custom-query-modules/python/mock-python-api.md new file mode 100644 index 00000000000..b09ea6ec42e --- /dev/null +++ b/docs2/custom-query-modules/python/mock-python-api.md @@ -0,0 +1,92 @@ +--- +id: mock-python-api +title: Mock Python query module API +sidebar_label: Mock Python API +slug: /reference-guide/query-modules/api/mock-python-api +--- + +The mock Python query module API enables you to develop and test query modules +for Memgraph without having to run a Memgraph instance by simulating its +behavior. As the mock API is compatible with the +[Python API](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md), +you can add modules developed with it to Memgraph as-is, without modifying the +code. + +It is implemented in `mgp_mock.py`, which contains definitions of all +classes and functions provided for developing query module procedures and +functions. The source file is located in the Memgraph installation directory, +inside `/usr/include/memgraph`. + +## API reference + +Because the mock API’s classes and functions are compatible with the corresponding +Python API classes and functions, the +[Python API reference](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md) +applies, with the following exceptions: + +* Query procedure returns (`Record` class) are printable. +* The mock API doesn’t throw errors having to do with Memgraph-internal + behavior (`UnableToAllocateError`, `InsufficientBufferError`, + `OutOfRangeError`, `KeyAlreadyExistsError`, `SerializationError` and + `AuthorizationError`). +* The mock API doesn’t contain two Python API methods dealing with + Memgraph-internal behavior (`must_abort` and `check_must_abort`). + These methods are used to check whether Memgraph has notified the query + module to abort its execution. +* The constructors of the `ProcCtx` and `FuncCtx` classes take a NetworkX + [MultiDiGraph](https://networkx.org/documentation/stable/reference/classes/multidigraph.html) + because that’s the data structure the mock API uses for internal graph + representations. +* Transformation modules are currently not implemented. + +### Graph representation + +The mock Python API uses a graph representation based on the NetworkX +[MultiDiGraph](https://networkx.org/documentation/stable/reference/classes/multidigraph.html), +which is a directed graph that supports parallel edges (relationships) and +custom node/relationship attributes. + +All elements of a Memgraph graph are supported by the mock API, with the +following rules about representing node labels and relationship types: + +* Node labels are stored in the node attribute named `"labels"` as a + `":"`-separated string, e.g. the node `(n:Actor:Director)` has + `{"labels": "Actor:Director"}`. +* Edge types are strings stored in `"type"`. + +## Using the mock API + +### Importing + +Before importing the mock API, you need to make it visible to the query module, +e.g. by adding the path of `mgp_mock.py` to PYTHONPATH or copying `mgp_mock.py` +to the directory containing the module. + +### Running + +The following code block contains an example query procedure and a runner for +query procedures: + +```python +import mgp_mock as mgp +import networkx as nx + +@mgp.read_proc +def example_procedure(context: mgp.ProcCtx) -> mgp.Record(status=str): + return mgp.Record(status="Hello, world!") + +graph = nx.MultiDiGraph() # Empty graph +context = mgp.ProcCtx(graph) # Create a context instance + +result = example_procedure(context) # Run the procedure +print(result) # Hello, world! +``` + +### Running the module with Memgraph + +As the mock Python API is compatible with the Python query module API, adding a +module developed with the mock API to Memgraph is a simple task. + +1. Replace the `mgp_mock` import with `import mgp` + * This includes refactoring the usages of `mgp_mock` (or alias) to `mgp`. +2. [Load the query module.](/reference-guide/query-modules/load-call-query-modules.md) diff --git a/docs2/custom-query-modules/python/python-api.md b/docs2/custom-query-modules/python/python-api.md new file mode 100644 index 00000000000..c28105c511f --- /dev/null +++ b/docs2/custom-query-modules/python/python-api.md @@ -0,0 +1,1916 @@ +--- +id: python-api +title: Query modules Python API +sidebar_label: Python API +slug: /reference-guide/query-modules/api/python-api +--- + +This is the API documentation for `mgp.py` that contains definitions of the +public Python API provided by Memgraph. In essence, this is a wrapper around the +**[C API](./c-api)**. This source file can be found in the Memgraph +installation directory, under `/usr/lib/memgraph/python_support`. + +:::tip + +For an example of how to implementΒ query modules in Python, take a look at [the +example we +provided](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md#python-api). + +::: + +:::tip + +If you install any Python modules after running Memgraph, you'll have to [load +them into Memgraph](../load-call-query-modules#loading-query-modules) or restart +Memgraph in order to use them. + +You can also develop query modules in Python from Memgraph Lab (v2.0 and newer). Just +navigate to **Query Modules** and click on **New Module** to start. + +::: + +:::info +If you need an additional Python library not included with Memgraph, check out +[the guide on how to install +it](/memgraph/how-to-guides/query-modules#how-to-install-external-python-libraries). +::: + + +## mgp.read_proc(func: Callable[…, mgp.Record]) + +Register func as a read-only procedure of the current module. + +`read_proc` is meant to be used as a decorator function to register module +procedures. The registered func needs to be a callable which optionally takes +`ProcCtx` as the first argument. Other arguments of func will be bound to values +passed in the Cypher query. The full signature of func needs to be annotated with +types. The return type must be `Record(field_name=type, …)` and the procedure must +produce either a complete `Record` or `None`. To mark a field as deprecated, use +`Record(field_name=Deprecated(type), …)`. Multiple records can be produced by +returning an iterable of them. Registering generator functions is currently not +supported. + +**Example usage** + +```python + import mgp + + @mgp.read_proc + def procedure(context: mgp.ProcCtx, + required_arg: mgp.Nullable[mgp.Any], + optional_arg: mgp.Nullable[mgp.Any] = None + ) -> mgp.Record(result=str, args=list): + args = [required_arg, optional_arg] + # Multiple rows can be produced by returning an iterable of mgp.Record + return mgp.Record(args=args, result='Hello World!') +``` + +The example procedure above returns 2 fields: `args` and `result`. +* `args` is a copy of arguments passed to the procedure. +* `result` is the result of this procedure, a β€œHello World!” string. + +Any errors can be reported by raising an `Exception`. + +The procedure can be invoked in Cypher using the following calls: + +```cypher +CALL example.procedure(1, 2) YIELD args, result; +CALL example.procedure(1) YIELD args, result; +``` + +Naturally, you may pass in different arguments or yield less fields. + +:::tip +Install the `mgp` Python module so your editor can use typing annotations +properly and suggest methods and classes it contains. You can install the module +by running `pip install mgp`. +::: + +## mgp.write_proc(func: Callable[…, mgp.Record]) + +Register func as a writeable procedure of the current module. + +`write_proc` is meant to be used as a decorator function to register module +procedures. The registered func needs to be a callable which optionally takes +`ProcCtx` as the first argument. Other arguments of func will be bound to values +passed in the Cypher query. The full signature of func needs to be annotated with +types. The return type must be `Record(field_name=type, …)` and the procedure must +produce either a complete `Record` or `None`. To mark a field as deprecated, use +`Record(field_name=Deprecated(type), …)`. Multiple records can be produced by +returning an iterable of them. Registering generator functions is currently not +supported. + +**Example usage** + +```python +import mgp + +@mgp.write_proc +def procedure(context: mgp.ProcCtx, + required_arg: str, + optional_arg: mgp.Nullable[str] = None + ) -> mgp.Record(result=mgp.Vertex): + + vertex = context.graph.create_vertex() + vertex_properties = vertex.properties + vertex_properties[β€œrequired_arg”] = required_arg + + if optional_arg is not None: + vertex_properties[β€œoptional_arg”] = optional_arg + + return mgp.Record(result=vertex) +``` + +The example procedure above returns a newly created vertex which has at most 2 +properties: +* `required_arg` is always present and its value is the first argument of the + procedure. +* `optional_arg` is present if the second argument of the procedure is not null. + +Any errors can be reported by raising an `Exception`. + +The procedure can be invoked in Cypher using the following calls: + +```cypher +CALL example.procedure(β€œproperty value”, β€œanother one”) YIELD result; +CALL example.procedure(β€œsingle argument”) YIELD result; +``` + +Naturally, you may pass in different arguments. + + +## mgp.add_batch_read_proc(func: Callable[…, mgp.Record], initializer: typing.Callable, cleanup: typing.Callable) + +Register `func` as a read-only batch procedure of the current module. + +`func` represents a function that is invoked through OpenCypher. Through OpenCypher user invokes `func`. Memgraph invokes first the `initializer` function. After the `initializer` function, `func` is called until it returns an empty result. Afterward, the `cleanup` function is called, which can be used to clean up global resources. Only at that point is garbage collection invoked, so any dangling references to Python objects will be cleaned. + +`initializer` must define the same parameters as the main `func` function, and will receive the same parameters as `func`. The position of arguments and the type of arguments must be the same. + +Otherwise, the same rules apply as in `read_proc`. It's important to keep in mind that no Memgraph resources can be stored in `init` and during batching. After `initializer` and each `func` call, every Memgraph-related object is invalidated and can't be used later on. + +## mgp.add_batch_write_proc(func: Callable[…, mgp.Record], initializer: typing.Callable, cleanup: typing.Callable) + +Register `func` as a writeable batch procedure of the current module. + +The same rules for parameters and order of calls to functions apply for a writeable procedure as for the read-only batched procedure. + +## mgp.function(func: Callable[[…]]) + +Register func as a Memgraph function in the current module. + +`function` is meant to be used as a decorator function to register module +functions. The registered func needs to be a callable which optionally takes +`FuncCtx` as the first argument. Other arguments of func will be bound to values +passed in the Cypher query. Only the funcion arguments need to be annotated with +types. The return type doesn't need to be specified, but it has to be supported +by `mgp.Any`. Registering generator functions is currently not supported. + +**Example usage** + +```python +import mgp + +@mgp.function +def func_example(context: mgp.FuncCtx, + required_arg: str, + optional_arg: mgp.Nullable[str] = None + ): + + return_args = [required_arg] + + if optional_arg is not None: + return_args.append(optional_arg) + + # Return any kind of result supported by mgp.Any + return return_args +``` + +The example function above returns a list of provided arguments: +* `required_arg` is always present and its value is the first argument of the + function. +* `optional_arg` is present if the second argument of the function is not + `null`. + +Any errors can be reported by raising an `Exception`. + +The function can be invoked in Cypher using the following calls: + +```cypher +RETURN example.func_example("first argument", "second_argument"); +RETURN example.func_example("first argument"); +``` + +Naturally, you may pass in different arguments. + +This module provides the API for usage in custom openCypher procedures. + +## Label Objects + +```python +class Label() +``` + +Label of a `Vertex`. + +### name + +```python +@property +def name() -> str +``` + +Get the name of the label. + +**Returns**: + + A string that represents the name of the label. + + +**Example**: + + ```label.name``` + +## Properties Objects + +```python +class Properties() +``` + +A collection of properties either on a `Vertex` or an `Edge`. + +### get() + +```python +def get(property_name: str, default=None) -> object +``` + +Get the value of a property with the given name or return default value. + +**Arguments**: + +- `property_name` - String that represents property name. +- `default` - Default value return if there is no property. + + +**Returns**: + + Any object value that property under `property_name` has or default value otherwise. + + +**Raises**: + +- `InvalidContextError` - If `edge` or `vertex` is out of context. +- `UnableToAllocateError` - If unable to allocate a `mgp.Value`. +- `DeletedObjectError` - If the `object` has been deleted. + + +**Examples**: + + ``` + vertex.properties.get(property_name) + edge.properties.get(property_name) + ``` + +### set() + +```python +def set(property_name: str, value: object) -> None +``` + +Set the value of the property. When the value is `None`, then the +property is removed. + +**Arguments**: + +- `property_name` - String that represents property name. +- `value` - Object that represents value to be set. + + +**Raises**: + +- `UnableToAllocateError` - If unable to allocate memory for storing the property. +- `ImmutableObjectError` - If the object is immutable. +- `DeletedObjectError` - If the object has been deleted. +- `SerializationError` - If the object has been modified by another transaction. +- `ValueConversionError` - If `value` is vertex, edge or path. + + +**Examples**: + + ``` + vertex.properties.set(property_name, value) + edge.properties.set(property_name, value) + ``` + +### items() + +```python +def items() -> typing.Iterable[Property] +``` + +Iterate over the properties. Doesn’t return a dynamic view of the properties but copies the +current properties. + +**Returns**: + + Iterable `Property` of names and values. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + items = vertex.properties.items() + for it in items: + name = it.name + value = it.value + ``` + ``` + items = edge.properties.items() + for it in items: + name = it.name + value = it.value + ``` + +### keys() + +```python +def keys() -> typing.Iterable[str] +``` + +Iterate over property names. Doesn’t return a dynamic view of the property names but copies the +name of the current properties. + +**Returns**: + + Iterable list of strings that represent names/keys of properties. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + graph.vertex.properties.keys() + graph.edge.properties.keys() + ``` + +### values() + +```python +def values() -> typing.Iterable[object] +``` + +Iterate over property values. Doesn’t return a dynamic view of the property values but copies the +value of the current properties. + +**Returns**: + + Iterable list of property values. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + vertex.properties.values() + edge.properties.values() + ``` + +### \_\_len\_\_ + +```python +def __len__() -> int +``` + +Get the number of properties. + +**Returns**: + + A number of properties on vertex or edge. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + len(vertex.properties) + len(edge.properties) + ``` + +### \_\_iter\_\_ + +```python +def __iter__() -> typing.Iterable[str] +``` + +Iterate over property names. + +**Returns**: + + Iterable list of strings that represent names of properties. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + iter(vertex.properties) + iter(edge.properties) + ``` + +### \_\_getitem\_\_ + +```python +def __getitem__(property_name: str) -> object +``` + +Get the value of a property with the given name or raise KeyError. + +**Arguments**: + +- `property_name` - String that represents property name. + + +**Returns**: + + Any value that property under property_name have. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate a mgp.Value. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + vertex.properties[property_name] + edge.properties[property_name] + ``` + +### \_\_setitem\_\_ + +```python +def __setitem__(property_name: str, value: object) -> None +``` + +Set the value of the property. When the value is `None`, then the +property is removed. + +**Arguments**: + +- `property_name` - String that represents property name. +- `value` - Object that represents value to be set. + + +**Raises**: + +- `UnableToAllocateError` - If unable to allocate memory for storing the property. +- `ImmutableObjectError` - If the object is immutable. +- `DeletedObjectError` - If the object has been deleted. +- `SerializationError` - If the object has been modified by another transaction. +- `ValueConversionError` - If `value` is vertex, edge or path. + + +**Examples**: + + ``` + vertex.properties[property_name] = value + edge.properties[property_name] = value + ``` + +### \_\_contains\_\_ + +```python +def __contains__(property_name: str) -> bool +``` + +Check if there is a property with the given name. + +**Arguments**: + +- `property_name` - String that represents property name + + +**Returns**: + + Bool value that depends if there is with a given name. + + +**Raises**: + +- `InvalidContextError` - If edge or vertex is out of context. +- `UnableToAllocateError` - If unable to allocate a mgp.Value. +- `DeletedObjectError` - If the object has been deleted. + + +**Examples**: + + ``` + if property_name in vertex.properties: + ``` + ``` + if property_name in edge.properties: + ``` + +## EdgeType Objects + +```python +class EdgeType() +``` + +Type of an Edge. + +### name + +```python +@property +def name() -> str +``` + +Get the name of EdgeType. + +**Returns**: + + A string that represents the name of EdgeType. + + +**Example**: + + ```edge.type.name``` + +## Edge Objects + +```python +class Edge() +``` + +Edge in the graph database. + +Access to an Edge is only valid during a single execution of a procedure in +a query. You should not globally store an instance of an Edge. Using an +invalid Edge instance will raise InvalidContextError. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Check if `edge` is in a valid context and may be used. + +**Returns**: + + A `bool` value depends on if the `edge` is in a valid context. + + +**Examples**: + + ```edge.is_valid()``` + +### underlying\_graph\_is\_mutable() + +```python +def underlying_graph_is_mutable() -> bool +``` + +Check if the `graph` can be modified. + +**Returns**: + + A `bool` value depends on if the `graph` is mutable. + + +**Examples**: + + ```edge.underlying_graph_is_mutable()``` + +### id + +```python +@property +def id() -> EdgeId +``` + +Get the ID of the edge. + +**Returns**: + + `EdgeId` represents ID of the edge. + + +**Raises**: + +- `InvalidContextError` - If edge is out of context. + + +**Examples**: + + ```edge.id``` + +### type + +```python +@property +def type() -> EdgeType +``` + +Get the type of edge. + +**Returns**: + + `EdgeType` describing the type of edge. + + +**Raises**: + +- `InvalidContextError` - If edge is out of context. + + +**Examples**: + + ```edge.type``` + +### from\_vertex() + +```python +@property +def from_vertex() -> "Vertex" +``` + +Get the source vertex. + +**Returns**: + + `Vertex` from where the edge is directed. + + +**Raises**: + +- `InvalidContextError` - If edge is out of context. + + +**Examples**: + + ```edge.from_vertex``` + +### to\_vertex() + +```python +@property +def to_vertex() -> "Vertex" +``` + +Get the destination vertex. + +**Returns**: + + `Vertex` to where the edge is directed. + + +**Raises**: + +- `InvalidContextError` - If edge is out of context. + + +**Examples**: + + ```edge.to_vertex``` + +### properties + +```python +@property +def properties() -> Properties +``` + +Get the properties of the edge. + +**Returns**: + + All `Properties` of edge. + + +**Raises**: + +- `InvalidContextError` - If edge is out of context. + + +**Examples**: + + ```edge.properties``` + +### \_\_eq\_\_ + +```python +def __eq__(other) -> bool +``` + +Raise InvalidContextError. + +## Vertex Objects + +```python +class Vertex() +``` + +Vertex in the graph database. + +Access to a Vertex is only valid during a single execution of a procedure +in a query. You should not globally store an instance of a Vertex. Using an +invalid Vertex instance will raise InvalidContextError. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Checks if `Vertex` is in valid context and may be used. + +**Returns**: + + A `bool` value depends on if the `Vertex` is in a valid context. + + +**Examples**: + + ```vertex.is_valid()``` + +### underlying\_graph\_is\_mutable() + +```python +def underlying_graph_is_mutable() -> bool +``` + +Check if the `graph` is mutable. + +**Returns**: + + A `bool` value depends on if the `graph` is mutable. + + +**Examples**: + + ```vertex.underlying_graph_is_mutable()``` + +### id + +```python +@property +def id() -> VertexId +``` + +Get the ID of the Vertex. + +**Returns**: + + `VertexId` represents ID of the vertex. + + +**Raises**: + +- `InvalidContextError` - If vertex is out of context. + + +**Examples**: + + ```vertex.id``` + +### labels + +```python +@property +def labels() -> typing.Tuple[Label] +``` + +Get the labels of the vertex. + +**Returns**: + + A tuple of `Label` representing vertex Labels + + +**Raises**: + +- `InvalidContextError` - If vertex is out of context. +- `OutOfRangeError` - If some of the labels are removed while collecting the labels. +- `DeletedObjectError` - If `Vertex` has been deleted. + + +**Examples**: + + ```vertex.labels``` + +### add\_label() + +```python +def add_label(label: str) -> None +``` + +Add the label to the vertex. + +**Arguments**: + +- `label` - String label to be added. + + +**Raises**: + +- `InvalidContextError` - If `Vertex` is out of context. +- `UnableToAllocateError` - If unable to allocate memory for storing the label. +- `ImmutableObjectError` - If `Vertex` is immutable. +- `DeletedObjectError` - If `Vertex` has been deleted. +- `SerializationError` - If `Vertex` has been modified by another transaction. + + +**Examples**: + + ```vertex.add_label(label)``` + +### remove\_label() + +```python +def remove_label(label: str) -> None +``` + +Remove the label from the vertex. + +**Arguments**: + +- `label` - String label to be deleted + +**Raises**: + +- `InvalidContextError` - If `Vertex` is out of context. +- `ImmutableObjectError` - If `Vertex` is immutable. +- `DeletedObjectError` - If `Vertex` has been deleted. +- `SerializationError` - If `Vertex` has been modified by another transaction. + + +**Examples**: + + ```vertex.remove_label(label)``` + +### properties + +```python +@property +def properties() -> Properties +``` + +Get the properties of the vertex. + +**Returns**: + + `Properties` on a current vertex. + + +**Raises**: + +- `InvalidContextError` - If `Vertex` is out of context. + + +**Examples**: + + ```vertex.properties``` + +### in\_edges + +```python +@property +def in_edges() -> typing.Iterable[Edge] +``` + +Iterate over inbound edges of the vertex. When the first parameter to a procedure is a projected graph, iterating will start over the inbound edges of the given vertex in the projected graph. +Doesn’t return a dynamic view of the edges but copies the +current inbound edges. + +**Returns**: + + Iterable list of `Edge` objects that are directed in towards the current vertex. + + +**Raises**: + +- `InvalidContextError` - If `Vertex` is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If `Vertex` has been deleted. + + +**Examples**: + + ```for edge in vertex.in_edges:``` + +### out\_edges + +```python +@property +def out_edges() -> typing.Iterable[Edge] +``` + +Iterate over outbound edges of the vertex. When the first parameter to a procedure is a projected graph, iterating will start over the outbound edges of the given vertex in the projected graph. + +Doesn’t return a dynamic view of the edges but copies the +current outbound edges. + +**Returns**: + + Iterable list of `Edge` objects that are directed out of the current vertex. + + +**Raises**: + +- `InvalidContextError` - If `Vertex` is out of context. +- `UnableToAllocateError` - If unable to allocate an iterator. +- `DeletedObjectError` - If `Vertex` has been deleted. + + +**Examples**: + + ```for edge in vertex.out_edges:``` + +### \_\_eq\_\_ + +```python +def __eq__(other) -> bool +``` + +Raise InvalidContextError + +## Path Objects + +```python +class Path() +``` + +Path containing Vertex and Edge instances. + +### \_\_init\_\_ + +```python +def __init__(starting_vertex_or_path: typing.Union[_mgp.Path, Vertex]) +``` + +Initialize with a starting Vertex. + +**Raises**: + +- `InvalidContextError` - If passed in Vertex is invalid. +- `UnableToAllocateError` - If cannot allocate a path. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Check if `Path` is in valid context and may be used. + +**Returns**: + + A `bool` value depends on if the `Path` is in a valid context. + + +**Examples**: + + ```path.is_valid()``` + +### expand() + +```python +def expand(edge: Edge) +``` + +Append an edge continuing from the last vertex on the path. + +The last vertex on the path will become the other endpoint of the given +edge, as continued from the current last vertex. + +**Arguments**: + +- `edge` - `Edge` that is added to the path + + +**Raises**: + +- `InvalidContextError` - If using an invalid `Path` instance or if passed in `Edge` is invalid. +- `LogicErrorError` - If the current last vertex in the path is not part of the given edge. +- `UnableToAllocateError` - If unable to allocate memory for path extension. + + +**Examples**: + + ```path.expand(edge)``` + +### vertices + +```python +@property +def vertices() -> typing.Tuple[Vertex, ...] +``` + +Vertices are ordered from the start to the end of the path. + +**Returns**: + + A tuple of `Vertex` objects order from start to end of the path. + + +**Raises**: + +- `InvalidContextError` - If using an invalid Path instance. + + +**Examples**: + + ```path.vertices``` + +### edges + +```python +@property +def edges() -> typing.Tuple[Edge, ...] +``` + +Edges are ordered from the start to the end of the path. + +**Returns**: + + A tuple of `Edge` objects order from start to end of the path + +**Raises**: + +- `InvalidContextError` - If using an invalid `Path` instance. + +**Examples**: + + ```path.edges``` + +## Record Objects + +```python +class Record() +``` + +Represents a record of resulting field values. + +### \_\_init\_\_ + +```python +def __init__(**kwargs) +``` + +Initialize with name=value fields in kwargs. + +## Vertices Objects + +```python +class Vertices() +``` + +Iterable over vertices in a graph. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Check if `Vertices` is in valid context and may be used. + +**Returns**: + + A `bool` value depends on if the `Vertices` is in valid context. + + +**Examples**: + + ```vertices.is_valid()``` + +### \_\_iter\_\_ + +```python +def __iter__() -> typing.Iterable[Vertex] +``` + +Iterate over vertices. + +**Returns**: + + Iterable list of `Vertex` objects. + + +**Raises**: + +- `InvalidContextError` - If context is invalid. +- `UnableToAllocateError` - If unable to allocate an iterator or a vertex. + + +**Examples**: + + ```for vertex in graph.vertices``` + + +### \_\_contains\_\_ + +```python +def __contains__(vertex) +``` + +Check if Vertices contain the given vertex. + +**Arguments**: + +- `vertex` - `Vertex` to be checked if it is a part of graph `Vertices`. + + +**Returns**: + + Bool value depends if there is `Vertex` in graph `Vertices`. + + +**Raises**: + +- `UnableToAllocateError` - If unable to allocate the vertex. + + +**Examples**: + + ```if vertex in graph.vertices:``` + +### \_\_len\_\_ + +```python +def __len__() +``` + +Get the number of vertices. + +**Returns**: + + A number of vertices in the graph. + + +**Raises**: + +- `InvalidContextError` - If context is invalid. +- `UnableToAllocateError` - If unable to allocate an iterator or a vertex. + + +**Examples**: + + ```len(graph.vertices)``` + +## Graph Objects + +```python +class Graph() +``` + +State of the graph database in current ProcCtx. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Check if `graph` is in a valid context and may be used. + +**Returns**: + + A `bool` value depends on if the `graph` is in a valid context. + + +**Examples**: + + ```graph.is_valid()``` + +### get\_vertex\_by\_id() + +```python +def get_vertex_by_id(vertex_id: VertexId) -> Vertex +``` + +Return the Vertex corresponding to the given vertex_id from the graph. When the first parameter to a procedure is a projected graph, the vertex must also exist in the projected graph. + +Access to a Vertex is only valid during a single execution of a +procedure in a query. You should not globally store the returned +Vertex. + +**Arguments**: + +- `vertex_id` - Memgraph Vertex ID + + +**Returns**: + + `Vertex`corresponding to `vertex_id` + + +**Raises**: + +- `IndexError` - If unable to find the given vertex_id. +- `InvalidContextError` - If context is invalid. + + +**Examples**: + + ```graph.get_vertex_by_id(vertex_id)``` + +### vertices + +```python +@property +def vertices() -> Vertices +``` + +Get all vertices in the graph. + +Access to a Vertex is only valid during a single execution of a +procedure in a query. You should not globally store the returned Vertex +instances. + +**Returns**: + + `Vertices` that contained in the graph. + + +**Raises**: + +- `InvalidContextError` - If context is invalid. + + +**Examples**: + + Iteration over all graph `Vertices`. + + ``` + graph = context.graph + for vertex in graph.vertices: + ``` + +### is\_mutable() + +```python +def is_mutable() -> bool +``` + +Check if the graph is mutable. Thus it can be used to modify vertices and edges. + +**Returns**: + + A `bool` value that depends if the graph is mutable or not. + + +**Examples**: + + ```graph.is_mutable()``` + +### create\_vertex() + +```python +def create_vertex() -> Vertex +``` + +Create an empty vertex. When the first parameter to a procedure is a projected graph, the vertex is also added to the projected graph view. + +**Returns**: + + Created `Vertex`. + + +**Raises**: + +- `ImmutableObjectError` - If `graph` is immutable. +- `UnableToAllocateError` - If unable to allocate a vertex. + + +**Examples** + + ```vertex = graph.create_vertex()``` + +### delete\_vertex() + +```python +def delete_vertex(vertex: Vertex) -> None +``` + +Delete a vertex if there are no edges. When the first parameter to a procedure is a projected graph, the vertex must also exist in the projected graph. + +**Arguments**: + +- `vertex` - `Vertex` to be deleted + +**Raises**: + +- `ImmutableObjectError` - If `graph` is immutable. +- `LogicErrorError` - If `vertex` has edges. +- `SerializationError` - If `vertex` has been modified by another transaction. + +**Examples**: + + ```graph.delete_vertex(vertex)``` + +### detach\_delete\_vertex() + +```python +def detach_delete_vertex(vertex: Vertex) -> None +``` + +Delete a vertex and all of its edges. When the first parameter to a procedure is a projected graph, such an operation is not possible. + +**Arguments**: + +- `vertex` - `Vertex` to be deleted with all of its edges + + +**Raises**: + +- `ImmutableObjectError` - If `graph` is immutable. +- `SerializationError` - If `vertex` has been modified by another transaction. + +**Examples**: + + ```graph.detach_delete_vertex(vertex)``` + +### create\_edge() + +```python +def create_edge(from_vertex: Vertex, to_vertex: Vertex, + edge_type: EdgeType) -> Edge +``` + +Create an edge. When the first parameter is a projected graph, it will create a new directed edge with a specified label only if both vertices are a part of the projected graph. + +**Returns**: + + Created `Edge`. + +**Arguments**: + +- `from_vertex` - `Vertex` from where edge is directed. +- `to_vertex` - `Vertex` to where edge is directed. +- `edge_type` - `EdgeType` defines the type of edge. + + +**Raises**: + +- `ImmutableObjectError` - If `graph` is immutable. +- `UnableToAllocateError` - If unable to allocate an edge. +- `DeletedObjectError` - If `from_vertex` or `to_vertex` has been deleted. +- `SerializationError` - If `from_vertex` or `to_vertex` has been modified by another transaction. + +**Examples**: + + ```edge = graph.create_edge(from_vertex, vertex, edge_type)``` + +### delete\_edge() + +```python +def delete_edge(edge: Edge) -> None +``` + +Delete an edge. When the first parameter to a procedure is a projected graph, the edge must also exist in the projected graph. + +**Arguments**: + +- `edge` - `Edge` to be deleted + + +**Raises**: + +- ImmutableObjectError: If `graph` is immutable. +- Raise SerializationError: If `edge`, its source or destination vertex has been modified by another transaction. + +**Examples** + + ```graph.delete_edge(edge)``` + +## AbortError Objects + +```python +class AbortError(Exception) +``` + +Signals that the procedure was asked to abort its execution. + +## ProcCtx Objects + +```python +class ProcCtx() +``` + +Context of a procedure being executed. + +Access to a ProcCtx is only valid during a single execution of a procedure +in a query. You should not globally store a ProcCtx instance. + +### graph + +```python +@property +def graph() -> Graph +``` + +Access to `Graph` object. + +**Returns**: + + Graph object. + + +**Raises**: + +- `InvalidContextError` - If context is invalid. + + +**Examples**: + + ```context.graph``` + + +## Logger Objects + +```python +class Logger() +``` + +Class for logging. + +### info() + +```python +def info(out: str) -> None +``` +Logs a message `out` on `INFO` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.info("message")``` + +### debug() + +```python +def debug(out: str) -> None +``` +Logs a message `out` on `DEBUG` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.debug("message")``` + + +### error() + +```python +def error(out: str) -> None +``` +Logs a message `out` on `ERROR` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.error("message")``` + +### trace() + +```python +def trace(out: str) -> None +``` +Logs a message `out` on `TRACE` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.trace("message")``` + + +### warning() + +```python +def warning(out: str) -> None +``` +Logs a message `out` on `WARNING` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.warning("message")``` + +### critical() + +```python +def critical(out: str) -> None +``` +Logs a message `out` on `CRITICAL` log level. + +**Arguments**: + +- `out` - `str` to be logged + + +**Examples** + + ```logger.critical("message")``` + +## UnsupportedTypingError Objects + +```python +class UnsupportedTypingError(Exception) +``` + +Signals a typing annotation is not supported as a _mgp.CypherType. + +## Deprecated Objects + +```python +class Deprecated() +``` + +Annotate a resulting Record's field as deprecated. + +### read\_proc() + +```python +def read_proc(func: typing.Callable[..., Record]) +``` + +Register `func` as a read-only procedure of the current module. + +The decorator `read_proc` is meant to be used to register module procedures. +The registered `func` needs to be a callable which optionally takes +`ProcCtx` as its first argument. Other arguments of `func` will be bound to +values passed in the cypherQuery. The full signature of `func` needs to be +annotated with types. The return type must be `Record(field_name=type, ...)` +and the procedure must produce either a complete Record or None. To mark a +field as deprecated, use `Record(field_name=Deprecated(type), ...)`. +Multiple records can be produced by returning an iterable of them. +Registering generator functions is currently not supported. + + +### write\_proc() + +```python +def write_proc(func: typing.Callable[..., Record]) +``` + +Register `func` as a writeable procedure of the current module. + +The decorator `write_proc` is meant to be used to register module +procedures. The registered `func` needs to be a callable which optionally +takes `ProcCtx` as the first argument. Other arguments of `func` will be +bound to values passed in the cypherQuery. The full signature of `func` +needs to be annotated with types. The return type must be +`Record(field_name=type, ...)` and the procedure must produce either a +complete Record or None. To mark a field as deprecated, use +`Record(field_name=Deprecated(type), ...)`. Multiple records can be produced +by returning an iterable of them. Registering generator functions is +currently not supported. + + +## InvalidMessageError Objects + +```python +class InvalidMessageError(Exception) +``` + +Signals using a message instance outside of the registered transformation. + +## Message Objects + +```python +class Message() +``` + +Represents a message from a stream. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Return True if `self` is in valid context and may be used. + +### source\_type() + +```python +def source_type() -> str +``` + +Supported in all stream sources + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +### payload() + +```python +def payload() -> bytes +``` + +Supported stream sources: + - Kafka + - Pulsar + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +### topic\_name() + +```python +def topic_name() -> str +``` + +Supported stream sources: + - Kafka + - Pulsar + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +### key() + +```python +def key() -> bytes +``` + +Supported stream sources: + - Kafka + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +### timestamp() + +```python +def timestamp() -> int +``` + +Supported stream sources: + - Kafka + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +### offset() + +```python +def offset() -> int +``` + +Supported stream sources: + - Kafka + +Raise InvalidArgumentError if the message is from an unsupported stream source. + +## InvalidMessagesError Objects + +```python +class InvalidMessagesError(Exception) +``` + +Signals using a messages instance outside of the registered transformation. + +## Messages Objects + +```python +class Messages() +``` + +Represents a list of messages from a stream. + +### is\_valid() + +```python +def is_valid() -> bool +``` + +Return True if `self` is in valid context and may be used. + +### message\_at() + +```python +def message_at(id: int) -> Message +``` + +Raise InvalidMessagesError if context is invalid. + +### total\_messages() + +```python +def total_messages() -> int +``` + +Raise InvalidContextError if context is invalid. + +## TransCtx Objects + +```python +class TransCtx() +``` + +Context of a transformation being executed. + +Access to a TransCtx is only valid during a single execution of a transformation. +You should not globally store a TransCtx instance. + +### graph + +```python +@property +def graph() -> Graph +``` + +Raise InvalidContextError if context is invalid. + +## FuncCtx Objects + +```python +class FuncCtx() +``` + +Context of a function being executed. + +Access to a FuncCtx is only valid during a single execution of a function in +a query. You should not globally store a FuncCtx instance. The graph object +within the FuncCtx is not mutable. + +### function() + +```python +def function(func: typing.Callable) +``` + +Register `func` as a user-defined function in the current module. + +The decorator `function` is meant to be used to register module functions. +The registered `func` needs to be a callable which optionally takes +`FuncCtx` as its first argument. Other arguments of `func` will be bound to +values passed in the Cypher query. Only the function arguments need to be +annotated with types. The return type doesn't need to be specified, but it +has to be supported by `mgp.Any`. Registering generator functions is +currently not supported. + + + + +## InvalidContextError Objects + +```python +class InvalidContextError(Exception) +``` + +Signals using a graph element instance outside of the registered procedure. + +## UnknownError Objects + +```python +class UnknownError(_mgp.UnknownError) +``` + +Signals unspecified failure. + +## UnableToAllocateError Objects + +```python +class UnableToAllocateError(_mgp.UnableToAllocateError) +``` + +Signals failed memory allocation. + +## InsufficientBufferError Objects + +```python +class InsufficientBufferError(_mgp.InsufficientBufferError) +``` + +Signals that some buffer is not big enough. + +## OutOfRangeError Objects + +```python +class OutOfRangeError(_mgp.OutOfRangeError) +``` + +Signals that an index-like parameter has a value that is outside its +possible values. + +## LogicErrorError Objects + +```python +class LogicErrorError(_mgp.LogicErrorError) +``` + +Signals faulty logic within the program such as violating logical +preconditions or class invariants and may be preventable. + +## DeletedObjectError Objects + +```python +class DeletedObjectError(_mgp.DeletedObjectError) +``` + +Signals accessing an already deleted object. + +## InvalidArgumentError Objects + +```python +class InvalidArgumentError(_mgp.InvalidArgumentError) +``` + +Signals that some of the arguments have invalid values. + +## KeyAlreadyExistsError Objects + +```python +class KeyAlreadyExistsError(_mgp.KeyAlreadyExistsError) +``` + +Signals that a key already exists in a container-like object. + +## ImmutableObjectError Objects + +```python +class ImmutableObjectError(_mgp.ImmutableObjectError) +``` + +Signals modification of an immutable object. + +## ValueConversionError Objects + +```python +class ValueConversionError(_mgp.ValueConversionError) +``` + +Signals that the conversion failed between python and cypher values. + +## SerializationError Objects + +```python +class SerializationError(_mgp.SerializationError) +``` + +Signals serialization error caused by concurrent modifications from +different transactions. + +## AuthorizationError Objects + +```python +class AuthorizationError(_mgp.AuthorizationError) +``` + +Signals that the user doesn't have sufficient permissions to perform +procedure call. diff --git a/docs2/custom-query-modules/python/python-example.md b/docs2/custom-query-modules/python/python-example.md new file mode 100644 index 00000000000..6945a51f0bb --- /dev/null +++ b/docs2/custom-query-modules/python/python-example.md @@ -0,0 +1,690 @@ +# Example of a query module written in Python + +We will examine how the query module `example` is implemented using the +C API and the Python API. Both query modules can be found in the +`/usr/lib/memgraph/query_modules` directory. + +If you require more information about what query modules are, please +read [the query modules overview page](/reference-guide/query-modules/overview.md) + +## Python API + +Query modules can be implemented using [the Python API](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md) +provided by Memgraph. If you wish to write your own query modules using the +Python API, you need to have Python version `3.5.0` or above installed. + +Every single Memgraph installation comes with the `py_example.py` query module +located in the `/usr/lib/memgraph/query_modules` directory. It was provided +as an example of a `.py` query module for you to examine and learn from. + +If you are working with Docker and would like to open the file on your computer, +copy it from the Docker container. + +
+ Transferring files from a Docker container + + If you are using Docker to run Memgraph, you can copy the files from the + Docker container to your local directory. + +

+ + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Position yourself in the directory where you want to transfer the file. + + **4.** Copy a file from the container to the current directory: + + ``` + docker cp CONTAINER ID:/usr/lib/memgraph/query_modules/py_example.py py_example.py + ``` + + Don't forget to replace the `CONTAINER ID`. +
+ +You can develop query modules in Python from Memgraph Lab (v2.0 and newer). Just +navigate to **Query Modules** and click on **New Module** to start. + + + +:::info +If you need an additional Python library not included with Memgraph, check out +[the guide on how to install +it](/memgraph/how-to-guides/query-modules#how-to-install-external-python-libraries). +::: + +### Readable procedure + +Let's take a look at the `py_example.py` file and its first line: + +```python +import mgp +``` + +On the first line, we import the `mgp` module, which contains definitions of the +public Python API provided by Memgraph. In essence, this is a wrapper around the +C API described in the next section. This file (`mgp.py`) can be found in the +Memgraph installation directory `/usr/lib/memgraph/python_support`. + +Because our procedure will only read from the database, we pass it to a +`read_proc` decorator, which handles read-only procedures. You can also inspect +the definition of said decorator in the `mgp.py` file or take a look at the +[Python API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md). + +Next, we define the `procedure` that will be used as the callback for our +`py_example.procedure` invocation through Cypher. + +```python +@mgp.read_proc +def procedure(context: mgp.ProcCtx, + required_arg: mgp.Nullable[mgp.Any], + optional_arg: mgp.Nullable[mgp.Any] = None + ) -> mgp.Record(args=list, + vertex_count=int, + avg_degree=mgp.Number, + props=mgp.Nullable[mgp.Map]): + + ... +``` + +Because we need to access the graph to get results, the first argument takes the +`ProcCtx` type, which is actually the graph. Then we defined two arguments, a +required and an optional argument that will be bound to the values passed in +the Cypher query. They can be either null or of any type. + +The return type must be `Record(field_name=type, ...)`, and the procedure must +produce either a complete `Record` or `None`. + +In our case, the example procedure returns four fields: + +- `args`: a copy of arguments passed to the procedure. +- `vertex_count`: number of vertices in the database. +- `avg_degree`: average degree of vertices. +- `props`: properties map of the Vertex or Edge object passed as the `required_arg`. + In case a Path object is passed, the procedure returns the properties map + of the starting vertex. + +We defined that this procedure can be invoked in Cypher as follows: + +```cypher +MATCH (n) WITH n LIMIT 1 CALL py_example.procedure(n, 1) YIELD * RETURN *; +``` + +To get the `props` result, first we need to check if the passed argument is an +Edge, Vertex or Path and create the properties map: + +```python +if isinstance(required_arg, (mgp.Edge, mgp.Vertex)): + props = dict(required_arg.properties.items()) +elif isinstance(required_arg, mgp.Path): + start_vertex, = required_arg.vertices + props = dict(start_vertex.properties.items()) +``` + +In the case of `mgp.Edge` and `mgp.Vertex`, we obtain an instance of +`mgp.Properties` class and invoke the `items()` method which returns an +`Iterable` containing `mgp.Property` objects of our `mgp.Edge` or +`mgp.Vertex`. Since the type of `mgp.Property` is a simple +`collections.namedtuple` containing `name` and `value`, we can easily pass it to +a `dict` constructor thus creating a map. + + + +To get the `vertex_count` result we need to count the number of vertices and +edges in our graph: + +```python +vertex_count = 0 +edge_count = 0 +for v in context.graph.vertices: + vertex_count += 1 + edge_count += sum(1 for e in v.in_edges) + edge_count += sum(1 for e in v.out_edges) +``` + +First, we set our variables and then access the `mgp.Graph` instance via +`context.graph`. The `mgp.Graph` instance contains the state of the database at +the time execution of the Cypher query that is calling our procedure. The +`mgp.Graph` instance also has the property `vertices` that allows us to access +the `mgp.Vertices` object, which can be iterated upon, thus +increasing the variable on each traversed vertex. + +Similarly, each `mgp.Vertex` object has `in_edges` and `out_edges` properties, +allowing us to iterate over the corresponding `mgp.Edge` objects, thus +increasing the variable on each traversed edge. + +Lastly, we calculate the `avg_degree` value and obtain a copy of the passed +arguments: + +```python +avg_degree = 0 if vertex_count == 0 else edge_count / vertex_count +args_copy = [copy.deepcopy(required_arg), copy.deepcopy(optional_arg)] +``` + +At the end, we return a `mgp.Record` with all the calculated values: + +```python +return mgp.Record(args=args_copy, vertex_count=vertex_count, + avg_degree=avg_degree, props=props) +``` + +### Writeable procedures + +Writeable procedures are implemented similarly as read-only procedures. +The only difference is that writeable procedures receive mutable objects. +Therefore they can create and delete vertices or edges, modify the properties of +vertices and edges, and add or remove labels of vertices. + +We can implement a very simple writeable query module similarly to read-only +procedures. The following procedure creates a new vertex with a certain property +name and its value passed as arguments and connects it to all existing vertices +that have a property with the same name and value: + +```python +@mgp.write_proc +def write_procedure(context: mgp.ProcCtx, + property_name: str, + property_value: mgp.Nullable[mgp.Any] + ) -> mgp.Record(created_vertex=mgp.Vertex): + # Collect all the vertices that have a property with + # the same name and value as the passed arguments + vertices_to_connect = [] + for v in context.graph.vertices: + if v.properties[property_name] == property_value: + vertices_to_connect.append(v) + # Create the new vertex and set its property + vertex = context.graph.create_vertex() + vertex.properties.set(property_name, property_value) + # Connect the new vertex to the other vertices + for v in vertices_to_connect: + context.graph.create_edge(vertex, v, mgp.EdgeType("HAS_SAME_VALUE")) + # Return a field containing the newly created vertex + return mgp.Record(created_vertex=vertex) + +``` +### Batched read procedures + +Similar to the regular `read` procedure, Memgraph also includes batched `read` procedure. Batched procedures are very similar to regular procedures. The key difference is that batched procedures return results in batches, mostly to reduce memory consumption. For batched procedures, you need to define **three** functions: +* `batching` function - similar to the main function in regular procedures +* `initialization` function - function to initialize stream, open source file, etc. +* `cleanup` function - function to close a stream, source file, etc. + +Since there are three functions, construct works as follows: +- `initialization` function must be defined in a way it receives the same parameters in the same order as batching function, including `mgp.ProcCtx` if it's defined as the first parameter +- when calling the procedure from the query, you need to call the `batching` function +- Memgraph calls `initialization` before the `batching` function +- `batching` function needs to return an empty result at some point, which signals the end of the stream +- `cleanup` function is called at the end of the stream + + +There is no decorator used to register batched read procedure, so use the `mgp` function `mgp.add_batch_read_proc('batch', 'init', 'cleanup')` + +```python +mysql_dict = {} + + +def init_migrate( + table: str, + config: mgp.Map, +): + global mysql_dict + + query = f"SELECT * FROM {table};" + mysql_dict = {} + # Init dict and store variables for later reference. + if mysql_dict is None: + connection = mysql_connector.connect(**config) + cursor = connection.cursor(buffered=True) + # Executes, but doesn't fetch. Fetching is done in batches + # in `migrate` + cursor.execute(query) + + mysql_dict["connection"] = connection + mysql_dict["cursor"] = cursor + mysql_dict["column_names"] = [column[0] for column in cursor.description] + +def migrate( + table_or_sql: str, + config: mgp.Map, +) -> mgp.Record(row=mgp.Map): + + global mysql_dict + cursor = mysql_dict["cursor"] + column_names = mysql_dict["column_names"] + rows = cursor.fetchmany(1000) + return [mgp.Record(row=_name_row_cells(row, column_names)) for row in rows] + +def cleanup_migrate(): + global mysql_dict + mysql_dict["cursor"] = None + mysql_dict["connection"].close() + mysql_dict["connection"].commit() + mysql_dict["connection"] = None + mysql_dict["column_names"] = None + mysql_dict = None + + +mgp.add_batch_read_proc(migrate, init_migrate, cleanup_migrate) + +def _name_row_cells(row_cells, column_names) -> Dict[str, Any]: + return dict(map(lambda column, value: (column, value), column_names, row_cells)) +``` + + + +### Batched write procedures + +Similar to batched `read` procedures, you can define batched `write` procedures. Batched procedures can return results in batches, mostly to reduce memory consumption. For batch `write` procedures like for batched `read` procedures you need to define **three** functions: +* `batching` function - similar to the main function in regular procedures +* `initialization` function - function to initialize a stream, open source file, etc. +* `cleanup` function - function to close a stream, source file, etc. + +Since there are three functions, construct works as follows: +- `initialization` function must be defined in a way it receives the same parameters in the same order as batching function, including `mgp.ProcCtx` if it's defined as the first parameter +- when calling the procedure from the query, you need to call the `batching` function +- Memgraph calls `initialization` before the `batching` function +- `batching` function needs to return an empty result at some point, which signals the end of the stream +- `cleanup` function is called at the end of the stream + +There is no decorator used to register batched read procedure, so use the `mgp` function `mgp.add_batch_write_proc('batch', 'init', 'cleanup')` + + +```python +mysql_dict = {} + +def init_migrate( + ctx: mgp.ProcCtx, + table: str, + config: mgp.Map, +): + global mysql_dict + + query = f"SELECT * FROM {table};" + mysql_dict = {} + + if mysql_dict is None: + connection = mysql_connector.connect(**config) + cursor = connection.cursor(buffered=True) + cursor.execute(query) + + mysql_dict["connection"] = connection + mysql_dict["cursor"] = cursor + mysql_dict["column_names"] = [column[0] for column in cursor.description] + +def migrate( + ctx: mgp.ProcCtx, + table_or_sql: str, + config: mgp.Map, +) -> mgp.Record(vertex=mgp.Vertex): + + global mysql_dict + cursor = mysql_dict["cursor"] + column_names = mysql_dict["column_names"] + rows = cursor.fetchmany(1000) + results = [] + for row in rows: + # For every row from database, create vertex + # and add properties from database + v=ctx.graph.create_vertex() + for key,value in _name_row_cells(row, column_names): + v.properties.set(key,value) + results.append(mgp.Record(vertex=v)) + return results + +def cleanup_migrate(): + global mysql_dict + mysql_dict["cursor"] = None + mysql_dict["connection"].close() + mysql_dict["connection"].commit() + mysql_dict["connection"] = None + mysql_dict["column_names"] = None + mysql_dict = None + + +mgp.add_batch_write_proc(migrate, init_migrate, cleanup_migrate) +``` + +### Magic functions + +User-defined, or so-called "Memgraph Magic functions" are implemented similarly +to read and write procedures. The difference between these is the end use-case +and graph mutability. Users should not modify (create, delete, or update) any +graph objects through functions. + +Semantically, functions should be small fragments of functionality that do not +require long computations and large memory consumption. + +The example of how to create and run a function is written below. This example +shows one trivial use-case of fetching the arguments as a list of returning +values. + +```python +@mgp.function +def func_example(context: mgp.FuncCtx, + argument: mgp.Any, + opt_argument: mgp.Nullable[mgp.Any] = None): + return_arguments = [argument] + + if opt_argument is not None: + return_arguments.append(opt_argument) + + # Note that we do not need to specify the result Record as long as it is a + # Memgraph defined value type. + return return_arguments +``` + +At first glance, there is a huge similarity between defining a function and a +procedure. Let's talk about differences. The first difference is the context +type. `FuncCtx` prevents you to modify the graph and does not offer the API to +communicate with the graph entities not related to the entry arguments. + +The second difference is the resulting signature. Functions do not require the +user to provide a resulting signature because of the return value. A function +call can be nested in Cypher and therefore the only requirement for the +returning value is to be of a supported `mgp.Type`. + +The Cypher call for the written custom function can be executed like this: + +```cypher +RETURN py_example.func_example("First argument", "Second argument"); +``` + +This call can also be nested and used as a preprocessing for some other function +or procedure. The example of how to combine a built-in function with the +currently developed one looks like +this: + +```cypher +RETURN head(py_example.func_example("First argument", "Second argument")); +``` + +Python API provided by Memgraph can be a very powerful tool for implementing +query modules. We strongly suggest you thoroughly inspect the `mgp.py` source +file located in the Memgraph installation directory +`/usr/lib/memgraph/python_support`. + +:::warning + +Do not store any graph elements globally when writing custom query modules with +the intent to use them in a different procedure invocation. + +::: + +### Terminate procedure execution + +Just as the execution of a Cypher query can be terminated with [`TERMINATE +TRANSACTIONS "id";`](/reference-guide/transactions.md) query, +the execution of the procedure can as well, if it takes too long to yield a +response or gets stuck in an infinite loop due to unpredicted input data. + +Transaction ID is visible upon calling the SHOW TRANSACTIONS; query. + +In order to be able to terminate the procedure, it has to contain function +`ctx.check_must_abort()` which precedes crucial parts of the code, such as +`while` and `until` loops, or similar points where the procedure might become +costly. + +Consider the following example: + +```python +import mgp + +@mgp.read_proc +def long_query(ctx: mgp.ProcCtx) -> mgp.Record(my_id=int): + id = 1 + try: + while True: + if ctx.check_must_abort(): + break + id += 1 + except mgp.AbortError: + return mgp.Record(my_id=id) +``` + +The `mgp.AbortError:` ensures that the correct message about termination is sent +to the session where the procedure was originally run. + +## C API + +Query modules can be implemented using the [C +API](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md) +provided by Memgraph. Such modules need to be compiled to a shared library so +that they can be loaded when Memgraph starts. This means that you can write the +procedures in any programming language that can work with C and be compiled to +the ELF shared library format (`.so`). + +:::warning + +If the programming language of your choice throws exceptions, these exceptions +should never leave the scope of your module! You should have a top-level +exception handler that returns an error value and potentially logs the error +message. Exceptions that cross the module boundary will cause unexpected issues. + +::: + +Every single Memgraph installation comes with the `example.so` query module +located in the `/usr/lib/memgraph/query_modules` directory. It was provided as +an example of a query module written with C API for you to examine and learn +from. The `query_module` directory also contains `src` directory, with +`example.c` file. + +Let's take a look at the `example.c` file. + +```c +#include "mg_procedure.h" +``` + +In the first line, we include `mg_procedure.h`, which contains declarations of +all functions that can be used to implement a query module procedure. This file +is located in the Memgraph installation directory, under +`/usr/include/memgraph`. To compile the module, you will have to pass the +appropriate flags to the compiler, for example, `clang`: + +```plaintext +clang -Wall -shared -fPIC -I /usr/include/memgraph example.c -o example.so +``` + +### Query procedures + +Next, we have a `procedure` function. This function will serve as the callback +for our `example.procedure` invocation through Cypher. + +```c +static void procedure(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_result *result, struct mgp_memory *memory) { + ... +} +``` + +If this was C++ you'd probably write the function like this: + +```cpp +namespace { +void procedure(const mgp_list *args, const mgp_graph *graph, + mgp_result *result, mgp_memory *memory) { + try { + ... + } catch (const std::exception &e) { + // We must not let any exceptions out of our module. + mgp_result_set_error_msg(result, e.what()); + return; + } +} +} +``` + +The `procedure` function receives the list of arguments (`args`) passed in the +query. The parameter `result` is used to fill in the resulting records of the +procedure. Parameters `graph` and `memory` are context parameters of the +procedure, and they are used in some parts of the provided C API. + +For more information on what exactly is possible with C API, take a look at the +`mg_procedure.h` file or the [C API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md). + +The following line contains the `mgp_init_module` function that registers procedures +that can be invoked through Cypher. Even though the example has only one +`procedure`, you can register multiple different procedures in a single module. + +Procedures are invoked using the `CALL . ...` syntax. The +`` will correspond to the name of the shared library. Since we +compile our example into `example.so`, then the module is called `example`. +Procedure names can be different than their corresponding implementation +callbacks because the procedure name is defined when registering a procedure. + +```c +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `procedure` as a read procedure with the name "procedure". + struct mgp_proc *proc = + mgp_module_add_read_procedure(module, "procedure", procedure); + // Return non-zero on error. + if (!proc) return 1; + // Additional code for better specifying the procedure (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` + +The omitted part specifies the signature of the registered procedure. The +signature specification states what kind of arguments a procedure accepts and +what will be the resulting set of the procedure. For information on signature +specification API, take a look at `mg_procedure.h` file and read the +documentation on functions prefixed with `mgp_proc_`. + +The passed in `memory` argument is only alive throughout the execution of +`mgp_init_module`, so you must not allocate any global resources with it. If you +really need to set up a certain global state, you may do so in the +`mgp_init_module` using the standard global allocators. + +Consequently, you may want to reset any global state or release global resources +in the following function. + +```c +int mgp_shutdown_module() { + // Return 0 to indicate success. + return 0; +} +``` + +As mentioned before, no exceptions should leave your module. If you are writing +the module in a language that throws them, use exception handlers +in `mgp_init_module` and `mgp_shutdown_module` as well. + + +### Batched query procedures + +Similar to batched query procedures in Python, you can add batched query procedures in C. + +Batched procedures need 3 functions, one for each of batching, initialization, and cleanup. + +```c +static void batch(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_result *result, struct mgp_memory *memory) { + ... +} + +static void init(const struct mgp_list *args, const struct mgp_graph *graph, + struct mgp_memory *memory) { + ... +} + +static void cleanup() { + ... +} +``` + +The `batch` function receives a list of arguments (`args`) passed in the +query. The parameter `result` is used to fill in the resulting records of the +procedure. Parameters `graph` and `memory` are context parameters of the +procedure, and they are used in some parts of the provided C API. + +At some point, `batch` needs to return an empty `result` to signal that the `batch` procedure is done with execution and `cleanup` can be called. `init` doesn't receive `result` as it is only used for initialization. `init` function will receive same arguments which are registered and passed to the `batch` function. + +Memgraph ensures to call `init` before the `batch` function and `cleanup` at the end. The user directly invokes the `batch` function through OpenCypher. + +The argument passed in `memory` is only alive throughout the execution of +`mgp_init_module`, so you must not allocate any global resources with it. Consequently, you may want to reset any global state or release global resources +in the `cleanup` function. + +For more information on what exactly is possible with C API, take a look at the +`mg_procedure.h` file or the [C API reference +guide](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md). + +The following line contains the `mgp_init_module` function that registers procedures +that can be invoked through Cypher. Even though the example has only one +`procedure`, you can register multiple different procedures in a single module. + +Batch procedures are invoked using the `CALL . ...` syntax. The +`` will correspond to the name of the shared library. Since the example is complied into `example.so`, the module is called `example`. +As mentioned, Memgraph ensures to call `init` before `` and `cleanup` once `` signals end with an empty result. + +```c +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `procedure` as a read procedure with the name "procedure". + struct mgp_proc *proc = + mgp_module_add_batch_read_procedure(module, "procedure", batched, init, cleanup); + // Return non-zero on error. + if (!proc) return 1; + // Additional code for better specifying the procedure (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` + + +### Magic functions + +A major part of defining the "Magic function" is similar to query procedures. +The steps of defining a callback and registering arguments are repeated in the +magic functions, only with a different syntax. + +To define a function, the first step is to define a callback. The example only +shows C++ code. + +```cpp +namespace { +void function(const mgp_list *args, mgp_func_context *func_ctx, + mgp_func_result *result, mgp_memory *memory) { + try { + ... + } catch (const std::exception &e) { + // We must not let any exceptions out of our module. + mgp_func_result_set_error_msg(result, e.what(), memory); + return; + } +} +} +``` + +The parameter `args` is used to fetch the required and optional arguments from +the Cypher call. The parameter `result` defines the resulting value. It can +carry either an error or a return value, depending on the runtime execution. +There is no `mgp_graph` argument because the graph is immutable in functions. + +To initialize and register the written function as a magic function, one should +write the initialization in the `mgp_init_module`. The registered function can +then be called in similar fashion as the built-in functions, just with the +syntax defining the module it is stored in: `.(...)`. + +```cpp +int mgp_init_module(struct mgp_module *module, struct mgp_memory *memory) { + // Register our `function` as a Magic function with the name "function". + struct mgp_func *func = + mgp_module_add_function(module, "function", function); // Above defined function pointer + // Return non-zero on error. + if (!func) return 1; + // Additional code for better specifying the function with arguments (omitted here). + ... + // Return 0 to indicate success. + return 0; +} +``` \ No newline at end of file diff --git a/docs2/custom-query-modules/python/python.md b/docs2/custom-query-modules/python/python.md new file mode 100644 index 00000000000..431f68aa281 --- /dev/null +++ b/docs2/custom-query-modules/python/python.md @@ -0,0 +1,176 @@ +--- +id: create-a-new-module-python +title: How to create a query module in Python +sidebar_label: Create a Python query module +--- + +The [Python API](/memgraph/reference-guide/query-modules/api/python-api) +provided by Memgraph lets you develop query modules. It is accompanied by the +[mock API](https://memgraph.com/docs/memgraph/reference-guide/query-modules/api/mock-python-api), which +makes it possible to develop and test query modules for Memgraph without having +to run a Memgraph instance. + +In this tutorial, we will learn how to develop a query module in +Python on the example of the **random walk algorithm**. + +## Prerequisites + +There are three options for installing and working with Memgraph MAGE: + +1. **Pulling the `memgraph/memgraph-mage` image**: check the `Docker Hub` + [installation guide](/installation/docker-hub.md). +2. **Building a Docker image from the MAGE repository**: check the `Docker + build` [installation guide](/installation/docker-build.md). +3. **Building MAGE from source**: check the `Build from source on Linux` + [installation guide](/installation/source.md). + +## Developing a module + +:::note + +These steps are the same for all MAGE installation options (_Docker Hub_, +_Docker build_ and _Build from source on Linux_). + +::: + +Position yourself in the **MAGE repository** you cloned earlier. Specifically, +go in the `python` subdirectory and create a new file named `random_walk.py`. + +```plaintext +mage +└── python + └── random_walk.py + +``` + +For coding the query module, we’ll use the +[`mgp`](https://github.com/memgraph/mgp) package that has the Memgraph Python +API including the key graph data structures: +[**Vertex**](https://github.com/memgraph/mgp/blob/main/mgp.py#L260) and +[**Edge**](https://github.com/memgraph/mgp/blob/main/mgp.py#L182). +To install `mgp`, run `pip install mgp`. + +Here's the code for the random walk algorithm: + +```python +import mgp +import random + + +@mgp.read_proc +def get_path( + start: mgp.Vertex, + length: int = 10, +) -> mgp.Record(path=mgp.Path): + """Generates a random path of length `length` or less starting + from the `start` vertex. + + :param mgp.Vertex start: The starting node of the walk. + :param int length: The number of edges to traverse. + :return: Random path. + :rtype: mgp.Record(mgp.Path) + """ + path = mgp.Path(start) + vertex = start + for _ in range(length): + try: + edge = random.choice(list(vertex.out_edges)) + path.expand(edge) + vertex = edge.to_vertex + except IndexError: + break + + return mgp.Record(path=path) +``` + +The `get_path` is decorated with the `@mgp.read_proc` decorator, which tells +Memgraph it's a `read` procedure, meaning it won't make changes to the graph. +The path is created from the `start` node, and edges are appended to it +iteratively. + +:::info +If you need an additional Python library that is not already installed with +Memgraph, check out our [guide on how to install +it](/memgraph/how-to-guides/query-modules#how-to-install-external-python-libraries). +::: + +### Terminate procedure execution + +Just as the execution of a Cypher query can be terminated with [`TERMINATE +TRANSACTIONS +"id";`](/memgraph/reference-guide/transactions) query, +the execution of the procedure can as well, if it takes too long to yield a +response or gets stuck in an infinite loop due to unpredicted input data. + +Transaction ID is visible upon calling the SHOW TRANSACTIONS; query. + +In order to be able to terminate the procedure, it has to contain function +`ctx.check_must_abort()` which precedes crucial parts of the code, such as +`while` and `until` loops, or similar points where the procedure might become +costly. + +Consider the following example: + +```python +import mgp + +@mgp.read_proc +def long_query(ctx: mgp.ProcCtx) -> mgp.Record(my_id=int): + id = 1 + try: + while True: + if ctx.check_must_abort(): + break + id += 1 + except mgp.AbortError: + return mgp.Record(my_id=id) +``` + +The `mgp.AbortError:` ensures that the correct message about termination is sent +to the session where the procedure was originally run. + +## Importing, querying and testing a module + +Now in order to import, query and test a module, check out the [following +page](/mage/how-to-guides/run-a-query-module). + +Feel free to create an issue or open a pull request on our [GitHub +repo](https://github.com/memgraph/mage) to speed up the development.
+Also, don't forget to throw us a star on GitHub. :star: + +## Working with the mock API + +The +[mock Python API](https://memgraph.com/docs/memgraph/reference-guide/query-modules/api/mock-python-api) +lets you develop and test query modules for Memgraph without having to run a +Memgraph instance. As it’s compatible with the Python API you can add modules +developed with it to Memgraph as-is, without having to refactor your code. + +The documentation on importing the mock API and running query modules with it +is available +[here](https://memgraph.com/docs/memgraph/reference-guide/query-modules/api/mock-python-api#using-the-mock-api), +accompanied by examples. + +## Managing Memgraph's Python environment + +After some time, any production system requires updates for the packages it uses. For example, when developing a new query module that requires the latest `networkx` version. + +If Memgraph is already deployed somewhere with an installed `networkx` package, you would probably like to use some package manager, e.g. pip or conda, to delete the old `networkx`, and install a new `networkx` package. You definitely wouldn't want to redeploy the whole Memgraph just because of one Python package. + +However, Python caches all modules, packages and the compiled bytecode, so this procedure cannot work out of the box. So after installing the new package, you need to use the utility procedure `mg.load_all()`. + +So the whole process looks like this: + +Uninstall the old package: +```python +pip uninstall networkx +``` +Install a new package: +```python +pip install networkx= +``` +Reload all query modules: +```cypher +CALL mg.load_all(); +``` + diff --git a/docs2/custom-query-modules/python/understanding-music-with-modules.md b/docs2/custom-query-modules/python/understanding-music-with-modules.md new file mode 100644 index 00000000000..85338cc5c41 --- /dev/null +++ b/docs2/custom-query-modules/python/understanding-music-with-modules.md @@ -0,0 +1,459 @@ +--- +id: understanding-music-with-modules +title: Exploring a music social network +sidebar_label: Exploring a music social network +--- + +This article is part of a series intended to show users how to use Memgraph on +real-world data and, by doing so, retrieve some interesting and useful +information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md). + +## Introduction + +Getting useful information from your data is always challenging. With Memgraph, +you can manipulate and extract a huge amount of information by writing queries. +Memgraph even offers a set of built-in graph algorithms. You can use those +algorithms in your queries, extending your power over the data. But what if you +wanted to do more? + +At its core, Memgraph "simply" builds a graph from your data. Graphs were always +interesting to scientists and engineers because of their interesting properties +allowing you to represent a specific kind of data in an intuitive way which +makes extracting useful information much easier. A field called graph theory +emerged in mathematics, producing a great number of algorithms, metrics, and +other functions that are applied to the graphs. + +Memgraph allows you to use the underlying graph in your functions by using +`Query modules`, and those functions are called procedures. In this tutorial, +we'll see how easy it is to extend the capabilities of Memgraph using Python. It +will also show you how to use one of the most popular Python libraries for +graphs, called [NetworkX](https://networkx.github.io/), which contains an insane +amount of functions and algorithms for the graphs. + +To get started, sign up to [Memgraph Cloud](https://memgraph.com/cloud), create +a new instance and connect to it using in-browser Memgraph Lab. If you require +help, check out the [documentation on Memgraph Cloud](/memgraph-cloud). + +You can also install Memgraph using the `memgraph-platform` image by following +the [installment instructions](/installation/overview.mdx) for your OS. Once +Memgraph is up and running, connect to it using **Memgraph Lab**, a visual user +interface that you can also use from your browser at +[`http://localhost:3000`](http://localhost:3000) or [download as an +application](https://memgraph.com/lab). + +## Data model + +Social graphs is a relatively recent term. Simply said, it's a representation of +a social network. Social networks appear in various sites, e.g., Youtube, which +is primarily a site for watching videos, can be classified as a social network. +For this tutorial, we'll use data consisting of users of the music streaming +platform called [Deezer](https://www.deezer.com/). + + + +The data consists of around 50k Deezer users from Croatia, but we prepared a +subset of the dataset only composed of 2k users. Each user is defined by id and +a list of genres he loved. The edges represent the mutual friendship between the +users. You can find a more detailed explanation of the dataset on the +[GitHub](https://github.com/benedekrozemberczki/datasets#deezer-social-networks) +alongside many more similar datasets kindly provided by the same authors. + +## Importing the dataset + +To import the dataset navigate to the `Datasets` tab in the sidebar. From there, +choose the dataset `Music genres social network` and continue with the tutorial. + +## Example queries and procedures + +Memgraph comes with several built-in algorithms. This list is expanded by the +MAGE library, but if the algorithm you require is something completely +different, you can add it yourself as a **query module**. + +Let's create a custom query module! + +Go to the **Query Modules** section in Memgraph Lab and click on the *+ New +Module* button. Give it a name, such as *deezer_example* and *Create* it. A new +query module will be created with example procedures. Feel free to erase them +and copy the following code into it that will define a procedure called +`genre_count`: + +```python +import mgp + + +@mgp.read_proc +def genre_count(context: mgp.ProcCtx, + genre: str) -> mgp.Record(genre=str, count=int): + count = len( + [v for v in context.graph.vertices if genre in v.properties['genres']]) + return mgp.Record(genre=genre, count=count) +``` + +Click *Save* and you should be able to see the procedure and its signature as +*Detected procedures & transformations*. + +We can notice multiple things: + +- import of the `mgp` module which contains helper functions and types for + defining custom procedures +- `@mgp.read_proc` decorator which marks the function as a procedure +- every argument is annotated with a type +- result is defined as an object of `mgp.Record` which also has annotated types + of its members + +This example is probably not that interesting to you because we can get the same +result using the following query: + +```cypher +MATCH (n) +WITH n, "Pop" AS genre +WHERE genre IN n.genres +RETURN genre, count(n); +``` + +Let's try something more interesting. The genres are listed in the order the +users have added them. If we assume that users picked the genres in order of +preference, let's write a function that tells us in what percentage each genre +appears in top n places. Add the following code: + +```python +from collections import defaultdict + + +@mgp.read_proc +def in_top_n_percentage(context: mgp.ProcCtx, + n: int) -> mgp.Record(genre=str, + percentage=float, + size=int): + genre_count = defaultdict(lambda: {'total_count': 0, 'in_top_n_count': 0}) + + for v in context.graph.vertices: + for index, genre in enumerate(v.properties['genres']): + genre_count[genre]['total_count'] += 1 + genre_count[genre]['in_top_n_count'] += index < n + + def get_record(genre, counts): return mgp.Record( + genre=genre, + percentage=counts['in_top_n_count'] / counts['total_count'], + size=counts['total_count'] + ) + + return [get_record( + genre, + counts) for genre, + counts in genre_count.items()] +``` + +*Save and close* the window then move to the *Query Execution* section to use the +procedure. + +Let's see what we get: + +```cypher +CALL deezer_example.in_top_n_percentage(3) +YIELD * +WITH genre, percentage, size +WHERE size > 10 +RETURN genre, percentage, size +ORDER BY percentage DESC; +``` + +As said in the introduction, we want to use the power of the graphs to extract +some useful information from our data which would be otherwise impossible. Most +of those functions are complex and writing them from scratch would be tedious. +As every modern programmer, we'll just use a package that has everything we need +and more. To be precise, we'll be using `NetworkX` that has a huge amount of +utility functions and graph algorithms implemented. + +To use `NetworkX` algorithms we need to transform our graph to a type `NetworkX` +recognizes. In our case, we need to use an undirected graph `networkX.Graph`. To +make our lives easier, let's write a helper function that transforms Memgraph +graph to `networkX.Graph`. + +Go back to the *Query Modules* section, find the *deezer_example* query module, +click on the arrow on the right to see its details, then edit it by adding the +following code: + +```python +import networkx as nx +import networkx.algorithms as nxa +import itertools + + +def _create_undirected_graph(context: mgp.ProcCtx) -> nx.Graph: + g = nx.Graph() + + for v in context.graph.vertices: + context.check_must_abort() + g.add_node(v) + + for v in context.graph.vertices: + context.check_must_abort() + for e in v.out_edges: + g.add_edge(e.from_vertex, e.to_vertex) + + return g +``` + +Now let's get some information about the graph. As our data represents social +network we would like to know if it has [bridges](https://tinyurl.com/y3angsdb) +and we would like to calculate the [average +clustering](https://en.wikipedia.org/wiki/Clustering_coefficient). + +```python +@mgp.read_proc +def analyze_graph( + context: mgp.ProcCtx) -> mgp.Record( + average_clustering=float, + has_bridges=bool): + g = _create_undirected_graph(context) + return mgp.Record( + average_clustering=nxa.average_clustering(g), + has_bridges=nxa.has_bridges(g)) +``` + +*Save and close* the window then move to the *Query Execution* section to use the +procedure: + +```cypher +CALL deezer_example.analyze_graph() +YIELD *; +``` + +Another interesting property of a node in a graph is its +[centrality](https://en.wikipedia.org/wiki/Centrality). Centrality tells us how +important a node is for a graph. In our case, it would mean higher the +centrality, the more popular the user is. Let's find out which user is the most +popular in our network and take a peek at his/her music taste. We will use the +[betweenness centrality](https://en.wikipedia.org/wiki/Betweenness_centrality). +Edit the query module by adding the following code: + +```python +@mgp.read_proc +def betweenness_centrality( + context: mgp.ProcCtx) -> mgp.Record(node=mgp.Vertex, + centrality=mgp.Number): + g = _create_undirected_graph(context) + return [mgp.Record(node=node, centrality=centrality) + for node, + centrality in nxa.centrality.betweenness_centrality(g).items()] +``` + +Now let's take a look at the results: + +``` +CALL deezer_example.betweenness_centrality() +YIELD * +RETURN node.id, node.genres, centrality +ORDER BY centrality DESC +LIMIT 10; +``` + +:::info + +Calculating betweenness centrality for each node can take some time to finish. +The issue of the slow `NetworkX` implementations is something Memgraph tackled +by implementing a custom betweenness centrality algorithm within the MAGE library. + +::: + +For our last trick, let's try to locate communities inside our network. +Communities are a set of nodes that are densely connected. The goal of the +community detection algorithms can be nicely described with the next +visualization: ![](../data/community_detection_visualization.png) + +As for centrality, there are multiple algorithms for finding communities in a +graph. We will write a function that takes a method for calculating communities, +uses it to find the communities, and, optionally, calculates some metrics +specific to the graph partitioning so we can compare algorithms. To make things +more interesting, let's find out which genre is the most popular in the +community and return the percentage which tells us how many of the users have +that genre on their list. In the end, music is something that connects us! Edit +the query module by adding the following code: + +```python +def _get_communities( + context: mgp.ProcCtx, + community_function, + calculate_quality: bool): + g = _create_undirected_graph(context) + + communities = list(community_function(g)) + + if calculate_quality: + quality = { + 'coverage': nxa.community.quality.coverage(g, communities), + 'modularity': nxa.community.quality.modularity(g, communities), + 'performance': nxa.community.quality.performance(g, communities) + } + else: + quality = {} + + communities = [list(community) for community in communities] + + def get_community_info(community): + info = { + 'size': len(community), + } + + genre_count = defaultdict(lambda: 0) + for genre in itertools.chain( + *[user.properties["genres"] for user in community]): + genre_count[genre] += 1 + + if len(genre_count) != 0: + mpg = max( + genre_count.items(), + key=lambda item: item[1]) + + info['most_popular_genre'] = mpg[0] + info['most_popular_genre_percentage'] = mpg[1] / info['size'] + + return info + + return mgp.Record(communities=[get_community_info(c) + for c in communities], quality=quality) +``` + +Now that we have our function in place let's test some algorithms. We will be +checking out community detection using [greedy modularity maximization by +Clauset-Newman-Moore](https://networkx.github.io/documentation/latest/reference/algorithms/generated/networkx.algorithms.community.modularity_max.greedy_modularity_communities.html#networkx.algorithms.community.modularity_max.greedy_modularity_communities) +and [label +propagation](https://networkx.github.io/documentation/latest/reference/algorithms/generated/networkx.algorithms.community.label_propagation.label_propagation_communities.html#networkx.algorithms.community.label_propagation.label_propagation_communities). Edit the query module by adding the following code: + +```python +@mgp.read_proc +def greedy_modularity_communities( + context: mgp.ProcCtx, + calculate_quality: bool = False) -> mgp.Record( + communities=list, + quality=mgp.Map): + return _get_communities( + context, + nxa.community.greedy_modularity_communities, + calculate_quality) + + +@mgp.read_proc +def label_propagation_communities( + context: mgp.ProcCtx, + calculate_quality: bool = False) -> mgp.Record( + communities=list, + quality=mgp.Map): + return _get_communities( + context, + nxa.community.label_propagation_communities, + calculate_quality) +``` + +In the above snippet, we can notice an optional argument `calculate_quality` and +usage of the type `mgp.Map` which is provided by Memgraph. + +Let's see the results with: + +```cypher +CALL deezer_example.greedy_modularity_communities() +YIELD communities +UNWIND communities as community +WITH community +WHERE community.size > 10 +RETURN community.most_popular_genre, community.most_popular_genre_percentage, community.size +ORDER BY community.size DESC; +``` + +and + +```cypher +CALL deezer_example.label_propagation_communities() +YIELD communities +UNWIND communities AS community +WITH community +WHERE community.size > 10 +RETURN community.most_popular_genre, community.most_popular_genre_percentage, community.size +ORDER BY community.size DESC; +``` + +Your results should look something like this: +![](../data/community_genre_statistics.png) + +Hmm, `Pop` sure is popular. Let's ignore that genre in the code: + +```python +for genre in itertools.chain( + *[user.properties["genres"] for user in community]): + if genre != 'Pop': + genre_count[genre] += 1 +``` + +and call our procedures again for each algorithm. Well, people always liked to +dance! + +We saw the biggest communities in our network using two different methods. It's +hard to say which of the methods did better so let's check a couple of metrics +by calling the same procedure with `calculate_quality` set to true: + +```cypher +CALL deezer_example.greedy_modularity_communities(true) +YIELD communities, quality +RETURN quality; +``` + +and + +```cypher +CALL deezer_example.label_propagation_communities(true) +YIELD communities, quality +RETURN quality; +``` + +I think it should come as no surprise that an algorithm that maximizes +modularity has higher modularity... + +## Optimized NetworkX integration + +As noted before, we at Memgraph are aware of NetworkX's potential but the +performance for some functions isn't that good. We decided to tackle this +problem by writing a wrapper object for Memgraph's graph and with smarter usage +of NetworkX algorithms. To make things even more convenient, we decided to +implement procedures that call NetworkX functions with our graphs, so you have +out-of-the-box access to the graph algorithms. NetworkX contains a huge amount +of functions, and writing procedures for all of them require insane effort, so +don't blame us if some of the algorithms aren't available. We are always open to +any feedback, so if you think that an implementation for an algorithm is needed, +we will surely take that into account. + +To demonstrate performance improvement, we will calculate the betweenness +centrality again, this time by using Memgraph's procedure. + +To get access to the NetworkX procedures, start your Memgraph server without +modifying the query modules directory path. That way, the path will be set to +the default path, which contains `nxalg` module. + +Now let's call the procedure: + +```cypher +CALL nxalg.betweenness_centrality() +YIELD * +RETURN node.id, node.genres, betweenness +ORDER BY betweenness DESC +LIMIT 10; +``` + +You should get the same results as with our previous procedure for the +betweenness centrality but in much less time. + +## Further reading + +We encourage you to take a look at our [How to +Implement Query +Modules](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md) how-to guide. + +This tutorial showed you how with a little effort you can extend your control +over the data. Using packages like `NetworkX` you get a huge amount of already +implemented graph algorithms while Memgraph allows you complete access to its +internal graph. + +If you want to learn more about how to use Memgraph with NetworkX, check out the [**Memgraph for NetworkX developers resources**](https://memgraph.com/memgraph-for-networkx?utm_source=networkx-guide&utm_medium=referral&utm_campaign=networkx_ppp&utm_term=docs%2Btutorialmusic&utm_content=resources). diff --git a/docs2/data-migration/csv.md b/docs2/data-migration/csv.md new file mode 100644 index 00000000000..5e4780a34eb --- /dev/null +++ b/docs2/data-migration/csv.md @@ -0,0 +1,694 @@ +--- +id: csv +title: Import data from CSV files +sidebar_label: CSV +--- + +import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; + +If your data is in CSV format, you can import it into a running Memgraph +database from a designated CSV files using the `LOAD CSV` Cypher clause. The +clause reads row by row from a CSV file, binds the contents of the parsed row to +the variable you specified and populates the database if it is empty, or appends +new data to an existing dataset. Memgraph supports the Excel CSV dialect, as +it's the most common one. + +`LOAD CSV` clause cannot be used with a Memgraph Cloud instance because at the +moment it is impossible to make files accessible by Memgraph. + +:::tip + +If the data is importing slower than expected, you can [speed it +up](#increase-import-speed) by creating indexes or switching the storage mode to +analytical. + +If the import speed is still unsatisfactory, don't hesitate to contact us on +[Discord](https://discord.com/invite/memgraph). + +::: + +## Clause syntax + +The syntax of the `LOAD CSV` clause is: + +```cypher +LOAD CSV FROM ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER ] [QUOTE ] [NULLIF ] AS +``` + +- `` is a string of the location to the CSV file. Without a URL + protocol it refers to a file path. There are no restrictions on where in your + filesystem the file can be located, as long as the path is valid (i.e., the + file exists). If you are using Docker to run Memgraph, you will need to + [copy the files from your local directory into the Docker](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-Docker-container) + container where Memgraph can access them. If using `http://`, `https://`, or + `ftp://` the CSV file will be fetched over the network. + +- `( WITH | NO ) HEADER` flag specifies whether the CSV file has a header, in + which case it will be parsed as a map, or it doesn't have a header, in which + case it will be parsed as a list. + + If the **`WITH HEADER`** option is set, the very first line in the file will be + parsed as the header, and any remaining rows will be parsed as regular rows. The + value bound to the row variable will be a map of the form: + + ```plaintext + { ( "header_field" : "row_value" )? ( , "header_field" : "row_value" )* } + ``` + + If the **`NO HEADER`** option is set, then each row is parsed as a list of values. + The contents of the row can be accessed using the list index syntax. Note that + in this mode, there are no restrictions on the number of values a row contains. + This isn't recommended, as the user must manually handle the varying number of + values in a row. + +* `IGNORE BAD` flag specifies whether rows containing errors should be ignored + or not. If it's set, the parser attempts to return the first valid row from + the CSV file. If it isn't set, an exception will be thrown on the first + invalid row encountered. + +* `DELIMITER ` option enables the user to specify the CSV + delimiter character. If it isn't set, the default delimiter character `,` is + assumed. + +* `QUOTE ` option enables the user to specify the CSV quote + character. If it isn't set, the default quote character `"` is assumed. + +* `NULLIF ` option enables you to specify a sequence of + characters that will be parsed as null. By default, all empty columns in + Memgraph are treated as empty strings, so if this option is not used, no + values will be treated as null. + +* `` is a symbolic name representing the variable to which the + contents of the parsed row will be bound to, enabling access to the row + contents later in the query. The variable doesn't have to be used in any + subsequent clause. + +## Clause specificities + +When using the `LOAD CSV` clause please keep in mind: + +- **The parser parses the values as strings** so it's up to the user to convert + the parsed row values to the appropriate type. This can be done using the + built-in conversion functions such as `ToInteger`, `ToFloat`, `ToBoolean` etc. + Consult the documentation on [the available conversion + functions](/cypher-manual/functions). + + If all values are indeed strings and the file has a header, you can import + data using the following string: + + ```cypher + LOAD CSV FROM "/people.csv" WITH HEADER AS row + CREATE (p:People) SET p += row; + ``` + +- **The `LOAD CSV` clause is not a standalone clause**, which means that a valid query + must contain at least one more clause, for example: + + ```cypher + LOAD CSV FROM "/people.csv" WITH HEADER AS row + CREATE (p:People) SET p += row; + ``` + + In this regard, the following query will throw an exception: + + ```cypher + LOAD CSV FROM "/file.csv" WITH HEADER AS row; + ``` + +- Adding a `MATCH` or `MERGE` clause before the LOAD CSV allows you to match + certain entities in the graph before running LOAD CSV, which is an optimization + as matched entities do not need to be searched for every row in the CSV file. + + But, the `MATCH` or `MERGE` clause can be used prior the `LOAD CSV` clause only + if the clause returns only one row. Returning multiple rows before calling the + `LOAD CSV` clause will cause a Memgraph runtime error. + +- **The `LOAD CSV` clause can be used at most once per query**, so the queries like the one + below wll throw an exception: + + ```cypher + LOAD CSV FROM "/x.csv" WITH HEADER as x + LOAD CSV FROM "/y.csv" WITH HEADER as y + CREATE (n:A {p1 : x, p2 : y}); + ``` + +## Increase import speed + +The `LOAD CSV` clause will create relationships much faster, and consequently +speed up data import, if you [create indexes](/how-to-guides/indexes.md) on +nodes or node properties once you import them: + +```cypher + CREATE INDEX ON :Node(id); +``` + +If the LOAD CSV clause is merging data instead of creating it, create indexes +before running the LOAD CSV clause. + +You can also speed up import if you switch Memgraph to [**analytical storage +mode**](/reference-guide/storage-modes.md). In the analytical mode there are no +ACID guarantees besides manually created snapshots but it does **increase the +import speed up to 6 times with 6 times less memory consumption**. After import +you can switch the storage mode back to transactional and enable ACID +guarantees. + +You can switch between modes within the session using the following query: + +```cypher +STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL}; +``` + +When in the analytical storage mode, **don't** import data using multiple +threads. + +The LOAD CSV clause will handle CSV's which are compressed with `gzip` or `bzip2`. +This can speed up time it takes to fetch and/or load the file. + +## Examples + +Below, you can find two examples of how to use the LOAD CSV clause depending on +the complexity of your data: + + - [One type of nodes and relationships](#one-type-of-nodes-and-relationships) + - [Multiple types of nodes and relationships](#multiple-types-of-nodes-and-relationships) + +### One type of nodes and relationships + +Let's import a simple dataset from the `people_nodes` and `people_relationships` CSV files. + + + + +1. Download the CSV files: + + - [`people_nodes.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/one-type-nodes/with-header/people_nodes.csv) + file with the following content: + + ```plaintext + id,name + 100,Daniel + 101,Alex + 102,Sarah + 103,Mia + 104,Lucy + ``` + + - [`people_relationships.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/one-type-nodes/with-header/people_relationships.csv) + file with the following content: + + ```plaintext + id_from,id_to + 100,101 + 100,102 + 100,103 + 101,103 + 102,104 + ``` + + These CSV files have a header, which means the `HEADER` option of the `LOAD CSV` + clause needs to be set to `WITH`. Each row will be parsed as a map, and the + fields can be accessed using the property lookup syntax (e.g. `id: row.id`). + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will load row by row from the CSV file, and create a new + node for each row with properties based on the parsed row values: + + ```cypher + LOAD CSV FROM "/path-to/people_nodes.csv" WITH HEADER AS row + CREATE (p:Person {id: row.id, name: row.name}); + ``` + + If successful, you should receive an `Empty set (0.014 sec)` message. + + If you have a large dataset, it's beneficial to create indexes on a property + that will be used to connect nodes and relationships, in this case, the `id` + property. + + ```cypher + CREATE INDEX ON :Person(id); + ``` + +4. With the initial nodes in place, you can now create relationships between + them by importing the `people_relationships.csv` file: + + ```cypher + LOAD CSV FROM "/path-to/people_relationships.csv" WITH HEADER AS row + MATCH (p1:Person {id: row.id_from}), (p2:Person {id: row.id_to}) + CREATE (p1)-[:IS_FRIENDS_WITH]->(p2); + ``` + +
+ + +1. Download the CSV files: + - [`people_nodes.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/one-type-nodes/no-header/people_nodes.csv) + file with the following content: + + ```plaintext + 100,Daniel + 101,Alex + 102,Sarah + 103,Mia + 104,Lucy + ``` + + - [`people_relationships.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/one-type-nodes/no-header/people_relationships.csv) + file with the following content: + + ```plaintext + 100,101 + 100,102 + 100,103 + 101,103 + 102,104 + ``` + + These CSV files don't have a header, so the `HEADER` option of the `LOAD CSV` + needs to be set to `NO`. Each row will be parsed as a list, and you can access + elements by defining the position of the element in the list. + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will load row by row from the CSV file, and create a new + node for each row with properties based on the parsed row values: + + ```cypher + LOAD CSV FROM "/path-to/people_nodes.csv" NO HEADER AS row + CREATE (p:Person {id: row[0], name: row[1]}); + ``` + + If successful, you should receive an `Empty set (0.014 sec)` message. + + If you have a large dataset, it's beneficial to create indexes on a property + that will be used to connect nodes and relationships, in this case, the `id` + property. + + ```cypher + CREATE INDEX ON :Person(id); + ``` + +4. With the initial nodes in place, you can now create relationships between + them by importing the `people_relationships.csv` file:: + + ```cypher + LOAD CSV FROM "/path-to/people_relationships.csv" NO HEADER AS row + MATCH (p1:Person {id: row[0]}), (p2:Person {id: row[1]}) + CREATE (p1)-[:IS_FRIENDS_WITH]->(p2); + ``` + +
+
+ +
+ This is how the graph should look like in Memgraph after the import + Run the following query:
+ + MATCH p=()-[]-() RETURN p; + +

+ +

+
+ +
+
+ +____ + +### Multiple types of nodes and relationships + +In the case of a more complex graph, we have to deal with multiple node and +relationship types. + +
+ Let's say we want to create a graph like this: +
+ +
+
+ +We will create that graph by using `LOAD CSV` clause to import four CSV files. + + + + +1. Download the + [`people_nodes.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_nodes.csv) + file, content of which is: + ```csv + id,name,age,city + 100,Daniel,30,London + 101,Alex,15,Paris + 102,Sarah,17,London + 103,Mia,25,Zagreb + 104,Lucy,21,Paris + ``` + + These CSV files have a header, which means the `HEADER` option of the `LOAD CSV` + clause needs to be set to `WITH`. Each row will be parsed as a map, and the + fields can be accessed using the property lookup syntax (e.g. `id: row.id`). + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will load row by row from the file, and create a new node + for each row with properties based on the parsed row values: + + ```cypher + LOAD CSV FROM "/path-to/people_nodes.csv" WITH HEADER AS row + CREATE (n:Person {id: row.id, name: row.name, age: ToInteger(row.age), city: row.city}); + ``` + +
+ This is how the graph should look like in Memgraph after the import: + Run the following query:
+ + MATCH (p) RETURN p; + +

+ +

+
+ +
+
+ + 4. If you have a large dataset, it's beneficial to create indexes on a property + that will be used to connect nodes and relationships, in this case, the `id` + property. + + ```cypher + CREATE INDEX ON :Person(id); + ``` + +Now move on to the `people_relationships.csv` file. + +
+ + +Each person from the `people_nodes.csv` file is connected to at least one other +person by being friends. + +1. Download the +[`people_relationships.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_relationships.csv) +file, where each row represents one friendship and the year it started: + ```csv + first_person,second_person,met_in + 100,102,2014 + 103,101,2021 + 102,103,2005 + 101,104,2005 + 104,100,2018 + 101,102,2017 + 100,103,2001 + ``` + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will create relationships between the people nodes: + + ```cypher + LOAD CSV FROM "/path-to/people_relationships.csv" WITH HEADER AS row + MATCH (p1:Person {id: row.first_person}) + MATCH (p2:Person {id: row.second_person}) + CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2) + SET f.met_in = row.met_in; + ``` + +
+ This is how the graph should look like in Memgraph after the import: + Run the following query:
+ + MATCH p=()-[]-() RETURN p; + +

+ +

+
+ +
+
+ +Now move on to the `restaurants_nodes.csv` file. + +
+ + +1. Download the +[`restaurants_nodes.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_nodes.csv) +file that holds a list of restaurants people ate at: + + ```csv + id,name,menu + 200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie + 201,KFC,Fried Chicken;Fries;Chicken Bucket + 202,Subway,Ham Sandwich;Turkey Sandwich;Foot-long + 203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust + ``` + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will create new nodes for each restaurant: + + ```cypher + LOAD CSV FROM "/path-to/restaurants_nodes.csv" WITH HEADER AS row + CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu}); + ``` + +
+ This is how the graph should look like in Memgraph after the import: + Run the following query:
+ + MATCH (p) RETURN p; + +

+ +

+
+ +
+
+ +4. If you have a large dataset, it's beneficial to create indexes on a property + that will be used to connect nodes and relationships, in this case, the `id` + property. + + ```cypher + CREATE INDEX ON :Restaurant(id); + ``` + +Now move on to the `restaurants_relationships.csv` file. + +
+ + +1. Download the +[`restaurants_relationships.csv`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_relationships.csv) +file that contains a list of people and the restaurants they visited: + + ```csv + PERSON_ID,REST_ID,liked + 100,200,true + 103,201,false + 104,200,true + 101,202,false + 101,203,false + 101,200,true + 102,201,true + ``` + +2. Check the location of the CSV file. If you are working with Docker, copy the + files from your local directory into the Docker container where Memgraph can + access them. + +
+ Transfer CSV files into a Docker container + + **1.** Start your Memgraph instance using Docker. + + **2.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker + container: + + ``` + docker ps + ``` + + **3.** Copy a file from your current directory to the container with the + command: + + ``` + docker cp ./file_to_copy.csv :/file_to_copy.csv + ``` + + The file is now inside your Docker container, and you can import it using the + `LOAD CSV` clause. +
+ +3. The following query will create relationships between people and restaurants +where they ate: + + ```cypher + LOAD CSV FROM "/path-to/restaurants_relationships.csv" WITH HEADER AS row + MATCH (p1:Person {id: row.PERSON_ID}) + MATCH (re:Restaurant {id: row.REST_ID}) + CREATE (p1)-[ate:ATE_AT]->(re) + SET ate.liked = ToBoolean(row.liked); + ``` + +
+ This is how the graph should look like in Memgraph after the import: + Run the following query:
+ + MATCH p=()-[]-() RETURN p; + +

+ +

+
+ +
+
+ +Congratulations! You've imported all the CSV files! + +
+
\ No newline at end of file diff --git a/docs2/data-migration/cypherl.md b/docs2/data-migration/cypherl.md new file mode 100644 index 00000000000..f6c5694d1cc --- /dev/null +++ b/docs2/data-migration/cypherl.md @@ -0,0 +1,226 @@ +--- +id: cypherl +title: Importing Cypher queries (CYPHERL format) +sidebar_label: CYPHERL +pagination_prev: import-data/overview +--- +import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; + +If your data is in the form of Cypher queries (for example, `CREATE` and `MERGE` +clauses) within a **CYPHERL** file it can be imported via Memgraph Lab or +mgconsole. + +The benefit of importing data using the CYPHERL file is that you need only +one file to import both nodes and relationships. But it can be tricky to +actually write the queries for creating nodes and relationships yourself. If you +haven't written any queries yet, check our [Cypher manual](/cypher-manual). + +:::tip + +To speed up import time consider [creating indexes](/how-to-guides/indexes.md) +on appropriate nodes or node properties. + +::: + +## Importing via Memgraph Lab + +Once you Memgraph instance in running and you've connected to it via Memgraph +Lab, go to the **Import & Export** section. To **Import Data** select the +CYPHERL file or drag and drop it into Memgraph Lab. + +You can import up to 1 million nodes and relationships via Memgraph Lab using +the CYPHERL file. + + + +## Importing via mgconsole + + + + +If you installed and started Memgraph using **Docker**, follow these steps: + +1. Open a new terminal and check the Docker container ID by running `docker ps` +2. Then run the following command + + ``` + docker exec -i CONTAINER_ID mgconsole < queries.cypherl + ``` + +For more information about `mgconsole` options run: + +```console +docker exec -i CONTAINER_ID mgconsole --help +``` + + + + +Once Memgraph is running in **Linux**, Cypher queries are imported by running +[mgconsole](/connect-to-memgraph/mgconsole.md) in a non-interactive mode and +importing data saved in a CYPHERL file. + +You can import queries saved in e.g. `queries.cypherl` by issuing the following +shell command: + +```plaintext +mgconsole < queries.cypherl +``` + +For more information about `mgconsole` options run: + +```console +mgconsole --help +``` + + + +## Examples + +Below, you can find two examples of how to import data within the `.cypherl` file +based on the complexity of your data: + + - [One type of nodes and relationships](#one-type-of-nodes-and-relationships) + - [Multiple types of nodes and relationships](#multiple-types-of-nodes-and-relationships) + +### One type of nodes and relationships + +Let's import data from `queries.cypherl` file with the following content: + +```plaintext +CREATE (:Person {id: "100", name: "Daniel", age: 30, city: "London"}); +CREATE (:Person {id: "101", name: "Alex", age: 15, city: "Paris"}); +CREATE (:Person {id: "102", name: "Sarah", age: 101, city: "London"}); +CREATE (:Person {id: "103", name: "Mia", age: 25, city: "Zagreb"}); +CREATE (:Person {id: "104", name: "Lucy", age: 21, city: "Paris"}); +MATCH (u:Person), (v:Person) WHERE u.id = "100" AND v.id = "102" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "100" AND v.id = "103" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "101" AND v.id = "104" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "101" AND v.id = "102" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "102" AND v.id = "103" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "103" AND v.id = "101" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "104" AND v.id = "100" CREATE (u)-[:IS_FRIENDS_WITH]->(v); +``` + +The first five queries create nodes for people, and the rest of the queries create +relationships between these nodes. + + + + +If you installed Memgraph using Docker, open a new terminal, position yourself +in the directory where the CYPHERL file is located and run the following +commands: + +1. Check the Docker container ID by running `docker ps` +2. Run the following command + + ``` + docker exec -i CONTAINER_ID mgconsole < queries.cypherl + ``` + + + + +Running mgconsole in a non-interactive mode and importing data saved in a +CYPHERL file: + +```console +mgconsole < queries.cypherl +``` + + + + +
+ This is how the graph should look like in Memgraph after the import: +
+ +
+
+ +### Multiple types of nodes and relationships + +Let's import data from `queries.cypherl` file with the following content: + +```plaintext +CREATE (p:Person {id: "100", name: "Daniel", age: 30, city: "London"}); +CREATE (p:Person {id: "101", name: "Alex", age: 15, city: "Paris"}); +CREATE (p:Person {id: "102", name: "Sarah", age: 17, city: "London"}); +CREATE (p:Person {id: "103", name: "Mia", age: 25, city: "Zagreb"}); +CREATE (p:Person {id: "104", name: "Lucy", age: 21, city: "Paris"}); +CREATE (r:Restaurant {id: "200", name: "McDonalds", menu: "Fries BigMac McChicken Apple Pie"}); +CREATE (r:Restaurant {id: "201", name: "KFC", menu: "Fried Chicken Fries Chicken Bucket"}); +CREATE (r:Restaurant {id: "202", name: "Subway", menu: "Ham Sandwich Turkey Sandwich Foot-long"}); +CREATE (r:Restaurant {id: "203", name: "Dominos", menu: "Pepperoni Pizza Double Dish Pizza Cheese filled Crust"}); +MATCH (u:Person), (v:Person) WHERE u.id = "100" AND v.id = "103" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2014"}]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "101" AND v.id = "104" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2001"}]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "102" AND v.id = "100" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2005"}]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "102" AND v.id = "103" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2017"}]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "103" AND v.id = "104" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2005"}]->(v); +MATCH (u:Person), (v:Person) WHERE u.id = "104" AND v.id = "102" CREATE (u)-[:IS_FRIENDS_WITH {met_in: "2021"}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "100" AND v.id = "200" CREATE (u)-[:ATE_AT {liked: true}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "102" AND v.id = "202" CREATE (u)-[:ATE_AT {liked: false}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "102" AND v.id = "203" CREATE (u)-[:ATE_AT {liked: false}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "102" AND v.id = "200" CREATE (u)-[:ATE_AT {liked: true}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "103" AND v.id = "201" CREATE (u)-[:ATE_AT {liked: true}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "104" AND v.id = "201" CREATE (u)-[:ATE_AT {liked: false}]->(v); +MATCH (u:Person), (v:Restaurant) WHERE u.id = "101" AND v.id = "200" CREATE (u)-[:ATE_AT {liked: true}]->(v); +``` + +The first five queries create nodes for people, and the following four queries +create nodes for restaurants. The rest of the queries are used to define +relationships between nodes. As said before, you can define different types of +nodes and relationships in one file. + + + + +If you installed Memgraph using Docker, open a new terminal, position yourself +in the directory where the CYPHERL file is located and run the following +commands: + +1. Check the Docker container ID by running `docker ps` +2. Run the following command + + ``` + docker exec -i CONTAINER_ID mgconsole < queries.cypherl + ``` + + + + +Running mgconsole in a non-interactive mode and importing data saved in a +CYPHERL file: + +```console +mgconsole < queries.cypherl +``` + + + + +
+ This is how the graph should look like in Memgraph after the import: +
+ +
+
diff --git a/docs2/data-migration/data-migration.md b/docs2/data-migration/data-migration.md new file mode 100644 index 00000000000..99b4de875fe --- /dev/null +++ b/docs2/data-migration/data-migration.md @@ -0,0 +1,95 @@ +--- +id: data-migration +title: Data migration +sidebar_label: Data migration +--- + +What data do you want to import? + +- [CSV files](#csv-files) +- [JSON files](#json-files) +- [CYPHERL files](#cypherl-files) +- [Data from a stream](#data-from-a-stream) +- [Table data](#table-data) +- [Data from an application or a program](#data-from-an-application-or-a-program) +- [Parquet, ORC or IPC/Feather/Arrow file](#parquet-orc-or-ipcfeatherarrow-file) +- [NetworkX, PyG or DGL graph](#networkx-pyg-or-dgl-graph) + + +:::tip + +If you can choose the format of the data you want to import, the fastest way to +import data into Memgraph is from a CSV file using the [LOAD CSV +clause](/import-data/files/load-csv-clause.md). + +::: + +## CSV files + +To import data from CSV files into Memgraph, use the [**LOAD CSV +clause**](/import-data/files/load-csv-clause.md), which is used as a standard +Cypher clause, and can be invoked straight from a running Memgraph instance. + +## JSON files + +You can [import a **JSON** file into Memgraph](/import-data/files/load-json.md) +by using the **`json_util` query module**, which has procedures for loading JSON +file from a local file and from a remote address. + +You can also use the **`import_util.json` procedure** to import data from a +local JSON file, but the file needs to in a specific format defined by the +procedure. + +## CYPHERL files + +If your data is in the form of Cypher queries (`CREATE` and `MERGE` clauses) +within a **CYPHERL** file it can be [imported via Memgraph +Lab or mgconsole](/import-data/files/cypherl.md). + +## Data from a stream + +Memgraph comes with full streaming support, and you can connect directly to a +**Kafka**, **Redpanda** or **Pulsar** stream using [Cypher +queries](/import-data/data-streams/manage-streams.md) or [Memgraph +Lab](/import-data/data-streams/manage-streams-lab.md). + +## MySQL, PostgreSQL or Oracle table data + +You can migrate data from a [**MySQL**](/import-data/migrate/mysql.md) or +[**PostgreSQL**](/import-data/migrate/postgresql.md) database using the +[**`mgmigrate`** tool](https://github.com/memgraph/mgmigrate). + +Alternatively, you can use the [`migration` +module](/mage/query-modules/python/migrate) from the [MAGE graph +library](/mage) which allows you to access data from a MySQL database, an SQL +server or an Oracle database. + +## Data from an application or a program + +Memgraph offers a [**wide range of clients**](/connect-to-memgraph/drivers/overview.md) that can be used to connect directly to the platform and import data. + +## Parquet, ORC or IPC/Feather/Arrow file + +If you are a Python user you can import **Parquet**, **ORC** or **IPC/Feather/Arrow** file +into Memgraph [using **GQLAlchemy**](/gqlalchemy/how-to-guides/table-to-graph-importer). + +## NetworkX, PyG or DGL graph + +If you are a Python user you can import **NetworkX**, **PyG** or **DGL graph** into Memgraph +[using **GQLAlchemy**](/gqlalchemy/how-to-guides/import-python-graphs). + +## Where to next? + +You can also connect to streams and import data from CYPHERL files to an +instance running in [Memgraph Cloud](/memgraph-cloud). + +Memgraph uses two mechanisms to [ensure the durability of stored +data](/reference-guide/backup.md) and make disaster recovery possible: +write-ahead logging (WAL) and periodic snapshot creation. + +To learn more about the Cypher language, check out our [Cypher +manual](/cypher-manual) or [Memgraph +Playground](https://playground.memgraph.com/) for interactive guides. + +For real-world examples of how to use Memgraph, we strongly suggest going +through one of the available [tutorials](/tutorials/overview.md). diff --git a/docs2/data-migration/export-data.md b/docs2/data-migration/export-data.md new file mode 100644 index 00000000000..b3f800e5b42 --- /dev/null +++ b/docs2/data-migration/export-data.md @@ -0,0 +1,37 @@ +--- +id: overview +title: Export data +sidebar_label: Export data +slug: /export-data +--- + +Memgraph allows you to export all the data from the database, or results from an executed query. + +## Export database + +Export database to the following file formats: +- [CYPHERL using Memgraph Lab](/memgraph-lab/user-manual#import--export) +- [JSON using the `export_util.json` procedure](/mage/query-modules/python/export-util) from MAGE - graph algorithms and modules library. + +You can also export data to Elasticsearch and enable continuous data +synchronization using the [`elasticsearch_synchronization` query +module](/mage/query-modules/python/elasticsearch-synchronization) available in +MAGE - graph algorithms and modules library. + +## Export query results + +Query results can be exported to a CSV, TSV and JSON file [using Memgraph Lab](/memgraph-lab/user-manual#data-results). + +To export query results from Memgraph Lab: +1. Run a query or select results you want to export. +2. Click Export results and choose CSV. +3. Save the file locally. + +Results can also be exported to a CSV file using the [`export_util.csv_query()` +procedure](/mage/query-modules/python/export-util#csv_queryquery-file_path-stream) +from MAGE - graph algorithms and modules library. + +## Where to next? + +Now that you exported data, [import](/import-data/overview.md) it back into a +new Memgraph instance. \ No newline at end of file diff --git a/docs2/data-migration/json.md b/docs2/data-migration/json.md new file mode 100644 index 00000000000..51b290a5127 --- /dev/null +++ b/docs2/data-migration/json.md @@ -0,0 +1,199 @@ +--- +id: json +title: Import data from JSON files +sidebar_label: JSON +--- + +A JSON file is a file that stores simple data structures and objects in +JavaScript Object Notation format, which is a standard data interchange format. +The data you want to import to the database is often saved in JSON format, and +you might want to import parts of that data as graph objects - nodes or +relationships. + +Data can be imported using query modules implemented in [the MAGE library](/mage): +- [`json_util`](/mage/query-modules/python/json-util) query module +- [`import_util`](/mage/query-modules/python/import-util) query module. + +The difference is that `json_util.load_from_path()` has no requirements about +the formatting of data inside the JSON file, while the `import_util.json()` +procedure requires data to be formatted in a specific way. It is the same +formatting the `export_util.json()` procedure generates when it's used to export +data from Memgraph into a JSON file. + +
+ JSON file data format required by the import_util.json() procedure + + ```json +[ + { + "id": 6114, + "labels": [ + "Person" + ], + "properties": { + "name": "Anna" + }, + "type": "node" + }, + { + "id": 6115, + "labels": [ + "Person" + ], + "properties": { + "name": "John" + }, + "type": "node" + }, + { + "id": 6116, + "labels": [ + "Person" + ], + "properties": { + "name": "Kim" + }, + "type": "node" + }, + { + "end": 6115, + "id": 21120, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21121, + "label": "IS_FRIENDS_WITH", + "properties": {}, + "start": 6114, + "type": "relationship" + }, + { + "end": 6116, + "id": 21122, + "label": "IS_MARRIED_TO", + "properties": {}, + "start": 6115, + "type": "relationship" + } +] + ``` +
+ +To be able to call the procedures, you need to [install MAGE and load query +modules](/mage/how-to-guides/run-a-query-module). + +:::tip + +If you can choose the format of the data you want to import, the fastest way to +import data into Memgraph is from a CSV file using the [LOAD CSV +clause](/import-data/files/load-csv-clause.md). + +::: + +## Examples + +Below, you can find two examples of how to load data from a JSON file depending +on the file location: + +- [Load JSON from a local file](#load-json-from-a-local-file) +- [Load JSON from a remote address](#load-json-from-a-remote-address) + +### Load JSON from a local file + +To import data from a local JSON file, you can use the +[`json_util.load_from_path()`](/mage/query-modules/python/json-util) procedure +or [`import_util.json()`](/mage/query-modules/python/import-util) procedure. + +The difference is that `json_util.load_from_path()` has no requirements about +the formatting of data inside the JSON file while the `import_util.json()` +procedure does. It is the same formatting the `export_util.json()` procedure +generates when it's used to export data from Memgraph into a JSON file. + +#### `json_util.load_from_path()` procedure + +The `json_util.load_from_path()` procedure takes one string argument (`path`) +and returns a list of JSON objects from the file located at the provided path. + +Let's import data from a file `data.json` with the following content: + +```json +{ + "first_name": "Jessica", + "last_name": "Rabbit", + "pets": ["dog", "cat", "bird"] +} +``` + +If you are using Docker to run Memgraph, you will need to [copy the files from +your local directory into the +Docker](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-Docker-container) +container where Memgraph can access them. + +To create a node with the label `Person` and `first_name`, `last_name` and `pets` +as properties, run the following query: + +```cypher +CALL json_util.load_from_path("path/to/data.json") +YIELD objects +UNWIND objects AS o +CREATE (:Person {first_name: o.first_name, last_name: o.last_name, pets: o.pets}); +``` + +
+ After you execute the above query, the graph in Memgraph should look like this: +
+ +
+
+ +#### `import_util.json()` procedure + +To find out how to import data with the `import_util.json()` procedure [check +out the MAGE documentation](/mage/query-modules/python/import-util). + +### Load JSON from a remote address + +To import data from a remote JSON file, use `load_from_url(url)` procedure that +takes one string argument (`url`) and returns a list of JSON objects from the +file located at the provided URL. + +For example, at `"https://download.memgraph.com/asset/mage/data.json"`, you can +find the following `data.json` file: + +```json +{ + "first_name": "James", + "last_name": "Bond", + "pets": ["dog", "cat", "fish"] +} +``` + +To create a node with the label `Person` and `first_name`, `last_name` and +`pets` as properties from the `data.json` file. You can run the following query: + +```cypher +CALL json_util.load_from_url("https://download.memgraph.com/asset/mage/data.json") +YIELD objects +UNWIND objects AS o +CREATE (:Person {first_name: o.first_name, last_name: o.last_name, pets: o.pets}); +``` + +
+ After you run the above query, the graph in Memgraph should look like this: +
+ +
+
+ +:::note + +To load JSON files from another local or remote location, just replace the +argument of the procedure with the appropriate path or URL. If you want to +create a different kind of graph, you need to change the query accordingly. To +learn more about querying, check out the [Cypher manual](/cypher-manual). + +::: diff --git a/docs2/data-migration/migrate-from-neo4j.md b/docs2/data-migration/migrate-from-neo4j.md new file mode 100644 index 00000000000..ead519b0169 --- /dev/null +++ b/docs2/data-migration/migrate-from-neo4j.md @@ -0,0 +1,508 @@ +--- +id: migrate-from-neo4j +title: Migrate from Neo4j to Memgraph +sidebar_label: Migrate from Neo4j +--- + +import EmbedYTVideo from '@site/src/components/EmbedYTVideo'; + +Memgraph is a native in-memory graph database specialized for real-time +use-cases such us streaming, analytical processing etc. It uses Cypher query +language and Bolt protocol. This means that you can use the same tools and +drivers that you are already using with Neo4j. Due to the ACID compliance, data +persistency and replication support in community version, Memgraph can be used +as main database for your applications, instead of Neo4j. + +This tutorial is also available as a video: + + +
+ +## Prerequisites + +To follow this tutorial, you will need to have the following: + +- Running Neo4j instance (with your data, or use the sample data provided) +- [Latest `memgraph/memgraph-platform` Docker image](https://memgraph.com/download) + +## Data schema + +One of the first steps to consider is how to migrate your data. If you have your +data in the form of [Cypher queries](/import-data/files/cypherl.md) or +[CSV](/import-data/files/load-csv-clause.md) or +[JSON](/import-data/files/load-json.md) format, you can import these formats +into Memgraph. Keep in mind that for importing larger datasets it is recommended +to use CSV format or pure Cypher queries (Memgraph's CYPHERL format), since they +can be imported into Memgraph natively, faster than JSON format. + +This tutorial will go through exporting data from Neo4j into CSV files and +importing it into Memgraph using [LOAD CSV](/import-data/files/load-csv-clause.md) +query and Memgraph's user visual interface [Memgraph Lab](/memgraph-lab). + +The sample dataset consists of 3 different kinds of nodes (Employee, Order and +Product) connected with 3 types of relationships as described by the graph +schema below: + + + +To create this graph in your Neo4j instance run the following queries: + +```cypher +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/orders.csv' AS column +MERGE (order:Order {orderID: column.OrderID}) + ON CREATE SET order.shipName = column.ShipName; + +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/products.csv' AS column +MERGE (product:Product {productID: column.ProductID}) + ON CREATE SET product.productName = column.ProductName, product.unitPrice = toFloat(column.UnitPrice); + +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/employees.csv' AS column +MERGE (e:Employee {employeeID:column.EmployeeID}) + ON CREATE SET e.firstName = column.FirstName, e.lastName = column.LastName, e.title = column.Title; + +CREATE INDEX product_id FOR (p:Product) ON (p.productID); +CREATE INDEX product_name FOR (p:Product) ON (p.productName); +CREATE INDEX employee_id FOR (e:Employee) ON (e.employeeID); +CALL db.awaitIndexes(); + +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/orders.csv' AS column +MATCH (order:Order {orderID: column.OrderID}) +MATCH (product:Product {productID: column.ProductID}) +MERGE (order)-[op:CONTAINS]->(product) + ON CREATE SET op.unitPrice = toFloat(column.UnitPrice), op.quantity = toFloat(column.Quantity); + +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/orders.csv' AS column +MATCH (order:Order {orderID: column.OrderID}) +MATCH (employee:Employee {employeeID: column.EmployeeID}) +MERGE (employee)-[:SOLD]->(order); + +LOAD CSV WITH HEADERS FROM 'https://gist.githubusercontent.com/jexp/054bc6baf36604061bf407aa8cd08608/raw/8bdd36dfc88381995e6823ff3f419b5a0cb8ac4f/employees.csv' AS column +MATCH (employee:Employee {employeeID: column.EmployeeID}) +MATCH (manager:Employee {employeeID: column.ReportsTo}) +MERGE (employee)-[:REPORTS_TO]->(manager); +``` + +If you are going to use different dataset to migrate, be aware of the +differences between Neo4j and [Memgraph data +types](/reference-guide/data-types.md) (for example, Memgraph doesn't support +`DateTime()` as there is no temporal type in Memgraph that supports timezones yet, +but you can modify data to use `localDateTime()`). + +## Exporting data from Neo4j + +Download the CSV file +[shipping.csv](https://public-assets.memgraph.com/import-data/load-csv-cypher/shipping.csv) +containing the data above if you don't want to go through the exporting process. + +To get your data out of Neo4j instance, use the Neo4j APOC export functionality. +To install APOC, select the project, then in the right-side menu select *Plugins +-> APOC* and press install. + + + +Then enable export by setting the configuration flag `apoc.export.file.enabled` +to `true` in the `apoc.config` file located in the `config` directory. To open +the directory, select the active project, click on *...* -> *Open folder* -> +*Configuration*. + +Export the data into a CSV file using: + +```cypher +CALL apoc.export.csv.all("shipping.csv", {}) +``` + +Once exported, the file is located in Neo4j's *Import* folder. To open it, +select the active project, click on *...* -> *Open folder* -> *Import*. + + + +## Importing data into Memgraph + +Now that the CSV file containing the needed data has been generated, let's +import data into Memgraph. + +As the original location of file is quite cumbersome, relocate it somewhere +more accessible. + +### 1. Starting Memgraph with Docker + +When working with Docker, the file need to be transferred from your local +directory into the Docker container where Memgraph can access it. + +This can be done by copying the file into your running instance. + +1. Run Memgraph with + + ``` + docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform + ``` + +2. To copy the file inside the container, open a new terminal to find out the + `CONTAINER ID` with `docker ps`, then run: + + ``` + docker cp /path_to_local_folder/shipping.csv :/usr/lib/memgraph/shipping.csv + ``` + + If the container ID is `bed1e5c9192d` and the file is locally located at + `C:/Data` the command would look like this: + + ``` + docker cp C:/Data/shipping.csv bed1:/usr/lib/memgraph/shipping.csv + ``` + +3. To check if the files are inside the container, first run: + + ``` + docker exec -it CONTAINER_ID bash + ``` + +4. List the files inside the `/usr/lib/memgraph`. + + ```nocopy + C:\Users\Vlasta>docker ps + CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES + bed1e5c9192d memgraph/memgraph-platform "/bin/sh -c '/usr/bi…" 2 minutes ago Up 2 minutes 0.0.0.0:3000->3000/tcp, 0.0.0.0:7444->7444/tcp, 0.0.0.0:7687->7687/tcp recursing_blackburn + + C:\Users\Vlasta>docker cp C:/Data/shipping.csv bed1:/usr/lib/memgraph/shipping.csv + + C:\Users\Vlasta>docker exec -it bed1 bash + root@bed1e5c9192d:/# ls /usr/lib/memgraph + auth_module memgraph python_support query_modules shipping.csv + root@bed1e5c9192d:/# + ``` + +### 2. Gaining speed with indexes and analytical storage mode + +Although the dataset imported in this tutorial is quite small, one day you might +want to import really big datasets with billions of nodes and relationships and +you will require all the extra speed you can get. + +To gain speed you can [create indexes](/reference-guide/indexing.md) on the +properties used to connect nodes with relationships which are the values in the +`_id` column in the CSV files, and in Memgraph they will be named `nodeID`. + +**To create indexes, run: ** + +```cypher +CREATE INDEX ON :Employee(nodeID); +CREATE INDEX ON :Order(nodeID); +CREATE INDEX ON :Product(nodeID); +``` + +You can also change the [storage mode](/reference-guide/storage-modes.md) from +`IN_MEMORY_TRANSACTIONAL` to `IN_MEMORY_ANALYTICAL`. This will disable the +creation of durability files (snapshots and WAL files) and you will no longer +have any ACID guarantees. Other transactions will be able to see the changes of +ongoing transactions. Also, transaction will be able to see the changes they are +doing. This means that the transactions can be committed in random orders, and +the updates to the data, in the end, might not be correct. + +But, if you import on one thread, batch of data after a batch of data, there +should be absolutely no issues, and you will gain 6 times faster import with 6 +times less memory consumption. + +After import you can switch back to the `IN_MEMORY_TRANSACTIONAL` storage mode or +continue running analytics queries (only read queries) in the +`IN_MEMORY_ANALYTICAL` mode to continue benefiting from low memory consumption. + +To switch between modes, run the following queries on a running instance: + +```cypher +STORAGE MODE IN_MEMORY_ANALYTICAL; +STORAGE MODE IN_MEMORY_TRANSACTIONAL; +``` + +To check the current storage mode, run: + +```cypher +SHOW STORAGE INFO; +``` + +**Change the storage mode to analytical before import.** + +```cypher +STORAGE MODE IN_MEMORY_ANALYTICAL; +``` + +### 3. Importing nodes + +To import nodes using a LOAD CSV clause let's examine the clause syntax: + +```cypher +LOAD CSV FROM "csv-file-path.csv" ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER ] [QUOTE ] AS +``` + +The file is now located at `/usr/lib/memgraph/shipping.csv` and it has a header +row. There is no need to ignore bad rows, the default deliminator is `,` and +the default quote character `"`, the same as in the exported CSV file, so no +changes are necessary. + +The first row of the LOAD CSV clause therefore looks like this: + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +``` + +Nodes are always imported before relationships so they will be imported first. + +The `shipping.csv` file contains the following columns important for node +creation: `_id`, `labels`, `employeeID`, `firstName`, `lastName`, `orderID`, +`productID`, `productName`, `shipName`, `title`, `unitPrice`. + +The `_id` property is actually an internal node ID needed to create +relationships later on. + +Execute queries in Memgraph Lab. Open your browser and go to +`http://localhost:3000/`, **Connect now** to the instance and go to the **Query +Execution** section. + +#### Employee nodes + +Begin with `Employee` nodes. + +After establishing the location and format of the CSV file, filter out the rows +that contain the label `:Employee`: + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._labels = ':Employee' +``` + +Then, create nodes with a certain label and properties. As an example, let's +look at the property `_id`. To add the property to the node, define its name in +Memgraph and assigned the value of a specific column in the CSV file. + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._labels = ':Employee' +CREATE (e:Employee {nodeID: row._id}) +``` + +So `nodeID: row._id` part of the query instructs Memgraph to create a property +named `nodeID` and assign it the value paired with key `_id`. First created +node will be assigned the value from the first data row, second node from the +second data row, etc. + +**After matching up the keys and values for all properties, the finished query looks like this:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._labels = ':Employee' +CREATE (e:Employee {nodeID: row._id, employeeID: row.employeeID, firstName: row.firstName, lastName: row.lastName, title: row.title}); + +MATCH (e:Employee) +RETURN e; +``` + +The second query will show all 9 created nodes. + +Copy the query in the **Cypher Editor** and **Run Query**. + + + +#### Order nodes + +Relevant properties for the `Order` nodes are `_id`, `orderID` and `shipName`. + +**To create `Order` nodes run the following query:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._labels = ':Order' +CREATE (o:Order {nodeID: row._id, orderID: row.orderID, shipName: row.shipName}); + +MATCH (o:Order) +RETURN o; +``` + +The second query will show all 830 created nodes: + + + +#### Product nodes + +Relevant properties for the `Product` nodes are `_id`, `productID`, `productName` +and `unitPrice`. + +As the parser parses all the values as strings, and the `unitPrice` are numbers, +they need to be converted to appropriate data type. + +**To create `Product` nodes run the following query:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._labels = ':Product' +CREATE (p:Product {nodeID: row._id, productID: row.productID, productName: row.productName, unitPrice: ToFloat(row.unitPrice)}); + +MATCH (p:Product) +RETURN p; +``` + +The second query will show all 77 created nodes: + + + +### 4. Graph improvements + +At this point it would be nice to improve the look of the nodes visually. At the +moment, nodes in the graph are represented with their labels, but it would be +more useful if their name attribute was written. + +To adjust the look of the graph using Graph Style Language, open the Graph Style +Editor. Find the following code block: + +``` +@NodeStyle HasProperty(node, "name") { + label: AsText(Property(node, "name")) +} +``` + +It defines that if the node has the property `name`, its label on the visual +graph will be that property. + +As none of the imported nodes have the property `name`, this part of the code +needs to be adjusted to use the properties nodes do have. + +Replace those three lines of code with the following block and **Apply** the +changes: + +``` +@NodeStyle HasProperty(node, "firstName") { + label: AsText(Property(node, "firstName")) +} + +@NodeStyle HasProperty(node, "orderID") { + label: AsText(Property(node, "orderID")) +} + +@NodeStyle HasProperty(node, "productName") { + label: AsText(Property(node, "productName")) +} +``` + + + +Visual appearance of the graph can be changed in many different ways, so be sure +to check the [GSS documentation](/memgraph-lab/graph-style-script-language). + +### 5. Importing relationships + +Now that all the 916 nodes have been imported, they can be connected with relationships. + +The first row of the LOAD CSV remains the same: + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +``` + +The `shipping.csv` file contains the following values important for relationship +creation: `_type`, `_start`, `_end`, `quantity` and `unitPrice`. + +The `_type` defines relationships type, `_start` and `_end` values define which +nodes need to be connected based on their ID. + +#### :REPORTS_TO relationships + +Begin with `:REPORTS_TO` relationship. + +After establishing the location and format of the CSV file, filter out the rows +that contain the type `REPORTS_TO`: + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._type = 'REPORTS_TO' +``` + +Now instruct Memgraph to find the nodes with certain IDs in order to create a +relationship between them. As node IDs are unique we can just define that any +node with a certain ID is a starting point, and another node with a certain ID +is the end point of the relationship type `REPORTS_TO`. + +**The LOAD CSV query creates 8 `:REPORTS_TO` relationships:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._type = 'REPORTS_TO' +MATCH (n {nodeID: row._start}), (n2 {nodeID: row._end}) +CREATE (n)-[:REPORTS_TO]->(n2); + +MATCH p=()-[:REPORTS_TO]->() +RETURN p; +``` + +The second query returns all the nodes connected with the `REPORTS_TO` type of +relationship. + + + +#### :SOLD relationships + +**The LOAD CSV query creates 830 `:SOLD` relationships:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._type = 'SOLD' +MATCH (n {nodeID: row._start}), (n2 {nodeID: row._end}) +CREATE (n)-[:SOLD]->(n2); + +MATCH p=()-[:SOLD]->() +RETURN p; +``` + +The second query returns all the nodes connected with the `SOLD` type of +relationship. + + + +#### :CONTAINS relationships + +This relationship type has properties about the `quantity` of products one order +contains. + +As the parser parses all the values as strings, and the value of this +relationship property are numbers, they need to be converted to appropriate +type. + +**The LOAD CSV query creates 2155 `CONTAINS` relationships:** + +```cypher +LOAD CSV FROM "/usr/lib/memgraph/shipping.csv" WITH HEADER AS row +WITH row WHERE row._type = 'CONTAINS' +MATCH (n {nodeID: row._start}), (n2 {nodeID: row._end}) +CREATE (n)-[:CONTAINS {quantity: ToInteger(row.quantity)}]->(n2); + +MATCH p=()-[:CONTAINS]->() +RETURN p; +``` + +The second query returns all the nodes connected with the `CONTAINS` type of +relationship. + + + +## After import + +Once all the 916 nodes and 2993 relationships have been imported decide whether +you want to switch back to the transactional storage mode or not. Remember that +the analytical storage mode you are using right now has no ACID compliance. + +To switch back to the analytical storage mode, run: + +```cypher +STORAGE MODE IN_MEMORY_TRANSACTIONAL; +``` + +To check the switch was successful, run: + +```cypher +SHOW STORAGE INFO; +``` + +You can query the database using the [**Cypher query +language**](/cypher-manual), use various graph algorithms and modules from our +open-source repository [**MAGE**](/mage) to solve graph analytics problems, +create awesome customized visual displays of your nodes and relationships with +[**Graph Style Script**](/memgraph-lab/graph-style-script-language), find out +how to connect any [**streams of data**](/memgraph/import-data/kafka) you might +have with Memgraph and above all - enjoy your new graph database! diff --git a/docs2/data-migration/migrate-from-rdbms.md b/docs2/data-migration/migrate-from-rdbms.md new file mode 100644 index 00000000000..33c9963f6e2 --- /dev/null +++ b/docs2/data-migration/migrate-from-rdbms.md @@ -0,0 +1,368 @@ +--- +id: migrate-from-rdbms +title: Migrate from RDBMS to Memgraph +sidebar_label: Migrate from RDBMS +--- + +This tutorial will help you import your data from a MySQL database into Memgraph +using CSV files on Windows 10. + +In two of our blog posts, we've explained the [differences between relational +and graph +database](https://memgraph.com/blog/graph-database-vs-relational-database) and +listed the [benefits of graph +databases](https://memgraph.com/blog/the-benefits-of-using-a-graph-database-instead-of-sql).
+In summary, instead of tables, graph databases use nodes connected by +relationships. Graph databases are an excellent choice if the data is highly +connected, you need to retrieve it often and the data model is not set in stone. +So if you need a quick and reliable database in which you can quickly and +effortlessly change the data model and properties, a graph database is the way +to go. + +## Prerequisites + +To follow along, you will need: + +- An installation of **Memgraph Platform**, a streaming graph application + platform that includes **MemgraphDB**, a visual user interface **Memgraph + Lab**, command-line interface **mgconsole** and **MAGE**, a graph algorithms + and modules library.
+ To install Memgraph Platform and set it up, please follow the Docker + installation instructions on the [Installation + guide](/installation/overview.mdx). +- (optional) A running relational database either with your own schema and data + or you can use the schema we used and populate the tables + +## Data Model + +We will learn how to import data from a relational database to Memgraph using +the example of an online store. The data model of the relational database that +we will use for this tutorial includes 5 tables with the following properties: + + + +## Migrate data using CSV files + +### 1. Export the data from a table to a CSV file + +To begin, you need to export the existing data into CSV files table by table +either using the *Export Wizard* or by running a query. + +**Exporting data using the Export Wizard** + +In this example, we are using the *Export Wizard* in the *MySQL Workbench*. To +export the **Customer** table, right-click on the table name and select the +**Table Data Export Wizard**. + + + +Click **Next** and on the second step of the Wizard do the following: + +1. Define the **File Path**. Usually, you can choose any location, but for this + tutorial place the files in the root and name the file the same as the + table. +2. Select the **csv** format if it isn't already selected. +3. Select comma as a **Field Separator**. +4. Leave the **Line Separator** as **LF**. +5. Delete the quotations from the **Enclose Strings** option and leave it + empty. + +Continue clicking **Next** until **Finish**. + + + +In the root folder of your computer, you should find the **customer.csv** file. +When opened in a text editor or a spreadsheet program, the data from the +**customer** table should look like this: + +```csv +id,name,email +1,Amos Burton,amos.burton@mail.em +2,Chrisjen Avasarala,cavasarala@mail.em +3,James Holden,james.holden@mail.em +4,Alex Kamal,akamal@post.com +5,Camina Drummer,cdrummer@post.com +6,Marco Inaros,marco.inaros@post.com +7,Naomi Nagata,naomi.nagata@post.com +8,Julie Mao,jmao@post.com +``` + +**Exporting data by running a query** + +You can also export data by writing a query, but the data can be exported only +to a specific location you can learn by running the following query: + +```sql +SHOW VARIABLES LIKE "secure_file_priv"; +``` + +I got this value as a response `'secure_file_priv', 'C:\ProgramData\MySQL\MySQL +Server 8.0\Uploads\'` which I can now use as a destination for my CSV file. + + + +Check that you've selected the database you want to export data from as your +default one. If the database is selected the name is bolded. If it is not, +double-click on it. + +To export the **customerpurchase** table, execute the query below. Notice how +we changed the backslashes into slashes to avoid getting an error. You can also +write double backslashes: + +```sql +SELECT 'id', 'idcustomer', 'idpurchase' +UNION +SELECT +id, +idcustomer, +idpurchase +FROM customerpurchase INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/customerpurchase.csv' +FIELDS TERMINATED BY ',' +LINES TERMINATED BY '\r\n'; +``` + +In the first line we defined the headings, and then selected fields from a table +that will be exported to a specified field. We also defined the comma sign as a +fields terminator and lines will be terminated by `\r\n`. + +**Exported CSV files** + +Export the rest of the tables using the preferred process and place all the CSV +files in the root directory. + +Below are the CSV files we exported from our relational database. Feel free to +download them, place them in the root directory and use them for the rest of +this tutorial. + +To place the files in the root directory you need Admin rights on your computer. + +- [`customer.csv`](https://public-assets.memgraph.com/tutorials/rdbms-migration-to-memgraph/customer.csv) +- [`customerpurchase.csv`](https://public-assets.memgraph.com/tutorials/rdbms-migration-to-memgraph/customerpurchase.csv) +- [`product.csv`](https://public-assets.memgraph.com/tutorials/rdbms-migration-to-memgraph/product.csv) +- [`productpurchase.csv`](https://public-assets.memgraph.com/tutorials/rdbms-migration-to-memgraph/productpurchase.csv) +- [`purchase.csv`](https://public-assets.memgraph.com/tutorials/rdbms-migration-to-memgraph/purchase.csv) + +### 2. Transfer CSV files into a Docker container + +Now we need to copy the CSV files from your local directory into the Docker +container so Memgraph can access them. + +1. Start your Memgraph instance by writing the following command in a terminal: + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform +``` + +2. Open a new terminal and find the CONTAINER ID of the Memgraph Docker + container: + +``` +docker ps +``` + +3. Place yourself in the root directory and copy files into the container with + the following command. You should replace CONTAINER ID and for each file + change the source and destination path: + +``` +docker cp source.csv :/destination.csv +``` + +On my computer, the CVS files we need are located in the root directory of the +Windows 10 OS, and the CONTAINER ID is `bbbc43620e5c`. + +First I place myself in the root directory: + +```terminal +cd C:\ +``` + +Then I ran 5 commands to copy the 5 CSV files to the container, changing the +file paths in both the source and destination with each new file. This is an +example of copying the `customer.csv` file: + +```terminal +docker cp customer.csv bbbc43620e5c:/customer.csv +``` + +To check if the files have indeed been copied run the following command but be +sure to replace the CONTAINER ID: + +```terminal +docker exec -it bbbc43620e5c bash +``` + +And then use the `ls` command to list all the files and directories in +the container's root. You should be able to see the CSV files we just copied to +the container. + +```terminal +root@bbbc43620e5c:/# ls +bin customer.csv dev home lib lib64 media opt product.csv purchase.csv run srv supervisord.pid tmp var +boot customerpurchase.csv etc lab lib32 mage mnt proc productpurchase.csv root sbin supervisord.log sys usr +``` + +### 3. Run Memgraph Lab + +If you installed Memgraph Platform correctly, you should be able to access +Memgraph Lab in your browser by visiting +[`http://localhost:3000/`](http://localhost:3000) and connect to the database. + +Place yourself in the **Query** tab where we will write queries in the **Query +editor** to import data into Memgraph. + + + +### 4. Import nodes into Memgraph + +As we already mentioned, graph databases do not use tables to store data, but +nodes with relationships that connect them. If you take a look at the data model +we were using in the relational database we can describe it with a single +sentence: "Customers make purchases of product." + +Nodes would be the customers, purchases and products while the relationship +between them is that customers MAKE purchases (`customerpurchase` table) OF +product (`productpurchase` table). + +So let's start by importing the nodes into Memgraph using the `LOAD CSV` Cypher +clause. The syntax of the LOAD CSV clause is: + +```cypher +LOAD CSV from "/file.csv" +WITH HEADER AS row +CREATE (n:nodeName {property1_memgraph_name: row.property1_relational_name, property2_memgraph_name: row.property2_relational_name}); +``` + +So first we need to define the source file path and set the `HEADER` option to +`WITH` because our CSV file has headers. The clause will parse each `row` and +create nodes with properties. This is the clause to create `customer` nodes. +Copy it and paste it in the **Query editor** in **Memgraph Lab**, then click +**Run query**: + +```cypher +LOAD CSV from "/customer.csv" +WITH HEADER AS row +CREATE (c:Customer {id: row.id, name: row.name, email: row.email}); +``` + +Switch to the **Overview** tab to confirm we have created 8 new nodes of +customers from our CSV file. Let's repeat the process to create nodes for +purchases. + +If we do not define the data type of a property, it will be a string. That is +why we defined the date of purchase as a `Date` type: + +```cypher +LOAD CSV from "/purchase.csv" +WITH HEADER AS row +CREATE (p:Purchase {id: row.id, date: Date(row.date)}); +``` + +For node `product`, we'll import products' price as a `float`: + +```cypher +LOAD CSV from "/product.csv" +WITH HEADER AS row +CREATE (pr:Product {id: row.id, brand: row.brand, name: row.name, price: ToFloat(row.price)}); +``` + +You should have 24 nodes imported into your graph database. You can list all the +nodes to check their properties by using this Cypher query: + +```cypher +MATCH (n) +RETURN n; +``` + + + +If you click on each node, you can see its properties. The nodes are still not +connected to each other, so let's focus on that by importing the rest of the CSV +files. + +### 5. Import relationships into Memgraph + +We've imported CSV files containing data about customers, purchases and +products. In our graph database, they are represented as nodes. Now we need to +show the relationships those nodes have with each other. + +Relationships are defined by data in the `customerpurchase` and +`productpurchase` tables and CSV files. If you open the `customerpurchase.csv` +file you can see it is actually connecting two different nodes, customer and +purchase, via their IDs. That is why we'll use the LOAD CSV clause to match +those IDs with existing nodes and create a relationship between them. In this +example, the relationship is that a customer MADE a purchase. The arrow of the +relationship defines that a customer makes the purchase and not the other way +around. And lastly, we are defining that the row with the `customerpurchase` ID +is actually the ID of the `:MADE` relationship. + +```cypher +LOAD CSV FROM "/customerpurchase.csv" WITH HEADER AS row +MATCH (c:Customer {id: row.idcustomer}) +MATCH (p:Purchase {id: row.idpurchase}) +CREATE (c)-[m:MADE]->(p) +SET m.id = row.id; +``` + +Running this query made 12 new relationships between customers and purchases. +Let's now create relationships between products and purchases. Notice how we +defined the quantity data type as integer. Once this last query is run, you +should have 24 nodes and 29 relationships (edges). + +```cypher +LOAD CSV FROM "/productpurchase.csv" WITH HEADER AS row +MATCH (pr:Product {id: row.idproduct}) +MATCH (p:Purchase {id: row.idpurchase}) +CREATE (p)-[o:OF]->(pr) +SET o.id = row.id +SET o.quantity = ToInteger(row.quantity); +``` + + + +### 6. Data model and updating the schema + +The data model in a graph database now looks like this: + + + +If you decided you want to add a property to any of the nodes or relationships +you can do so at any point without disrupting the schema. + +Let's add the `city` property to customer 4: + +```cypher +MATCH (c:Customer {id: "4"}) +SET c.city = "Zagreb" +RETURN c +``` + +You can check if this property has been added by running the following query and +clicking on the node in the **Graph** view: + +```cypher +MATCH (c:Customer {id: "4"}) +RETURN c +``` + +As the last step of this tutorial let's check all the nodes and relationships +we've imported into Memgraph by running the following query: + +```cypher +MATCH (c)-[m]-(p)-[o]-(pr) +RETURN c,m,p,o,pr; +``` + + + +## Where to next? + +Congratulations! You now have a graph database. You can query it using the +[**Cypher query language**](/cypher-manual), use various graph algorithms and +modules from our open-source repository [**MAGE**](/mage) to solve graph +analytics problems, create awesome customized visual displays of your nodes and +relationships with [**Graph Style Script**](/memgraph-lab/graph-style-script-language), +find out how to connect any [**streams of data**](/memgraph/import-data/kafka) +you might have with Memgraph and above all - enjoy your graph database! diff --git a/docs2/data-migration/sql.md b/docs2/data-migration/sql.md new file mode 100644 index 00000000000..343cd99ad4c --- /dev/null +++ b/docs2/data-migration/sql.md @@ -0,0 +1,224 @@ +## Migrate MySQL database to Memgraph + +### Prerequisites + +* A running **[MySQL](https://www.mysql.com/)** instance with the database you wish to migrate. +* A running **[Memgraph](https://memgraph.com/product)** instance where you want to migrate the data. +* The **[mgmigrate](https://github.com/memgraph/mgmigrate)** tool installed. + Installation instructions can be found + [here](https://github.com/memgraph/mgmigrate). + +### Dataset + +To show you how to migrate data from MySQL to Memgraph we will be working with a +MySQL database named `users_db` that contains two tables, `users` and +`user_relationships`: + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +The `users` table contains four users with their ids and names: + +```console +mysql> SELECT * FROM users; ++----+------+ +| id | name | ++----+------+ +| 0 | Anna | +| 1 | Josh | +| 2 | Lisa | +| 3 | Troy | ++----+------+ +``` + + + + +The `user_relationships` table contains the relationships between users: + +```console +mysql> SELECT * FROM user_relationships; ++----------+----------+ +| user_one | user_two | ++----------+----------+ +| 0 | 1 | +| 2 | 3 | ++----------+----------+ +``` + + + + +_____ + +### Migrating + +**1.** You can migrate this database into Memgraph by running: + +```console +build/src/mgmigrate --source-kind=mysql / + --source-host 127.0.0.1 / + --source-port 33060 / + --source-username root / + --source-password mysql / + --source-database=users_db / + --destination-host 127.0.0.1 / + --destination-port 7687 / + --destination-use-ssl=false +``` + +**2.** Run the following query in **[Memgraph Lab](https://memgraph.com/product/lab)** or **[mgconsole](/connect-to-memgraph/mgconsole.md)** to see the results: + +```cypher +MATCH (n)-[r]-(m) RETURN n,r,m; +``` + +The query results should be: + + + + +```console +memgraph> MATCH (n)-[r]-(m) RETURN n,r,m; ++--------------------------------+--------------------------------+--------------------------------+ +| n | r | m | ++--------------------------------+--------------------------------+--------------------------------+ +| (:users {id: 1, name: "Josh"}) | [:user_relationships] | (:users {id: 0, name: "Anna"}) | +| (:users {id: 0, name: "Anna"}) | [:user_relationships] | (:users {id: 1, name: "Josh"}) | +| (:users {id: 3, name: "Troy"}) | [:user_relationships] | (:users {id: 2, name: "Lisa"}) | +| (:users {id: 2, name: "Lisa"}) | [:user_relationships] | (:users {id: 3, name: "Troy"}) | ++--------------------------------+--------------------------------+--------------------------------+ +``` + + + + +![memgraph-docs-mgmigrate-results](../../data/import-data/memgraph-docs-mgmigrate-results.png) + + + + +## Migrate data from PostgreSQL to Memgraph + +### Prerequisites + +* A running **[PostgreSQL](https://www.postgresql.org/)** instance with the database you wish to migrate. +* A running **[Memgraph](https://memgraph.com/product)** instance where you want to migrate the data. +* The **[mgmigrate](https://github.com/memgraph/mgmigrate)** tool installed. + Installation instructions can be found + [here](https://github.com/memgraph/mgmigrate). + +### Dataset + +For this tutorial, we will be working with a PostgreSQL database named `users_db` +that contains two tables, `users` and `user_relationships`: + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +The `users` table contains four users with their ids and names: + +```console +users_db=# SELECT * FROM "users"; + id | name +----+------ + 0 | Anna + 1 | Josh + 2 | Lisa + 3 | Troy +``` + + + + +The `user_relationships` table contains the relationships between users: + +```console +users_db=# SELECT * FROM user_relationships; + user_one | user_two +----------+---------- + 0 | 1 + 2 | 3 +``` + + + + +____ + +### Migrating + +**1.** You can migrate this database into Memgraph by running: + +```console +build/src/mgmigrate --source-kind=postgresql / + --source-host 127.0.0.1 / + --source-port 5432 / + --source-username postgres / + --source-password postgres / + --source-database=users_db / + --destination-host 127.0.0.1 / + --destination-port 7687 / + --destination-use-ssl=false +``` + +**2.** Run the following query in **[Memgraph Lab](https://memgraph.com/product/lab)** or **[mgconsole](/connect-to-memgraph/mgconsole.md)** to see the results: + +```cypher +MATCH (n)-[r]-(m) RETURN n,r,m; +``` + +The query results should be: + + + + +```console +memgraph> MATCH (n)-[r]-(m) RETURN n,r,m; ++--------------------------------+--------------------------------+--------------------------------+ +| n | r | m | ++--------------------------------+--------------------------------+--------------------------------+ +| (:users {id: 1, name: "Josh"}) | [:user_relationships] | (:users {id: 0, name: "Anna"}) | +| (:users {id: 0, name: "Anna"}) | [:user_relationships] | (:users {id: 1, name: "Josh"}) | +| (:users {id: 3, name: "Troy"}) | [:user_relationships] | (:users {id: 2, name: "Lisa"}) | +| (:users {id: 2, name: "Lisa"}) | [:user_relationships] | (:users {id: 3, name: "Troy"}) | ++--------------------------------+--------------------------------+--------------------------------+ +``` + + + + +![memgraph-docs-mgmigrate-results](../../data/import-data/memgraph-docs-mgmigrate-results.png) + + + \ No newline at end of file diff --git a/docs2/data-streams/data-streams.md b/docs2/data-streams/data-streams.md new file mode 100644 index 00000000000..4cb8213e19a --- /dev/null +++ b/docs2/data-streams/data-streams.md @@ -0,0 +1,319 @@ +--- +id: overview +title: Streams +sidebar_label: Streams overview +slug: /reference-guide/streams +--- + +Memgraph can connect to existing Kafka, Redpanda, and Pulsar sources to ingest +the data, which you can then query with the power of MAGE algorithms or your own +custom procedures. + +[![Related - +Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/graph-stream-processing-with-kafka.md) [![Related - How to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/import-data/data-streams/overview.md) + +To use streams, a user must: + +1. [Create a transformation + module](/reference-guide/streams/transformation-modules/overview.md#creating-a-transformation-module) +2. [Load the transformation + module](/reference-guide/streams/transformation-modules/overview.md#loading-modules) into + Memgraph +3. [Create the stream](#create-a-stream) with a `CREATE STREAM` query and optionally [set its offset](#setting-a-stream-offset) with + `CALL mg.kafka_set_stream_offset(stream_name, offset)` +4. [Start the stream](#start-a-stream) with a `START STREAM` query + +You can write Python transformation modules, create and start streams using the +**Stream** section in the Memgraph Lab, [check out +how](/import-data/data-streams/manage-streams-lab.md). + +:::tip + +Check out the **example-streaming-app** on +[GitHub](https://github.com/memgraph/example-streaming-app) to see a sample +Memgraph-Kafka application. + +::: + +## Create a stream + +The syntax for creating a stream depends on the type of the source because each +specific type supports a different set of configuration options. + +There is no strict order for specifying the configuration options. + +### Kafka and Redpanda + +```cypher +CREATE KAFKA STREAM + TOPICS [, , ...] + TRANSFORM + [CONSUMER_GROUP ] + [BATCH_INTERVAL ] + [BATCH_SIZE ] + [BOOTSTRAP_SERVERS ] + [CONFIGS { : [, : , ...]}] + [CREDENTIALS { : [, : , ...]}]; +``` + +| Option | Description | Type | Example | Default | +| :---------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------: | :--------------------------------: | :---------: | +| stream name | Name of the stream in Memgraph | plain text | my_stream | / | +| topic | Name of the topic in Kafka | plain text | my_topic | / | +| transform procedure | Name of the transformation file followed by a procedure name | function | my_transformation.my_procedure | / | +| consumer group | Name of the consumer group in Memgraph | plain text | my_group | mg_consumer | +| batch interval duration | Maximum waiting time in milliseconds for consuming messages before calling the transform procedure | int | 9999 | 100 | +| batch size | Maximum number of messages to wait for before calling the transform procedure | int | 99 | 1000 | +| bootstrap servers | Comma-separated list of bootstrap servers | string | "localhost:9092" | / | +| configs | String key-value pairs of configuration options for the Kafka consumer | map with string key-value pairs | {"sasl.username": "michael.scott"} | / | +| credentials | String key-value pairs of configuration options for the Kafka consumer, but their value aren't shown in the Kafka specific stream information | map with string key-value pairs | {"sasl.password": "password"} | / | + +:::warning + +The credentials are stored on the disk without any encryption, which means +everybody who has access to the data directory of Memgraph can get the +credentials. + +::: + +To check the list of possible configuration options and their values, please +check the documentation of +[librdkafka](https://github.com/edenhill/librdkafka/blob/v1.7.0/CONFIGURATION.md) +library, which is used in Memgraph. At the time of writing this documentation +Memgraph uses version 1.7.0 of librdkafka. + +### Pulsar + +```cypher +CREATE PULSAR STREAM + TOPICS [, , ...] + TRANSFORM + [BATCH_INTERVAL ] + [BATCH_SIZE ] + [SERVICE_URL ]; + +``` + +| Option | Description | Type | Example | Default | +| :---------------------: | :------------------------------------------------------------------------------------------------: | :--------: | :----------------------------: | :-----: | +| stream name | Name of the stream in Memgraph | plain text | my_stream | / | +| topic | Name of the topic in Pulsar | plain text | my_topic | / | +| transform procedure | Name of the transformation file followed by a procedure name | function | my_transformation.my_procedure | / | +| batch interval duration | Maximum waiting time in milliseconds for consuming messages before calling the transform procedure | int | 9999 | 100 | +| batch size | Maximum number of messages to wait for before calling the transform procedure | int | 99 | 1000 | +| service url | URL to the running Pulsar cluster | string | "pulsar://127.0.0.1:6650" | / | + +The transformation procedure is called if either the `BATCH_INTERVAL` or the +`BATCH_SIZE` is reached, and at least one message is received. + +The `BATCH_INTERVAL` starts when the: + +- the stream is started +- the processing of the previous batch is completed +- the previous batch interval ended without receiving any messages + +After each message is processed, the stream will acknowledge them. If the stream +is stopped, the next time it starts, it will continue processing the message from +the last acknowledged message. + +The user who executes the `CREATE` query is the owner of the stream. + +**Memgraph Community** doesn't support authentication and authorization, so the +owner is always `Null`, and the privileges are not checked. + +In **Memgraph Enterprise**, owner privileges are checked upon executing the +queries returned from the transformation procedures. If the owner doesn't have +the required privileges, the execution of the queries will fail. Find more +information about how the owner affects the stream in the [reference +guide](reference-guide/security.md#owners). + +## Start a stream + +The following query will start a specific stream with name `` to +consume `` number of batches for a maximum duration of `` +milliseconds. + +```cypher +START STREAM [BATCH_LIMIT ] [TIMEOUT ]; +``` + +The stream will automatically stop after consuming the given number of batches +or reaching the timeout. If `` number of batches are not processed within +the specified `TIMEOUT`, probably because not enough messages was received, an +exception is thrown. `TIMEOUT` is measured in milliseconds, and its default +value is 30000. It can only be used in combination with the `BATCH_LIMIT` +option. + +If `BATCH_LIMIT` (and `TIMEOUT`) is not provided, the `` stream +will run for an infinite number of batches without a timeout limit. + +```cypher +START STREAM ; +``` + +The following query will start all streams for an infinite number of batches and +without a timeout limit. + +```cypher +START ALL STREAMS; +``` + +When a stream is started, it resumes ingesting data from the last committed +offset. If no offset is committed for the consumer group, the largest +offset will be used. Therefore, only the new messages will be consumed. + +## Stop a stream + +The following queries stop a specific stream or all streams. + +```cypher +STOP STREAM ; +``` + +```cypher +STOP ALL STREAMS; +``` + +## Delete a stream + +The following query drops a stream with the name ``. + +```cypher +DROP STREAM ; +``` + +## Show streams + +To show streams, use the following query: + +```cypher +SHOW STREAMS; +``` + +It shows a list of existing streams with the following information: + +- stream name +- stream type +- batch interval +- batch size +- transformation procedure name +- the owner of the streams +- whether the stream is running or not + +## Check stream + +To perform a dry-run on the stream and get the results of the transformation, +use the following query: + +```cypher +CHECK STREAM [BATCH_LIMIT ] [TIMEOUT ]; +``` + +The `CHECK STREAM` clause will do a dry-run on the `` stream with +`` number of batches and return the result of the transformation, that +is, the queries and parameters that would be executed in a normal run. If +`` number of batches are not processed within the specified `TIMEOUT`, +probably because not enough messages were received, an exception is thrown. + +The default value of `` is 1. `TIMEOUT` is measured in milliseconds, and +its default value is 30000. + +## Get stream information + +To get more information about a specific Kafka or Redpanda stream, use the +following query: + +```cypher +CALL mg.kafka_stream_info("stream_name") YIELD *; +``` + +This procedure will return information about the bootstrap server, set +configuration, consumer group, credentials, and topics. + +To get more information about a specific Pulsar stream, use the +following query: + +```cypher +CALL mg.pulsar_stream_info("stream_name") YIELD *; +``` + +The procedure will return the service URL and topics. + +## Kafka producer delivery semantics + +In stream processing, it is important to consider how failures are handled. When +connecting an external application such as Memgraph to a Kafka stream, there are +two possible ways to handle failures during message processing: + +1. Every message is processed **at least once**: the message offsets are + committed to the Kafka cluster after processing. If the committing fails, the + messages can get processed multiple times. +2. Every message is processed **at most once**: the message offsets are + committed to the Kafka cluster right after they are received before the + processing is started. If the processing fails, the same messages + won't be processed again. + +Missing a message can result in missing an edge that would connect two +independent components of a graph. Therefore, the general opinion in Memgraph is +that missing some information is a bigger problem in graphs databases than +having duplicated information, so Memgraph uses **at least once** semantics, +i.e., the queries returned by the transformations are first executed and +committed to the database for every batch of messages, and only then is the +message offset committed to the Kafka cluster. + +However, even though Memgraph cannot guarantee **exactly once** semantics, it +tries to minimize the possibility of processing messages multiple times. This +means committing the message offsets to the Kafka cluster happens right after +the transaction is committed to the database. + +## Configuring stream transactions + +A stream can fail for various reasons. One important type of failure is when a +transaction (in which the returned queries of the transformation are executed) +fails to commit because of another conflicting transaction. This is a side +effect of [isolation levels](/reference-guide/transactions.md#isolation-levels) and can be +remedied by the following Memgraph flag: + +``` +--stream-transaction-conflict-retries=TIMES_TO_RETRY +``` + +By default, Memgraph will always try to execute a transaction once. However, for +streams, if Memgraph fails because of transaction conflicts, it will retry to +execute the transaction again for up to `TIMES_TO_RETRY` times (default value is +30). + +Moreover, the interval of retries is also important and can be configured with +the following Memgraph flag: + +``` +--stream-transaction-retry-interval=INTERVAL_TIME +``` + +The `INTERVAL_TIME` is measured in `milliseconds` and the default value is +`500ms`. + +## Setting a stream offset + +When using a Kafka stream, you can manually set the offset of the next consumed +message with a call to the query procedure `mg.kafka_set_stream_offset`: + +```cypher +CALL mg.kafka_set_stream_offset(stream_name, offset) +``` + +| Option | Description | Type | Example | Default | +| :---------: | :--------------------------------------: | :----: | :---------: | :-----: | +| stream_name | Name of the stream to set the offset for | string | "my_stream" | / | +| offset | Offset number | int | 0 | / | + +- An offset of `-1` denotes the start of the stream, i.e., the beginning offset + available for the given topic/partition. +- An offset of `-2` denotes the end of the stream, i.e., for each + topic/partition, its logical end such that only the next produced message will + be consumed. + +Stream can consume messages from multiple topics with multiple partitions. +Therefore, when setting the offsets to an arbitrary number be aware that setting +the offset of a stream internally sets all of the associated offsets of that +stream (topics/partitions) to that value. diff --git a/docs2/data-streams/graph-stream-processing-with-kafka.md b/docs2/data-streams/graph-stream-processing-with-kafka.md new file mode 100644 index 00000000000..fa8f8a7f992 --- /dev/null +++ b/docs2/data-streams/graph-stream-processing-with-kafka.md @@ -0,0 +1,314 @@ +--- +id: graph-stream-processing-with-kafka +title: Graph stream processing with Kafka and Memgraph +sidebar_label: Graph stream processing with Kafka +--- + +In this tutorial, you will learn how to connect Memgraph to an existing Kafka +stream using Memgraph Lab, and transform data into graph database objects to +analyze it in real-time. + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/streams/overview.md) + +If you are still very new to streaming, feel free to first read some of our blog +posts about the topic to learn [what stream processing +is](https://memgraph.com/blog/introduction-to-stream-processing), [how it +differs from batch +processing](https://memgraph.com/blog/batch-processing-vs-stream-processing) and +[how streaming databases work](https://memgraph.com/blog/streaming-databases). + +Now that you've covered theory, let's dive into practice! + +We will focus on processing real-time movie ratings that are streamed through +MovieLens Kafka stream from the [Awesome Data +Stream](https://awesomedata.stream/#/movielens) using Memgraph Lab and the +Cypher query language. + +## Prerequisites + +To follow this tutorial, you will need: + +- [Memgraph Platform](/installation/overview.mdx) or [Memgraph Cloud](https://cloud.memgraph.com) + +You can use Memgraph Cloud for a 2-week trial period, or you can install +Memgraph Platform locally. + +## Data stream + +For this tutorial, we will use MovieLens Kafka stream from the [Awesome Data +Stream](https://awesomedata.stream/#/movielens). MovieLens data stream streams +movie ratings, and each JSON message represents a new movie rating: + +```nocopy +"userId": "112", +"movie": { + "movieId": "4993", + "title": "Lord of the Rings: The Fellowship of the Ring, The (2001)", + "genres": ["Adventure", "Fantasy"] +}, +"rating": "5", +"timestamp": "1442535783" +``` + +## 1. Prepare Memgraph + +Let's open Memgraph Lab, where we will write the transformation module and +connect to the stream. + +If you have successfully installed Memgraph Platform, you should be able to open +Memgraph Lab in a browser at [`http://localhost:3000/`](http://localhost:3000/). + +If you are using Memgraph Cloud, open the running instance, and open the +**Connect via Client** tab, then click on **Connect in Browser** to open +Memgraph Lab in a new browser tab. Enter your project password and **Connect Now**. + +## 2. Create a transformation module + +The prerequisite of connecting Memgraph to a stream is to have a transformation +module with procedures that can produce Cypher queries based on the received +messages. Procedures can be written in +[Python](/reference-guide/streams/transformation-modules/api/python-api.md) or +[C](/reference-guide/streams/transformation-modules/api/c-api.md) languages. If you +need more information about what transformation modules are, please read our [reference +guide on transformation modules](/reference-guide/streams/transformation-modules/overview.md). + +Memgraph Lab allows you to develop Python transformation modules in-app: + +1. Navigate to **Query Modules**. Here you can see all the query modules + available in Memgraph, such as utility modules or query modules from the MAGE + library. Here you will also be able to check out and edit any transformation + modules you develop while using Memgraph. + + + +2. Click on the **+ New Module** button, give the new module name `movielens` + and create the module. + +3. Memgraph Lab creates sample procedures you can erase, so you have a clean + slate for writing the `movielens` transformation module. + + + +### Python API + +Python API is defined in the `mgp` module you can find in the Memgraph +installation directory `/usr/lib/memgraph/python_support`. In essence, Python +API is a wrapper around the C API, and at the beginning of each new module, you +need to import the `mgp`. As the messages from the streams are coming as JSON +messages, you need to `import json` module for Memgraph to read them correctly. +Below the imported modules, you need to define the `@mgp.transformation` +decorator, which handles data coming from streams. + +Python API also defines `@mgp.read_proc` and `@mgp.write_proc` decorators. +`@mgp.read_proc` decorator handles read-only procedures, the `@mgp.write_proc` +decorator handles procedures that also write to the database and they are used +in [writing custom query +modules](/tutorials/implement-custom-query-module-in-python.md). + +```python +import mgp +import json + +@mgp.transformation +``` + +Now you are ready to write the function that will transform JSON messages to +graph entities. + +### Transformation function + +First, define the function `rating` that will receive a list of messages and return +queries that will be executed in Memgraph as any regular query in order to +create nodes and relationships, so the signature of the function looks like this: + +```python +import mgp +import json + +@mgp.transformation +def rating(messages: mgp.Messages + ) -> mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] +``` + +Now you need to iterate through each message within the batch, decode it with +`json.loads` and save the elements of the message in the `movie_dict` variable. + +```python +import mgp +import json + +@mgp.transformation +def rating(messages: mgp.Messages + ) -> mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message = messages.message_at(i) + movie_dict = json.loads(message.payload().decode('utf8')) +``` +Now, you create the queries that will execute in Memgraph. You instruct Memgraph +to create `User`, `Movie` and `Genre` nodes, then connect the nodes with +appropriate relationships. In each query, you also define the entity properties. + +```python +import mgp +import json + +@mgp.transformation +def rating(messages: mgp.Messages + ) -> mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message = messages.message_at(i) + movie_dict = json.loads(message.payload().decode('utf8')) + result_queries.append( + mgp.Record( + query=("MERGE (u:User {id: $userId}) " + "MERGE (m:Movie {id: $movieId, title: $title}) " + "WITH u, m " + "UNWIND $genres as genre " + "MERGE (m)-[:OF_GENRE]->(:Genre {name: genre}) " + "MERGE (u)-[r:RATED {rating: ToFloat($rating), timestamp: $timestamp}]->(m)"), + +``` + +Once you set the placeholders, you can fill them out by applying the values +from the messages to the node and relationship properties, and return the +queries. + + +```python +import mgp +import json + +@mgp.transformation +def rating(messages: mgp.Messages + ) -> mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message = messages.message_at(i) + movie_dict = json.loads(message.payload().decode('utf8')) + result_queries.append( + mgp.Record( + query=("MERGE (u:User {id: $userId}) " + "MERGE (m:Movie {id: $movieId, title: $title}) " + "WITH u, m " + "UNWIND $genres as genre " + "MERGE (m)-[:OF_GENRE]->(:Genre {name: genre}) " + "MERGE (u)-[r:RATED {rating: ToFloat($rating), timestamp: $timestamp}]->(m)"), + parameters={ + "userId": movie_dict["userId"], + "movieId": movie_dict["movie"]["movieId"], + "title": movie_dict["movie"]["title"], + "genres": movie_dict["movie"]["genres"], + "rating": movie_dict["rating"], + "timestamp": movie_dict["timestamp"]})) + + return result_queries +``` + +Congratulations, you just created your first transformation procedure! Save it +and you should be able to see transformation `rating() -> ()` among the +**Detected procedures & transformations**. + + + +You can now **Save and Close** the module to get an overview of the module that +lists procedures and their signature. + + + +## 3. Create a stream + +To add a stream in Memgraph Lab: + +1. Switch to **Streams** and **Add New Stream**. +2. Choose Kafka stream type, enter stream name `movielens`, server address +`get.awesomedata.stream:9093`, and topics `rating` as instructed on the [Awesome +Data Stream](https://awesomedata.stream/#/movielens) +3. Go to the **Next Step**. +4. Click on **Edit** (pencil icon) to modify the *Consumer Group* to the one +written on the [Awesome Data Stream](https://awesomedata.stream/#/movielens). As +the streams are public, consumer groups need to be unique. + +The stream configuration should look something like this: + + + +## 4. Add a transformation module + +To add the `movielens` Python transformation module you developed earlier to a stream: + +1. Click on **Add Transformation Module**. +2. Click on **Choose Transformation Module**. +3. Select the `movielens` transformation module +4. Check if the necessary transformation procedure `rating() -> ()` is visible under **Detected + transformation functions** on the right. +5. Select it and **Attach to Stream**. + + + +## 5. Set Kafka configuration parameters + +Due to the nature of the public MovieLens Awesome Data Stream, you need to add +additional Kafka configuration parameters: + +* **sasl.username**: public
+* **sasl.password**: public
+* **security.protocol**: SASL_PLAINTEXT
+* **sasl.mechanism**: PLAIN
+ +In order to do so: + +1. In the Kafka Configuration Parameters **+ Add parameter field**. +2. Insert the parameter name and value. +3. To add another parameter, **Add parameter filed**. +4. **Save Configuration** once you have set all parameters. + + + +## 6. Connect Memgraph to the stream and start ingesting the data + +Once the stream is configured, you can **Connect to Stream**. + +Memgraph will do a series of checks, ensuring that defined topics and +transformation procedures are correctly configured. If all checks pass +successfully, you can **Start the stream**. Once you start the stream, you will +no longer be able to change configuration settings, just the transformation +module. + +The stream status changes to **Running**, and data is ingested into Memgraph. +You can see the number of nodes and relationships rising as the data keeps +coming in. + +## 7. Analyze the streaming data + +Switch to **Query Execution** and run a query to visualize the data coming in: + +``` +MATCH p=(n)-[r]-(m) +RETURN p LIMIT 100; +``` + + +Congratulations! You have connected Memgraph to a Kafka stream. We've prepared +queries that utilize the most powerful graph algorithms to gain every last bit +of insight that data can provide. [Let the querying +begin](https://memgraph.com/blog/how-to-analyze-a-streaming-dataset-of-movie-ratings-using-custom-query-modules)! + +If you are new to Cypher, check [**Cypher query language +manual**](/cypher-manual). You can also try using various graph algorithms and +modules from our open-source repository [**MAGE**](/mage) to solve graph +analytics problems, create awesome customized visual displays of your nodes and +relationships with [**Graph Style +Script**](/memgraph-lab/graph-style-script-language). + +You can also explore other data streams from the [Awesome Data +Stream](https://awesomedata.stream/) site! Feel free to play around with the +Python API some more and let us know what you are working on through our +[Discord server](https://discord.gg/memgraph). + +Above all - enjoy your graph database! diff --git a/docs2/data-streams/manage-streams-lab.md b/docs2/data-streams/manage-streams-lab.md new file mode 100644 index 00000000000..3052cd9754d --- /dev/null +++ b/docs2/data-streams/manage-streams-lab.md @@ -0,0 +1,148 @@ +--- +id: manage-streams-lab +title: Manage data streams from Memgraph Lab +sidebar_label: Manage data streams from Memgraph Lab +--- + +If you prefer to use GUI, you can connect to data streams by using a wizard in +the **Stream** section of Memgraph Lab. If you prefere writing commands, you can +[manage streams with queries](/how-to-guides/streams/manage-streams.md). + +If you need a Kafka stream to play around with, we've provided some at [Awesome +Data Stream](https://awesomedata.stream/)! + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/streams/overview.md) [![Related - +Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/graph-stream-processing-with-kafka.md) + +## How to add a stream? + +To add a stream in Memgraph Lab: + +1. Switch to **Streams** and **Add New Stream**. +2. Choose stream type, enter a stream name, server address, and topics you want to subscribe to. +3. Go to the **Next Step**. +4. Click on **Edit** (pencil icon) to modify the *Consumer Group*, *Batch + Interval* or *Batch Size*. + +If you are trying to connect to MovieLens Kafka data stream from the [Awesome Data +Stream](https://awesomedata.stream/#/movielens), the stream configuration should +look like this: + + + +Once the basic configuration is finished, you need to define a transformation +module and attach it to the stream. + +## How to add a transformation module? + +A transformation module is a set of user-defined transformation procedures +written in [C](/reference-guide/streams/transformation-modules/api/c-api.md) or +[Python](/reference-guide/streams/transformation-modules/api/python-api.md) that +act on data received from a streaming source. Transformation procedures instruct +Memgraph on how to transform the incoming messages to consume them correctly. + +At the moment, you can only develop Python transformation modules directly from +Memgraph Lab. + +To add a Python transformation module to a stream: +1. Click on **Add Transformation Module**. +2. Click on **Choose Transformation Module**. +3. Select an existing transformation module or **+ Create new transformation**. +4. Review an existing module or clear the screen and write a new transformation + procedure. +5. Save the transformation module. +6. Check if the necessary transformation procedure is visible under **Detected + transformation functions** on the right. +7. Select a transformation procedure and **Attach to Stream**. + +You can also develop transformation modules in Python beforehand, in the section +**Query Modules**. Click on the **New Module**, and the Lab will automatically +recognize transformation procedures once you define them. + +If you developed a procedure in C, you have to [load it into +Memgraph](manage-streams.md#how-to-create-and-load-a-transformation-module-into-memgraph) +first, and then you will be able to see it in the **Query Modules** section and +attach it to a stream. + +Check the transformation module for MovieLens on [Awesome Data +Stream](https://awesomedata.stream/#/movielens). + + + +## How to set Kafka configuration parameters? + +If necessary, add the Kafka configuration parameters to customize the stream further: + +1. In the Kafka Configuration Parameters **+ Add parameter field**. +2. Insert the parameter name and value. +3. To add another parameter, **Add parameter filed**. +4. **Save Configuration** once you have set all parameters. + +To connect to the [Awesome Data Stream](https://awesomedata.stream/) you need to set +the following Kafka configuration parameters: + +* **sasl.username** \| public
+* **sasl.password** \| public
+* **security.protocol** \| SASL_PLAINTEXT
+* **sasl.mechanism** \| PLAIN
+ + + +## How to connect Memgraph to the stream and start ingesting the data? + +Once the stream is configured, you can **Connect to Stream**. + +Memgraph will do a series of checks, ensuring that defined topics and +transformation procedures are correctly configured. If all checks pass +successfully, you can **Start the stream**. Once you start the stream, you will +no longer be able to change any of the configuration settings, just the +transformation module. + +The stream status changes to **Running**, and data is ingested into Memgraph. +You can see the number of nodes and relationships rising as the data keeps +coming in. If your nodes and relationships numbers stay at zero, check the +transformation module, as there might be a flaw in the logic that needs to be +resolved. + +Switch to **Query Execution** and run a query to visualize the data coming in: + +``` +MATCH p=(n)-[r]-(m) +RETURN p LIMIT 100; +``` + + +## How to manage a stream? + +To manage a stream in Memgraph Lab, go to **Streams** and click on the stream +you want to manage. + +### How to start, stop or delete a stream? + +To start a draft steam, click on **Connect to Stream**. + +To stop or start a stream, click on **Stop Stream**/**Start Stream**. + +To delete a stream, click on **Delete Stream**. + +### How to edit a stream? + +You cannot edit a started stream. You can only create a new stream with the +changes you want to implement. + +You can only change the transformation module and the stream +offset.. + +## How to change Kafka stream offset? + +Kafka stream offset can be changed using a query only: + +```cypher +CALL mg.kafka_set_stream_offset(streamName, offset) +``` + +An offset of `-1` denotes the beginning offset available for the given +topic/partition. + +An offset of `-2` denotes the end of the stream and only the +next produced message will be consumed. \ No newline at end of file diff --git a/docs2/data-streams/manage-streams-query.md b/docs2/data-streams/manage-streams-query.md new file mode 100644 index 00000000000..0a8a9e6a945 --- /dev/null +++ b/docs2/data-streams/manage-streams-query.md @@ -0,0 +1,150 @@ +--- +id: manage-streams-query +title: Manage data streams with queries +sidebar_label: Manage data streams with queries +--- + +The following page instructs how to manage streams using queries. Streams can +also be [managed through the **Stream** section in the Memgraph +Lab](/how-to-guides/streams/manage-streams-lab.md). + +If you need a Kafka stream to play around with, we've provided some at [Awesome +Data Stream](https://awesomedata.stream/)! + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/streams/overview.md) [![Related - +Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/graph-stream-processing-with-kafka.md) + +## How to create and load a transformation module into Memgraph? + +A [transformation +module](/reference-guide/streams/transformation-modules/overview.md) is a set of +user-defined transformation procedures written in +[C](/reference-guide/streams/transformation-modules/api/c-api.md) or +[Python](/reference-guide/streams/transformation-modules/api/python-api.md) that +act on data received from a streaming engine. Transformation procedures instruct +Memgraph on how to transform the incoming messages to consume them correctly. + +To create a transformation module, you need to: + +1. [Create a Python or a shared library file + (module).](/reference-guide/streams/transformation-modules/overview.md#creating-a-transformation-module) +2. Save the file into the Memgraph's `query_modules` or `internal_modules` directory (default: + `/usr/lib/memgraph/query_modules` and `/var/lib/memgraph/internal_modules/`). +3. Load the file into Memgraph either on startup (automatically) or by running a + `CALL mg.load_all();` query. + +If you are using Docker to run Memgraph, check [how to transfer the file into the container](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-docker-container). + +If you are using Memgraph Lab you can [create transformation module within the +application](/reference-guide/streams/transformation-modules/overview.md#creating-transformation-modules-within-memgraph-lab). + +## How to create a Kafka or Redpanda stream? + +In order to create a stream with a query, first you need to [load the +transformation module into +Memgraph](#how-to-create-and-load-a-transformation-module-into-memgraph). The +most basic query for creating a stream is: + + +```cypher +CREATE KAFKA STREAM streamName +TOPICS topic1[, , ...] +TRANSFORM transModule.transProcedure +BOOTSTRAP_SERVERS bootstrapServers; +``` + +Additional options for creating a stream are explained in the [reference +guide](/reference-guide/streams/overview.md#kafka-and-redpanda). + +## How to create a Pulsar stream? + +In order to create a stream with a query, first you need to [load the +transformation module into +Memgraph](#how-to-create-and-load-a-transformation-module-into-memgraph). The +most basic query for creating a stream is: + + +```cypher +CREATE PULSAR STREAM streamName +TOPICS topic1[,topic2, ...] +TRANSFORM transModule.transProcedure +SERVICE_URL serviceURL; +``` + +Additional options for creating a stream are explained in the [reference +guide](/reference-guide/streams/overview.md#pulsar). + +## How to get information about a stream? + +You can get the basic stream information with: + +```cypher +SHOW STREAMS; +``` + +## How to check the transformed incoming data? + +To see the results of the transformation module use the `CHECK STREAM` clause. +It will consume the message from the last committed offset but won't commit the +offsets. There is no committed offset coming from a newly created stream, so by +default, the query will wait `30000` milliseconds (`30` seconds) for new +messages and after that, it will throw a timeout exception. You can change the +timeout by adding the `TIMEOUT` sub-clause and adding a custom time to the query. + +The following query will transform new messages that come from the stream within +60 seconds: + +```cypher +CHECK STREAM myStream TIMEOUT 60000; +``` + +To consume more batches, increase the `BATCH_LIMIT`: + +```cypher +CHECK STREAM myStream BATCH_LIMIT 3 TIMEOUT 60000; +``` + +## How to start, stop or delete a stream? + +To start a specific stream with name `` that will consume `` +number of batches for a maximum duration of `` milliseconds and +then stop: + +```cypher +START STREAM [BATCH_LIMIT ] [TIMEOUT ]; +``` + +To start a stream that will run for an infinite number of batches without a +timeout limit: + +```cypher +START STREAM streamName; +``` + +To stop a stream: + +```cypher +STOP STREAM streamName; +``` + +To delete a stream: + +```cypher +DROP STREAM streamName; +``` + +For more options, [check the reference guide](/reference-guide/streams/overview.md#start-a-stream). + +## How to change Kafka stream offset? + +Use the following query to change Kafka stream offset: + +```cypher +CALL mg.kafka_set_stream_offset(streamName, offset) +``` + +An offset of `-1` denotes the beginning offset available for the given +topic/partition. + +An offset of `-2` denotes the end of the stream and only the +next produced message will be consumed. \ No newline at end of file diff --git a/docs2/data-streams/transformation-modules/c-api.md b/docs2/data-streams/transformation-modules/c-api.md new file mode 100644 index 00000000000..5f9db2ac5eb --- /dev/null +++ b/docs2/data-streams/transformation-modules/c-api.md @@ -0,0 +1,323 @@ +--- +id: c-api +title: Transformation modules C API +sidebar_label: C API +--- + +This is the C API documentation for `mg_procedure.h` that contains declarations +of all functions that can be used to implement a transformation. This source +file can be found in the Memgraph installation directory, under +`include/memgraph`. On the standard Debian installation, this will be under +`/usr/include/memgraph`. + +:::caution + +**NOTE:** This part of the documentation is still under development. An updated +version will soon be available. + +::: + +:::tip + +For an example of how to implementΒ transformation modules in C, check out the +[transformation module example](#transformation-module-example). + +::: + +## Types + +| | Name | +| -------------- | -------------- | +| typedef void (\*)(const struct mgp_messages \*, const struct mgp_graph \*, struct mgp_result \*, struct mgp_memory \*); | **[mgp_trans_cb](#typedef-mgp_trans_cb)**
Entry-point for a transformation with a fixed result type | + +Each record of the result must contain the following fields: +* the `query` field with a Cypher query as a string that will be executed against the database +* the `parameters` field with the optional query parameters as a nullable map + +## Functions + +| | Name | +| -------------- | -------------- | +| size_t | **[mgp_messages_size](#function-mgp_messages_size)**(const struct mgp_messages \*messages)
Get the number of messages contained in the messages list. | +| const struct mgp_message \* | **[mgp_messages_at](#function-mgp_messages_at)**(const struct mgp_messages \*messages, size_t idx)
Get the mgp_message at index idx. | +| size_t | **[mgp_message_payload_size](#function-mgp_message_payload_size)**(const struct mgp_message \*message)
Get the payload size of message. | +| const char \* | **[mgp_message_payload](#function-mgp_message_payload)**(const struct mgp_message \*message)
Get the payload of messages as a byte array.| +| const char \* | **[mgp_message_topic_name](#function-mgp_message_topic_name)**(const struct mgp_message \*message)
Get the topic name of message. | +| size_t | **[mgp_message_key_size](#function-mgp_message_key_size)**(const struct mgp_message \*message)
Get key size of message. | +| const char \* | **[mgp_message_key](#function-mgp_message_key)**(const struct mgp_message \*message)
Get key of message as a byte array. | +| int64_t | **[mgp_message](#function-mgp_message_timestamp)**(const struct mgp_message \*message)
Get the timestamp of message. | +| int | **[mgp_module_add_transformation](#function-mgp_module_add_transformation)**(struct mgp_module \*module, const char \*name, mgp_trans_cb cb)
Registers a transformation to a module | + +## Types Documentation + +### typedef mgp_trans_cb + +```cpp +typedef void(* mgp_trans_cb) (const struct mgp_messages *, const struct mgp_graph *, struct mgp_result *, struct mgp_memory *); +``` + +Entry-point for a transformation invoked through a stream. +Passed in arguments will not live longer than the callback's execution. Therefore, +you must not store them globally or use the passed in `mgp_memory` to allocate global resources. +The result type of transformation is fixed. + +## Functions Documentation + +### function mgp_messages_size + +```cpp +size_t mgp_messages_size( + const struct mgp_messages* messages +) +``` +Returns the total number of messages contained in the argument `messages`. + +### function mgp_messages_at + +```cpp +mgp_message* mgp_messages_at( + const struct mgp_messages* messages, + size_t idx +) +``` +Accessor function that returns the underlying `message` stored at index `idx` in `messages`. +The index supplied must reside in the half-open interval [0, `mgp_messages_size(messages)`). + +### function mgp_message_payload_size + +```cpp +size_t mgp_message_payload_size + const struct mgp_message* message +) +``` +Returns the payload size of the argument `message`. + +### function mgp_message_payload + +```cpp +const char * mgp_message_payload( + const struct mgp_message* message +) +``` +Returns the payload of the argument `message` as a byte array with size `mgp_message_payload_size(message)`. + +### function mgp_message_topic_name + +```cpp +const char * mgp_message_topic_name( + const struct mgp_message* message +) +``` +Returns topic name of the argument `message`. Topic name is `NULL` terminated. + +### function mgp_message_key_size + +```cpp +size_t mgp_message_key_size( + const struct mgp_message* message +) +``` +Returns the key size of argument `message`. + +### function mgp_message_key + +```cpp +const char * mgp_message_key( + const struct mgp_message* message +) +``` +Returns the key of the argument `message` as a byte array with size `mgp_message_key_size(message)`. + +### function mgp_message_timestamp + +```cpp +int64_t mgp_timestamp( + const struct mgp_message* message +) +``` +Returns the timestamp of the argument `message`. + +### function mgp_module_add_transformation + +```cpp +int mgp_module_add_transformation( + struct mgp_module *module, + const char *name, + mgp_trans_cb cb +) +``` +Register a transformation to a module. The `name` must be a sequence of digits, underscores, +lowercase, and uppercase Latin letters. The `name` must begin with a non-digit character. +Note that Unicode characters are not allowed. Additionally, the `name` is case-sensitive. + +Return non-zero if the transformation is added successfully. Registering might +fail if unable to allocate memory for the transformation; if `name` is not +valid, or a transformation with the same name was already registered or if any +other unexpected failure happens. + +## Transformation module example + +Transformations can be implemented in C/C++ using the C API provided by +Memgraph. Such modules need to be compiled to a shared library so that they can +be loaded when Memgraph starts. This means that you can write the +transformations in any programming language which can work with C and can be +compiled to the ELF shared library format. + +In this chapter, we assume that Memgraph is installed on a standard Debian or +Ubuntu machine where the necessary header file can be found under +`/usr/include/memgraph`. For other installations, the header file can be found +under the `include/memgraph` folder in the Memgraph installation directory. + +As we already discussed how transformations work in the Python example, we +won't go over the transformation itself in detail. Also, to keep the +complexity of this example low, this transformation doesn't use the query +parameters. + +So let's create `c_transformation.cpp` and start to populate it! + +```cpp +#include +#include + +#include "mg_procedure.h" + +const std::string query_part_1{"CREATE (n:MESSAGE {timestamp: '"}; +const std::string query_part_2{"', payload: '"}; +const std::string query_part_3{"', topic: '"}; +const std::string query_part_4{"'})"}; + +std::string create_query(mgp_message &message, struct mgp_result *result) { + int64_t timestamp{0}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_message_timestamp(&message, ×tamp)) { + throw "Internal error!"; + } + + const char *payload{nullptr}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_message_payload(&message, &payload)) { + throw "Internal error!"; + } + + size_t payload_size{0}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_message_payload_size(&message, &payload_size)) { + throw "Internal error!"; + } + + const char *topic_name{nullptr}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_message_topic_name(&message, &topic_name)) { + throw "Internal error!"; + } + + return query_part_1 + std::to_string(timestamp) + query_part_2 + + std::string{payload, payload_size} + query_part_3 + topic_name + + query_part_4; +} + +void my_c_transformation(struct mgp_messages *messages, mgp_graph *, + mgp_result *result, mgp_memory *memory) { + + mgp_value *null_value{nullptr}; + + try { + size_t messages_size{0}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_messages_size(messages, &messages_size)) { + return; + } + + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_value_make_null(memory, &null_value)) { + return; + } + + for (auto i = 0; i < messages_size; ++i) { + + mgp_message *message{nullptr}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_messages_at(messages, i, &message)) { + break; + } + + const auto query = create_query(*message, result); + + mgp_result_record *record{nullptr}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_result_new_record(result, &record)) { + break; + } + + mgp_value *query_value{nullptr}; + if (mgp_error::MGP_ERROR_NO_ERROR != + mgp_value_make_string(query.c_str(), memory, &query_value)) { + break; + } + + auto mgp_result = mgp_result_record_insert(record, "query", query_value); + mgp_value_destroy(query_value); + + if (mgp_error::MGP_ERROR_NO_ERROR != mgp_result) { + static_cast( + mgp_result_set_error_msg(result, "Couldn't insert field")); + break; + } + + mgp_result = mgp_result_record_insert(record, "parameters", null_value); + if (mgp_error::MGP_ERROR_NO_ERROR != mgp_result) { + static_cast( + mgp_result_set_error_msg(result, "Couldn't insert field")); + break; + } + } + mgp_value_destroy(null_value); + } catch (const std::exception &e) { + mgp_value_destroy(null_value); + static_cast(mgp_result_set_error_msg(result, e.what())); + return; + } +} +``` + +Now we have to register the transformation in the `mgp_init_module` function: + +```cpp +extern "C" int mgp_init_module(mgp_module *module, mgp_memory *memory) { + + return mgp_error::MGP_ERROR_NO_ERROR != + mgp_module_add_transformation(module, "my_c_transformation", + my_c_transformation); +} +``` + +Now let's compile it: + +```shell +clang++ --std=c++17 -Wall -shared -fPIC -I /usr/include/memgraph c_transformation.cpp -o c_transformation.so +``` + +After copying the resulting `c_transformation.so` to the +`/usr/lib/memgraph/query_modules` or `/var/lib/memgraph/internal_modules` directory, we can reload the modules and check +if Memgraph found our newly created transformation: + +```cypher +CALL mg.load_all(); +``` + +Then the transformation should show up in the list of transformations: + +```cypher +CALL mg.transformations() YIELD *; +``` + +You should see something like this: + +```plaintext ++-------------------------------------------+-------------------------------------------------------+-------------+ +| name | path | is_editable | ++-------------------------------------------+-------------------------------------------------------+-------------+ +| "c_transformation.my_c_transformation" | "/usr/lib/memgraph/query_modules/c_transformation.so" | false | +| "transformation.my_transformation" | "/usr/lib/memgraph/query_modules/transformation.py" | true | ++-------------------------------------------+-------------------------------------------------------+-------------+ +``` diff --git a/docs2/data-streams/transformation-modules/python-api.md b/docs2/data-streams/transformation-modules/python-api.md new file mode 100644 index 00000000000..de1bcd2c8bb --- /dev/null +++ b/docs2/data-streams/transformation-modules/python-api.md @@ -0,0 +1,480 @@ +--- +id: python-api +title: Transformations Python API +sidebar_label: Python API +--- + +This is the additional API documentation for `mgp.py` which contains +definitions of the public Transformation Python API provided by Memgraph. At the +core, this is a wrapper around the **[C API](c-api.md)**. This source file can +be found in the Memgraph installation directory, under `python_support`. On the +standard Debian installation, this will be under +`/usr/lib/memgraph/python_support`. + +:::caution + +**NOTE:** This part of the documentation is still under development. An updated +version will soon be available. + +::: + +:::tip + +For an example how to implementΒ transformation modules in Python with Memgraph Lab, check out +this [tutorial](/tutorials/graph-stream-processing-with-kafka.md#2-create-a-transformation-module). + +Below, you can find [transformation examples of different format +messages](#transformation-examples-of-different-format-messages) such as JSON, +Avro and Protobuf. + +::: + +## `mgp.transformation(func)` +Transformation modules in Python have to follow certain rules in order to work: +1. The transformation module is a Python function +2. The function has to be *decorated* with a **@mgp.transformation** decorator +3. The function can have 1 or 2 arguments + - one of type `mgp.Messages` (required) + - one of type `mgp.TransCtx` (optional) +4. The function has to return an `mgp.Record` in the following form: + - `mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map])` + - the return type can also be an **iterable** of `mgp.Record`s, but not a + generator + +### Examples +```python +import mgp + +@mgp.transformation +def transformation(context: mgp.TransCtx, + messages: mgp.Messages + ) -> mgp.Record(query=str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message = messages.message_at(i) + payload_as_str = message.payload().decode("utf-8") + result_queries.append(mgp.Record( + query=f"CREATE (n:MESSAGE {{timestamp: '{message.timestamp()}', payload: '{payload_as_str}', topic: '{message.topic_name()}'}})", + parameters=None)) + + return result_queries +``` +This transformation extracts the interesting members of each `mgp.Message` and +stores them in query `Record`, which wraps a `CREATE` clause with all the +interesting members (timestamp, payload, etc.) and an empty parameter list. + +Any errors can be reported by raising an Exception. + +## `class mgp.Message(message)` +Bases: `object` + +Represents a single message. You shouldn't store a `Message` globally. + +### `is_valid()` +Returns true if the underlying `mgp.message` object is valid and can be +accessed. + +### `payload()` +Returns the payload of the message. Raises an `InvalidMessageError` if +`is_valid()` is false. + +### `topic_name()` +Returns the topic name of the underlying `mgp.message`. Raises an +`InvalidMessageError` if `is_valid()` is false. + +### `key()` +Returns the key of the underlying `mgp.message` as bytes. Raises an +`InvalidMessageError` if `is_valid()` is false. + +### `timestamp()` +Returns the timestamp of the underlying `mgp.message`. Raises an +`InvalidMessageError` if `is_valid()` is false. + +## `class mgp.Messages(messages)` +Bases: `object` + +Represents a list of messages passed to a transformation. You shouldn't store +`messages` globally . + +### `is_valid()` +Returns true if the underlying `mgp.messages` object is valid and can be +accessed. + +### `total_messages()` +Returns the number of `mgp.messages` contained. Raises `InvalidMessagesError` if +`is_valid()` is false. + +### `message_at(id)` +Returns the underlying `mgp.message` at index `id`. Raises +`InvalidMessagesError` if `is_valid()` is false. + +## `class mgp.TransCtx(graph)` +Bases: `object` + +Context of a transformation being executed. + +Access to a `TransCtx` is only valid during a single execution of a +transformation. You shouldn't store a `TransCtx` globally. + +### `graph()` +Raise `InvalidContextError` if context is invalid. + +### `is_valid()` +Returns true if the context is valid and can be accessed. + +## Transformation examples of different format messages + +If you are using Kafka or Redpanda, below are transformation examples of +messages in the most common formats: + +- **[JSON](#json)** +- **[Avro](#avro)** +- **[Protobuf](#protobuf)** + + +Once the transformation procedures have been written, the module needs to be +loaded into Memgraph. + +### JSON + +[JSON](https://www.json.org/json-en.html) (JavaScript Object Notation) is an +open standard file format and data interchange format that uses human-readable +text to store and transmit data objects consisting of attribute-value pairs and +arrays (or other serializable values). It is a common data format with a diverse +range of functionality in data interchange, including communication of web +applications with servers. + +Let's assume we have the following schemas coming out of three topics: + +```json +person = { + "id" : int, + "name": str, + "address" : str, + "mail": str, + } +company = { + "id" : int, + "name" : str, + "address" : str, + "mail": str, + } +works_at = { + "person_id" : int, + "company_id" : int, + "start_date" : date, + } +``` + +The procedures within the Python transformation module that will transform the incoming +data into Cypher query would look like this: + +```python +import mgp +import json + +@mgp.transformation +def person_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + for i in range(messages.total_messages()): + message = messages.message_at(i) + message_json = json.loads(message.payload()) + result_queries.append(mgp.Record ( + query=f'''MERGE (p:Person {{ id: ToInteger({message_json["id"]}), name: "{message_json["name"]}", + address: "{message_json["address"]}", mail: "{message_json["mail"]}" }})''' , + parameters=None + )) + return result_queries + +@mgp.transformation +def company_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + for i in range(messages.total_messages()): + message = messages.message_at(i) + message_json = json.loads(message.payload()) + result_queries.append(mgp.Record ( + query=f'''MERGE (c:Company {{ id: ToInteger({message_json["id"]}), name: "{message_json["name"]}", + address: "{message_json["address"]}", mail: "{message_json["mail"]}" }})''' , + parameters=None + )) + return result_queries + +@mgp.transformation +def employees_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message = messages.message_at(i) + message_json = json.loads(message.payload()) + result_queries.append(mgp.Record ( + query=f'''MATCH (p:Person ), (c:Company) + WHERE p.id = "{message_json["person_id"]}" AND c.id = "{message_json["company_id"]}" + MERGE (p)-[WORKS_AT: {{start_date: date({message_json["start_date"]})}}]->(c)''' , + parameters=None + )) + + return result_queries +``` + +Upon creating three separate streams in Memgraph (one for each topic), and +ingesting the data, the graph schema looks like this: + + + +If you need help writing transformation modules, check out [the tutorial on +writing modules in +Python](/tutorials/graph-stream-processing-with-kafka.md#2-create-a-transformation-module), +and [an example of a C transformation +procedure](/reference-guide/streams/transformation-modules/api/c-api.md#transformation-module-example). + +### Avro + +If you want to import your data in Memgraph using Apache Avro serialization, you +need to know the [Avro +schema](https://avro.apache.org/docs/current/gettingstartedpython.html#Defining+a+schema) +of your data. This is necessary for deserializing the data. Each schema contains +a single schema definition, so there should be a separate schema for each data +representation you want to import into Memgraph. + +Avro data types will be flexibly mapped to the target schema, that is, Avro and +openCypher types do not need to match exactly. Use the table below for data type +mappings: + +| Avro Data Type | Cypher Casting Function| +|----------------|------------------------| +| bool | toBoolean | +| float | toFloat | +| int | toInteger | + + +Let's assume we have the following schemas coming out of three topics: + +```json +profile_schema = """ { + "namespace": "example.avro", + "name": "Person", + "type": "record", + "fields": [ + {"name": "id", "type": "int"}, + {"name": "name", "type": "string"}, + {"name": "address", "type": "string"} + {"name": "mail", "type": "string"}, + ] +}""" + +company_schema = """{ + "namespace": "example.avro", + "name": "Company", + "type": "record", + "fields": [ + {"name": "id", "type": "int"}, + {"name": "name", "type": "string"}, + {"name": "address", "type": "string"} + {"name": "mail", "type": "string"}, + ] +} """ + +works_at_schema = """ { + "namespace": "example.avro", + "name": "Works_At", + "type": "record", + "fields": [ + {"name": "person_id", "type": "int"}, + {"name": "company_id", "type": "int"} + {"name": "start_date", "type": "date"} + ] +} +""" +``` + +Data received by the Memgraph consumer is a byte array and needs to be +deserialized. The following method will deserialize data with the help of +Confluent Kafka: + +```python +from confluent_kafka.schema_registry import SchemaRegistryClient +from confluent_kafka.schema_registry.avro import AvroDeserializer + +def process_record_confluent(record: bytes, src: SchemaRegistryClient, schema: str): + deserializer = AvroDeserializer(schema_str=schema, schema_registry_client=src) + return deserializer(record, None) # returns dict + +``` + +The procedures within the Python transformation module that will transform the incoming +data into Cypher query would look like this: + +```python +import mgp + +@mgp.transformation +def person_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message_avro = messages.message_at(i) + msg_value = message_avro.payload() + message = process_record_confluent(msg_value, src= SchemaRegistryClient({'url': 'http://localhost:8081'}), schema=profile_schema) + result_queries.append(mgp.Record ( + query=f'''MERGE (p:Person {{ id: ToInteger({message["id"]}), name: "{message["name"]}", address: "{message["address"]}", mail: "{message["mail"]}" }})''' , + parameters=None + )) + + return result_queries + +@mgp.transformation +def company_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message_avro = messages.message_at(i) + msg_value = message_avro.payload() + message = process_record_confluent(msg_value, src= SchemaRegistryClient({'url': 'http://localhost:8081'}), schema=profile_schema) + result_queries.append(mgp.Record ( + query=f'''MERGE (c:COmpany {{ id: ToInteger({message["id"]}), name: "{message["name"]}", address: "{message["address"]}", mail: "{message["mail"]}" }})''' , + parameters=None + )) + + return result_queries + +@mgp.transformation +def company_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + + for i in range(messages.total_messages()): + message_avro = messages.message_at(i) + msg_value = message_avro.payload() + message = process_record_confluent(msg_value, src= SchemaRegistryClient({'url': 'http://localhost:8081'}), schema=profile_schema) + result_queries.append(mgp.Record ( + query=f'''MATCH (p:Person ), (c:Company) + WHERE p.id = "{message["person_id"]}" AND c.id = "{message["company_id"]}" + MERGE (p)-[WORKS_AT: {{start_date: date({message["start_date"]})}}]->(c)''' , + parameters=None + )) + + return result_queries + +``` + +Upon creating three separate streams in Memgraph (one for each topic), and ingesting the data, the +graph schema looks like this: + + + + +### Protobuf + +Similar to Apache Avro, +[Protobuf](https://developers.google.com/protocol-buffers) is a method of +serializing structured data. A message format is defined in a `.proto` file, and +from it you can generate code in many languages, including Java, Python, C++, +C#, Go, and Ruby. Unlike Avro, Protobuf does not serialize schema with the +message. In order to deserialize the message, you need the schema in the +consumer. A benefit of working with Protobuf is the option to define multiple +messages in one `.proto` file. + +Let's assume we have the following schemas coming out of three topics: + +```protobuf +syntax = "proto3"; + +message Person { + int64 id = 1; + string name = 2; + string address = 3; + string mail = 4; +} + +message Company { + int64 id = 1; + string name = 2; + string address = 3; + string mail = 4; +} + +message WorksAt { + int64 person_id = 1; + int64 company_id = 2; + string start_date = 3; +} + +``` +These schemas translate into the `.proto` file. +Before making your transformation script, it is necessary to [generate +code](https://developers.google.com/protocol-buffers/docs/pythontutorial#compiling-your-protocol-buffers) +from the `.proto` file. + +Data received by the Memgraph consumer is a byte array and needs to be +deserialized. The following method will help you deserialize your data with the +help of Confluent Kafka: + +```python +from confluent_kafka.schema_registry import SchemaRegistryClient +from confluent_kafka.schema_registry.protobuf import ProtobufDeserializer + +import person_pb2 # proto file compiled into Python module + +def process_record_protobuf(record: bytes, message_type: obj) -> dict: + deserializer = ProtobufDeserializer(message_type) + return deserializer(record, None) +``` + +`message_type` corresponds to the message defined in `.proto` file. This method +should be added to the transformation module. + +The procedures within the Python transformation module that will transform the incoming +data into Cypher query would look like this: + +```python +import mgp + +@mgp.transformation +def person_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + for i in range(messages.total_messages()): + message_pb = messages.message_at(i) + msg_value = message_pb.payload() + message = process_record_protobuf(msg_value, person_pb2.Person) + result_queries.append(mgp.Record ( + query=f'''MERGE (p:Person {{ id: ToInteger({message.id}), name: "{message.name}", address: "{message.address}", mail: "{messag.mail}" }})''' , + parameters=None + )) + + return result_queries + +@mgp.transformation +def company_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + for i in range(messages.total_messages()): + message_pb = messages.message_at(i) + msg_value = message_pb.payload() + message = process_record_protobuf(msg_value, person_pb2.Person) + result_queries.append(mgp.Record ( + query=f'''MERGE (c:Copany {{ id: ToInteger({message.id}), name: "{message.name}", address: "{message.address}", mail: "{messag.mail}" }})''' , + parameters=None + )) + + return result_queries + +@mgp.transformation +def company_transformation(messages: mgp.Messages) -> mgp.Record(query = str, parameters=mgp.Nullable[mgp.Map]): + result_queries = [] + for i in range(messages.total_messages()): + message_pb = messages.message_at(i) + msg_value = message_pb.payload() + message = process_record_protobuf(msg_value, person_pb2.Person) + result_queries.append(mgp.Record ( + query=f'''MATCH (p:Person ), (c:Company) + WHERE p.id = "{message.person_id}" AND c.id = "{message.company_id}" + MERGE (p)-[WORKS_AT: {{start_date: date({message.start_date})}}]->(c)''' , + parameters=None + )) + + return result_queries +``` + +Upon creating three separate streams in Memgraph (one for each topic), and ingesting the data, the +graph schema looks like this: + + \ No newline at end of file diff --git a/docs2/data-streams/transformation-modules/transformation-modules.md b/docs2/data-streams/transformation-modules/transformation-modules.md new file mode 100644 index 00000000000..fb43df40de0 --- /dev/null +++ b/docs2/data-streams/transformation-modules/transformation-modules.md @@ -0,0 +1,186 @@ +--- +id: overview +title: Transformation modules +sidebar_label: Transformation modules overview +slug: /reference-guide/streams/transformation-modules +--- + +In order to connect Memgraph to a data stream, it needs to know how to transform +the incoming messages in order to consume them correctly. This is done with a +transformation module. + +[![Related - +Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/graph-stream-processing-with-kafka.md#create-a-transformation-module) + +To create a transformation module, you need to: + +1. Create a [Python](./api/python-api.md) or a [shared library](./api/c-api.md) + file (module). +2. Save the file into the Memgraph's `query_modules` or `internal_modules` directory (default: + `/usr/lib/memgraph/query_modules` and `/var/lib/memgraph/internal_modules/`). +3. Load the file into Memgraph either on startup (automatically) or by running a + `CALL mg.load_all();` query. + +If you are using Memgraph Lab you can [create transformation module within the +application](#creating-transformation-modules-within-memgraph-lab). + +## Creating a transformation module + +Memgraph supports user-defined transformations procedures written in **Python** +and **C** that act on data received from a streaming engine. These +transformation procedures are grouped into a module called **Transformation +module**, which is then loaded into Memgraph on startup or later on. A +transformation module consists of a transformation, a query procedure, or both. + +Currently, we support transformations for Kafka, Pulsar and Redpanda +streams. + +The available API references are: + +- **[C API](./api/c-api.md)** +- **[Python API](./api/python-api.md)** + +For examples of transformation modules check out [the tutorial on implementing a +Python transformation +module](/tutorials/graph-stream-processing-with-kafka.md#2-create-a-transformation-module), +[Python transformation +examples](/reference-guide/streams/transformation-modules/api/python-api.md#transformation-examples-of-different-format-messages) +of different format messages or [an example of a transformation module written +in C](./api/c-api.md#transformation-module-example). + +## Loading modules + +Modules can be loaded on startup or when the instance is already running. + +### Loading on startup + +Memgraph attempts to load the modules from all `*.so` and `*.py` files it finds +in the default (`/usr/lib/memgraph/query_modules` and +`/var/lib/memgraph/internal_modules/`) directories. The `*.so` modules +are written using the C API and the `*.py` modules are written using the Python +API. Each file corresponds to one module. The names of these files will be mapped to +module names. For example, `hello.so` will be mapped to the `hello` module and a +`py_hello.py` script will be mapped to the `py_hello` module. + +If you want to change the directory in which Memgraph searches for +transformation modules, change or extend the `--query-modules-directory` +flag in the main configuration file (`/etc/memgraph/memgraph.conf`) or supply it +as a command-line parameter (e.g., when using Docker). + +:::caution + +Please remember that if you are using Memgraph Platform image, you should pass +configuration flags within MEMGRAPH environment variable (e.g. `docker run -p +7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH="--log-level=TRACE +--query-modules-directory=path/path" memgraph/memgraph-platform`) and if you +are using any other image you should pass them as arguments after the image name +(e.g., `... memgraph/memgraph-mage --log-level=TRACE +--query-modules-directory=path/path`). + +::: + +
+ Transfer transformation module into a Docker container + + If you are using Docker to run Memgraph, you will need to copy the + transformation module file from your local directory into the Docker + container where Memgraph can access it. + +

+ +**1.** Open a new terminal and find the `CONTAINER ID` of the Memgraph Docker +container: + +``` +docker ps +``` + +**2.** Copy a file from your current directory to the container with the +command: + +``` +docker cp ./file_name.py :/usr/lib/memgraph/query_modules/file_name.py +``` + +The file is now inside your Docker container. + +
+ +### Loading while the instance is already running + +To load a specific transformation module from a `*.so` and `*.py` files that + were added to the default directories (`/usr/lib/memgraph/query_modules` and +`/var/lib/memgraph/internal_modules/`) while the instance was already running, use: + +``` +CALL mg.load(module_name); +``` + +To load all transformation modules, use: + +``` +CALL mg.load_all(); +``` + +## Creating transformation modules within Memgraph Lab + +If you are using Memgraph Lab to connect to the database instance, you can +create the transformation module within the application: + +1. Go to **Query Modules** and click on **+ New Module**. +2. Give the transformation module a name and **Create** it. +3. Write the transformation procedures and click **Save & Close**. + +You will see the signature and overview of the transformation procedure that you +can now use while [creating a new +stream](/import-data/data-streams/manage-streams-lab.md). + +## Utility procedures for transformations + +Query procedures that allow you to gain more insight into modules and +transformations are written under our utility `mg` query module. For +transformations, this module offers: + +| Procedure | Description | +| ------------------------------------------ | ------------------------------------ | +| `mg.transformations() :: (name :: STRING)` | Lists all transformation procedures. | +| `mg.load(module_name :: STRING) :: ()` | Loads or reloads the given module. | +| `mg.load_all() :: ()` | Loads or reloads all modules. | + +For example, you can invoke `mg.transformations()` from mgconsole or Memgraph +Lab with the following command: + +```cypher +CALL mg.transformations() YIELD *; +``` + +This will yield the following result: + +```nocopy ++-------------------------------------------+-------------------------------------------------------+-------------+ +| name | path | is_editable | ++-------------------------------------------+-------------------------------------------------------+-------------+ +| "batch.transform" | "/usr/lib/memgraph/query_modules/batch.py" | true | ++-------------------------------------------+-------------------------------------------------------+-------------+ +``` + +To load a module (named e.g. `hello`) that wasn't loaded on startup (probably +because it was added to Memgraph's directory once Memgraph was already running), +you can invoke: + +```cypher +CALL mg.load("hello"); +``` + +If you wish to reload an existing module, say the `hello` module above, use the +same procedure: + +```cypher +CALL mg.load("hello"); +``` + +To reload all existing modules and load any newly added ones, use: + +```cypher +CALL mg.load_all(); +``` diff --git a/docs2/data-visualization/data-visualization.md b/docs2/data-visualization/data-visualization.md new file mode 100644 index 00000000000..8594eaedd9f --- /dev/null +++ b/docs2/data-visualization/data-visualization.md @@ -0,0 +1,65 @@ +# Data visualization in Memgraph Lab + +**Memgraph Lab** is a lightweight and intuitive **visual user interface** that +enables you to: + +- visualize graph data using [the Orb library](https://github.com/memgraph/orb) +- write and execute Cypher queries +- import and export data +- manage stream connections +- view and optimize query performance +- develop query modules in Python + +It was designed to help you with every stage of your learning process and graph +development. + +memgraph_lab_screenshot + +## Quick start + +If you would like to query a running Memgraph database instance using **Memgraph +Lab**, be sure to: + +### 1. Install Memgraph Platform or Memgraph Lab + +We recommend you [install **Memgraph Platform**](/memgraph/installation) and get +the complete streaming graph application platform that includes **MemgraphDB**, +command-line tool **mgconsole**, visual user interface **Memgraph Lab** running +within the browser and **MAGE** - graph algorithms and modules library. + +If you already have a running Memgraph database instance, the web application is +available at https://lab.memgraph.com/ or you can install Memgraph Lab as a +desktop application on [Windows](/installation/windows.md), +[macOS](/installation/macos.md) or [Linux](/installation/linux.md). + +### 2. Connect to Memgraph + +[Connect Memgraph Lab to Memgraph](/connect-to-memgraph.md) and start +experimenting with data and Cypher. + +### 3. Check out Graph Style Script + +To give your graphs a bit more pizzazz, dive into the [Graph Style Script +language](/style-script/overview.md) and learn how to customize the visual +appearance of your graphs to make them truly remarkable. + +### 4. Browse through the Changelog + +Want to know what's new in Memgraph Lab? Take a look at +[Changelog](/changelog.md) to see a list of new features. + +## What to do next? + +You can also execute queries using Memgraph's command-line tool +[mgconsole](https://memgraph.com/docs/memgraph/connect-to-memgraph/mgconsole). + +Those who are new to querying can head out to our +[Tutorials](https://memgraph.com/docs/memgraph/tutorials) or play around on +[Playground](https://playground.memgraph.com/) to get a feeling of what is +possible to find out from data using graphs. The [Cypher +manual](https://memgraph.com/docs/cypher-manual/) will give you an overview of +clauses and functions to help you write awesome queries. + +If you need more magic to enhance your graph power, look into [MAGE - Memgraph +Advanced Graph Extensions](https://memgraph.com/docs/mage) that will provide you +with various graph algorithms and modules in the form of query modules. \ No newline at end of file diff --git a/docs2/data-visualization/graph-style-script/built-in-elements.md b/docs2/data-visualization/graph-style-script/built-in-elements.md new file mode 100644 index 00000000000..623b95be229 --- /dev/null +++ b/docs2/data-visualization/graph-style-script/built-in-elements.md @@ -0,0 +1,2184 @@ +# Built in elements + +## Colors + +Graph Style Script comes with built-in colors that you can use by their name. + +Example of using color names: + +```cpp +@NodeStyle { + color: aquamarine + color-hover: Darker(cyan) +} +``` + +Example of using color codes: + +```cpp +@NodeStyle { + color: #7FFFD4 + color-hover: Darker(#00FFFF) +} +``` + +The color names come from a list of the [X11 +colors](https://www.w3.org/TR/css-color-3/#svg-color) supported by popular +browsers with the addition of gray/grey variants from SVG 1.0. + +| Color name | HEX code | +| -------------------- | --------------------------------------------------------------------------------------------------------------- | +| blue | #FF0000 | +| aliceblue | #F0F8FF | +| antiquewhite | #FAEBD7 | +| aqua | #00FFFF | +| aquamarine | #7FFFD4 | +| azure | #F0FFFF | +| beige | #F5F5DC | +| bisque | #FFE4C4 | +| black | #000000 | +| blanchedalmond | #FFEBCD | +| blue | #0000FF | +| blueviolet | #8A2BE2 | +| brown | #A52A2A | +| burlywood | #DEB887 | +| cadetblue | #5F9EA0 | +| chartreuse | #7FFF00 | +| chocolate | #D2691E | +| coral | #FF7F50 | +| cornflowerblue | #6495ED | +| cornsilk | #FFF8DC | +| crimson | #DC143C | +| cyan | #00FFFF | +| darkblue | #00008B | +| darkcyan | #008B8B | +| darkgoldenrod | #B8860B | +| darkgray | #A9A9A9 | +| darkgreen | #006400 | +| darkgrey | #A9A9A9 | +| darkkhaki | #BDB76B | +| darkmagenta | #8B008B | +| darkolivegreen | #556B2F | +| darkorange | #FF8C00 | +| darkorchid | #9932CC | +| darkred | #8B0000 | +| darksalmon | #E9967A | +| darkseagreen | #8FBC8F | +| darkslateblue | #483D8B | +| darkslategray | #2F4F4F | +| darkslategrey | #2F4F4F | +| darkturquoise | #00CED1 | +| darkviolet | #9400D3 | +| deeppink | #FF1493 | +| deepskyblue | #00BFFF | +| dimgray | #696969 | +| dimgrey | #696969 | +| dodgerblue | #1E90FF | +| firebrick | #B22222 | +| floralwhite | #FFFAF0 | +| forestgreen | #228B22 | +| fuchsia | #FF00FF | +| gainsboro | #DCDCDC | +| ghostwhite | #F8F8FF | +| gold | #FFD700 | +| goldenrod | #DAA520 | +| gray | #808080 | +| green | #008000 | +| greenyellow | #ADFF2F | +| grey | #808080 | +| honeydew | #F0FFF0 | +| hotpink | #FF69B4 | +| indianred | #CD5C5C | +| indigo | #4B0082 | +| ivory | #FFFFF0 | +| khaki | #F0E68C | +| lavender | #E6E6FA | +| lavenderblush | #FFF0F5 | +| lawngreen | #7CFC00 | +| lemonchiffon | #FFFACD | +| lightblue | #ADD8E6 | +| lightcoral | #F08080 | +| lightcyan | #E0FFFF | +| lightgoldenrodyellow | #FAFAD2 | +| lightgray | #D3D3D3 | +| lightgreen | #90EE90 | +| lightgrey | #D3D3D3 | +| lightpink | #FFB6C1 | +| lightsalmon | #FFA07A | +| lightseagreen | #20B2AA | +| lightskyblue | #87CEFA | +| lightslategray | #778899 | +| lightslategrey | #778899 | +| lightsteelblue | #B0C4DE | +| lightyellow | #FFFFE0 | +| lime | #00FF00 | +| limegreen | #32CD32 | +| linen | #FAF0E6 | +| magenta | #FF00FF | +| maroon | #800000 | +| mediumaquamarine | #66CDAA | +| mediumblue | #0000CD | +| mediumorchid | #BA55D3 | +| mediumpurple | #9370DB | +| mediumseagreen | #3CB371 | +| mediumslateblue | #7B68EE | +| mediumspringgreen | #00FA9A | +| mediumturquoise | #48D1CC | +| mediumvioletred | #C71585 | +| midnightblue | #191970 | +| mintcream | #F5FFFA | +| mistyrose | #FFE4E1 | +| moccasin | #FFE4B5 | +| navajowhite | #FFDEAD | +| navy | #000080 | +| oldlace | #FDF5E6 | +| olive | #808000 | +| olivedrab | #6B8E23 | +| orange | #FFA500 | +| orangered | #FF4500 | +| orchid | #DA70D6 | +| palegoldenrod | #EEE8AA | +| palegreen | #98FB98 | +| paleturquoise | #AFEEEE | +| palevioletred | #DB7093 | +| papayawhip | #FFEFD5 | +| peachpuff | #FFDAB9 | +| peru | #CD853F | +| pink | #FFC0CB | +| plum | #DDA0DD | +| powderblue | #B0E0E6 | +| purple | #800080 | +| red | #FF0000 | +| rosybrown | #BC8F8F | +| royalblue | #4169E1 | +| saddlebrown | #8B4513 | +| salmon | #FA8072 | +| sandybrown | #F4A460 | +| seagreen | #2E8B57 | +| seashell | #FFF5EE | +| sienna | #A0522D | +| silver | #C0C0C0 | +| skyblue | #87CEEB | +| slateblue | #6A5ACD | +| slategray | #708090 | +| slategrey | #708090 | +| snow | #FFFAFA | +| springgreen | #00FF7F | +| steelblue | #4682B4 | +| tan | #D2B48C | +| teal | #008080 | +| thistle | #D8BFD8 | +| tomato | #FF6347 | +| turquoise | #40E0D0 | +| violet | #EE82EE | +| wheat | #F5DEB3 | +| white | #FFFFFF | +| whitesmoke | #F5F5F5 | +| yellow | #FFFF00 | +| yellowgreen | #9ACD32 | + +## Functions + +Graph Style Script has a large number of built-in functions. With these +functions, you can achieve the right style for your graph. + +## Color functions + +### `Darker(color)` + +Returns a darker version of the given color. + +Example: + +- `color-hover: Darker(#dd2222)` will make the hover event color darker. + +Inputs: + +- `color: Color` + +Outputs: + +- `Color` + +### `Lighter(color)` + +Returns a lighter version of the given color. + +Example: + +- `color-hover: Lighter(#dd2222)` sets a lighter on hover event color. + +Inputs: + +- `color: Color` + +Outputs: + +- `Color` + +### `Mix(color1, color2)` + +Mixes given colors (performs linear interpolation). + +Example: + +- `Mix(#1B5E20, orange)` + +Inputs: + +- `color1: Color` +- `color2: Color` + +Outputs: + +- `Color` + +### `Red(color)` + +Returns the red component of a given color. The value will be between 0 and 255 +(both inclusive). + +Examples: + +- `Red(mediumseagreen)` will return the value 60. +- `Red(#6a0dad)` will return the value 106. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `Green(color)` + +Returns the green component of a given color. The value will be between 0 and +255 (both inclusive). + +Examples: + +- `Green(mediumseagreen)` will return the value 179. +- `Green(#6a0dad)` will return the value 13. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `Blue(color)` + +Returns the blue component of a given color. The value will be between 0 and 255 +(both inclusive). + +Examples: + +- `Blue(mediumseagreen)` will return the value 113. +- `Blue(#6a0dad)` will return the value 173. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `RGB(red, green, blue)` + +Creates a new color with given components. + +Example: + +- `RGB (128, 159, 255)` will return the color that has value #809fff. + +Inputs: + +- `red: number` +- `green: number` +- `blue: number` + +Outputs: + +- `Color` + +### `RGBA(red, green, blue, alpha)` + +Creates a new color with given components. Same as `RGB` with an additional +`alpha` value (between 0 and 1) for transparency. + +Example: +- `RGBA(128, 159, 255, 0.2)` will return the color that has value #809fff33. + +Inputs: + +- `red: number` +- `green: number` +- `blue: number` +- `alpha: number` + +Outputs: + +- `Color` + +### `Hue(color)` + +Returns the hue (HSL) component of a given color. The value will +be between 0 and 359 (both inclusive). + +Example: + +- `Hue(aliceblue)` will return value 208. +- `Hue(#00FFFF)` will return value 180. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `Saturation(color)` + +Returns the saturation (HSL) component of a given color. The value will +be between 0 and 100 (both inclusive). + +Example: + +- `Saturation(aliceblue)` will return value 100. +- `Saturation(#77a4ab)` will return value 24. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `Lightness(color)` + +Returns the lightness (HSL) component of a given color. The value will +be between 0 and 100 (both inclusive). + +Example: + +- `Lightness(aliceblue)` will return value 97. +- `Lightness(#FFFF00)` will return value 50. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +### `HSL(hue, saturation, lightness)` + +Creates a new color with given HSL (hue, saturation, lightness) values. Hue +value must be between 0 and 359 (both inclusive), saturation and lightness +values must be between 0 and 100 (both inclusive). + +Example: + +- `HSL(282, 23, 56)` will return the color that has value #9975a9. + +Inputs: + +- `hue: number` +- `saturation: number` +- `lightness: number` + +Outputs: + +- `Color` + +### `HSLA(hue, saturation, lightness, alpha)` + +Creates a new color with given components. Same as `HSL` with an additional +`alpha` value (between 0 and 1) for transparency. + +Example: + +- `HSLA(282, 23, 56, 0.2)` will return the color that has value #9975a933. + +Inputs: + +- `hue: number` +- `saturation: number` +- `lightness: number` +- `alpha: number` + +Outputs: + +- `Color` + +### `Alpha(color)` + +Returns `alpha` (transparency) component of a given color. The value will +be between 0 and 1 (both inclusive). + +Examples: + +- `Alpha(aliceblue)` will return value 1. +- `Alpha(#FFFF0033)` will return value 0.2. +- `Alpha(RGBA(282, 23, 56, 0.8))` will return value 0.8. +- `Alpha(HSLA(282, 23, 56, 0.2))` will return value 0.2. + +Inputs: + +- `color: Color` + +Outputs: + +- `number` + +## Conditional functions + +### `And(value...)` + +Returns `True` if all the given values are `Truthy`. Returns False otherwise. +Expressions after the first expression that evaluates to `Falsy` are not +evaluated. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Example: + +- `And(HasProperty(node, "a"), HasProperty(node, "b"))` will return `True` if + node has properties `a` and `b`. + +Inputs: + +- `value1: any` +- `value2: any` +- `valueN: any` + +Outputs: + +- `boolean` + +### `Or(value...)` + +Returns `True` if any of the given values is `Truthy`. Returns `False` +otherwise. Expressions after the first expression that evaluates to `Truthy` are +not evaluated. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Example: + +- `Or(Less(Property(node, "age"),20), Greater(Property(node, "age"),40))` + returns `True` if the node's `age` property is either less than 20 or greater + than 40. + +Inputs: + +- `value1: any` +- `value2: any` +- `valueN: any` + +Outputs: + +- `boolean` + +### `Not(value)` + +Returns `True` if the value is `Falsy` and returns `False` if the value is +`Truthy`. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Example: + +- `@NodeStyle Not(HasProperty(node, "count")) {...}` will apply the defined + styles to the nodes without the `count` property. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `Equals(value1, value2)` + +Returns `True` if given values are equal, `False` otherwise. Numbers, Strings +and Booleans are compared by value, Arrays and Maps by the content, Nodes and +Edges are compared by identity. + +Example: + +- `Equals(Property(edge, "category"), "Food")` checks if `edge.category` equals + to text "Food". +- `Equals(Property(node, "name"), "Jon Snow")` returns `True` if the condition + is met. + +Inputs: + +- `value1: any` +- `value2: any` + +Outputs: + +- `boolean` + +### `Greater(value1, value2)` + +Returns `True` if `value1` is greater than `value2`, `False` otherwise. + +Example: + +- `Greater(Size(Labels(node)), 0)` + +Inputs: + +- `value1: number` +- `value2: number` + +Outputs: + +- `boolean` + +### `Less(number1, number2)` + +Returns `True` if `value1` is less than `value2`, `False` otherwise. + +Example: + +- `Less(Property(node, "age"),40)` will return `True` if given `node.age` is + less than 40. + +Inputs: + +- `value1: number` +- `value2: number` + +Outputs: + +- `boolean` + +### `If(condition, then, else)` + +If condition is `Truthy` returns the `then` value, otherwise returns the `else` +value. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Example: + +- `label: If(HasProperty(node, "name"), Property(node, "name"), "No name")` + returns the property `name` as label if the node has one, or `No name` if the + node doesn't have it. + +Inputs: + +- `condition: any` +- `then: any` +- `else: any` + +Outputs: + +- `any` + +## Graph functions + +### `HasLabel(node, label)` + +Returns `True` if the given graph node has a label, `False` otherwise. + +Example: + +- `HasLabel(node, "Category")` will return `True` if a node has a label with the + name `Category`. + +Inputs: + +- `node: Node` +- `label: string` + +Outputs: + +- `boolean` + +### `HasProperty(nodeOrEdge, propertyName)` + +Returns `True` if a given graph node or relationship has the property +`propertyName`. + +Example: + +- `HasProperty(node, "City")` will return `True` if a node has a property with + the name `City`. + +Inputs: + +- `nodeOrEdge: Node | Relationship` +- `propertyName: string` + +Outputs: + +- `boolean` + +### `Id(nodeOrEdge)` + +Returns the ID of a given graph `node` or `edge`. + +Example: + +- `label: AsText(Id(node))` sets the label to be the node ID. + +Inputs: + +- `nodeOrEdge: Node | Relationship` + +Outputs: + +- `number` + +### `Identity(nodeOrEdge)` + +Returns the ID of a given graph `node` or `edge`. + +Example: + +- `label: AsText(Identity(node))` sets the label to be the node ID. + +Inputs: + +- `nodeOrEdge: Node | Relationship` + +Outputs: + +- `number` + +### `Labels(node)` + +Returns the list of labels of the given graph node. + +Example: + +- `label: Labels(node)` sets the label to be a list of all the node's labels. + +Inputs: + +- `node: Node` + +Outputs: + +- `List[string]` + +### `Property(nodeOrEdge, propertyName)` + +Returns the property with the name `propertyName` of given graph node or +relationship. + +Example: + +- `label: AsText(Property(node, "name"))` creates a label using the node's + `name` property. + +Inputs: + +- `nodeOrEdge: Node | Relationship` +- `propertyName: string` + +Outputs: + +- `any` + +### `Type(edge)` + +Returns the type of a given graph relationship. + +Example: + +- `label: Type(edge)` sets the label to the relationship type. + +Inputs: + +- `edge: Relationship` + +Outputs: + +- `string` + +### `InEdges(node)` + +Returns the list of inbound edges from a given graph node. + +Example: + +- `size: Size(InEdges(node))` sets the size to be equal to the count of inbound + edges. + +Inputs: + +- `node: Node` + +Outputs: + +- `List[Relationship]` + +### `OutEdges(node)` + +Returns the list of outbound edges from a given graph node. + +Example: + +- `size: Size(OutEdges(node))` sets the size to be equal to the count of + outbound edges. + +Inputs: + +- `node: Node` + +Outputs: + +- `List[Relationship]` + +### `Edges(graphOrNode)` + +Returns the list of inbound and outbound edges from a given graph node. It +returns all the edges in the graph if the input is a graph. + +Examples: + +- `size: Size(Edges(graph))` sets the size to be equal to the count of all + graph edges. +- `size: Size(Edges(node))` sets the size to be equal to the count of inbound + and outbound edges. + +Inputs: + +- `graphOrNode: Graph | Node` + +Outputs: + +- `List[Relationship]` + +### `Nodes(graphOrEdge)` + +Returns the list of start and end nodes from a given graph edge. It returns +all the nodes in the graph if the input is a graph. + +Examples: + +- `size: Size(Nodes(graph))` sets the size to be equal to the count of all + graph nodes. +- `size: Size(Nodes(edge))` sets the size to be equal to the count of nodes + that edge connects (usually 2). + +Inputs: + +- `graphOrEdge: Graph | Relationship` + +Outputs: + +- `List[Node]` + +### `AdjacentNodes(node)` + +Returns the list of adjacent nodes for a given graph node. An adjacent node is a +node connected directly with a single edge, inbound or outbound. + +Example: + +- `size: Size(AdjacentNodes(node))` sets the size to be equal to the count of + adjacent nodes. + +Inputs: + +- `node: Node` + +Outputs: + +- `List[Node]` + +### `StartNode(edge)` + +Returns the start (source) node for a given graph edge. + +Example: + +- `label: AsText(Id(StartNode(edge)))` sets the label of the edge to be the + start node ID. + +Inputs: + +- `edge: Relationship` + +Outputs: + +- `Node` + +### `EndNode(edge)` + +Returns the end (target) node for a given graph edge. + +Example: + +- `label: AsText(Id(EndNode(edge)))` sets the label of the edge to be the end + node ID. + +Inputs: + +- `edge: Relationship` + +Outputs: + +- `Node` + +### `NodeCount(graph)` + +Returns the total number of nodes in the graph. + +Example: + +- `size: NodeCount(graph)` sets the size to be the total number of nodes in the + graph. + +Inputs: + +- `graph: Graph` + +Outputs: + +- `number` + +### `EdgeCount(graph)` + +Returns the total number of edges in the graph. + +Example: + +- `size: EdgeCount(graph)` sets the size to be the total number of edges in the + graph. + +Inputs: + +- `graph: Graph` + +Outputs: + +- `number` + +## Map functions + +### `MapKeys(map)` + +Returns an array of all map keys. + +Example: + +- `MapKeys(AsMap("key1", "value1", "key2", "value2")))` will return an array `["key1", "key2"]`. + +Inputs: + +- `map: Map[string, any]` + +Outputs: + +- `List[string]` + +### `MapValues(map)` + +Returns an array of all map values. + +Example: + +- `MapValues(AsMap("key1", "value1", "key2", 12)))` will return an array `["value1", 12]`. + +Inputs: + +- `map: Map[string, any]` + +Outputs: + +- `List[any]` + +> Check other map functions down below: `AsMap`, `IsMap`, `Get`, `Set`, `Del`. + +## Math functions + +### `Add(value...)` + +Returns the sum of given values. + +Example: + +- `Add(10, Property(node, "age"))` will give node.age + 10 if age is defined (as + a number). + +Inputs: + +- `value1: number` +- `value2: number` +- `valueN: number` + +Outputs: + +- `number` + +### `Div(value1, value2)` + +Returns `value1` divided by `value2`. + +Example: + +- `Div(Property(node, "population"), 2)` will divide `node.population` with 2 if + population is defined (as a number). + +Inputs: + +- `value1: number` +- `value2: number` + +Outputs: + +- `number` + +### `Exp(value)` + +Returns 2.71828... raised to the power value. + +Example: -`Exp(2)` will return the number 7.38905609893 + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Log(value)` + +Returns the logarithm (to the base e) of a value. + +Example: + +- `Log(Property(node, "sales"))` + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Log10(value)` + +Returns the logarithm (to the base 10) of a value. + +Example: + +- `Log10(Property(node, "sales"))` + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Mul(value...)` + +Returns the product of given values. + +Example: + +- `Mul(2,10,3)` returns 60 (2*10*3). + +Inputs: + +- `value1: number` +- `value2: number` +- `valueN: number` + +Outputs: + +- `number` + +### `Random()` + +Returns a random number between 0 (inclusive) and 1 (exclusive). All the +possible numbers are equally likely to be returned. + +Example: + +- `Random()` + +Outputs: + +- `number` + +### `RandomInt(bound)` + +Returns a random integer between 0 (inclusive) and bound (exclusive). All the +possible numbers are equally likely to be returned. + +Example: + +- `RandomInteger(Property(node, "population"))` will return an integer between 0 + and `node.population` if population is defined (as a number). + +Inputs: + +- `bound: number` + +Outputs: + +- `number` + +### `Sqrt(value)` + +Returns the square root of a value. + +Example: + +- `Sqrt(Property(node, "surface"))` will return the square root of a + `node.surface`. + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Sub(value1, value2)` + +Subtracts `value2` from `value1`. + +Example: + +- `Sub(Property(node, "age"),10)` returns `node.age` - 10 if age is defined (as + a number). + +Inputs: + +- `value1: number` +- `value2: number` + +Outputs: + +- `number` + +### `Floor(value)` + +Returns the largest integer less than or equal to the input value. + +Examples: + +- `Floor(2.8)` will return number `2`. +- `Floor(2)` will return number `2`. + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Ceil(value)` + +Returns the smallest integer greater than or equal to the input value. + +Examples: + +- `Ceil(2.1)` will return number `3`. +- `Ceil(2)` will return number `2`. + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Round(value)` + +Returns the closest integer to the input value. + +Examples: + +- `Round(2.1)` will return number `2`. +- `Round(2.5)` will return number `3`. +- `Round(2.8)` will return number `3`. + +Inputs: + +- `value: number` + +Outputs: + +- `number` + +### `Sum(array)` + +Returns the sum of all numbers in the input array. For an empty array, +it returns `0`. + +Example: + +- `Sum(AsArray())` will return number `0`. +- `Sum(AsArray(1, 2, 3, 4))` will return number `10`. +- `Sum(AsArray(5.0, 6.5))` will return number `11.5`. + +Inputs: + +- `array: List[number]` + +Outputs: + +- `number` + +### `Avg(array)` + +Returns the average of all numbers in the input array. An array +should have at least one number. + +Example: + +- `Avg(AsArray(1))` will return number `1`. +- `Avg(AsArray(1, 2, 3, 4, 5))` will return number `3`. +- `Avg(AsArray(4.8, 6.2))` will return number `5.5`. + +Inputs: + +- `array: List[number]` + +Outputs: + +- `number` + +### `Min(array)` + +Returns the minimum of all numbers in the input array. An +array should have at least one number. + +Example: + +- `Min(AsArray(1))` will return number `1`. +- `Min(AsArray(1, 2, 3, 4, 5))` will return number `1`. +- `Min(AsArray(4.8, 6.2))` will return number `4.8`. + +Inputs: + +- `array: List[number]` + +Outputs: + +- `number` + +### `Max(array)` + +Returns the maximum of all numbers in the input array. An +array should have at least one number. + +Example: + +- `Max(AsArray(1))` will return number `1`. +- `Max(AsArray(1, 2, 3, 4, 5))` will return number `5`. +- `Max(AsArray(4.8, 6.2))` will return number `6.2`. + +Inputs: + +- `array: List[number]` + +Outputs: + +- `number` + +## Text functions + +### `Concat(value...)` + +Concatenates given strings or arrays. + +Example: + +- `Concat("City", " ", "of", " ", "London")` will return `City of London`. +- `Concat(AsArray(1, 2, 3), AsArray(4, 5))` will return `[1, 2, 3, 4, 5]`. + +Inputs: + +- `value1: string | List[any]` - `value2: string | List[any]` - `valueN: string +| List[any]` + +Outputs: + +- `string | List[any]` + +### `Slice(value, start, end?)` + +Returns a string or array slice defined by the start and optional end index. +Negative indexes will also work. + +Examples: + +- `Slice("Hello", 1)` will return `"ello"`. +- `Slice("Hello", -3, -1)` will return `"ll"`. +- `Slice(AsArray(1, 2, 3, 4, 5), 1, 3)` will return `[2, 3]`. +- `Slice(AsArray(1, 2, 3, 4, 5), -2)` will return `[4, 5]`. + +Inputs: + +- `value: string | List[any]` +- `start: number` +- `end?: number` + +Outputs: + +- `string | List[any]` + +### `Split(text, delimiter)` + +Returns a string or array slice defined by the start and optional end index. +Negative indexes will also work. + +Examples: + +- `Split("Hello", "x")` will return `["Hello"]`. +- `Split("Hello", "")` will return `["H", "e", "l", "l", "o"]`. +- `Split("Hello", "lo")` will return `["Hel", ""]`. +- `Split("Hello there", " ")` will return `["Hello", "there"]`. + +Inputs: + +- `text: string` +- `delimiter: string` + +Outputs: + +- `List[string]` + +### `Format(formatString, value...)` + +Substitutes occurrences of curly brace pairs in formatString with textual +representations of given values. The first occurrence is substituted with the +first value, the second occurrence with the second value and so on. + +Examples: + +- `Format("{}, {}!", "Hello", "World")` -> `"Hello, World!"` + +Text inside curly braces is ignored. + +- `Format("{name}: {age}", "Antun", 23)` -> `"Antun: 23"` + +Inputs: + +- `formatString: string` +- `value1: any` +- `valueN: any` + +Outputs: + +- `string` + +### `Matches(text, regex)` + +Returns `True` if text matches regex. The evaluation of the regex is done with +the Javascript function `RegExp.test(text)`. + +Examples: + +- `Matches("Graph style script", "style")` -> `True` +- `Matches("Graph style script", "st.* script")` -> `True` +- `Matches("Graph style script", "^G")` -> `True` +- `Matches("Graph style script", "GRAPH?")` -> `False` + +Inputs: + +- `text: string` +- `regex: string` + +Outputs: + +- `boolean` + +### `Replace(text, regex, replacement)` + +Returns a new string where a replacement value will be used instead of the +first regex match. The creation of the regex is done with the Javascript +function `new RegExp(text)`. + +Examples: + +- `Replace("Graph style script", "xyz", "text")` -> `"Graph style script"` +- `Replace("Graph style script", "style ", "")` -> `"Graph script"` +- `Replace("Graph style script", "style.*", "rocks!")` -> `"Graph rocks!"` +- `Replace("Graph style script", "s", "S!")` -> `"Graph S!tyle script"` + +Inputs: + +- `text: string` +- `regex: string` +- `replacement: string` + +Outputs: + +- `string` + +### `LowerCase(text)` + +Returns the value of a string converted to lower case. + +Example: + +- `AsText(LowerCase(Property(node, "name")))` will return node name in lower + case. + +Inputs: + +- `text: string` + +Outputs: + +- `string` + +### `UpperCase(text)` + +Returns the value of a string converted to upper case. + +Example: + +- `AsText(UpperCase(Property(node, "name")))` will return the node name in upper + case. + +Inputs: + +- `text: string` + +Outputs: + +- `string` + +### `Trim(text)` + +Returns the string without starting and ending whitespaces. + +Example: + +- `Trim(" Hello there! ")` will return `"Hello there!`. + +Inputs: + +- `text: string` + +Outputs: + +- `string` + +## Array functions + +### `Join(array, delimiter)` + +Returns a new string by joining array elements with the delimiter. + +Example: + +- `label: Join(Labels(node), ", ")` creates a label which is a string made out + of all the labels delimited with a comma. + +Inputs: + +- `array: List[any]` +- `delimiter: string` + +Outputs: + +- `string` + +### `Contains(array, value)` + +Returns `True` if the array contains the defined value, `False` otherwise. + +Example: + +- `Contains(AsArray(2,7,8,9), 2)` will return `True`. + +Inputs: + +- `array: List[any]` +- `value: any` + +Outputs: + +- `boolean` + +### `RandomOf(array)` + +Returns a random element of the given array. All the elements are equally likely +to be chosen. + +Example: + +- `RandomOf(AsArray(1,3,5,7,11,13))` will return one of the array elements. + +Inputs: + +- `array: List[any]` + +Outputs: + +- `any | null` + +### `Find(array, function)` + +Returns the first element of the given array for which the function yields +`Truthy` value. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Function argument `function` has one input argument which is the `item` of the +array. + +Example: + +- `Find(AsArray(1, 2, 3, 4), Function(item, Greater(item, 2)))` will return + number `3`. +- `Find(AsArray(1, 2, 1, 1), Function(item, Greater(item, 2)))` will return + `Null`. + +Inputs: + +- `array: List[any]` +- `function: Function` + +Outputs: + +- `any | null` + +### `Filter(array, function)` + +Returns the new array with elements of the given array for which the function +yields `Truthy` value. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Function argument `function` has one input argument which is the `item` of the +array. + +Example: + +- `Filter(AsArray(1, 2, 3, 4), Function(item, Greater(item, 2)))` will return + array `[3, 4]`. +- `Filter(AsArray(1, 2, 1, 1), Function(item, Greater(item, 2)))` will return + `[]`. + +Inputs: + +- `array: List[any]` +- `function: Function` + +Outputs: + +- `List[any]` + +### `Map(array, function)` + +Returns the new array where each element of the given array is converted +(mapped) with the defined function. + +Function argument `function` has one input argument which is the `item` of the +array. + +Example: + +- `Map(AsArray(1, 2, 3, 4), Function(item, Mul(item, 2)))` will return array + `[2, 4, 6, 8]`. +- `Map(AdjacentNodes(node), Function(n, Property(n, "name")))` will return the + list of names of adjacent nodes. + +Inputs: + +- `array: List[any]` +- `function: Function` + +Outputs: + +- `List[any]` + +### `Reduce(array, function, initialValue)` + +The `Reduce()` function returns a single value generated by reducing an array of values. +The `function `parameter has two arguments, `previous reduced value` and `current array value`. +The `initalValue` parameter specifies the initial value used for the first reduce iteration. + +Example: + +- The following example does a sum of all elements in the array with the initial + value of `1`. Because the array is empty, the returned value is the initial one: `1`. + +``` +Reduce( + AsArray(), + Function(prev, current, Add(prev, current)), + 1 +) +``` + +- The same example as the above one, but with a defined array of three elements. The result + will be number `6`. + +``` +Reduce( + AsArray(1, 2, 3), + Function(prev, current, Add(prev, current)), + 0 +) +``` + +- The following example joins all letters from an array into a single text `"ABC"`. + +``` +Reduce( + AsArray("A", "B", "C"), + Function(prev, current, Format("{}{}", prev, current)), + "" +) +``` + +Inputs: + +- `array: List[any]` +- `function: Function` +- `initalValue: any` + +Outputs: + +- `any` + +### `All(array, function)` + +Returns `True` if the function yields `Truthy` value for all elements of the +given array. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Function argument `function` has one input argument which is the `item` of the +array. + +Example: + +- `All(AsArray(1, 2, 3, 4), Function(item, Greater(item, 2)))` will return + `False`. +- `All(AsArray(1, 2, 1, 1), Function(item, Less(item, 3)))` will return `True`. + +Inputs: + +- `array: List[any]` +- `function: Function` + +Outputs: + +- `boolean` + +### `Any(array, function)` + +Returns `True` if the function yields `Truthy` value for any element of the +given array. + +In GSS, there are six `Falsy` values: `False` , `0` , `""` , `Null` , `[]` +(empty array), and `{}` (empty map). Everything else is considered `Truthy`. + +Function argument `function` has one input argument which is the `item` of the +array. + +Example: + +- `Any(AsArray(1, 2, 3, 4), Function(item, Greater(item, 2)))` will return + `True`. +- `Any(AsArray(1, 2, 1, 1), Function(item, Greater(item, 3)))` will return + `False`. + +Inputs: + +- `array: List[any]` +- `function: Function` + +Outputs: + +- `boolean` + +### `Uniq(array)` + +Returns an array of unique elements of the given array. + +Example: + +- `Uniq(AsArray(2,1,1,2,1,3,1))` will return `[2, 1, 3]`. +- `Uniq(AsArray("1", "1", 1, True, True, 1))` will return `["1", 1, True]`. + +Inputs: + +- `array: List[any]` + +Outputs: + +- `List[any]` + +### `Reverse(array)` + +Returns an array with reversed elements of the given array. + +Example: + +- `Reverse(AsArray(1, 2, 3))` will return `[3, 2, 1]`. + +Inputs: + +- `array: List[any]` + +Outputs: + +- `List[any]` + +### `Sort(array)` + +Returns an array with sorted items. The sort works only on arrays with +primitive types: strings, numbers, and booleans. + +Example: + +- `Sort(AsArray(3, 2, 1, 8, 3))` will return `[1, 2, 3, 3, 8]`. + +Inputs: + +- `array: List[string] | List[boolean] | List[number]` + +Outputs: + +- `List[string] | List[boolean] | List[number]` + +### `Next(iterator)` + +Returns the next item in the iterator. If iterator has no items, +it returns `Null`. + +Example: + +- `Next(AsIterator(AsArray(3, 2, 1)))` will return `3`. + +Inputs: + +- `iterator: Iterator[any]` + +Outputs: + +- `any | null` + +## Type functions + +### `AsArray(value...)` + +Creates and returns an array of given values. The function can be used +to convert `Iterator` back to the array with `AsArray(AsIterator(AsArray(1, 2)))`. + +Examples: + +- `AsArray("Alfa", "Bravo", "Charlie", "Delta", "Echo")` -> `["Alfa", "Bravo", + "Charlie", "Delta", "Echo"]` +- `AsArray(AsIterator(AsArray(1, 2, 3)))` -> `[1, 2, 3]`. + +Inputs: + +- `value1: any` +- `value2: any` +- `valueN: any` + +Outputs: + +- `List[any]` + +### `AsMap(key, value, ...)` + +Creates and returns a map of given pairs of keys and values. There must be an even +number of inputs because each key should have its own value. Keys must be type of string. +Values can be any type. + +Example: + +- `AsMap("1", 10, "2", 20)` -> `{"1": 10, "2": 20}` + +Inputs: + +- `key1: string` +- `value1: any` +- `keyN: string` +- `valueN: any` + +Outputs: + +- `Map[string, any]` + +### `AsIterator(array)` + +Creates and returns an iterator of given array. Iterator values can be used only once with +`Next` function until all values have been used. + +Example: + +- `AsIterator(AsArray(1, 2, 3))` -> `(1, 2, 3)` + +Inputs: + +- `array: List[any]` + +Outputs: + +- `Iterator[any]` + +### `AsNumber(value)` + +Parses the given string or boolean and returns a number. The string should +contain only one number in base 10 and nothing else. Boolean `True` returns +number 1. Boolean `False` returns number 0. + +Example: + +- `AsNumber("8")` will return number 8. + +Inputs: + +- `value: string | number | boolean` + +Outputs: + +- `number` + +### `AsText(value)` + +Returns a textual representation of a given value. + +Example: + +- `AsText(Property(node, "age"))` will return `node.age` as string. + +Inputs: + +- `value: any` + +Outputs: + +- `string` + +### `TypeOf(value)` + +Returns the type of a given value. Type is returned as a string. Following types +are used in GSS: + +- `"number"` - represents numbers +- `"boolean"` - represents booleans (`True` and `False`) +- `"string"` - represents textual values +- `"Null"` - represents null value (`Null`) +- `"Color"` - represents colors +- `"Node"` - represents graph node +- `"Edge"` - represents graph relationship +- `"Graph"` - represents graph +- `"List"` - represents an array object (e.g. `[1, 2, 3]`) +- `"Iterator"` - represents an iterator object (e.g. `(1, 2, 3)`) +- `"Map"` - represents a map object (e.g. `{ "name": "GSS" }`) +- `"Function"` - represents function object + +Example: + +- `TypeOf(Property(node, "name"))` returns `string`. + +Inputs: + +- `value: any` + +Outputs: + +- `string` + +### `IsArray(value)` + +Returns `True` if the input value is an array, otherwise `False`. + +Examples: + +- `IsArray(10.2)` returns `False`. +- `IsArray(AsArray(1, 2, 3))` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsMap(value)` + +Returns True if the input value is a map, otherwise False. + +Examples: + +- `IsMap(10.2)` returns `False`. +- `IsMap(AsMap("key", "value"))` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsIterator(value)` + +Returns True if the input value is an iterator, otherwise False. + +Examples: + +- `IsIterator(AsArray(1, 2, 3))` returns `False`. +- `IsIterator(AsIterator(AsArray(1, 2, 3)))` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsNumber(value)` + +Returns `True` if the input value is a number, otherwise `False`. + +Example: + +- `IsNumber(10.2)` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsBoolean(value)` + +Returns `True` if the input value is a boolean, otherwise `False`. + +Example: + +- `IsBoolean(False)` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsString(value)` + +Returns `True` if the input value is a string, otherwise `False`. + +Example: + +- `IsString("text")` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +### `IsNull(value)` + +Returns `True` if the input value is a `Null`, otherwise `False`. + +Example: + +- `IsNull(Null)` returns `True`. + +Inputs: + +- `value: any` + +Outputs: + +- `boolean` + +## Utility functions + +### `Define(name, value)` + +Binds the given value to the given name. Names can be redefined. + +Example: + +- `Define(city, "London")` will set the value of the `city` to `London`. + +Inputs: + +- `name: Variable` +- `value: any` + +### `Function(arg..., body)` + +Creates a function. `body` is the expression to evaluate when the function is +called. All arguments except `body` are argument names of the function to +create. When the created function is called names `arg1`, `arg2`, ... are bound +to function arguments and available in the `body` expression. This function is +most useful in combination with `Define`. + +Examples: + +``` +Define(makeGreeting, Function(firstName, Format("Hello, {}!", firstName))) +makeGreeting("World") // -> Hello, World! +``` + +``` +Define(pow, Function(x, n, If(Equals(n, 1), x, Mul(x, pow(x, Sub(n, 1)))))) +pow(2, 10) // -> 1024 +``` + +Inputs: + +- `arg1: Variable` +- `argN: Variable` +- `body: any` + +Outputs: + +- `Function` + +### `Execute(expression...)` + +Executes all expressions given as arguments. The function comes in handy when there +are set of commands that should be executed, e.g. setting several items on the map +with `Set` and returning the last value. + +Example: + +``` +Define(map, AsMap()) +Define(mapKeys, Execute( + Set(map, "key1", "value1"), + Set(map, "key2", "value2"), + MapKeys(map), +)) +``` + +Variable `map` will be `{"key1": "value1", "key2": "value2"}`. Execution returns the +last value of the variable `mapKeys` which is an array of keys: `["key1", "key2"]`. + +Inputs: + +- `expression1: Expression` +- `expressionN: Expression` + +Outputs: + +- `any` + +### `Get(object, key, defaultValue?)` + +If `obj` is a List, returns the element with index `key` of list `obj` (indexing +is zero based). If `obj` is a Map, returns the value for key `key`. If `obj` is +a string, returns the letter with index `key` of string `obj` (indexing is zero +based). If `obj` is a Node, returns the value for key `key`. If `obj` is a +Relationship, returns the value for key `key`. + +In case of invalid input or missing value, it returns `defaultValue` or `Null` if +default value is not defined. + +Examples: + +- `Get(AsArray(3,6,7,3), 2)` returns number 7. +- `Get(Property(node, "map"), "year") will get the property `year` from the map + of node properties. + +Inputs: + +- `object: List | Map | string | Node | Relationship` +- `key: number | string` +- `defaultValue?: any` + +Outputs: + +- `any` + +### `Set(object, key, value)` + +If `obj` is a List, sets the value with index `key` (indexing is zero based). Value will +be returned on successful set. If index is out of ranges of the list, nothing will +be set, and `Null` will be returned. + +If `obj` is a Map, sets the value for key `key`. Key must be a string type. Input value +will be returned. + +Examples: + +- `Define(array, AsArray(1, 2, 3)) Set(array, 1, 5)` returns number `5` and array will be `[1, 5, 3]`. +- `Define(map, AsMap()) Set(map, "key", "value")` returns `"value"` and map will be `{"key": "value"}`. + +Inputs: + +- `object: List | Map` +- `key: number | string` +- `value: any` + +Outputs: + +- `any | null` + +### `Del(map, key)` + +Removes a value from a map under key `key`. Removed value will be returned. If key +was missing in a map, `Null` will be returned. + +Examples: + +- `Define(map, AsMap("a", 1, "b", 2)) Del(map, "a")` returns `1` and map will be `{"b": 2}`. +- `Define(map, AsMap("a", 1)) Del(map, "b")` returns `Null` and map will be `{"a": 1}`. + +Inputs: + +- `map: Map[string, any]` +- `key: string` + +Outputs: + +- `any | null` + +### `Size(value)` + +If value is of type `List` or `Map`, returns its size. If value is of type +`string`, returns its length. If value is of type `Node`, returns the size of +node properties. If value is of type `Relationship`, returns the size of +relationship properties. If value is of type `Graph`, returns the size of the +graph (nodes and relationships) + +Example: + +- `Size(Property(node, "name"))` returns the size of the node's `name` property. + +Inputs: + +- `value: List | Map | string | Node | Relationship | Graph` + +Outputs: + +- `number` + +### `Coalesce(value...)` + +Returns the first non-null value. In case of empty call or all values being +`Null`, `Null` will be returned. + +Example: + +- `Coalesce()` returns `Null` +- `Coalesce(Null, 1, False)` returns `1` + +Inputs: + +- `value1: any` +- `valueN: any` + +Outputs: + +- `any | null` + +## Variables + +Graph Style Script has a few built-in variables that you can use. + +## `node` + +The variable `node` is bound to the graph node for which the style directive +`@NodeStyle` is being evaluated. Graph node is of type `Map` and has all +information about the node (`properties`, `labels`, `id`). + +In the following example, you can see the usage of the variable `node` within +the `@NodeStyle` directive. + +``` +@NodeStyle { + label: Property(node, "name") + size: Mul(Size(Edges(node)), 5) +} +``` + +If `node` is used outside `@NodeStyle` directive, a compile error will be +thrown. + +## `edge` + +The variable `edge` is bound to the graph relationship for which the style +directive `@EdgeStyle` is being evaluated. Graph relationship is of type `Map` +and has all information about the relationship (`properties`, `type`, `start`, +`end`, `id`). + +In the following example, you can see the usage of the variable `edge` within +the `@EdgeStyle` directive. + +``` +@EdgeStyle { + label: Format("From node {}", Property(StartNode(edge), "name")) + size: AsNumber(Property(edge, "importance")) +} +``` + +If `edge` is used outside `@EdgeStyle` directive, a compile error will be +thrown. + +## `graph` + +The variable `graph` is bound to the overall graph that contains nodes and +edges. It can be useful to get the total count of nodes and edges with the +following functions: `NodeCount(graph)` and `EdgeCount(graph)`. + +In the following example, you can see the usage of the variable `graph` in +the directive context (`@NodeStyle`, `@EdgeStyle`) and global context +(variable `EDGE_COUNT`); + +``` +// Global context acts like a cache because the +// following expression will be evaluated only once +Define(EDGE_COUNT, EdgeCount(graph)) + +@NodeStyle { + size: Sqrt(NodeCount(graph)) +} + +@EdgeStyle { + width: If(Greater(EDGE_COUNT, 1000), 1, 2) +} +``` + +The `graph` variable is not bound to any of the directives (`@NodeStyle`, +`@EdgeStyle`) so you can use it wherever you want in the Graph Style Script +code. \ No newline at end of file diff --git a/docs2/data-visualization/graph-style-script/directive-properties.md b/docs2/data-visualization/graph-style-script/directive-properties.md new file mode 100644 index 00000000000..3da80a43dad --- /dev/null +++ b/docs2/data-visualization/graph-style-script/directive-properties.md @@ -0,0 +1,502 @@ +# Directive properties + +## `@ViewStyle` directive + +`@ViewStyle` directive is used for defining style properties of a general +graph view: link distance, view, physics, repel force, etc. You can read more about +each property in the following sections. + +### `@ViewStyle` + +Here is the list of all properties that can be defined in the `@ViewStyle` directive, +along with their expected types. + +#### `collision-radius: number` + +Sets the margin radius for each node from its centre. If node `size` is `10` and +`collision-radius` is set to `20`, it means there will be `10` spaces left around each +node. No other node can be in that space. + +The default `collision-radius` is `15`. + +Example: + +- `collision-radius: 15` sets the margin radius for each node from its centre to `15`. + +#### `repel-force: number` + +Sets the strength of repel force between all nodes. If positive, it adds a force that +moves nodes away from each other, if negative, it moves nodes towards each other. + +The default `repel-force` is `-100`. + +Example: + +- `repel-force: -100` sets the repel force between all nodes to `-100`. + +#### `link-distance: number` + +Sets the minimum required distance between two connected nodes from their centres. + +The default `link-distance` is `30`. If node sizes are `20` and link distance is `30`, +nodes might overlap because the minimum distance from one node centre to another is +`20 + 20 = 40`. + +Example: + +- `link-distance: 30` sets the minimum required distance to `30`. + +#### `physics-enabled: boolean` + +Enables or disables physics which is a real-time simulation for graph node positions. +When physics is enabled, the graph is not static anymore. + +Examples: + +- `physics-enabled: True` enables the physics. +- `physics-enabled: Greater(NodeCount(graph), 100)` enables the physics for graphs with more than 100 nodes. + +#### `background-color: Color` + +Sets the background color of the canvas. + +Examples: + +- `background-color: #DDDDDD` sets the background color of the canvas to light gray. +- `background-color: black` sets the background color of the canvas to black. + +#### `view: string: "default" | "map"` + +Sets the current graph view that can be either `"default"` or `"map"`. The `"default"` view is +a graph visualization on a blank background. The `"map"` view is a graph visualization with a map +as a background where each node needs to provide `latitude` and `longitude` to be positioned +on the map. + +The default `view` is `"default"`. + +Examples: + +- `view: "default"` sets the view to the default view. +- `view: "map"` sets the view to the map view that will be shown only if at least one node has + required geo information: `latitude` and `longitude`. + +## `@ViewStyle.Map` directive + +`@ViewStyle.Map` directive is a subset of `@ViewStyle` because it defines +additional style properties for a graph view when there is a map background. +Style properties of the `@ViewStyle.Map` directive are used to style the +background map. + +### `@ViewStyle.Map` + +Here is the list of all properties that can be defined in the `@ViewStyle.Map` +directive, along with their expected types. + +#### 1. `tile-layer: string: "detailed" | "light" | "dark" | "basic" | "satellite"` + +Sets the map tile for the map background. The default map tile is `"light"`. + +Examples: +- `tile-layer: "dark"` sets the map tile to be type `"dark"`. + +## `@EdgeStyle` + +Here is the list of all properties that can be defined in the `@EdgeStyle` +directive, along with their expected types. + +### `arrow-size`: `Number` + +Sets the size of the arrow on the relationship line end. + +Examples: + +- `arrow-size: 10` sets the arrow size to be 10 pixels. + +### `color`: `Color` + +Sets the background color of an element. + +Examples: + +- `color: #FF0000` sets the background color of the element to red. +- `color: limegreen` sets the background color of the element to lime green. + +### `color-hover`: `Color` + +Sets the background color of an element on mouse hover event. + +Examples: + +- `color-hover: #FF0000` sets the background color of the shape to red on mouse + hover event. +- `color-hover: limegreen` sets the background color of the shape to lime green + on mouse hover event. + +### `color-selected`: `Color` + +Sets the background color of an element on mouse select event. + +Examples: + +- `color-selected: #FF0000` sets the background color of the shape to red on + mouse select event. +- `color-selected: limegreen` sets the background color of the shape to lime + green on mouse select event. + +### `font-background-color`: `Color` + +Sets the background color of an element's label (text). Text can be defined with +property `label`. + +Examples: + +- `font-background-color: #FF0000` sets the text background color to red. +- `font-background-color: limegreen` sets the text background color to lime + green. + +### `font-color`: `Color` + +Sets the color of the element's label (text). Text can be defined with property +`label`. + +Examples: + +- `font-color: #FF0000` sets the text color to red. +- `font-color: limegreen` sets the text color to lime green. + +### `font-family`: `String` + +Sets a font family for the element's text. Text can be defined with property +`label`. + +Examples: + +- `font-family: "sans-serif"` sets the text family font to sans-serif. +- `font-family: "cursive"` sets the text family font to cursive. + +### `font-size`: `Number` + +Sets the size of the element's text. Text can be defined with property `label`. + +Example: + +- `font-size: 10` sets the size of the font to 10 pixels. + +### `label`: `String` + +Sets the element's text label. The text is shown below the element (node or +relationship). + +Examples: + +- `label: "Text"` sets the text "Text" as a label for every single element. +- `label: Property(edge, "quantity")` sets the text for the element's label + dynamically by using the `edge` property `"quantity"`. + +### `shadow-color`: `Color` + +Sets the color of the element's shadow. + +Examples: + +- `shadow-color: #FF0000` sets the shadow color to red. +- `shadow-color: limegreen` sets the shadow color to lime green. + +### `shadow-size`: `Number` + +Sets the blur size of the element's shadow. If the value is 0, the shadow will +be a solid color defined by the property `shadow-color`. + +Examples: + +- `shadow-size: 5` indicates that the shadow will be diffused across 5 pixels. + +### `shadow-offset-x`: `Number` + +Sets the horizontal offset of the element's shadow. A positive value puts the +shadow on the right side of the shape, a negative value puts the shadow on the +left side of the shape. + +Examples: + +- `shadow-offset-x: 0` indicates that the shadow is exactly behind the shape. +- `shadow-offset-x: 20` indicates that the shadow starts 20 pixels to the right. + +### `shadow-offset-y`: `Number` + +Sets the vertical offset of the element's shadow. A positive value puts the +shadow below the shape, a negative value puts the shadow above the shape. + +Examples: + +- `shadow-offset-y: 0` indicates that the shadow is exactly behind the shape. +- `shadow-offset-y: 20` indicates that the shadow starts 20 pixels below the + shape position. + +### `width`: `Number` + +Sets the width of the relationship line. + +Example: + +- `width: 2` indicates that the width of the relationship line will be 2 pixels + wide. + +### `width-hover`: `Number` + +Sets the width of the relationship line on mouse hover event. + +Example: + +- `width-hover: 2` indicates that the width of the relationship will be 2 pixels + wide on mouse hover event. + +### `width-selected`: `Number` + +Sets the width of the relationship line on mouse select event. + +Examples: + +- `width-selected: 2` indicates that the width of the relationship will be 2 + pixels wide on mouse select event. + +### `z-index: number` + +Sets the stack order of an element, similar to the CSS `z-index`. The element with the +highest `z-index` will be rendered on top of every other element. + +Example: +- `z-index: 100` sets the element's z-index. + +## `@NodeStyle` + +Here is the list of all properties that can be defined in the `@NodeStyle` +directive, along with their expected types. + +### `border-color`: `Color` + +Sets a border color. + +Examples: + +- `border-color: #FF0000` sets the border color to red. +- `border-color: limegreen` sets the border color to lime green. + +### `border-color-hover`: `Color` + +Sets a border color that is applied on mouse hover event. + +Examples: + +- `border-color-hover: #FF0000` sets the border color to red on mouse hover + event. +- `border-color-hover: limegreen` sets the border color to lime green on mouse + hover event. + +### `border-color-selected`: `Color` + +Sets a border color that is applied on mouse select event. + +Examples: + +- `border-color-selected: #FF0000` sets the border color to red on mouse select + event. +- `border-color-selected: limegreen` sets the border color to lime green on + mouse select event. + +### `border-width`: `Number` + +Sets the border width. + +Example: + +- `border-width: 2` sets the border width to 2 pixels. + +### `border-width-selected`: `Number` + +Sets the border width that is applied on mouse select event. + +Example: + +- `border-width-selected: 10` sets the border width to 10 pixels on mouse select + event. + +### `color`: `Color` + +Sets the background color of an element. + +Examples: + +- `color: #FF0000` sets the background color of the element to red. +- `color: limegreen` sets the background color of the element to lime green. + +### `color-hover`: `Color` + +Sets the background color of an element on mouse hover event. + +Examples: + +- `color-hover: #FF0000` sets the background color of the element to red on + mouse hover event. +- `color-hover: limegreen` sets the background color of the element to lime + green on mouse hover event. + +### `color-selected`: `Color` + +Sets the background color of an element on mouse select event. + +Examples: + +- `color-selected: #FF0000` sets the background color of the element to red on + mouse select event. +- `color-selected:limegreen` sets the background color of the element to lime + green on mouse select event. + +### `font-background-color`: `Color` + +Sets the background color of the element's label (text). Text can be defined +with property `label`. + +Examples: + +- `font-background-color: #FF0000` sets the text background color to red. +- `font-background-color: limegreen` sets the text background color to lime + green. + +### `font-color`: `Color` + +Sets the color of the element's label (text). Text can be defined with property +`label`. + +Examples: + +- `font-color: #FF0000` sets the text color to red. +- `font-color: limegreen` sets the text color to lime green. + +### `font-family`: `String` + +Sets a font family for the element's label (text). Text can be defined with +property `label`. + +Examples: + +- `font-family: "sans-serif"` sets the text family font to sans-serif. +- `font-family: "cursive"` sets the text family font to cursive. + +### `font-size`: `Number` + +Sets the size of the element's text. Text can be defined with property `label`. + +Example: + +- `font-size: 10` sets the size of the font to 10 pixels. + +### `image-url`: `String` + +Sets the element's background to be an image from the image URL. Supported +format are `png`, `jpeg`, `gif` (static, not dynamic), `webp` or base 64 encoded +image using `inline data:image/png;base64`. + + + It will +override the value defined with the property `color`. + +Examples: + +- `image-url: "https://download.memgraph.com/asset/images/memgraph-logo.png"` + sets the element's background to be an image of the Memgraph logo. +- `image-url: Property(node, "profile_image")` sets the element's background to + be an image from the URL that is fetched from the `node` property + `"profile_image"`. + +### `image-url-selected`: `String` + +Sets the element's background to be an image from the image URL on mouse select +event. Supported format are `png`, `jpeg`, `gif` (static, not dynamic), `webp` +or base 64 encoded image using `inline data:image/png;base64`. It will override +the value defined with the property `color-selected`. + +Example: + +- `image-url-selected: + "https://download.memgraph.com/asset/images/memgraph-logo-5f60e83d.jpeg"` sets + the element's background to be an image of the Memgraph logo. + +Check property `image-url` for more details and examples. + +### `label`: `String` + +Sets the element's text label. The text is shown below the element (node or +relationship). + +Examples: + +- `label: "Text"` sets the text "Text" as a label for every single element. +- `label: Property(node, "name")` sets the text for the element's label + dynamically by using the `node` property `"name"`. + +### `shadow-color`: `Color` + +Sets the color for the element's shadow. + +Examples: + +- `shadow-color: #FF0000` sets the shadow color to red. +- `shadow-color: limegreen` sets the shadow color to lime green. + +### `shadow-size`: `Number` + +Sets the blur size of the element's shadow. If the value is 0, the shadow will +be a solid color defined by the property `shadow-color`. + +Example: + +- `shadow-size: 5` indicates that the shadow will be diffused across 5 pixels. + +### `shadow-offset-x`: `Number` + +Sets the horizontal offset of the element's shadow. A positive value puts the +shadow on the right side of the element, a negative value puts the shadow on the +left side of the element. + +Examples: + +- `shadow-offset-x: 0` indicates that the shadow is exactly behind the element. +- `shadow-offset-x: 20` indicates that the shadow starts 20 pixels to the right. + +### `shadow-offset-y`: `Number` + +Sets the vertical offset of the element's shadow. A positive value puts the +shadow below the element, a negative value puts the shadow above the element. + +Examples: + +- `shadow-offset-y: 0` indicates that the shadow is exactly behind the element. +- `shadow-offset-y: 20` indicates that the shadow starts 20 pixels below the + element position. + +### `shape`: `String` + +Sets the shape of the element. The default shape for the node is `"dot"`. +Possible values are: `"dot"`, `"square"`, `"diamond"`, `"triangle"`, +`"triangleDown"`, `"star"` + +Examples: + +- `shape: "square"` indicates that the shape of the element will be a square. + +### `size`: `Number` + +Sets the size of the element. + +Example: + +- `size: 10` indicates that the radius of the element will be 10 pixels. + +### `z-index: number` + +Sets the stack order of an element, similar to the CSS `z-index`. The element with the +highest `z-index` will be rendered on top of every other element. + +Example: +- `z-index: 100` sets the element's z-index. \ No newline at end of file diff --git a/docs2/data-visualization/graph-style-script/graph-style-script.md b/docs2/data-visualization/graph-style-script/graph-style-script.md new file mode 100644 index 00000000000..974d36b0fa8 --- /dev/null +++ b/docs2/data-visualization/graph-style-script/graph-style-script.md @@ -0,0 +1,623 @@ +--- +id: graph-style-script-language +title: Graph Style Script language +sidebar_label: Graph Style Script language +slug: /graph-style-script-language +--- + +[![Related - Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/memgraph/tutorials/style-your-graphs-in-memgraph-lab) + +This guide will show you how to easily get started with the Graph Style Script +language. GSS is a language for customizing the visual display of graphs. For a +complete list of available features consult the [Style script +reference guide](./reference-guide.md). + +## Graph example + +In this guide, we will use an example graph with European countries and cities. +The data can be found +[here](https://memgraph.com/docs/memgraph/tutorials-overview/backpacking-through-europe). +Countries have the label `Country`, while cities have the label `City`. All +nodes have the property `name`. Cities have many additional properties, +including `country` (containing country) and `drinks_USD` (average drink price). + +## Setting graph labels + +We want to label country nodes with country names, and city nodes with city +names and containing country names. To achieve that we can use two directives. +The first one selects countries and the second one selects cities. + +```cpp +@NodeStyle HasLabel(node, "Country") { + label: Property(node, "name") +} + +@NodeStyle HasLabel(node, "City") { + label: Format("{cityName}, {countryName}", + Property(node, "name"), + Property(node, "country")) +} +``` + +In the case of the [`Format`](gss-functions.md#formatformatstring-val1-val2) +function, content inside the curly braces is ignored but can be helpful for +clarity. + +## Setting node images + +It would be nice to display flags in the country nodes. This can be achieved +using URLs of flag images. There is a website that hosts many world flags so we +can use images from [there](https://cdn.countryflags.com). Their API expects a +country name as a part of the URL path so we will make the following directive. + +```cpp +@NodeStyle HasLabel(node, "Country") { + image-url: Format("https://cdn.countryflags.com/thumbs/{}/flag-800.png", + LowerCase(Property(node, "name"))) +} +``` + +Unfortunately, this won't work for all countries. Flags for England and Scotland +cannot be found on the website because they aren't real countries. So we can get +around that by providing custom directives below the general one above. + +```cpp +@NodeStyle Equals(Property(node, "name"), "England") { + image-url: "https://upload.wikimedia.org/wikipedia/en/thumb/b/be/Flag_of_England.svg/2560px-Flag_of_England.svg.png" +} + +@NodeStyle Equals(Property(node, "name"), "Scotland") { + image-url: "https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Flag_of_Scotland.svg/1200px-Flag_of_Scotland.svg.png" + } +``` + +Also, URLs for a country name with whitespace inside them don't so we also have +to provide custom URLs for the Czech Republic and Bosnia and Herzegovina. + +```cpp +@NodeStyle Equals(Property(node, "name"), "Bosnia and Herzegovina") { + image-url: "https://upload.wikimedia.org/wikipedia/commons/thumb/b/bf/Flag_of_Bosnia_and_Herzegovina.svg/1200px-Flag_of_Bosnia_and_Herzegovina.svg.png" + } + + @NodeStyle Equals(Property(node, "name"), "Czech Republic") { + image-url: "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/Flag_of_the_Czech_Republic.svg/2560px-Flag_of_the_Czech_Republic.svg.png" + } +``` + +Now all the country nodes have their flags displayed. + +## Highlighting interesting nodes + +We can highlight nodes with low drink prices in the following way. We want to +use a beer image and a bigger size along with a red shadow. + +```cpp +@NodeStyle And( + HasLabel(node, "City"), + Less(Property(node, "drinks_USD"), 5)) { + size: 50 + image-url: "https://www.sciencenews.org/wp-content/uploads/2020/05/050620_mt_beer_feat-1028x579.jpg" + shadow-color: red + } +``` + +## Caching results for faster performance + +To normalize some value, for example, the size or width of all the +nodes or relationships in the graph, find the minimum and maximum values +of all nodes. +For example, a node labled "Person"` has the property `age` that holds the age information +of a particular person. We want the node property `size` to be 5 for the youngest person +and 20 for the oldest one in the presented graph. All other node sizes should be normalized +within that range. + +One of the solutions could look like this: + +```cpp +// Size range min/max variables +Define(MIN_SIZE, 5) +Define(MAX_SIZE, 20) +Define(PROP_NAME, "age") +Define(SIZE_RANGE, Sub(MAX_SIZE, MIN_SIZE)) + +// A set of utility functions +// Create a new array of property values from an array of nodes +Define(GetProperties, Function(nodes, propName, + Map(nodes, Function(singleNode, Property(singleNode, propName))) +)) +// Keep only the numeric values from an array of values +Define(KeepNumericValues, Function(values, + Filter(values, Function(value, IsNumber(value))) +)) + +// Functions to find min and max value in the input nodes +Define(GetMaxValue, Function(nodes, + Max(KeepNumericValues(GetProperties(nodes, PROP_NAME))) +)) +Define(GetMinValue, Function(nodes, + Min(KeepNumericValues(GetProperties(nodes, PROP_NAME))) +)) + +// Normalize function that receives two inputs: node (n) and +// graph (g) and returns normalized value into a range +// [MIN_SIZE, MAX_SIZE] +Define(Normalize, Function(n, g, + Add( + MIN_SIZE, + Mul( + SIZE_RANGE, + Div( + Sub(Property(n, PROP_NAME), GetMinValue(Nodes(g))), + Sub(GetMaxValue(Nodes(g)), GetMinValue(Nodes(g))) + ) + ) + ) +)) + +// For all nodes with the label "Person" and numeric property "age" +@NodeStyle And(HasLabel(node, "Person"), IsNumber(Property(node, PROP_NAME))) { + color: white + size: Normalize(node, graph) + width: Div(Normalize(node, graph), 5) + label: Format("Age: {}", AsText(Property(node, PROP_NAME))) +} +``` + +![Using Graph Style Script to style different nodes by its size](../data/caching-results-gss.png) + +The problem with the solution above is slow performance. The `Normalize` function is called twice +for each node in the graph view. Each `Normalize` call iterates through all nodes three times: two +times for `GetMinValue` and once for `GetMaxValue`. For small graphs, you won't see a difference +in performance but as the number of nodes rises the performance issues will follow. + +To solve this issue, cache the results by calculating outside of `@NodeStyle` and +`@EdgeStyle` directives where the variable `graph` is also available. +Inside the `@NodeStyle` directive, a local variable can be used to store the normalized +value and use it with `size` and `width` properties thus calling the `Normalize` function only once. + +Check the improved GSS code below: + +```cpp +// Size range min/max variables +Define(MIN_SIZE, 5) +Define(MAX_SIZE, 20) +Define(PROP_NAME, "age") +Define(SIZE_RANGE, Sub(MAX_SIZE, MIN_SIZE)) + +// A set of utility functions +// Create a new array of property values from an array of nodes +Define(GetProperties, Function(nodes, propName, + Map(nodes, Function(singleNode, Property(singleNode, propName))) +)) +// Keep only the numeric values from an array of values +Define(KeepNumericValues, Function(values, + Filter(values, Function(value, IsNumber(value))) +)) + +// Variables MAX_VALUE and MIN_VALUE will hold the max and min +// values of all node properties. +// The If statement is used to handle errors when there are no values to calculate +// min and max from. +Define(MAX_VALUE, If( + Greater(NodeCount(graph), 0), + Max(KeepNumericValues(GetProperties(Nodes(graph), PROP_NAME))), + 0 +)) +Define(MIN_VALUE, If( + Greater(NodeCount(graph), 0), + Min(KeepNumericValues(GetProperties(Nodes(graph), PROP_NAME))), + 0 +)) + +// Normalize function that receives one inputs: node and +// returns normalized value into a range [MIN_SIZE, MAX_SIZE] +Define(Normalize, Function(n, + Add( + MIN_SIZE, + Mul( + SIZE_RANGE, + Div( + Sub(Property(n, PROP_NAME), MIN_VALUE), + Sub(MAX_VALUE, MIN_VALUE) + ) + ) + ) +)) + +// For all the nodes with label "Person" and numeric property "age" +@NodeStyle And(HasLabel(node, "Person"), IsNumber(Property(node, PROP_NAME))) { + // Local variable used to cache a result from function Normalize + Define(NORM, Normalize(node)) + + color: white + size: NORM + width: Div(NORM, 5) + label: Format("Age: {}", AsText(Property(node, PROP_NAME))) +} +``` + +## Main building blocks + +The main building blocks of Graph Style Script (GSS) are expressions and +directives. GSS files are a sequence of expressions and directives. + +### Expressions + +Expressions are used to combine values to create new values using functions. For +example, the expression: + +```cpp +Add(2, 5) + -> 7 +``` + +creates a new value 7 from values 2 and 5. There are a lot of functions built +into Graph Style Script so there are even more ways to combine values. There is +even a function to create new functions. + +When expressions are evaluated, values are created. There are several types of +Graph Style Script values: `Boolean`, `Color`, `Number`, `String`, `Array`, +`Dictionary`, `Function`and `Null`. + +An expression can be either literal expressions, name expressions or function +applications. Literal expressions exist for `Color`s, `Number`s and `String`s. + +This is a literal expression for `String`s. + +```cpp +"Hello" + -> Hello +``` + +It evaluates to the value `"Hello"` of the type `String`. The newline character +and double quotes can be escaped in strings using \\ (backslash). + +``` +"In the end he said: \"I am Iron Man!\"" + -> In the end he said: "I am Iron Man!" +``` + +These are literal expressions for `Number`s. + +```cpp +123 + -> 123 +3.14159 + -> 3.14159 +``` + +Literal expressions for colors are hex strings starting with '#'. This is a +literal expression for the color red. + +```cpp +#ff0000 + -> #ff0000 +``` + +Name expressions are names that can be evaluated if there are values bound to +them in the environment (lexical scope). Names can start with any of the lower +case or upper case letters of the English alphabet and apart from those can +contain digits and the following characters: -, \_. Names can be defined using +the `Define` function. + +```cpp +Define(superhero, "Iron Man") +superhero + -> Iron Man +``` + +In the previous example the value `"Iron Man"` was bound to the name +`superhero`. After that name expression `superhero` evaluates the value `"Iron +Man"` to type `String`. + +There are many built-in names that are bound to useful values. Most used are +boolean values which are bound to `True` and `False` and null value which is +bound to `Null`. Also, all the CSS web colors are bound to their names. + +```cpp +dodgerblue + -> #1e90ff +forestgreen + -> #228b22 +``` + +The third type of expressions are function application expressions. A function +can be applied to the list of expressions (arguments) in the following way. + +```cpp +Concat("Agents", " ", "of", " ", "S.H.I.E.L.D.") + -> Agents of S.H.I.E.L.D. +``` + +Here the function `Concat` was applied to the list of string literal expressions +to produce their concatenation. Any expression can be an argument. + +Not all expressions have to be evaluated. For example, when calling `If` +function one argument will not be evaluated. + +```cpp +Define(mood, "happy") +Define(name, "Happy Hogan") +If(Equals(mood, "happy"), + Format("{} is happy today.", name), + Format("{} is not happy today.", name)) + -> Happy Hogan is happy today. +``` + +In the previous example expression `Format("{} is not happy today", name)` will +not be evaluated because its value is not needed. + +Some other function will not evaluate their arguments because they are +interested in their names and not values. For example, when creating a new +function argument names aren't evaluated, but are remembered to be later bound +to the function arguments when the function is called. + +```cpp +Define(square, Function(x, Mul(x, x))) +square(2) + -> 4 +``` + +In the previous example the name `x` isn't evaluated in the first line, and +neither is the expression `Mul(x, x)`. In the second line when the function +`square` is called number 2 will be bound to the name `x` and only then will +`Mul(x, x)` be evaluated. + +### Directives + +Directives are the second building block of style script. Directive names start +with '@'. The name is followed by the optional expression (filter) which is +followed by an opening curly brace, directive body and a closing curly brace. +The directive body is a list of pairs of property names and expressions. +Property names and expressions are separated by a colon and after every +expression, a new line must follow. The directive structure is the following. + +``` +@ { + : + ... + ... + : +} +``` + +Like in CSS, directives defined later override properties of the previous +directives. + +Graph Style Script currently has four directives: + +* `@NodeStyle` - for defining the visual style of graph nodes. +* `@EdgeStyle` - for defining the visual style of graph relationships. +* `@ViewStyle` - for defining the general graph style properties. +* `@ViewStyle.Map` - for defining the graph style properties when map + is in the background. + +An example of a directive is `@NodeStyle` directive which can be used to specify +style properties of a graph node. + +```cpp +@NodeStyle { + border-width: 2 + color: #abcdef + label: "Hello, World!" +} +``` + +#### `@NodeStyle` + +The `@NodeStyle` directive is used for defining style properties of a graph +node. It is possible to filter the nodes to which the directive applies by +providing an optional predicate after the directive name and before the opening +curly brace. + +Before any expressions are evaluated (including the predicate) the name `node` +is bound to the graph node for which the directive is being evaluated. Graph +node is of type `Dictionary` and has all information about the node (properties, +labels). + +Here is an example of a `@NodeStyle` directive that is applied to all graph +nodes with the label superhero: + +```cpp +@NodeStyle HasLabel(node, vehicle) { + label: Format("{}, horsepower: {}", + Property(node, "model"), + Property(node, "horsepower")) +} +``` + +The predicate can be any expression that returns a value of type `Boolean`. It +should depend on `node`, because if it doesn't, it will either be applied to all +nodes or to no nodes. + +```cpp +@NodeStyle And(HasProperty(node, "name"), + Equals(Property(node, "name"), "Tony Stark")) { + color: gold + shadow-color: red + label: "You know who I am" +} +``` + +Take a look at the [GSS @NodeStyle directive +properties](/docs/memgraph-lab/style-script/gss-nodestyle-directive) page to see +all node styling possibilities. + +#### `@EdgeStyle` + +The `@EdgeStyle` directive is used for defining the style properties of a graph +relationship. Most things work like the `@NodeStyle` directive with one +exception: the directive will bind the name `edge` to the relationship for which +the directive is being evaluated (`@NodeStyle` binds the name `node`). + +Take a look at the [GSS @EdgeStyle directive +properties](/docs/memgraph-lab/style-script/gss-edgestyle-directive) page to see +all relationship styling possibilities. + +#### `@ViewStyle` + +`@ViewStyle` directive is used for defining style properties of a general +graph view: link distance, view, physics, repel force, etc. It is also +possible to use a predicate expression which acts as a filter to apply +the defined properties to the final directive output. + +``` +@ViewStyle { + : + ... + : +} +``` + +Similar to `@NodeStyle` and `@EdgeStyle`, `@ViewStyle` has a built-in variable +`graph` which can be used for directive filter or property assignment. + +An example below shows a general directive style definition and a directive where +style properties will only be applied if there are more than 10 nodes in the graph. + +```cpp +@ViewStyle { + collision-radius: 15 + physics-enabled: True +} + +@ViewStyle Greater(NodeCount(graph), 10) { + physics-enabled: False + repel-force: -300 +} +``` + +If there are less than 10 nodes in the graph, the final default graph style properties +will be: + +```json +{ + "collision-radius": 15, + "physics-enabled": true +} +``` + +Otherwise, if there are more than 10 nodes in the graph, the final default graph style +properties will be: + +```json +{ + "collision-radius": 15, + "physics-enabled": false, + "repel-force": -300 +} +``` + +Take a look at the [GSS @ViewStyle directive +properties](/docs/memgraph-lab/style-script/gss-viewstyle-directive) page to see +all styling possibilities. + +#### `@ViewStyle.Map` + +`@ViewStyle.Map` directive is a subset of `@ViewStyle` because it defines additional style +properties for a graph view when there is a map background. The map view will be available +only if: + +* `@ViewStyle` contains a property `view` set to value `"map"`. +* There is at least one node with defined `latitude` and `longitude` properties + +It is also possible to use a predicate expression which acts as a filter to +apply the defined properties to the final directive output. + +``` +@ViewStyle.Map { + : + ... + : +} +``` + +Similar to `@ViewStyle`, `@ViewStyle.Map` also has a built-in +variable `graph` which can be used for directive filter or property assignment. + +An example below shows a general directive style definition and a directive where +style properties will be only applied if there are more than 10 nodes in the graph. + +```cpp +@ViewStyle { + view: "map" +} + +@ViewStyle.Map { + tile-layer: "detailed" +} + +@ViewStyle.Map Greater(NodeCount(graph), 10) { + tile-layer: "dark" +} +``` + +If there are less than 10 nodes in the graph, the final map graph style properties +will be: + +```json +{ + "tile-layer": "detailed" +} +``` + +Otherwise, if there are more than 10 nodes in the graph, the final map graph style +properties will be: + +```json +{ + "tile-layer": "dark" +} +``` + +Take a look at the [GSS @ViewStyle.Map directive +properties](/docs/memgraph-lab/style-script/gss-viewstyle-map-directive) page to see +all styling possibilities. + +### Built-in functions + +Graph Style Script has a large number of built-in functions that can help you +with achieving the right style for your graph. Take a look at the [list of GSS +built-in functions](/docs/memgraph-lab/style-script/gss-functions). + +### Built-in colors + +Graph Style Script comes with built-in colors that you can use the color's name. +Take a look at the [list of built-in +colors](/docs/memgraph-lab/style-script/gss-colors). + +### Built-in variables + +Graph Style Script has a few built-in variables that you can use: `node`, +`edge`, and `graph`. Read more about it in the [list of built-in +variables](/docs/memgraph-lab/style-script/gss-variables). + +### File Structure + +Style script files are composed of expressions and directives. All expressions +outside directives are evaluated first in the global environment. This is useful +for defining names using function `Define`. After that `@NodeStyle` and +`@EdgeStyle` directives are evaluated for each node and relationship, +respectively. All the names in the global environment are visible while applying +the directives so they can be used for defining property values inside +directives. + +For example: + +```cpp +// These are the global variables +Define(square, Function(x, Mul(x, x))) +Define(maxAllowedDebt, 10000) + +@NodeStyle HasLabel(node, "BankUser") { + // This is a local variable + Define(nodeDebt, Property(node, "debt")) + + size: square(nodeDebt) + color: If(Greater(nodeDebt, maxAllowedDebt), + red, + lightblue) +} +``` + +Names `square` and `maxAllowedDebt` are visible inside `@NodeStyle` directive. \ No newline at end of file diff --git a/docs2/data-visualization/install-and-connect.md b/docs2/data-visualization/install-and-connect.md new file mode 100644 index 00000000000..a9a90d585a2 --- /dev/null +++ b/docs2/data-visualization/install-and-connect.md @@ -0,0 +1,385 @@ +# Install Memgraph Lab and connect to a database + +We recommend you use the `memgraph/memgraph-platform` Docker image to install +**Memgraph Platform** and get the complete streaming graph application platform +that includes: + +- **MemgraphDB** - the database that holds your data +- **Memgraph Lab** - visual user interface for running queries and visualizing + graph data +- **mgconsole** - command-line interface for running queries +- **MAGE** - graph algorithms and modules library + +After running the image, mgconsole will open in the terminal while Memgraph Lab +is available on `http://localhost:3000`. + +You can install Memgraph Platform on: + +

+ + windows + + + macos + + + linux + +

+ +There is also a smaller +[`memgraph/memgraph-platform`](https://hub.docker.com/r/memgraph/memgraph-platform/tags?page=1) +Docker image that doesn't include MAGE - the graph algorithms and modules +library. The tag includes only `mamgraph` and `lab` keywords, for example: +`2.7.1-memgraph2.7.0-lab2.6.0`. + +If you already have a running Memgraph database instance you can access the Lab +web application at http://lab.memgraph.com/, and if you want to install Memgraph +Lab as a desktop application, check out the installation instructions for +[Windows](/memgraph-lab/installation/windows), +[macOS](/memgraph-lab/installation/macos) and +[Linux](/memgraph-lab/installation/linux). + +## Environment variables + +Use the following environment variables to configure Memgraph Lab: + +| Variable | Description | Type | Default | +| -------------- | -------------- | -------------- | +| APP_CYPHER_QUERY_MAX_LEN | Max length of a Cypher query | `[integer]` | 5000 | +| APP_MODULE_NAME_MAX_LEN | Max length of the query module name | `[integer]` | 1000 | +| APP_MODULE_CONTENT_MAX_LEN | Max length of a query module content | `[integer]` | 50000 | +| APP_STREAM_NAME_MAX_LEN | Max length of the stream name | `[integer]` | 500 | + +Example: + +```bash +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v memgraph/memgraph-platform -e APP_CYPHER_QUERY_MAX_LEN=10000 memgraph/memgraph-platform +``` + +## Windows + +We recommend you [install **Memgraph Platform**](/memgraph/installation) and get +the complete streaming graph application platform that includes
**MemgraphDB**, +command-line tool **mgconsole**, visual user interface **Memgraph Lab** running +within the browser and **MAGE** - graph algorithms and modules library. + +To access the web application go to http://lab.memgraph.com/ and if you want to +install Memgraph Lab as a desktop application, follow the instructions below. + +## Step 1 - Download and install Memgraph + +Memgraph Lab needs a running MemgraphDB instance + +If you installed Memgraph Platform you should already have one.
If not, +install [MemgraphDB](/memgraph/installation) and once the database instance is +running you can continue with the next step. + +If you installed Memgraph DB using Docker, and you want to be able to use in-browser Memgraph Lab, be sure to expose port 3000 (`-p 3000:3000`) in the `docker run ...` command. + +If you installed Memgraph DB using Docker, and you want to be able to connect to it with Memgraph Lab application, be sure to expose ports 7687 for the instance connection (`-p 7687:7687`) and 7444 for logs (`-p 7444:7444`) in the `docker run ...` command. + +## Step 2 - Installing and setting up Memgraph Lab + +**1.** Download Memgraph Lab by visiting the [Download +Hub](https://memgraph.com/download/#memgraph-lab). + +**2.** You can install Memgraph Lab by double clicking the downloaded installer +and following the instructions. + +**3.** After you start Memgraph Lab, you should be presented with a login +screen. The username and password fields are empty by default. The default +connection string is set to `localhost:7687`. If you're using a different port, +you will have to change the connection string to point to that port, i.e. +`localhost:`. + +**4.** Click on connect, and you should be presented with the following +dashboard: + +![lab-dashboard](../data/installation/lab-dashboard.png) + +Congratulations! You have successfully installed Memgraph Lab and connected it +to Memgraph. You are now ready to start building your graph and querying it. + +:::caution + +You might receive the following error message when trying to connect. + +![failed_connection](../data/failed_connection.png) + +In this case, make sure that Memgraph is properly up and running and that you +have entered the correct port number. + +::: + +## Step 3 - Create a simple graph + +Let's create a simple graph and execute some queries. This will make sure +everything is running correctly. + +Go to **Query execution**, enter the following query in the **Cypher Editor** tab and click **Run Query**. + +```cypher +CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); +``` + +You just created 2 nodes in the database, one labeled `User` with the name +"Alice" and the other labeled `Software` with the name "Memgraph". Between them, +you also created a relationship indicating that "Alice" likes "Memgraph". + +Now that the data is stored inside Memgraph, you can run a query to retrieve and +visualize the graph. Execute the following query: + +```cypher +MATCH (u:User)-[r]->(x) RETURN u, r, x; +``` + +You should get the following result: + +![graph_result](../data/installation/lab-graph.png) + +Now that you know your development environment is working, you are ready to +continue exploring Memgraph and building much more interesting projects +leveraging connected data. + +## MacOS + +We recommend you [install **Memgraph Platform**](/memgraph/installation) and get +the complete streaming graph application platform that includes
**MemgraphDB**, +command-line tool **mgconsole**, visual user interface **Memgraph Lab** running +within the browser and **MAGE** - graph algorithms and modules library. + +To access the web application go to http://lab.memgraph.com/ and if you want to +install Memgraph Lab as a desktop application, follow the instructions below. + +## Step 1 - Download and install Memgraph + +Memgraph Lab needs a running MemgraphDB instance + +If you installed Memgraph Platform you should already have one.
If not, +install [MemgraphDB](/memgraph/installation) and once the database instance is +running you can continue with the next step. + +If you installed Memgraph DB using Docker, and you want to be able to use in-browser Memgraph Lab, be sure to expose port 3000 (`-p 3000:3000`) in the `docker run ...` command. + +If you installed Memgraph DB using Docker, and you want to be able to connect to it with Memgraph Lab application, be sure to expose ports 7687 for the instance connection (`-p 7687:7687`) and 7444 for logs (`-p 7444:7444`) in the `docker run ...` command. + +## Step 2 - Installing and setting up Memgraph Lab + +**1.** Download Memgraph Lab by visiting the [Download +Hub](https://memgraph.com/download/#memgraph-lab). + +**2.** Once you have Memgraph Lab installed, run the app, and you should be +presented with a login screen. The username and password fields are empty by +default. The default connection string is set to `localhost:7687`. If you're +using a different port, you will have to change the connection string to point +to that port, i.e. `localhost:`. + +**3.** Click on connect, and you should be presented with the following +dashboard: + +![lab-dashboard](../data/installation/lab-dashboard.png) + +Congratulations! You have successfully installed Memgraph Lab and connected it +to Memgraph. You are now ready to start building your graph and querying it. + +:::caution + +You might receive the following error message when trying to connect. + +![failed_connection](../data/failed_connection.png) + +In this case, make sure that Memgraph is properly up and running and that you +have entered the correct port number. + +::: + +## Step 3 - Create a simple graph + +Let's create a simple graph and execute some queries. This will make sure +everything is running correctly. + +Go to **Query execution**, enter the following query in the **Cypher Editor** tab and click **Run Query**. + +```cypher +CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); +``` + +You just created 2 nodes in the database, one labeled `User` with the name +"Alice" and the other labeled `Software` with the name "Memgraph". Between them, +you also created a relationship indicating that "Alice" likes "Memgraph". + +Now that the data is stored inside Memgraph, you can run a query to retrieve and +visualize the graph. Execute the following query: + +```cypher +MATCH (u:User)-[r]->(x) RETURN u, r, x; +``` + +You should get the following result: + +![graph_result](../data/installation/lab-graph.png) + +Now that you know your development environment is working, you are ready to +continue exploring Memgraph and building much more interesting projects +leveraging connected data. + +## Linux + +We recommend you [install **Memgraph Platform**](/memgraph/installation) and get +the complete streaming graph application platform that includes
**MemgraphDB**, +command-line tool **mgconsole**, visual user interface **Memgraph Lab** running +within the browser and **MAGE** - graph algorithms and modules library. + +To access the web application go to http://lab.memgraph.com/ and if you want to +install Memgraph Lab as a desktop application, follow the instructions below. + +## Step 1 - Download and install Memgraph + +Memgraph Lab needs a running MemgraphDB instance + +If you installed Memgraph Platform you should already have one.
If not, +install [MemgraphDB](/memgraph/installation) and once the database instance is +running you can continue with the next step. + +If you installed Memgraph DB using Docker, and you want to be able to use in-browser Memgraph Lab, be sure to expose port 3000 (`-p 3000:3000`) in the `docker run ...` command. + +If you installed Memgraph DB using Docker, and you want to be able to connect to it with Memgraph Lab application, be sure to expose ports 7687 for the instance connection (`-p 7687:7687`) and 7444 for logs (`-p 7444:7444`) in the `docker run ...` command. + +## Step 2 - Installing and setting up Memgraph Lab + +**1.** Download Memgraph Lab by visiting the [Download +Hub](https://memgraph.com/download/#memgraph-lab). + +**2.** You can install Memgraph Lab by double clicking the downloaded installer +or by executing: + +```console +sudo dpkg -i MemgraphLab-x.x.x.deb +``` + +**3.** After you start Memgraph Lab, you should be presented with a login +screen. The username and password fields are empty by default. The default +connection string is set to `localhost:7687`. If you're using a different port, +you will have to change the connection string to point to that port, i.e. +`localhost:`. + +**4.** Click on connect, and you should be presented with the following +dashboard: + +![lab-dashboard](../data/installation/lab-dashboard.png) + +Congratulations! You have successfully installed Memgraph Lab and connected it +to Memgraph. You are now ready to start building your graph and querying it. + +:::caution + +You might receive the following error message when trying to connect. + +![failed_connection](../data/failed_connection.png) + +In this case, make sure that Memgraph is properly up and running and that you +have entered the correct port number. + +::: + +## Step 3 - Create a simple graph + +Let's create a simple graph and execute some queries. This will make sure +everything is running correctly. + +Go to **Query execution**, enter the following query in the **Cypher Editor** tab and click **Run Query**. + +```cypher +CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); +``` + +You just created 2 nodes in the database, one labeled `User` with the name +"Alice" and the other labeled `Software` with the name "Memgraph". Between them, +you also created a relationship indicating that "Alice" likes "Memgraph". + +Now that the data is stored inside Memgraph, you can run a query to retrieve and +visualize the graph. Execute the following query: + +```cypher +MATCH (u:User)-[r]->(x) RETURN u, r, x; +``` + +You should get the following result: + +![graph_result](../data/installation/lab-graph.png) + +Now that you know your development environment is working, you are ready to +continue exploring Memgraph and building much more interesting projects +leveraging connected data. + +## Connect to a database + +import CompatibilityWarning from './templates/_compatibility_warning.mdx'; + +## Prerequisites + +Before you proceed with the guide, make sure that you have either: + +- Installed [**Memgraph Platform**](/memgraph/installation) and now have a + running database instance and Memgraph Lab is running within the browser on + `http://localhost:3000`, or +- Installed [**MemgraphDB**](/memgraph/installation) and a running database + instance, and either an installed [**Memgraph + Lab**](/memgraph-lab/installation) desktop application or access to Memgraph + Lab web application at http://lab.memgraph.com/ + +## Connecting to Memgraph + +Make sure that Memgraph is running and open Memgraph Lab. If you are starting +with a fresh database instance: + +1. Leave the `Username` and `Password` fields **empty**. +2. The `Host` field can be either **`localhost`**, **`127.0.0.1`** or + **`0.0.0.0`**, or change it appropriately. +3. The `Port` field should be **`7687`**. Every Memgraph instance is listening + on this port by default. +4. The `Encrypted` option should be **disabled** and display `SSL Off` by + default. + +If you fail to connect, make sure that your database instance is up and running. +If the `Host` address is wrong, take a look at the [Docker +note](/memgraph/how-to-work-with-docker#docker-container-ip-address) in the +installation guide. + +![Memgraph Lab](./data/getting-started/memgraph-lab-login.png) + +## Executing queries + +Now, you can execute Cypher queries on Memgraph. Open the **Query** tab, located +in the left sidebar, copy the following query and press the **Run query** +button: + +```cypher +CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); +``` + +The query above will create 2 nodes in the database, one labeled "User" with +name "Alice" and the other labeled "Software" with name "Memgraph". It will also +create a relationship that "Alice" _likes_ "Memgraph". + +To find created nodes and relationships, execute the following query: + +```cypher +MATCH (u:User)-[r]->(x) RETURN u, r, x; +``` + +## Where to next? + +To learn more about the **Cypher** language, visit the **[Cypher +manual](/cypher-manual)** or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive guides. For +real-world examples of how to use Memgraph, we strongly suggest going through +one of the available **[Tutorials](/memgraph/tutorials)**. Details on what can +be stored in Memgraph can be found in the article about **[Data +storage](/memgraph/concepts/storage)**. + +## Getting help + +Visit the **[Help Center](/help-center)** page in case you run into any kind of +problem, or you have additional questions. \ No newline at end of file diff --git a/docs2/data-visualization/style-your-graphs-in-memgraph-lab.md b/docs2/data-visualization/style-your-graphs-in-memgraph-lab.md new file mode 100644 index 00000000000..d8272487878 --- /dev/null +++ b/docs2/data-visualization/style-your-graphs-in-memgraph-lab.md @@ -0,0 +1,305 @@ +--- +id: style-your-graphs-in-memgraph-lab +title: Style your graphs in Memgraph Lab +sidebar_label: Style your graphs in Memgraph Lab +--- + +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/how-to-style-your-graphs-in-memgraph-lab) + +In this tutorial, you'll learn how to use **Style script** to add style to your +graphs. You'll use [**Memgraph Cloud**](https://memgraph.com/cloud) or the sandbox +site **Memgraph Playground** that runs **Memgraph Lab** to try out styling +graphs. + +## Prerequisites + +For this tutorial, there are no particular prerequisites. All you need is a web +browser. + +## Step 1 - Connecting to Memgraph Cloud or Memgraph Playground + +Memgraph Cloud enables you to read and make changes to the data. It comes with a +14-day free trial upon registration. You can also use Memgraph Playground, but +there you can only read the data, but don't worry, you will be able to complete +the tutorial. + +### Memgraph Cloud + +1. [Sign up](https://cloud.memgraph.com/) to Memgraph Cloud. +2. Once you finish the registration, log in and [create a new + project](/memgraph-cloud/cloud-projects#create-a-new-memgraph-cloud-project). +3. Open the project and [connect to it via Memgraph + Lab](/memgraph-cloud/cloud-connect#connect-with-memgraph-lab). + + + +4. In Memgraph Lab, navigate to **Datasets** section and upload the Europe + backpacking dataset. +5. Run the sample query provided by the Lab. +6. Open the **Graph Style Editor** tab. + + + +Notice there is code already present in the _Graph Style Editor_. In the next few +steps, you'll learn how to adjust that code to style your graph using colors and +images. + +### Memgraph Playground + +Open the Memgraph Playground sandbox [Europe +backpacking](https://playground.memgraph.com/sandbox/europe-backpacking). When +the sandbox is loaded, do the following: + +1. Expand **Sample Query Examples**. +2. Run the first query to display the shortest path from Spain to Russia. +3. Click the gear icon to open the **Style editor** + +![style-graphs-open-style-editor](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-open-style-editor.png) + +Notice there is code already present in the _Style editor_. In the next few +steps, you'll learn how to adjust that code to style your graph using colors and +images. + +## Step 2 - Using colors and borders to style graph nodes + +With the _Style editor_ in front of you, you are ready to style your graph by +modifying the existing style and adding some new style rules. First, let's +modify the code that defines the node style. Look for this section of the code: + +```nocopy +@NodeStyle { + size: 50 + border-width: 5 + border-color: #ffffff + shadow-color: #bab8bb + shadow-size: 6 +} +``` + +This part of the code is called a +[directive](https://memgraph.com/docs/memgraph-lab/style-script/reference-guide#directives), +and it is used to define how the node looks and feels. + +To start, make the node smaller but with a larger and darker shadow. Update the +values for properties `size`, `shadow-color`, and `shadow-size`. Set the value +of `size` to `35`, `shadow-color` to `#333333`, and `shadow-size` to `20`. Your +code should now look like this: + +``` +@NodeStyle { + size: 35 + border-width: 5 + border-color: #ffffff + shadow-color: #333333 + shadow-size: 20 +} +``` + +Click **Apply** to see what your graph looks like now. + +![style-graphs-node-size](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-node-size.png) + +Now change the color of the nodes from red to gold and make them orange on +hover. Find the following code: + +```nocopy +@NodeStyle HasLabel?(node, "Country") { + color: #dd2222 + color-hover: Darker(#dd2222) + color-selected: #dd2222 +} +``` + +Update value of the property `color` to `#ffd700` and `color-hover` to +`#ffa500`. The updated code should look like this: + +``` +@NodeStyle HasLabel(node, "Country") { + color: #ffd700 + color-hover: #ffa500 + color-selected: #dd2222 +} +``` + +Don't forget to click **Apply** to see your updated graph. + +![style-graphs-node-colors](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-node-colors.png) + +## Step 3 - Add images to the nodes + +Now that all the colors and borders are just right, it's time to add images to +the nodes. Let's add them to the first and last node using two different images +from Wikipedia. You'll use a predicate to assign a value to a node with a +specific node value. + +To display the two images, add the following code at the end of the style +script: + +``` +@NodeStyle Equals(Property(node, "name"), "Russia") { + image-url: "https://upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/320px-Flag_of_Russia.svg.png" +} + +@NodeStyle Equals(Property(node, "name"), "Spain") { + image-url: "https://upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/320px-Flag_of_Spain.svg.png" + } +``` + +Click **Apply** to update the style of your graph. Your graph is looking better +with each step, isn't it? + +![style-graphs-node-with-images](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-node-with-images.png) + +## Step 4 - Using colors to style graph relationships + +With all of the nodes looking just like you wanted them to, it's time to style +the relationships between them. You'll represent your relationships as straight, +thin lines with no arrows. To do that, locate the `@EdgeStyle` directive and the +following code: + +```nocopy +@EdgeStyle { + width: 3 + label: Type(edge) +} +``` + +Now replace that code with this one: + +``` +@EdgeStyle { + width: 1 + label: Type(edge) + arrow-size: 0 + color: #6AA84F +} +``` + +Click **Apply** and your relationships will have a new style! + +![style-graphs-relationships-colors](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-relationships-colors.png) + +## Step 5 - Checking the final result + +We are at the end of this tutorial. Move the nodes around to get the final look. +Your result could look similar to the image below. + +![style-graphs-graph-with-new-style](../data/tutorials/style-your-graphs-in-memgraph-lab/style-graphs-graph-with-new-style.png) + +The complete styling code for this graph is: + +``` +@NodeStyle { + size: 35 + border-width: 5 + border-color: #ffffff + shadow-color: #333333 + shadow-size: 20 +} + +@NodeStyle Greater(Size(Labels(node)), 0) { + label: Format(":{}", Join(Labels(node), " :")) +} + +@NodeStyle HasLabel(node, "Country") { + color: #ffd700 + color-hover: #ffa500 + color-selected: #dd2222 +} + +@NodeStyle HasProperty(node, "name") { + label: AsText(Property(node, "name")) +} + +@EdgeStyle { + width: 1 + label: Type(edge) + arrow-size: 0 + color: #6AA84F +} + +@NodeStyle Equals(Property(node, "name"), "Russia") { + image-url: "https://upload.wikimedia.org/wikipedia/en/thumb/f/f3/Flag_of_Russia.svg/320px-Flag_of_Russia.svg.png" +} + +@NodeStyle Equals(Property(node, "name"), "Spain") { + image-url: "https://upload.wikimedia.org/wikipedia/en/thumb/9/9a/Flag_of_Spain.svg/320px-Flag_of_Spain.svg.png" + } +``` + +## Use font awesome for node images + +[![Related - Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/style-your-graphs-in-memgraph-lab.md) [![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/how-to-style-your-graphs-in-memgraph-lab) + +[Font Awesome](https://fontawesome.com/) is a popular icon library. If you ever +tried to use a font awesome icon as a background image for a node, you might +have noticed you were not able to do that by using the icon directly. Memgraph +Lab doesn't support `SVG` format at this time, but it supports `PNG`, `JPEG`, +`GIF` and `WEBP` formats. Here is a workaround for this problem. + +1. Find the Font Awesome icon that you want to convert to PNG. Go to the [Font + Awesome](https://fontawesome.com/icons/) website and locate the icon that you + want to use as a node background and download it in SVG format. + + + + +2. Convert SVG file to PNG with your favorite image editing program, or you can use one of the dozen online services for file conversion. + + +:::info + +You can use programs such as [Gimp](https://www.gimp.org/) or +[Inkscape](https://inkscape.org/) to convert SVG to PNG. + +::: + +3. Upload the PNG file to a web server so that you can set it for node background. + If you are using an image hosting service, make a note of the URL. Some of those + services use URLs unrelated to the image name and are hard to come by at a + latter time. + +4. Edit the code of the Graph Style Editor in Memgraph Lab by adding the `image-url` property to the `@NodeStyle` class. Here is an example: + +``` +image-url: "https://i.imgur.com/bLF8qWQ.png" +``` + +Your `@NodeStyle` block of code should look something like this: + +``` +@NodeStyle { + size: 6 + color: #DD2222 + border-width: 0.6 + border-color: #1d1d1d + font-size: 3 + image-url: "https://i.imgur.com/bLF8qWQ.png" +} +``` + +You can look at [Graph Style Script @NodeStyle directive +properties](/memgraph-lab/style-script/gss-nodestyle-directive#image-url-string) +for additional info on the syntax. + +5. Apply the style and review changes. + + + +## Where to next? + +In this tutorial, you've learned how to style graphs, nodes and relationships in +particular, using Memgraph Lab. We hope that you had fun going through this +tutorial. You can continue playing in Playground, or even better [download and +install **Memgraph Platform**](/docs/memgraph/installation) on your computer. + +To get a taste of some more advanced styling features, head to our blog post +[How to style your graphs in Memgraph +Lab](https://memgraph.com/blog/how-to-style-your-graphs-in-memgraph-lab). Also, +be sure to check out [guide to Style Script +script](/docs/memgraph-lab/graph-style-script-language) or take a deep dive into +the [Graph Style Script reference +guide](/docs/memgraph-lab/style-script/reference-guide) to learn more about the +language. diff --git a/docs2/data-visualization/user-manual.md b/docs2/data-visualization/user-manual.md new file mode 100644 index 00000000000..fe1d8b971f4 --- /dev/null +++ b/docs2/data-visualization/user-manual.md @@ -0,0 +1,331 @@ +--- +title: Memgraph Lab user manual +sidebar_label: User manual +--- + +**Memgraph Lab** is a **visual user interface** that enables you to import and +export data to and from Memgraph database, write and execute Cypher queries, +visualize graph data, view and optimize query performance, develop query modules +in Python or connect to data streams. + +Here is a short overview of the Lab's interface, features it provides, and links +to resources that will help you achieve your graph goals. At the end of the +page, there is a Lab demo video from the Memgraph Cloud launch if you need more +visual input. + +## Overview + +Every time you open Memgraph Lab, it will greet you with an **Overview**, +offering resources and actions depending whether your database is empty or not, +and whether you ran any queries or not. + + + +At the top of the screen, you will find information about: +- Connection status +- Memgraph version, IP address, and port of the database +- Number of nodes and relationships currently in the database +- Disk storage used, and total and available RAM + +In the top right corner, you can find the help and notification buttons. The +help section provides you with helpful documentation and links to the Memgraph +community, while the notification section is used to inform you about important +events within the Memgraph ecosystem. + +All the Memgraph Lab sections are listed in the left side menu, below which you +can find the [Layout](#layout) options and Memgraph Lab version. + +## Query Execution + +In this section, you can write and run queries, as well as see their tabular or +graphical results. + +### Cypher Editor + +Here is where you write and run your Cypher queries. A keyword suggestion tool +can help you with clause completion and give information about signatures and +parameters. If you need help writing Cypher queries, check out the [Cypher +manual](/cypher-manual). + + + +Once you **Run** a query (by clicking a button or pressing **CTRL** + +**Enter**), you can **Cancel** it, but if the query has already +reach MemgraphDB, the action won't actually stop the query's execution. In the +case of running complex algorithms on a large dataset, you need to be a bit +patient and wait for Memgraph to complete running the query. + +If the Cypher Editor contains multiple queries, you can execute a single one by +selecting it and pushing the "Run Selected". + +Here you can also copy the query to your clipboard or add it to an existing or +new query collection. + +### Graph Style Editor + +The Graph Style Editor allows you to customize the visual appearance of the +graph results by editing the System Style or create a custom style and change +the color, size, and shape of graph elements, even add pictures or backgrounds. + + + +When saving a style, the graph results of an executed query will be used as a +preview picture of the style. Custom styles can be defined as a default style +applied to all following query runs. + +Be sure to check the [GSS reference guide](/style-script/overview.md) and a +[tutorial](/memgraph/tutorials/style-your-graphs-in-memgraph-lab) on how to +style your graphs. + +### Data results + +When the query has executed or failed, below the Cypher editor, you can see a +*Query successful* message or an error. In the case of successful execution, +you can also find out query execution time, rows the query generated as well as +nodes and relationship the query returned (if any). + +Depending on the results generated by the query, they can be shown in a +table, as a graph, or both. + +When rendering a graph that exceeds the set rendering limits, which +might take considerable amount of time to preview, you will be asked if you want +to proceed with the graph visualization or switch to the data view. + +The rendering limit can be set in the **Settings** section. + +If the Cypher Editor contains multiple queries and all of them were executed, +you can select to view the results of each separate query. + +Here, you can also **Download Results** in JSON, CSV and TSV format. + +Table rows can be expanded to show additional information about entities. + + + +### Graph results + +When results are shown as a graph, you can click on each node or relationship to +see additional information. You can also **Expand** a node to see its +relationships, **Collapse** a node to hide its relationships, or **Hide** the node from +the canvas. + + + +In the bottom left corner, you can **Enable physics**, that is, make nodes interact +with each other, by pulling away or closing in to one another, depending on the +strength of the relationships between them. + +In the top right corner of the graph you can open **Graph Preferences** and set +the collision radius that dictates the margin radius for each node from its +center, the repel force that dictates how strongly nodes repel each other, and +the link distance that dictates the minimum required distance between two +connected nodes. + + + +Another interesting feature you can use on graph data results is the map +background. This feature automatically turns on when the result nodes have +numeral `lat` and `lng` properties. + + + +## Run History + +Here you can search or view ran queries or applied styles, with the information +about the time of the last run, runtime and whether the execution was successful +or not. + +If the last action within the Query Execution was a query run, a clock icon will +appear next to the query. If the last action was the application of a style, a +clock icon will appear next to the style name. If the action included both the +execution of a query and the application of the style, the clock icon will +appear next to both. + +If the Query Editor includes several queries, but only one query was selected +and ran, the Query column of the Run History will show only that query, but the +full contents of the Query Editor will be previewed by expanding the row. +Expanding the Style Name column will show the Graph Style Script (GSS) code of +the style. + +In the Run History you can run rerun queries in the current or new execution +view, copy them into clipboard, and save them to an existing or new collection. + +You can filter the data to view just the query history, the style history or +both. + +You can clear the run history in the **Settings** section. + +## Collections + +In the Collections section, you can gather your favorite queries so they are always +at hand. + + + +Queries can be added to the collection from the **Query Execution** and the **Latest +Queries** section. + +From the **Query Collections** section, you can directly run queries, copy them +into the clipboard, and save them to an existing or a new collection. + +Query collections can also be imported and exported to JSON file format. + +## Query Modules + +[Query modules](/memgraph/reference-guide/query-modules) are +collections of procedures written in **C**, **C++**, **Python**, and **Rust** +(either `*.so` or `*.py` files) to extend the query language. Transformation +procedures necessary to ingest data from data streams are also written as query +modules. + +Some query modules are built-in, and others, like those that can help you solve +complex graph issues, are available as part of the [MAGE](/mage) +library you can add to your Memgraph installation. The library is already +included if you are using [Memgraph +Platform](/memgraph/installation) or [Memgraph +MAGE](/mage/installation) Docker images to run Memgraph, or you are +connecting to a [Cloud](/memgraph-cloud) instance. + +All the query modules and procedures are listed in the **Query Modules** +section. By expanding the information about each query module, you can see the +procedures it contains, as well as their signatures and examples. + + + +In this section, you can also implement your own custom query modules written in +Python by clicking on the **+ New Module**. A new file will open with example +procedures you can examine and learn from. Once you have written and saved the +query module, Memgraph Lab will automatically detect procedures within it, which +you can then call from queries. + + + +If you need help writing custom query modules, check out the [reference +guide](/memgraph/reference-guide/query-modules/implement-custom-query-modules/overview), +or a [tutorial](/memgraph/tutorials/implement-custom-query-module-in-python) +on query modules. + +## Streams + +In Memgraph Lab, you can connect to a data stream by running a series of Cypher +queries or you can connect to it using the **Streams** section. + +Once you enter basic information about the stream, such as type, name, server +address and topics, the Streams section allows you to add an existing query +module containing transformation procedure or to write a new one in Python. + +When saving a new query module, Memgraph Lab will automatically detect +transformation procedures within it, which you can then attach to the created +stream. + +Adding Kafka Configuration Parameters is also done via the Stream section, as +well as managing the connection - starting, pausing or deleting it. + + + +Check out the [reference guide on +streams](/memgraph/reference-guide/streams), and check [a how-to +guide on connecting to data +streams](/memgraph/import-data/data-streams/manage-streams-lab) from Memgraph +Lab. + +## Graph Schema + +If you need to check the data model of the data currently in the database, you +can generate a graph schema that will show all the node types and relationships +between them. + +By selecting a certain node or relationship type, Lab will provide information +about the current number of nodes or relationships of that type, as well as +percentage of existence of each property across the nodes and relationships of +that type. + + + +## Datasets + +From the Datasets section, you can load interesting datasets varying in topics +and size. You can use the datasets to explore the Cypher query language and +Memgraph Lab features, or to experiment with data before you tackle your own +more complex issues. + +You can check the structure of the dataset by checking its graph schema, as well +as reading the explanations of all the entities and their properties. + + + +There are several +[tutorials](/memgraph/tutorials/exploring-datasets) you can use to +explore the datasets available in Memgraph Lab. + +## Import & Export + +In this section, you can import and export data in [CYPHERL +format](/memgraph/import-data/cypherl), which represents data in the form of +Cypher queries. + +To import data from other sources, check the [guides on +importing](/memgraph/import-data). + + + +## Logs + +To be able to see logs from Memgraph Lab application you need to open port 7444 +when starting Memgraph with Docker. + + + + +Check the [reference guide on +configuration](/memgraph/reference-guide/configuration#other) regarding logs +to check how to modify logging, and the [how-to guide on how to access +logs](/memgraph/how-to-guides/config-logs) if you are not using Docker. + +You can set the number of visible logs in the **Settings** section. + +## Settings + +In the Settings section you can check your unique application identification +number and the Lab version. + +You can also adjust the limits after which Lab will no longer give code +completion suggestions, or automatically render graph results. + +In the settings you can also clear the run history and set the number of saved +records, as well as the number of log records. + +In the Graph Style Library you can rename or delete styles, set them as the +default style or see their code. + +## Layout + +You can split the work area horizontally to work with 2 sections at the same +time, or vertically to work with up to 5 sections simultaneously. + +You can also combine the horizontal and vertical split. + + + +## Memgraph Lab demo video + +As a part of the [Memgraph Cloud](/memgraph-cloud) release, we've showcased different +features of Memgraph Lab, and we invite you to check it out! + +[memgraph_lab](https://youtu.be/Tt5KPKylU8k?t=1390 "Get started with Memgraph Lab") + +If you are interested in a particular topic, below is the breakdown of the video +by the topics covered in the user manual, but there is a breakdown in the +description of the video as well: + +- Overview section ([25:35](https://youtu.be/Tt5KPKylU8k?t=1534)) +- Streams section ([26:25](https://youtu.be/Tt5KPKylU8k?t=1585)) +- Graph schema ([38:30](https://youtu.be/Tt5KPKylU8k?t=2310)) +- Query execution ([39:55](https://youtu.be/Tt5KPKylU8k?t=2395)) +- MAGE query modules ([42:00](https://youtu.be/Tt5KPKylU8k?t=2520)) +- GSS ([1:00:14](https://youtu.be/Tt5KPKylU8k?t=2520)) and ([1:14:20](https://youtu.be/Tt5KPKylU8k?t=4460)) +- Query collections ([1:08:18](https://youtu.be/Tt5KPKylU8k?t=4096)) +- Datasets section ([1:09:55](https://youtu.be/Tt5KPKylU8k?t=4195)) +- Keyword suggestion tool ([1:11:15](https://youtu.be/Tt5KPKylU8k?t=4275)) +- Customizing graph results ([1:12.48](https://youtu.be/Tt5KPKylU8k?t=4365)) +- Custom query modules ([1:27:33](https://youtu.be/Tt5KPKylU8k?t=5253)) \ No newline at end of file diff --git a/docs2/data/first-steps/connect-to-memgraph-lab.png b/docs2/data/first-steps/connect-to-memgraph-lab.png new file mode 100644 index 00000000000..e274d1bcabc Binary files /dev/null and b/docs2/data/first-steps/connect-to-memgraph-lab.png differ diff --git a/docs2/data/first-steps/memgraph-lab-cypher-editor.png b/docs2/data/first-steps/memgraph-lab-cypher-editor.png new file mode 100644 index 00000000000..fa7c7443850 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-cypher-editor.png differ diff --git a/docs2/data/first-steps/memgraph-lab-dashboard.png b/docs2/data/first-steps/memgraph-lab-dashboard.png new file mode 100644 index 00000000000..b6906e771df Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-dashboard.png differ diff --git a/docs2/data/first-steps/memgraph-lab-dataset-import.png b/docs2/data/first-steps/memgraph-lab-dataset-import.png new file mode 100644 index 00000000000..e0a9c6274e3 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-dataset-import.png differ diff --git a/docs2/data/first-steps/memgraph-lab-datasets.png b/docs2/data/first-steps/memgraph-lab-datasets.png new file mode 100644 index 00000000000..31dc130dfd2 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-datasets.png differ diff --git a/docs2/data/first-steps/memgraph-lab-first-cypher-query.png b/docs2/data/first-steps/memgraph-lab-first-cypher-query.png new file mode 100644 index 00000000000..91063a41ced Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-first-cypher-query.png differ diff --git a/docs2/data/first-steps/memgraph-lab-graph-results.png b/docs2/data/first-steps/memgraph-lab-graph-results.png new file mode 100644 index 00000000000..ec4073c39a9 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-graph-results.png differ diff --git a/docs2/data/first-steps/memgraph-lab-map-style-final.png b/docs2/data/first-steps/memgraph-lab-map-style-final.png new file mode 100644 index 00000000000..cb045f93e03 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-map-style-final.png differ diff --git a/docs2/data/first-steps/memgraph-lab-map-style.png b/docs2/data/first-steps/memgraph-lab-map-style.png new file mode 100644 index 00000000000..859b32ea208 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-map-style.png differ diff --git a/docs2/data/first-steps/memgraph-lab-style-editor.png b/docs2/data/first-steps/memgraph-lab-style-editor.png new file mode 100644 index 00000000000..0e050c7a645 Binary files /dev/null and b/docs2/data/first-steps/memgraph-lab-style-editor.png differ diff --git a/docs2/data/first-steps/yt-video-preview.png b/docs2/data/first-steps/yt-video-preview.png new file mode 100644 index 00000000000..3c7dda96265 Binary files /dev/null and b/docs2/data/first-steps/yt-video-preview.png differ diff --git a/docs2/data/install-memgraph-on-windows-10/memgraph-lab-connect-now.png b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-connect-now.png new file mode 100644 index 00000000000..9a1d1f518d3 Binary files /dev/null and b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-connect-now.png differ diff --git a/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-data.png b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-data.png new file mode 100644 index 00000000000..6e41e41be3f Binary files /dev/null and b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-data.png differ diff --git a/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-graph.png b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-graph.png new file mode 100644 index 00000000000..4fa0afef836 Binary files /dev/null and b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-graph.png differ diff --git a/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-result.png b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-result.png new file mode 100644 index 00000000000..14e681ce6ef Binary files /dev/null and b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-result.png differ diff --git a/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-query.png b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-query.png new file mode 100644 index 00000000000..f69a726139b Binary files /dev/null and b/docs2/data/install-memgraph-on-windows-10/memgraph-lab-run-query.png differ diff --git a/docs2/data/lab-user-manual/custom-query-modules.png b/docs2/data/lab-user-manual/custom-query-modules.png new file mode 100644 index 00000000000..fed0f856e52 Binary files /dev/null and b/docs2/data/lab-user-manual/custom-query-modules.png differ diff --git a/docs2/data/lab-user-manual/dataset.png b/docs2/data/lab-user-manual/dataset.png new file mode 100644 index 00000000000..6c1d0fabae7 Binary files /dev/null and b/docs2/data/lab-user-manual/dataset.png differ diff --git a/docs2/data/lab-user-manual/graph-results.png b/docs2/data/lab-user-manual/graph-results.png new file mode 100644 index 00000000000..b95105c6b2d Binary files /dev/null and b/docs2/data/lab-user-manual/graph-results.png differ diff --git a/docs2/data/lab-user-manual/gss.png b/docs2/data/lab-user-manual/gss.png new file mode 100644 index 00000000000..3f3a61ec685 Binary files /dev/null and b/docs2/data/lab-user-manual/gss.png differ diff --git a/docs2/data/lab-user-manual/import-export.png b/docs2/data/lab-user-manual/import-export.png new file mode 100644 index 00000000000..53410dfba50 Binary files /dev/null and b/docs2/data/lab-user-manual/import-export.png differ diff --git a/docs2/data/lab-user-manual/intelisense.png b/docs2/data/lab-user-manual/intelisense.png new file mode 100644 index 00000000000..efd8d8b8b87 Binary files /dev/null and b/docs2/data/lab-user-manual/intelisense.png differ diff --git a/docs2/data/lab-user-manual/latest.png b/docs2/data/lab-user-manual/latest.png new file mode 100644 index 00000000000..74abb62a280 Binary files /dev/null and b/docs2/data/lab-user-manual/latest.png differ diff --git a/docs2/data/lab-user-manual/layouts.png b/docs2/data/lab-user-manual/layouts.png new file mode 100644 index 00000000000..a095a4e883a Binary files /dev/null and b/docs2/data/lab-user-manual/layouts.png differ diff --git a/docs2/data/lab-user-manual/logs.png b/docs2/data/lab-user-manual/logs.png new file mode 100644 index 00000000000..34a5470d795 Binary files /dev/null and b/docs2/data/lab-user-manual/logs.png differ diff --git a/docs2/data/lab-user-manual/map.png b/docs2/data/lab-user-manual/map.png new file mode 100644 index 00000000000..1d4d2e9fe4d Binary files /dev/null and b/docs2/data/lab-user-manual/map.png differ diff --git a/docs2/data/lab-user-manual/overview.png b/docs2/data/lab-user-manual/overview.png new file mode 100644 index 00000000000..b7c0f94c5c1 Binary files /dev/null and b/docs2/data/lab-user-manual/overview.png differ diff --git a/docs2/data/lab-user-manual/physics.png b/docs2/data/lab-user-manual/physics.png new file mode 100644 index 00000000000..c135e0b14e0 Binary files /dev/null and b/docs2/data/lab-user-manual/physics.png differ diff --git a/docs2/data/lab-user-manual/query-collection.png b/docs2/data/lab-user-manual/query-collection.png new file mode 100644 index 00000000000..43abedf69e5 Binary files /dev/null and b/docs2/data/lab-user-manual/query-collection.png differ diff --git a/docs2/data/lab-user-manual/query-modules.png b/docs2/data/lab-user-manual/query-modules.png new file mode 100644 index 00000000000..718d74ee77f Binary files /dev/null and b/docs2/data/lab-user-manual/query-modules.png differ diff --git a/docs2/data/lab-user-manual/rows.png b/docs2/data/lab-user-manual/rows.png new file mode 100644 index 00000000000..0b3647e106c Binary files /dev/null and b/docs2/data/lab-user-manual/rows.png differ diff --git a/docs2/data/lab-user-manual/schema.png b/docs2/data/lab-user-manual/schema.png new file mode 100644 index 00000000000..1e345323d53 Binary files /dev/null and b/docs2/data/lab-user-manual/schema.png differ diff --git a/docs2/data/lab-user-manual/streams.png b/docs2/data/lab-user-manual/streams.png new file mode 100644 index 00000000000..71582bbea4d Binary files /dev/null and b/docs2/data/lab-user-manual/streams.png differ diff --git a/docs2/data/lab-user-manual/video.png b/docs2/data/lab-user-manual/video.png new file mode 100644 index 00000000000..f05e2b97ba6 Binary files /dev/null and b/docs2/data/lab-user-manual/video.png differ diff --git a/docs2/data/memgraph-cloud/account-payment.png b/docs2/data/memgraph-cloud/account-payment.png new file mode 100644 index 00000000000..d972999fce8 Binary files /dev/null and b/docs2/data/memgraph-cloud/account-payment.png differ diff --git a/docs2/data/memgraph-cloud/admin-credentials.png b/docs2/data/memgraph-cloud/admin-credentials.png new file mode 100644 index 00000000000..9a5284b71f0 Binary files /dev/null and b/docs2/data/memgraph-cloud/admin-credentials.png differ diff --git a/docs2/data/memgraph-cloud/cloud-img.svg b/docs2/data/memgraph-cloud/cloud-img.svg new file mode 100644 index 00000000000..cb7add58c5b --- /dev/null +++ b/docs2/data/memgraph-cloud/cloud-img.svg @@ -0,0 +1,485 @@ + + + Illustration Copy + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs2/data/memgraph-cloud/cloud-login.png b/docs2/data/memgraph-cloud/cloud-login.png new file mode 100644 index 00000000000..7b8a3c55be3 Binary files /dev/null and b/docs2/data/memgraph-cloud/cloud-login.png differ diff --git a/docs2/data/memgraph-cloud/cloud-password.png b/docs2/data/memgraph-cloud/cloud-password.png new file mode 100644 index 00000000000..cfec7b56eaa Binary files /dev/null and b/docs2/data/memgraph-cloud/cloud-password.png differ diff --git a/docs2/data/memgraph-cloud/connect-to-cloud-memgraph-lab-web.png b/docs2/data/memgraph-cloud/connect-to-cloud-memgraph-lab-web.png new file mode 100644 index 00000000000..227df567b02 Binary files /dev/null and b/docs2/data/memgraph-cloud/connect-to-cloud-memgraph-lab-web.png differ diff --git a/docs2/data/memgraph-cloud/new-project.png b/docs2/data/memgraph-cloud/new-project.png new file mode 100644 index 00000000000..9693cd908b7 Binary files /dev/null and b/docs2/data/memgraph-cloud/new-project.png differ diff --git a/docs2/data/memgraph-cloud/pause-project.png b/docs2/data/memgraph-cloud/pause-project.png new file mode 100644 index 00000000000..469adea1117 Binary files /dev/null and b/docs2/data/memgraph-cloud/pause-project.png differ diff --git a/docs2/data/memgraph-cloud/paused-project.png b/docs2/data/memgraph-cloud/paused-project.png new file mode 100644 index 00000000000..18abab2e803 Binary files /dev/null and b/docs2/data/memgraph-cloud/paused-project.png differ diff --git a/docs2/data/memgraph-cloud/project-management.png b/docs2/data/memgraph-cloud/project-management.png new file mode 100644 index 00000000000..b2dd284b801 Binary files /dev/null and b/docs2/data/memgraph-cloud/project-management.png differ diff --git a/docs2/data/memgraph-cloud/yt-cloud-getting-started-preview.png b/docs2/data/memgraph-cloud/yt-cloud-getting-started-preview.png new file mode 100644 index 00000000000..db7c6a2abaf Binary files /dev/null and b/docs2/data/memgraph-cloud/yt-cloud-getting-started-preview.png differ diff --git a/docs2/data/migrate-from-neo4j/GSS.png b/docs2/data/migrate-from-neo4j/GSS.png new file mode 100644 index 00000000000..169fd76b9ce Binary files /dev/null and b/docs2/data/migrate-from-neo4j/GSS.png differ diff --git a/docs2/data/migrate-from-neo4j/contains.png b/docs2/data/migrate-from-neo4j/contains.png new file mode 100644 index 00000000000..e41fb1f1cd3 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/contains.png differ diff --git a/docs2/data/migrate-from-neo4j/employees.png b/docs2/data/migrate-from-neo4j/employees.png new file mode 100644 index 00000000000..aa6178aa172 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/employees.png differ diff --git a/docs2/data/migrate-from-neo4j/import_folder.png b/docs2/data/migrate-from-neo4j/import_folder.png new file mode 100644 index 00000000000..3ee03f261d1 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/import_folder.png differ diff --git a/docs2/data/migrate-from-neo4j/install_APOC.png b/docs2/data/migrate-from-neo4j/install_APOC.png new file mode 100644 index 00000000000..c9283311e59 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/install_APOC.png differ diff --git a/docs2/data/migrate-from-neo4j/orders.png b/docs2/data/migrate-from-neo4j/orders.png new file mode 100644 index 00000000000..6d5f5c8a53f Binary files /dev/null and b/docs2/data/migrate-from-neo4j/orders.png differ diff --git a/docs2/data/migrate-from-neo4j/product.png b/docs2/data/migrate-from-neo4j/product.png new file mode 100644 index 00000000000..4b6256e697e Binary files /dev/null and b/docs2/data/migrate-from-neo4j/product.png differ diff --git a/docs2/data/migrate-from-neo4j/reports_to.png b/docs2/data/migrate-from-neo4j/reports_to.png new file mode 100644 index 00000000000..bd0803b9da4 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/reports_to.png differ diff --git a/docs2/data/migrate-from-neo4j/shipping_schema.png b/docs2/data/migrate-from-neo4j/shipping_schema.png new file mode 100644 index 00000000000..f3076ca2efc Binary files /dev/null and b/docs2/data/migrate-from-neo4j/shipping_schema.png differ diff --git a/docs2/data/migrate-from-neo4j/sold.png b/docs2/data/migrate-from-neo4j/sold.png new file mode 100644 index 00000000000..13044fd5f47 Binary files /dev/null and b/docs2/data/migrate-from-neo4j/sold.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_data_model.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_data_model.png new file mode 100644 index 00000000000..1838ce0bd0d Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_data_model.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_export_query.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_query.png new file mode 100644 index 00000000000..c20ba705b8f Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_query.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard.png new file mode 100644 index 00000000000..a0e61c33873 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard_step_2.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard_step_2.png new file mode 100644 index 00000000000..6995a3337c8 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_export_wizard_step_2.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_file_location.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_file_location.png new file mode 100644 index 00000000000..00ff802b9e0 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_file_location.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_data_model.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_data_model.png new file mode 100644 index 00000000000..5dd60cf9507 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_data_model.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_database.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_database.png new file mode 100644 index 00000000000..ac6d1186487 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_graph_database.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_overview.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_overview.png new file mode 100644 index 00000000000..f955d2139e7 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_overview.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_query.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_query.png new file mode 100644 index 00000000000..4552cb5bfbb Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_lab_query.png differ diff --git a/docs2/data/migrate-from-rdbms/migrate_relational_database_nodes.png b/docs2/data/migrate-from-rdbms/migrate_relational_database_nodes.png new file mode 100644 index 00000000000..c12fc7c7dc9 Binary files /dev/null and b/docs2/data/migrate-from-rdbms/migrate_relational_database_nodes.png differ diff --git a/docs2/data/transactions/admin_show_transactions.png b/docs2/data/transactions/admin_show_transactions.png new file mode 100644 index 00000000000..88cb9813cb7 Binary files /dev/null and b/docs2/data/transactions/admin_show_transactions.png differ diff --git a/docs2/data/transactions/terminate_transactions.png b/docs2/data/transactions/terminate_transactions.png new file mode 100644 index 00000000000..6f511aa29f5 Binary files /dev/null and b/docs2/data/transactions/terminate_transactions.png differ diff --git a/docs2/data/transactions/transaction_aborted_message.png b/docs2/data/transactions/transaction_aborted_message.png new file mode 100644 index 00000000000..ea66612e587 Binary files /dev/null and b/docs2/data/transactions/transaction_aborted_message.png differ diff --git a/docs2/deployment/audit-log.md b/docs2/deployment/audit-log.md new file mode 100644 index 00000000000..7b9d079b973 --- /dev/null +++ b/docs2/deployment/audit-log.md @@ -0,0 +1,78 @@ +--- +id: audit-log +title: Audit log (Enterprise) +sidebar_label: Audit log +--- + +Memgraph supports all query audit logging. When enabled, the audit log contains +records of all queries executed on the database. Each executed query is one +entry (one line) in the audit log. The audit log itself is a CSV file. + +All audit logs are written to +`/audit/audit.log`. The log is rotated using +`logrotate`, so entries in the `audit.log` file are always the newest entries. +Entries in `audit.log.1`, `audit.log.2.gz`, etc. are older entries. The +default log rotation configuration can be found in `/etc/logrotate.d/memgraph`. +By default, the log is rotated every day and a full year of entries is +preserved. You can modify the values to your own needs and preferences. + +## Format + +The audit log contains the following information formatted into a CSV file: +```plaintext +,
,,, +``` +For each query, the supplied query parameters are also logged. The query is +escaped and quoted so that commas in queries don't affect the correctness of +the CSV. The parameters are encoded as JSON objects and are then escaped and +quoted. + +## Example + +This is an example of the audit log: +```plaintext +1551376833.225395,127.0.0.1,admin,"MATCH (n) DETACH DELETE n","{}" +1551376833.257825,127.0.0.1,admin,"CREATE (n {name: $name})","{\"name\":\"alice\"}" +1551376833.273546,127.0.0.1,admin,"MATCH (n), (m) CREATE (n)-[:e {when: $when}]->(m)","{\"when\":42}" +1551376833.300955,127.0.0.1,admin,"MATCH (n), (m) SET n.value = m.value","{}" +``` + +We can see that all of the queries were executed from the loopback address and +were executed by the user `admin`. The executed queries are: + + Query | Parameters +--------------------------------------------------|----------- +MATCH (n) DETACH DELETE n | {} +CREATE (n {name: $name}) | {"name": "alice"} +MATCH (n), (m) CREATE (n)-[:e {when: $when}]->(m) | {"when": 42} +MATCH (n), (m) SET n.value = m.value | {} + +## Parsing the log + +If you wish to parse the log, the following Python snippet shows how to extract +data from the audit log: +```python +import csv +import json + +with open("audit.log") as f: + reader = csv.reader(f, delimiter=',', doublequote=False, + escapechar='\\', lineterminator='\n', + quotechar='"', quoting=csv.QUOTE_MINIMAL, + skipinitialspace=False, strict=True) + for line in reader: + timestamp, address, username, query, params = line + params = json.loads(params) + # Rest of your code that processes the logs. +``` + +## Flags + +This section contains the list of flags that are used to configure audit +logging in Memgraph. + + Flag | Description +------------------------------------|------------ + `--audit-enabled` | Enables audit logging. + `--audit-buffer-size` | Controls the in-memory buffer size used for audit logs. + `--audit-buffer-flush-interval-ms` | Controls the time interval (in milliseconds) used for flushing the in-memory buffer to disk. diff --git a/docs2/deployment/auth-module.md b/docs2/deployment/auth-module.md new file mode 100644 index 00000000000..bd663b3b47d --- /dev/null +++ b/docs2/deployment/auth-module.md @@ -0,0 +1,149 @@ +--- +id: auth-module +title: Auth module (Enterprise) +sidebar_label: Auth module +--- + +Memgraph supports authentication and (optional) authorization using a custom +built external auth module. The two supported operation modes are: +- authentication only (username/password verification) +- authentication and authorization (username/password verification and user to + role mapping) + +When a user connects to Memgraph the database will forward the user's supplied +username and password to the external auth module and wait for it to deliver +the authentication and/or authorization verdict back to the database. Based on +the returned verdict, Memgraph will either close the connection to the +connected user or it will allow the connection and set-up the user and/or role +accordingly. + +When Memgraph is switched to use the external auth module for authentication +its internal users are automatically disabled. All users are authenticated only +using the module, existing local users are ignored (unless they can be +authenticated using the module). + +## Authentication + +In this mode Memgraph will only perform authentication (verification of +username and password) using the external auth module. All user to role +mappings and user and role permissions are managed through Memgraph. + +When a user that has never logged in to the database passes authentication +using the external auth module, a user object is created for that user. The +user can then be seen using the following query: +```cypher +SHOW USERS; +``` +This behavior can be changed to disable login to users that don't have an +explicitly created user account. + +## Authorization + +In this mode Memgraph will perform authentication and authorization using the +external auth module. The authorization supported is in the form of determining +the user to role mapping using the module. User and role permissions are still +managed through Memgraph. + +When a user that has a role that doesn't yet exist in the database logs in to +the database, a role object is created for that user and assigned to that user. +The role can then be seen using the following query: +```cypher +SHOW ROLES; +``` +This behavior can be changed to disable login to users that don't have an +explicitly created role. + +## Flags + +This section contains the list of flags that are used to configure the external +auth module authentication and authorization mechanisms used by Memgraph. + + Flag | Description +------------------------------------|------------ + `--auth-module-executable` | Path to the executable that should be used for user authentication/authorization. + `--auth-module-create-user` | Controls whether users should be implicitly created on first login or they should be explicitly created manually. + `--auth-module-create-role` | Controls whether roles should be implicitly created on first appearance or they should be explicitly created manually. + `--auth-module-manage-roles` | Specifies whether the module is used only for authentication (value is `false`), or it should be used for both authentication and authorization. + `--auth-module-timeout` | Specifies the maximum time that Memgraph will wait for a response from the external auth module. + `--auth-password-permit-null` | Can be set to false to disable null passwords. + `--auth-password-strength-regex` | The regular expression that should be used to match the entire entered password to ensure its strength. + +## Communication + +The external auth module can be written in any programming language. Because of +that, the communication protocol between Memgraph and the module is simple to +implement. + +Memgraph uses inter-process pipes to communicate with the module. The module +will receive auth requests on file descriptor `1000` and has to return auth +responses to file descriptor `1001`. You may be wondering why we didn't just +use `stdin` and `stdout` for communication. The standard streams aren't used +because external libraries often tend to write something to `stdout` which is +difficult to turn off. By using separate file descriptors, `stdout` is left +intact and can be used freely for debugging purposes (along with `stderr`). + +The protocol that is used between Memgraph and the module is as follows: + - Each auth request is sent as a JSON encoded object in a single line that is + terminated by a `\n`. + - Each auth response must be sent as a JSON encoded object in a single line + that is terminated by a `\n`. + - Auth requests are objects that contain the following keys: + - `username` - the user's username + - `password` - the user's password + - Auth responses must be objects that contain the following keys: + - `authenticated` - a `bool` indicating whether the user is allowed to log + in to the database + - `role` - a `string` indicating which role the user should have (must be + supplied even when the module is used for authentication only) + +If the external auth module crashes during the processing of an auth request, +Memgraph won't allow the user to log in to the database and will automatically +restart the auth module for the next auth request. All crash logs will be seen +in Memgraph's output (typically in `systemd` logs using `journalctl`). + +## Example + +This very simple example auth module is written in Python, but any programming language can be used. + +```python +#!/usr/bin/python3 +import json +import io + + +def authenticate(username, password): + return {"authenticated": True, "role": ""} + + +if __name__ == "__main__": + input_stream = io.FileIO(1000, mode="r") + output_stream = io.FileIO(1001, mode="w") + while True: + params = json.loads(input_stream.readline().decode("ascii")) + ret = authenticate(**params) + output_stream.write((json.dumps(ret) + "\n").encode("ascii")) +``` + +In the example you can see exactly how the communication protocol works and you +can see the function that is used for authentication (and authorization). When +writing your own modules you just have to reimplement the `authenticate` +function according to your needs. + +Because the authentication (and authorization) function has a simple signature, +it is easy (and recommended) to write unit (or integration) tests in separate +files. For example: + +```python +#!/usr/bin/python3 +import module + +assert module.authenticate("sponge", "bob") == {"authenticated": True, "role": ""} +assert module.authenticate("CHUCK", "NORRIS") == {"authenticated": True, "role": ""} +``` + +## LDAP + +With every Memgraph Enterprise installation we provide our own module that +supports authentication and authorization using LDAP. For more information +about how the module should be set-up see the +[reference guide](ldap-security.md). diff --git a/docs2/deployment/deployment.md b/docs2/deployment/deployment.md new file mode 100644 index 00000000000..89b0cf0350c --- /dev/null +++ b/docs2/deployment/deployment.md @@ -0,0 +1,16 @@ +# Deployment features + +Below are the features that might be interesting if you are running Memgraph in +production. + +## [Audit log](audit-log.md) +## [Auth module](auth-module.md) +## [Exposing system metrics](exposing-system-metrics.md) +## [LDAP security](ldap-security.md) +## [Metadata](metadata.md) +## [Monitoring server](monitoring-server.md) +## [Replication](replication.md) +## [Security](security.md) +## [Server stats](server-stats.md) +## [SSL encryption](ssl-encryption.md) +## [User management](user-management.md) \ No newline at end of file diff --git a/docs2/deployment/exposing-system-metrics.md b/docs2/deployment/exposing-system-metrics.md new file mode 100644 index 00000000000..e7b22894ff1 --- /dev/null +++ b/docs2/deployment/exposing-system-metrics.md @@ -0,0 +1,250 @@ +--- +id: exposing-system-metrics +title: Exposing system metrics (Enterprise) +sidebar_label: Exposing system metrics +--- + +In production systems, monitoring of applications is crucial, and that includes databases as well. +Memgraph allows tracking information about transactions, query latencies, snapshot recovery latencies, +triggers, bolt messages, indexes, streams, and many more using an HTTP server. + +Exposing metrics is a Memgraph Enterprise feature and therefore needs a valid Memgraph Enterprise license key. +After successfully entering the license key, Memgraph needs to be restarted in order to start the metrics HTTP server. + +## Configuring the HTTP endpoint + +The default address and port for the metrics server is `0.0.0.0:9091`, and can be configured using [configuration flags](/reference-guide/configuration.md) +`--metrics-address` and `--metrics-port`. If you need help changing the configuration follow [the how-to guide](/how-to-guides/config-logs.md). + +## System metrics + +All system metrics measuring different parts of the system can be divided into three different types: +- **Gauge** - a single value of some variable in the system (e.g. memory usage) +- **Counter (uint64_t)** - a value that can be incremented or decremented (e.g. number of active transactions in the system) +- **Histogram (uint64_t)** - distribution of measured values (e.g. certain percentile of query latency on N measured queries) + +### General metrics + + | Name | Type | Description | + | -------------- | ---------------- | ----------------------------------------------------------- | + | vertex_count | Gauge (uint64_t) | Number of nodes stored in the system. | + | edge_count | Gauge (uint64_t) | Number of relationships stored in the system. | + | average_degree | Gauge (double) | Average number of relationships of a single node. | + | memory_usage | Gauge (uint64_t) | Amount of RAM used reported by the OS (in bytes). | + | disk_usage | Gauge (uint64_t) | Amount of disk space used by the [data directory](/reference-guide/backup.md) (in bytes). | + +### Index metrics + + | Name | Type | Description | + | -------------------------- | ------- | ------------------------------------------------------ | + | ActiveLabelIndices | Counter | Number of active label indices in the system. | + | ActiveLabelPropertyIndices | Counter | Number of active label property indices in the system. | + +### Operator metrics + +Before a Cypher query is executed, it is converted into an internal form suitable for execution, known as a query plan. +A query plan is a tree-like data structure describing a pipeline of operations that will be performed on the database in order to +yield the results for a given query. Every node within a plan is known as +[a logical operator](/memgraph/reference-guide/inspecting-queries#operators) and describes a particular operation. + + | Name | Type | Description | + | ----------------------------------- | ------- | -------------------------------------------------------------- | + | OnceOperator | Counter | Number of times Once operator was used. | + | CreateNodeOperator | Counter | Number of times CreateNode operator was used. | + | CreateExpandOperator | Counter | Number of times CreateExpand operator was used. | + | ScanAllOperator | Counter | Number of times ScanAll operator was used. | + | ScanAllByLabelOperator | Counter | Number of times ScanAllByLabel operator was used. | + | ScanAllByLabelPropertyRangeOperator | Counter | Number of times ScanAllByLabelPropertyRange operator was used. | + | ScanAllByLabelPropertyValueOperator | Counter | Number of times ScanAllByLabelPropertyValue operator was used. | + | ScanAllByLabelPropertyOperator | Counter | Number of times ScanAllByLabelProperty operator was used. | + | ScanAllByLabelIdOperator | Counter | Number of times ScanAllByLabelId operator was used. | + | ExpandOperator | Counter | Number of times Expand operator was used. | + | ExpandVariableOperator | Counter | Number of times ExpandVariable operator was used. | + | ConstructNamedPathOperator | Counter | Number of times ConstructNamedPath operator was used. | + | FilterOperator | Counter | Number of times Filter operator was used. | + | ProduceOperator | Counter | Number of times Produce operator was used. | + | DeleteOperator | Counter | Number of times Delete operator was used. | + | SetPropertyOperator | Counter | Number of times SetProperty operator was used. | + | SetPropertiesOperator | Counter | Number of times SetProperties operator was used. | + | SetLabelsOperator | Counter | Number of times SetLabels operator was used. | + | RemovePropertyOperator | Counter | Number of times RemoveProperty operator was used. | + | RemoveLabelsOperator | Counter | Number of times RemoveLabels operator was used. | + | EdgeUniquenessFilterOperator | Counter | Number of times EdgeUniquenessFilter operator was used. | + | EmptyResultOperator | Counter | Number of times EmptyResult operator was used. | + | AccumulateOperator | Counter | Number of times Accumulate operator was used. | + | AggregateOperator | Counter | Number of times Aggregate operator was used. | + | SkipOperator | Counter | Number of times Skip operator was used. | + | LimitOperator | Counter | Number of times Limit operator was used. | + | OrderByOperator | Counter | Number of times OrderBy operator was used. | + | MergeOperator | Counter | Number of times Merge operator was used. | + | OptionalOperator | Counter | Number of times Optional operator was used. | + | UnwindOperator | Counter | Number of times Unwind operator was used. | + | DistinctOperator | Counter | Number of times Distinct operator was used. | + | UnionOperator | Counter | Number of times Union operator was used. | + | CartesianOperator | Counter | Number of times Cartesian operator was used. | + | CallProcedureOperator | Counter | Number of times CallProcedureOperator operator was used. | + | ForeachOperator | Counter | Number of times Foreach operator was used. | + | EvaluatePatternFilterOperator | Counter | Number of times EvaluatePatternFilter operator was used. | + | ApplyOperator | Counter | Number of times Apply operator was used. | + +### Query metrics + + | Name | Type | Description | + | ---------------------------- | --------- | ---------------------------------------------------------- | + | QueryExecutionLatency_us_50p | Histogram | Query execution latency in microseconds (50th percentile). | + | QueryExecutionLatency_us_90p | Histogram | Query execution latency in microseconds (90th percentile). | + | QueryExecutionLatency_us_99p | Histogram | Query execution latency in microseconds (99th percentile). | + +### Query type metrics + + | Name | Type | Description | + | -------------- | ------- | -------------------------------------- | + | ReadQuery | Counter | Number of read-only queries executed. | + | WriteQuery | Counter | Number of write-only queries executed. | + | ReadWriteQuery | Counter | Number of read-write queries executed. | + +### Session metrics + + | Name | Type | Description | + | ----------------------- | ------- | --------------------------------------- | + | ActiveSessions | Counter | Number of active connections. | + | ActiveBoltSessions | Counter | Number of active Bolt connections. | + | ActiveTCPSessions | Counter | Number of active TCP connections. | + | ActiveSSLSessions | Counter | Number of active SSL connections. | + | ActiveWebSocketSessions | Counter | Number of active websocket connections. | + | BoltMessages | Counter | Number of Bolt messages sent. | + +### Snapshot metrics + + | Name | Type | Description | + | ------------------------------ | --------- | ------------------------------------------------------------ | + | SnapshotCreationLatency_us_50p | Histogram | Snapshot creation latency in microseconds (50th percentile). | + | SnapshotCreationLatency_us_90p | Histogram | Snapshot creation latency in microseconds (90th percentile)- | + | SnapshotCreationLatency_us_99p | Histogram | Snapshot creation latency in microseconds (99th percentile). | + | SnapshotRecoveryLatency_us_50p | Histogram | Snapshot recovery latency in microseconds (50th percentile). | + | SnapshotRecoveryLatency_us_90p | Histogram | Snapshot recovery latency in microseconds (90th percentile). | + | SnapshotRecoveryLatency_us_99p | Histogram | Snapshot recovery latency in microseconds (99th percentile). | + + +### Stream metrics + + | Name | Type | Description | + | ---------------- | ------- | ------------------------------------- | + | StreamsCreated | Counter | Number of streams created. | + | MessagesConsumed | Counter | Number of consumed streamed messages. | + +### Transaction metrics + + | Name | Type | Description | + | ---------------------- | ------- | ------------------------------------------------------------------------------- | + | ActiveTransactions | Counter | Number of active transactions. | + | CommitedTransactions | Counter | Number of committed transactions. | + | RollbackedTransactions | Counter | Number of rollbacked transactions. | + | FailedQuery | Counter | Number of times executing a query failed (either during parse time or runtime). | + +### Trigger metrics + + | Name | Type | Description | + | ---------------- | ------- | ---------------------------- | + | TriggersCreated | Counter | Number of Triggers created. | + | TriggersExecuted | Counter | Number of Triggers executed. | + +## Example response + +If there aren't any modifying configurations, by sending a GET request to `localhost:9091` in the +local Memgraph build will result in a response similar to the one below. + +```json +{ + "General": { + "average_degree": 0.0, + "disk_usage": 1417846, + "edge_count": 0, + "memory_usage": 36937728, + "vertex_count": 0 + }, + "Index": { + "ActiveLabelIndices": 0, + "ActiveLabelPropertyIndices": 0 + }, + "Operator": { + "AccumulateOperator": 0, + "AggregateOperator": 0, + "ApplyOperator": 0, + "CallProcedureOperator": 0, + "CartesianOperator": 0, + "ConstructNamedPathOperator": 0, + "CreateExpandOperator": 0, + "CreateNodeOperator": 0, + "DeleteOperator": 0, + "DistinctOperator": 0, + "EdgeUniquenessFilterOperator": 0, + "EmptyResultOperator": 0, + "EvaluatePatternFilterOperator": 0, + "ExpandOperator": 0, + "ExpandVariableOperator": 0, + "FilterOperator": 0, + "ForeachOperator": 0, + "LimitOperator": 0, + "MergeOperator": 0, + "OnceOperator": 0, + "OptionalOperator": 0, + "OrderByOperator": 0, + "ProduceOperator": 0, + "RemoveLabelsOperator": 0, + "RemovePropertyOperator": 0, + "ScanAllByIdOperator": 0, + "ScanAllByLabelOperator": 0, + "ScanAllByLabelPropertyOperator": 0, + "ScanAllByLabelPropertyRangeOperator": 0, + "ScanAllByLabelPropertyValueOperator": 0, + "ScanAllOperator": 0, + "SetLabelsOperator": 0, + "SetPropertiesOperator": 0, + "SetPropertyOperator": 0, + "SkipOperator": 0, + "UnionOperator": 0, + "UnwindOperator": 0 + }, + "Query": { + "QueryExecutionLatency_us_50p": 0, + "QueryExecutionLatency_us_90p": 0, + "QueryExecutionLatency_us_99p": 0 + }, + "QueryType": { + "ReadQuery": 0, + "ReadWriteQuery": 0, + "WriteQuery": 0 + }, + "Session": { + "ActiveBoltSessions": 0, + "ActiveSSLSessions": 0, + "ActiveSessions": 0, + "ActiveTCPSessions": 0, + "ActiveWebSocketSessions": 0, + "BoltMessages": 0 + }, + "Snapshot": { + "SnapshotCreationLatency_us_50p": 4860, + "SnapshotCreationLatency_us_90p": 4860, + "SnapshotCreationLatency_us_99p": 4860, + "SnapshotRecoveryLatency_us_50p": 628, + "SnapshotRecoveryLatency_us_90p": 628, + "SnapshotRecoveryLatency_us_99p": 628 + }, + "Stream": { + "MessagesConsumed": 0, + "StreamsCreated": 0 + }, + "Transaction": { + "ActiveTransactions": 0, + "CommitedTransactions": 0, + "FailedQuery": 0, + "RollbackedTransactions": 0 + }, + "Trigger": { + "TriggersCreated": 0, + "TriggersExecuted": 0 + } +} +``` \ No newline at end of file diff --git a/docs2/deployment/ldap-security.md b/docs2/deployment/ldap-security.md new file mode 100644 index 00000000000..6069cf193ee --- /dev/null +++ b/docs2/deployment/ldap-security.md @@ -0,0 +1,330 @@ +--- +id: ldap-security +title: LDAP Security (Enterprise) +sidebar_label: LDAP Security +--- + +[![Related - How to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/manage-users-using-ldap.md) + +For the purpose of supporting LDAP authentication and (optional) +authorization, we have built an auth module that is packaged with Memgraph +Enterprise. For more information about auth modules see the +[reference guide](../reference-guide/auth-module.md). + +The module supports two operation modes: +- authentication only (LDAP bind request) +- authentication and authorization (LDAP bind and search requests) + +## Authentication + +When using LDAP authentication the module builds the DN used for authentication +using the user specified username and the following formula: +```plaintext +DN = prefix + username + suffix +``` +In most common situations the `prefix` will be `cn=` and the `suffix` will be +`,dc=example,dc=com`. With an example username `alice` that would yield a DN +equal to `cn=alice,dc=example,dc=com` which will then be used for the LDAP bind +operation with the user specified password. + +## Authorization + +Authentication is performed in the same way as above. After the user is +authenticated, the module searches through the role mapping root DN object that +contains role mappings. A role mapping object that has the current bound user +as its `member` attribute is used as the user's role. The role that is mapped +to the user is the `CN` attribute of the role mapping object. The attribute +that contains the user DN in the mapping object, as well as the attribute that +contains the role name, can be changed in the module configuration file to +accommodate your LDAP schema. + +Note: When searching for a role in directories that have thousands of roles, +the search process could take a long time. That could cause long login times. + +## Module requirements + +The module is written in Python 3 and it must be installed on the server for +you to be able to use it. The Python version should be at least `3.5`. Also, +you must have the following Python 3 libraries installed: + - `ldap3` - used to communicate with the LDAP server + - `PyYAML` - used to parse the configuration file + +## Module configuration + +The module configuration file is `/etc/memgraph/auth_module/ldap.yaml`. An +initial example configuration file that has all settings documented and +explained is `/etc/memgraph/auth_module/ldap.example.yaml`. You can copy the +example configuration file into the module configuration file to get you up and +running quickly. + +## Database configuration + +In order to enable use of the LDAP authentication and authorization module you +have to specify to Memgraph to use it. You should specify the flag +`--auth-module-executable /usr/lib/memgraph/auth_module/ldap.py`. + +Other flags that change the behavior of the database to module integration +can be specified according to your needs. + +## Manage authentication and authorization + +In large organizations it is often difficult to manage permissions that staff +members have in the organization. Organizations typically use an LDAP server +to hold and manage the permissions. Because LDAP servers are already set-up in +most large organizations, it is convenient for the organization to allow all +staff members to have access to the database using the already available +centralized user management system. + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/ldap-security.md) + +:::warning +This is an Enterprise feature. +Once the Memgraph Enterprise license expires, newly created users will be granted all privileges. +The existing users' privileges will still apply but you won't be able to manage them. +::: + +For this guide let's assume that we have an LDAP server that is serving the +following data: + +```plaintext +# Users root entry +dn: ou=people,dc=memgraph,dc=com +objectclass: organizationalUnit +objectclass: top +ou: people + +# User dba +dn: cn=dba,ou=people,dc=memgraph,dc=com +cn: dba +objectclass: person +objectclass: top +sn: user +userpassword: dba + +# User alice +dn: cn=alice,ou=people,dc=memgraph,dc=com +cn: alice +objectclass: person +objectclass: top +sn: user +userpassword: alice + +# User bob +dn: cn=bob,ou=people,dc=memgraph,dc=com +cn: bob +objectclass: person +objectclass: top +sn: user +userpassword: bob + +# User carol +dn: cn=carol,ou=people,dc=memgraph,dc=com +cn: carol +objectclass: person +objectclass: top +sn: user +userpassword: carol + +# User dave +dn: cn=dave,ou=people,dc=memgraph,dc=com +cn: dave +objectclass: person +objectclass: top +sn: user +userpassword: dave + +# Roles root entry +dn: ou=roles,dc=memgraph,dc=com +objectclass: organizationalUnit +objectclass: top +ou: roles + +# Role moderator +dn: cn=moderator,ou=roles,dc=memgraph,dc=com +cn: moderator +member: cn=alice,ou=people,dc=memgraph,dc=com +objectclass: groupOfNames +objectclass: top + +# Role admin +dn: cn=admin,ou=roles,dc=memgraph,dc=com +cn: admin +member: cn=carol,ou=people,dc=memgraph,dc=com +member: cn=dave,ou=people,dc=memgraph,dc=com +objectclass: groupOfNames +objectclass: top +``` + +To summarize, in this dataset we have the following data: +- `ou=people,dc=memgraph,dc=com` - entry that holds all users + - `cn=dba,ou=people,dc=memgraph,dc=com` - user `dba` that will be used as the database administrator + - `cn=alice,ou=people,dc=memgraph,dc=com` - regular user `alice` + - `cn=bob,ou=people,dc=memgraph,dc=com` - regular user `bob` + - `cn=carol,ou=people,dc=memgraph,dc=com` - regular user `carol` + - `cn=dave,ou=people,dc=memgraph,dc=com` - regular user `dave` +- `ou=roles,dc=memgraph,dc=com` - entry that holds all roles + - `cn=moderator,ou=roles,dc=memgraph,dc=com` - role `moderator` that has `alice` as its member + - `cn=admin,ou=roles,dc=memgraph,dc=com` - role `admin` that has `carol` and `dave` as its members + +For detailed information about the LDAP integration you should first see the +reference guide: +[LDAP security](../reference-guide/ldap-security.md). + +### Authentication + +Before enabling LDAP authentication, Memgraph should be prepared for the +integration. Here we assume that you have an already running Memgraph instance +that doesn't have any users in its local authentication storage. For more +details on how the native authentication storage works in Memgraph you should +see: [User privileges](./manage-user-privileges.md). + +First you should create the user that should be the database administrator. It +is important to have in mind that the username that you create *must* exist in +the LDAP directory. For the described LDAP directory we will connect to the +database and issue the following queries all in the same connection: +```cypher +CREATE USER dba; +GRANT ALL PRIVILEGES TO dba; +``` +After the user is created and all privileges are granted, it is safe to +disconnect from the database and proceed with LDAP integration. + +To enable LDAP integration you should specify the following flag to Memgraph: +```plaintext +--auth-module-executable=/usr/lib/memgraph/auth_module/ldap.py +``` + +You should also have the following LDAP module configuration in +`/etc/memgraph/auth_module/ldap.yaml`: +```yaml +server: + host: "" + port: + encryption: "disabled" + cert_file: "" + key_file: "" + ca_file: "" + validate_cert: false + +users: + prefix: "cn=" + suffix: ",ou=people,dc=memgraph,dc=com" + +roles: + root_dn: "" + root_objectclass: "" + user_attribute: "" + role_attribute: "" +``` +You should adjust the security settings according to your LDAP server security +settings. + +After setting these configuration options you should restart your Memgraph +instance. + +Now you can verify that you can still log in to the database using username +`dba` and password `dba`. + +Issuing `SHOW USERS;` should list that currently only user `dba` exists. This +is normal. It means that LDAP authentication is successfully enabled (because +you were able to log in) and no other users have yet logged in. + +You should now be able to log in using username `alice` and password `alice`. +Because Alice has never before logged in to the database a new user will be +created for Alice and she won't have any privileges (yet). + +Using user `dba` we modify Alice's privileges to include the `MATCH` privilege. +```cypher +GRANT MATCH TO alice; +``` + +After Alice logs in again into the database (to refresh her privileges) she +will be able to execute the following query: +```cypher +MATCH (n) RETURN n; +``` + +Issuing `SHOW USERS;` as `dba` should now yield both `dba` and `alice`. + +Users Bob, Carol and Dave will also be able to log in to the database using +their LDAP password. As with Alice, their users will be created and won't have +any privileges. + +If automatic user account creation is disabled using the database flag: +```plaintext +--auth-ldap-create-user=false +``` +The database administrator (user `dba`) will first have to explicitly create +the users that he wishes to allow to connect to the database: +```cypher +CREATE USER alice; +CREATE USER bob; +``` + +In this scenario only Alice and Bob will be allowed to log in to the database +because they already have existing user accounts, but users Carol and Dave +won't be able to log in. + +### Authorization + +In the previous example users could only authenticate using LDAP. In this +example we will explain how to set-up the LDAP auth module to deduce the user's +role using LDAP search queries. + +First, you should enable and verify that user authentication works. To enable +role mapping for the described LDAP schema, we will modify the LDAP auth module +configuration file, specifically the section `roles`, to have the following +content: +```yaml +roles: + root_dn: "ou=roles,dc=memgraph,dc=com" + root_objectclass: "groupOfNames" + user_attribute: "member" + role_attribute: "cn" +``` +This configuration tells the LDAP module that all role mapping entries are +children of the `ou=roles,dc=memgraph,dc=com` entry, that the children have +user DNs specified in their `member` attribute and that the `cn` attribute +should be used to determine the role name. + +When a user logs in to the database, the LDAP auth module will go through all +role mapping entries and will try to find out which role mapping entry has the +user as its member. + +So now when Alice logs in, the LDAP auth module will go through the following +entries: `cn=admin,ou=roles,dc=memgraph,dc=com` and +`cn=moderator,ou=roles,dc=memgraph,dc=com`. Because Alice is a member of the +`moderator` role mapping, the LDAP auth module will assign role moderator to +Alice. + +Now as the user `dba` we can issue `SHOW ROLE FOR alice;` and we will see that +indeed Alice now has the role `moderator`. + +Permissions for users and roles are still managed through Memgraph, they can't +be managed through the LDAP server. + +If automatic role creation is disabled using the flag: +```plaintext +--auth-ldap-create-role=false +``` +The database administrator (user `dba`) will first have to explicitly create +the role for users that he wishes to allow to connect to the database: +```cypher +CREATE ROLE moderator; +``` + +In this scenario only Alice and Bob will be allowed to log in. Alice will be +allowed to log in because her role (moderator) already exists. Bob will be +allowed to log in because he doesn't have any role. Carol and Dave won't be +allowed to log in because their role (administrator) doesn't exist. + +If both automatic role creation and automatic user creation are disabled, then +both the user and the role must exist for a user to successfully log in to the +database. + +## Where to next? + +To learn more about Memgraph's functionalities, visit the **[Reference +guide](/reference-guide/overview.md)**. For real-world examples of how to use +Memgraph, we strongly suggest going through one of the available +**[Tutorials](/tutorials/overview.md)**. \ No newline at end of file diff --git a/docs2/deployment/metadata.md b/docs2/deployment/metadata.md new file mode 100644 index 00000000000..c09194d0a4c --- /dev/null +++ b/docs2/deployment/metadata.md @@ -0,0 +1,99 @@ +--- +id: metadata +title: Metadata +sidebar_label: Metadata +--- + +Bolt protocol specifies additional data that can be sent along with the +requested results. Such data is called metadata and can be divided into two +groups: + - Query Statistics + - Notifications + +Both of these metadata can be accessed through `summary` map that is being sent +along with the results of the query. Query statistics will be under `stats` key, +and notifications under `notifications` key. + +## Query Statistics + +Query statistics will be sent whenever a user executes a query that will affect +data in any way. In other words we will track the quantity of these changes +throughout the query execution and report it back to the user. + +The structure of statistics is a map of string keys and integer values. Data +that is being tracked: + + - `nodes-created` + - `nodes-deleted` + - `relationships-created` + - `relationships-deleted` + - `labels-added` + - `labels-removed` + - `properties-set` + +This data will refer only to the changes done by the query, thus changes made in +triggers will not affect these values. + +:::caution Differences compared to triggers + +It is possible that after executing a query some of these counters are not +zero, however the regarding triggers are not triggered. The reason for that is +triggers are only triggered when there is a difference between the starting +and ending state, while the counters are also counting the not permanent +changes. + +For example if the query creates and deletes nodes like +`CREATE (n) DELETE n;`, then it leaves Memgraph in the same state as before. +The value will be 1 for both `nodes-created` and `nodes-deleted`, but the +triggers will not be triggered since there is no difference between in +states before and after the query is executed. + +::: + +## Notifications + +Notifications will be sent whether we want to confirm the results of query or +want to notify the user about possible wrong usage. Every notification is +represented as a dictionary with these possible values: + +Key|Value Type +:-:|:-: +severity|String|/ +code|String|/ +title|String|/ +description|String|/ + +In order to enable users to handle these notifications however they see fit, we +will introduce possible values for severity and code notifications attributes. +Title and description values will depend on query and the query values and +should be used only as messages. + +### Severity + + - `INFO` + - `WARNING` + +### Code + +- `CreateConstraint` +- `CreateIndex` +- `CreateStream` +- `CheckStream` +- `CreateTrigger` +- `DropConstraint` +- `DropReplica` +- `DropIndex` +- `DropStream` +- `DropTrigger` +- `ConstraintAlreadyExists` +- `IndexAlreadyExists` +- `LoadCSVTip` +- `IndexDoesNotExist` +- `ConstraintDoesNotExist` +- `RegisterReplica` +- `ReplicaPortWarning` +- `SetReplica` +- `StartStream` +- `StartAllStreams` +- `StopStream` +- `StopAllStreams` diff --git a/docs2/deployment/monitoring-server.md b/docs2/deployment/monitoring-server.md new file mode 100644 index 00000000000..c9b4d1749b0 --- /dev/null +++ b/docs2/deployment/monitoring-server.md @@ -0,0 +1,97 @@ +--- +id: monitoring-server +title: Monitoring server +sidebar_label: Monitoring server +--- + +Memgraph allows you to connect to its monitoring server via WebSocket and +receive certain information from it. +For example, each log will be forwarded to all the connected clients. + +## Connecting + +To connect to Memgraph's WebSocket Server use the following URL: + +```plaintext +ws://host:port +``` + +The default host used is `localhost` but that can be changed using the +`--monitoring-host` configuration flag. +The default port used is `7444` but that can be changed using the +`--monitoring-port` configuration flag. + +### Connecting with a secure connection (WSS) + +As for the Bolt connection, SSL is also supported. Same flags are used for both +types of connection - `--bolt-cert-file` and `--bolt-key-file`. + +If both of them are set, you will need to use the following URL to connect to +the WebSocket server: + +```plaintext +wss://host:port +``` + +## Authentication + +If the authentication is used, Memgraph won't send the message to a certain +connection until it's authenticated. + +To authenticate, a JSON with the credentials in the following format is +required: + +```json +{ + "username": "", + "password": "" +} +``` + +If the credentials are valid, the connection will be made, and the client will +receive the messages. As a response, the client should receive the following +message: + +```json +{ + "success": true, + "message": "User has been successfully authenticated!" +} +``` + +If they are invalid or the first message is in the invalid format, the +connection is dropped. As a response, the following message is sent: + +```json +{ + "success": false, + "message": "" +} +``` + +:::info + +If authentication is not used (there are no users present in Memgraph), +no authentication message is expected, and no response will be returned. + +::: + +### Authorization (Enterprise) + +Permission for connecting through WebSocket is controlled by the `WEBSOCKET` +privilege. + +## Messages + +### Logs + +Each log that is written to the log file is forwarded to the connected clients +in the following format: + +```json +{ + event: "log", + level: "trace"|"debug"|"info"|"warning"|"error"|"critical", + message: "" +} +``` diff --git a/docs2/deployment/replication.md b/docs2/deployment/replication.md new file mode 100644 index 00000000000..0af69146629 --- /dev/null +++ b/docs2/deployment/replication.md @@ -0,0 +1,650 @@ +--- +id: replication +title: Replication +sidebar_label: Replication +--- + +:::caution + +Memgraph 2.9 introduced a new configuration flag +`--replication-restore-state-on-startup` which is `false` by default. + +If you want instances to remember their role and configuration in a replication +cluster upon restart, the `--replication-restore-state-on-startup` needs to be +set to `true` when first initializing the instances and remain `true` throughout +the instances' lifetime. + +When reinstating a cluster it is advised to first initialize the MAIN +instance, then the REPLICA instances. + +::: + +When distributing data across several instances, Memgraph uses replication to +provide a satisfying ratio of the following properties, known from the CAP theorem: + +1. **Consistency** (C) - every node has the same view of data at a given point in + time +2. **Availability** (A) - all clients can find a replica of the data, even in the + case of a partial node failure +3. **Partition tolerance** (P) - the system continues to work as expected despite a + partial network failure + +In the replication process, the data is replicated from one storage (MAIN +instance) to another (REPLICA instances). + +:::info + +From version 2.4 it is no longer possible to specify a timeout when registering +a sync replica. To mimic this behavior in higher releases, please use ASYNC +replication instead. + +::: + + +[![Related - How +to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/replication.md) +[![Related - Under the +Hood](https://img.shields.io/static/v1?label=Related&message=Under%20the%20hood&color=orange&style=for-the-badge)](/under-the-hood/replication.md) +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/implementing-data-replication) + + +## Data replication implementation basics + +In Memgraph, all instances are MAIN upon starting. When creating a replication +cluster, one instance has to be chosen as the MAIN instance. The rest of the +instances have to be demoted to REPLICA roles and have a port defined using a +Cypher query. + +If you want instances to remember their role and configuration in a replication +cluster upon restart, they need to be initialized with the +`--replication-restore-state-on-startup` set to `true` and remain `true` +throughout the instances' lifetime. Otherwise and by default, restarted +instances will start as MAIN instances disconnected from any replication +cluster. + +Once demoted to REPLICA instances, they will no longer accept write queries. In +order to start the replication, each REPLICA instance needs to be registered +from the MAIN instance by setting [a replication +mode](/under-the-hood/replication.md#replication-modes) (SYNC or ASYNC) and +specifying the REPLICA instance's socket address. + +The replication mode defines the terms by which the MAIN instance can commit the +changes to the database, thus modifying the system to prioritize either +consistency or availability: + +- **SYNC** - After committing a transaction, the MAIN instance will communicate +the changes to all REPLICA instances running in SYNC mode and wait until it +receives a response or information that a timeout is reached. SYNC mode ensures +consistency and partition tolerance (CP), but not availability for writes. If +the primary database has multiple replicas, the system is highly available for +reads. But, when a replica fails, the MAIN instance can't process the write due +to the nature of synchronous replication. + +- **ASYNC** - The MAIN instance will commit a transaction without receiving + confirmation from REPLICA instances that they have received the same + transaction. ASYNC mode ensures system availability and partition tolerance (AP), + while data can only be eventually consistent. + +Once the REPLICA instances are registered, data storage of the MAIN instance is +replicated and synchronized using transaction timestamps and durability files +(snapshot files and WALs). Memgraph does not support replication of +authentication configurations, query and authentication modules, and audit logs. + +By using the timestamp, the MAIN instance knows the current state of the +REPLICA. If the REPLICA is not synchronized with the MAIN instance, the MAIN +instance sends the correct data for synchronization kept as deltas within WAL +files. Deltas are the smallest possible updates of the database, but they carry +enough information to synchronize the data on a REPLICA. Memgraph stores only +`remove` actions as deltas, for example, `REMOVE key:value ON node_id`. + +If the REPLICA is so far behind the MAIN instance that the synchronization using +WAL files and deltas within it is impossible, Memgraph will use snapshots to +synchronize the REPLICA to the state of the MAIN instance. + +## Running multiple instances + +When running multiple instances, each on its own machine, run Memgraph as you +usually would. + +If you are exploring replication and running multiple instances on one machine, +you can run Memgraph with Docker. Check [Docker run options for Memgraph +images](/memgraph/how-to-guides/work-with-docker#run-a-memgraph-docker-image) to +set up ports and volumes properly, if necessary. + +## Assigning roles + +Each Memgraph instance has the role of the MAIN instance when it is first +started. + +Also, by default, each crashed instance restarts as a MAIN instance disconnected +from any replication cluster. To change this behavior, set the +`--replication-restore-state-on-startup` to `true` when first initializing the +instance. + +### Assigning the REPLICA role + +Once you decide what instance will be the MAIN instance, all the other instances +that will serve as REPLICA instances need to be demoted and have the port set +using the following query: + +```plaintext +SET REPLICATION ROLE TO REPLICA WITH PORT ; +``` + +If you set the port of each REPLICA instance to `10000`, it will be easier to +register replicas later on because the query for registering replicas uses port +10000 as the default one. + +Otherwise, you can use any unassigned port between 1000 and 10000. + +### Assigning the MAIN role + +The replication cluster should only have one MAIN instance in order to avoid +errors in the replication system. If the original MAIN instance fails, you can +promote a REPLICA instance to be the new MAIN instance by running the following +query: + +```plaintext +SET REPLICATION ROLE TO MAIN; +``` + +If the original instance was still alive when you promoted a new MAIN, you need +to resolve any conflicts and manage replication manually. + +If you demote the new MAIN instance back to the REPLICA role, it will not +retrieve its original function. You need to [drop +it](#dropping-a-replica-instance) from the MAIN and register it again. + +If the crashed MAIN instance goes back online once a new MAIN is already +assigned, it cannot reclaim its previous role. It can be cleaned and +demoted to become a REPLICA instance of the new MAIN instance. + +### Checking the assigned role + +To check the replication role of an instance, run the following query: + +```plaintext +SHOW REPLICATION ROLE; +``` + +## Registering REPLICA instances + +Once all the nodes in the cluster are assigned with appropriate roles, you can +enable replication in the MAIN instance by registering REPLICA instances, +setting a replication mode (SYNC and ASYNC), and specifying +the REPLICA instance's socket address. Memgraph doesn't support chaining REPLICA +instances, that is, a REPLICA instance cannot be replicated on another REPLICA +instance. + +If you want to register a REPLICA instance with a SYNC replication mode, run the following query: + +```plaintext +REGISTER REPLICA name SYNC TO ; +``` + +If you want to register a REPLICA instance with an ASYNC replication mode, run +the following query: + +```plaintext +REGISTER REPLICA name ASYNC TO ; +``` + +The socket address must be a string value as follows: + +```plaintext +"IP_ADDRESS:PORT_NUMBER" +``` + +where `IP_ADDRESS` is a valid IP address, and `PORT_NUMBER` is a valid port +number, for example: + +```plaintext +"172.17.0.4:10050" +``` + +The default value of the `PORT_NUMBER` is `10000`, so if you set REPLICA roles +using that port, you can define the socket address specifying only the valid IP +address: + +```plaintext +"IP_ADDRESS" +``` + +Example of a `` using only `IP_ADDRESS`: + +```plaintext +"172.17.0.5" +``` + +When a REPLICA instance is registered, it will start replication in ASYNC mode +until it synchronizes to the current state of the database. Upon +synchronization, REPLICA instances will either continue working in the ASYNC +mode or reset to SYNC mode. + +### Listing all registered REPLICA instances + +You can check all the registered REPLICA instances and their details by running +the following query: + +```plaintext +SHOW REPLICAS; +``` + +### Dropping a REPLICA instance + +To drop a replica, run the following query: + +```plaintext +DROP REPLICA ; +``` + +## MAIN and REPLICA synchronization + +By comparing timestamps, the MAIN instance knows when a REPLICA instance is not +synchronized and is missing some earlier transactions. The REPLICA instance is +then set into a RECOVERY state, where it remains until it is [fully synchronized +with the MAIN instance](/under-the-hood/replication.md#synchronizing-instances). + +The missing data changes can be sent as snapshots or WAL files. Snapshot files +represent an image of the current state of the database and are much larger than +the WAL files, which only contain the changes, deltas. Because of the difference +in file size, Memgraph favors the WAL files. + +While the REPLICA instance is in the RECOVERY state, the MAIN instance +calculates the optimal synchronization path based on the REPLICA instance's +timestamp and the current state of the durability files while keeping the +overall size of the files necessary for synchronization to a minimum. + +## Set up a replication cluster + +In the replication process, the data is replicated from one storage (MAIN +instance) to another (REPLICA instances), thus providing a combination of +consistency, availability and partition tolerance when distributing data over +several instances. + +[![Related - Reference +Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/replication.md) +[![Related - Under the +Hood](https://img.shields.io/static/v1?label=Related&message=Under%20the%20hood&color=orange&style=for-the-badge)](/under-the-hood/replication.md) +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/implementing-data-replication) + +This example demonstrates how to create a simple cluster of nodes running +Memgraph instances, and set up replication using various replication modes. + +### Cluster topology + +The cluster will consist of three nodes, one MAIN instance and two REPLICA +instances. In order to showcase the creation of REPLICA instances with different +replication modes, we will create: + +- The MAIN instance - contains the original data that will be replicated to + REPLICA instances +- REPLICA instance 1 - replication in the SYNC mode +- REPLICA instance 2 - replication in the ASYNC mode + +### How to run multiple instances? + +If you are running multiple instances, each on its own machine, run Memgraph as +you usually would. + +If you are exploring replication and running multiple instances on one machine, run Memgraph Platform with Docker. + +Memgraph 2.9 introduced a new configuration flag +`--replication-restore-state-on-startup` which is `false` by default. + +If you want instances to remember their role and configuration in a replication +cluster upon restart, the `--replication-restore-state-on-startup` needs to be +set to `true` when first initializing the instances and remain `true` throughout +the instances' lifetime. + +The MAIN instance: + +``` +docker run -it -p 3000:3000 memgraph/memgraph-platform +``` + +REPLICA instance 1: + +``` +docker run -it -p 3001:3000 memgraph/memgraph-platform +``` + +REPLICA instance 2: + +``` +docker run -it -p 3002:3000 memgraph/memgraph-platform +``` + +You can connect to each instance with the [Memgraph Lab](/memgraph-lab) +in-browser application at: + +- the MAIN instance - `localhost:3000` +- REPLICA instance 1 - `localhost:3001` +- REPLICA instance 2 - `localhost:3002` + +If you need to define additional ports or volumes, check [Docker run options for Memgraph images](/memgraph/how-to-guides/work-with-docker#run-a-memgraph-docker-image). + +### How to demote an instance to a REPLICA role? + +Run the following query in both REPLICA instances to demote them to the +REPLICA role: + +``` +SET REPLICATION ROLE TO REPLICA WITH PORT 10000; +``` + +If you set the port of each REPLICA instance to `10000`, it will be easier to +register replicas later on because the query for registering replicas uses port +`10000` as the default one. + +Otherwise, you can use any unassigned port between 1000 and 10000. + +### How to register REPLICA instances? + +To register a REPLICA instance, you need to find out the IP address of each +instance. + +The IP addresses will probably be: + +- the MAIN instance - `172.17.0.2` +- REPLICA instance 1 - `172.17.0.3` +- REPLICA instance 2 - `172.17.0.4` + +If they are not, please change the IP addresses in the following queries to +match the [IP addresses on your cluster](/memgraph/how-to-guides/work-with-docker#how-to-retrieve-a-docker-container-ip-address). + +Then, run the following queries from the MAIN instance to register REPLICA +instances: + +1. REPLICA instance 1 at `172.17.0.3` + + ``` + REGISTER REPLICA REP1 SYNC TO "172.17.0.3"; + ``` + + REPLICA instance 1 is called REP1, its replication mode is SYNC, and it is + located at IP address `172.17.0.3.` with port `10000`. + + Once the MAIN instance commits a transaction, it will communicate the changes + to all REPLICA instances running in SYNC mode and wait until it receives a response that the changes have been applied to the REPLICAs or that a timeout has been reached. + + If you used any port other than `10000` while demoting a REPLICA instance, + you will need to specify it like this: "172.17.0.3:5000" + +2. REPLICA instance 2 at `172.17.0.4` + + ``` + REGISTER REPLICA REP2 ASYNC TO "172.17.0.4"; + ``` + + REPLICA instance 2 is called REP2, its replication mode is ASYNC, and it is + located at IP address `172.17.0.4.` with port `10000`. + + When the REPLICA instance is running in ASYNC mode the MAIN instance will + commit a transaction without receiving confirmation from REPLICA instances + that they have received the same transaction. ASYNC mode ensures system + availability and partition tolerance. + + If you used any port other than `10000` while demoting a REPLICA instance, + you will need to specify it like this: "172.17.0.4:5000" + +### How to check info about registered REPLICA instances? + +Check REPLICA instances by running the following query from the MAIN +instance: + +``` +SHOW REPLICAS; +``` + +### How to drop a REPLICA instance? + +To drop a replica, run the following query: + +```plaintext +DROP REPLICA ; +``` + +### How to promote a REPLICA instance to MAIN? + +To promote a REPLICA instance to MAIN, run the following query: + +```plaintext +SET REPLICATION ROLE TO MAIN; +``` + + +## Look under the hood of Memgraph's replication + +Uninterrupted data and operational availability in production systems are +critical and can be achieved in many ways. In Memgraph we opted for replication. + +[![Related - How +to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/replication.md) +[![Related - Reference +Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/replication.md) +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/implementing-data-replication) + +In distributed systems theory the CAP theorem, also named Brewer's theorem, +states that any distributed system can simultaneously guarantee two out of the +three properties: + +1. **Consistency** (C) - every node has the same view of data at a given point in + time +2. **Availability** (A) - all clients can find a replica of the data, even in the + case of a partial node failure +3. **Partition tolerance** (P) - the system continues to work as expected despite a + partial network failure + + + +Most of the Memgraph use-cases do not benefit from well-known algorithms that +strive to achieve all three CAP properties, such as Raft, because due to their +complexity they produce performance issues. Memgraph use-cases are based on +running analytical graph workloads on real-time data, demanding a simpler +concept such as **replication**. + +Replication consists of replicating data from one storage to one or several +other storages. The downside of its simplicity is that only two out of three CAP +properties can be achieved. + +### Replication implementation in Memgraph + +To enable replication, there must be at least two instances of Memgraph in a +cluster. Each instance has one of two roles: MAIN or REPLICA. The MAIN instance +accepts read and write queries to the database and REPLICA instances accept only +read queries. + +The changes or state of the MAIN instance are replicated to the REPLICA +instances in a SYNC or ASYNC mode. The SYNC mode ensures consistency and +partition tolerance (CP), but not availability for writes. The ASYNC mode +ensures system availability and partition tolerance (AP), while data can only be +eventually consistent. + +By using the timestamp, the MAIN instance knows the current state of the +REPLICA. If the REPLICA is not synchronized with the MAIN instance, the MAIN +instance sends the correct data for synchronization as WAL files. + +If the REPLICA is so far behind the MAIN instance that the synchronization using +WAL files is impossible, Memgraph will use snapshots. + +### Replication modes + +:::info + +From version 2.4 it is no longer possible to specify a timeout when registering +a SYNC replica. To mimic this behavior in higher releases, please use ASYNC +replication instead. + +::: + +Replication mode defines the terms by which the MAIN instance can commit the +changes to the database, thus modifying the system to prioritize either +consistency or availability. There are two possible replication modes +implemented in Memgraph replication: + +- SYNC +- ASYNC + + + +When a REPLICA instance is registered and added to the cluster, it will start +replicating in ASYNC mode. That will allow it to catch up to the current state +of the MAIN instance. When the REPLICA instance synchronizes with the MAIN +instance, the replication mode will change according to the mode defined during +registration. + +#### SYNC replication mode + +SYNC mode is the most straightforward replication mode in which the main storage +thread waits for the response and cannot continue until the response is +received or a timeout is reached. + +The following diagrams express the behavior of the MAIN instance in cases when +SYNC REPLICA doesn't answer within the expected timeout. + +##### SYNC REPLICA going down when creating index, uniqueness constraint or existence constraint + +![sync-replicas-down-when-creating-index-or-constraints](data/replication/workflow_diagram_data_definition_creation.drawio.png) + + +##### SYNC REPLICA going down when dropping index, uniqueness constraint or existence constraint + +![sync-replicas-down-when-dropping-index-or-constraints](data/replication/workflow_diagram_data_definition_dropping.drawio.png) + + +##### SYNC REPLICA going down adding/updating/deleting data + +![sync-replicas-down-when-modifying-data](data/replication/workflow_diagram_data_manipulation.drawio.png) + +#### ASYN replication mode + +In the ASYNC replication mode, the MAIN instance will commit a transaction +without receiving confirmation from REPLICA instances that they have received +the same transaction. This means that the MAIN instance does not wait for the +response from the REPLICA instances in the main thread but in some other thread. + +A new thread can be created every time a transaction needs to be replicated to +the REPLICA instance, but because transactions are committed often and use a lot +of resources, each REPLICA instance has one permanent thread connecting it with +the MAIN instance. Using this background thread, the MAIN instance pushes +replication tasks to the REPLICA instance, creates a custom thread pool pattern, +and receives confirmations of successful replication from the REPLICATION +instance. + + + +ASYNC mode ensures system availability and partition tolerance. + +### Synchronizing instances + +By comparing timestamps, the MAIN instance knows when a REPLICA instance is not +synchronized and is missing some earlier transactions. The REPLICA instance is +then set into a RECOVERY state, where it remains until it is fully synchronized +with the MAIN instance. + +The missing data changes can be sent as snapshots or WAL files. Snapshot files +represent an image of the current state of the database and are much larger than +the WAL files, which only contain the changes, deltas. Because of the difference +in file size, Memgraph favors the WAL files. + +While the REPLICA instance is in the RECOVERY state, the MAIN instance +calculates the optimal synchronization path based on the REPLICA instance's +timestamp and the current state of the durability files while keeping the +overall size of the files necessary for synchronization to a minimum. + + + +Imagine there were 5 changes made to the database. Each change is saved in a WAL +file, so there are 5 WAL files, and the snapshot was created after 2 changes. +The REPLICA instance can be synchronized using a snapshot and the 3 latest WAL +files or using 5 WAL files. Both options would correctly synchronize the +instances, but 5 WAL files are much smaller. + +The durability files are constantly being created, deleted, and updated. Also, +each replica could need a different set of files to sync. There are several ways +to ensure that the necessary files persist and that instances can read the WAL +files currently being updated without affecting the performance of the rest of +the database. + +#### Locking durability files + +Durability files are also used for recovery and are periodically deleted to +eliminate redundant data. The problem is that they can be deleted while they are +being used to synchronize a REPLICA with the MAIN instance. + +To delay the file deletion, Memgraph uses a file retainer that consists of +multiple lockers. Threads can store and lock the files they found while +searching for the optimal recovery path in the lockers, thus ensuring the files +will still exist once they are sent to the REPLICA instance as a part of the +synchronization process. If some other part of the system sends a deletion +request for a certain file, the file retainer first checks if that file is +locked in a locker. If it is not, it is deleted immediately. If the file is +locked, the file retainer adds the file to the deletion queue. The file retainer +will periodically clean the queue by deleting the files that are no longer +locked inside the locker. + +#### Writing and reading files simultaneously + +Memgraph internal file buffer is used when writing deltas to WAL files, and +mid-writing, the content of one WAL file can be divided across two locations. If +at that point that WAL file is used to synchronize the REPLICA instance, once +the data is being read from the internal buffer, the buffer can be flushed, and +the REPLICA could receive an invalid WAL file because it is missing a chunk of +data. It could also happen that the WAL file is sent before all the transactions +are written to the internal buffer. + +To avoid these issues, flushing of that internal buffer is disabled while the +current WAL is sent to a REPLICA instance. To get all the data necessary for the +synchronization, the replication thread reads the content directly from the WAL +file, then reads how many bytes are written in the buffer and copies the data to +another location. Then the flushing is enabled again, and the transaction is +replicated using the copied buffer. Because the access to the internal buffer +was not blocked, new data can be written. The content of the buffer (including +any new data) is then written in a new WAL file that will be sent in the next +synchronization process. + + + +#### Fixing timestamp consistency + +Timestamps are used to compare the state of the REPLICA instance in comparison +to the MAIN instance. + +At first, we used the current timestamp without increasing its value for global +operations, like creating an index or creating a constraint. By using a single +timestamp, it was impossible to know which operations the REPLICA had applied +because sequential global operations had the same timestamp. To avoid this +issue, a unique timestamp is assigned to each global operation. + +As replicas allow read queries, each of those queries was assigned with its own +timestamp. Those timestamps caused issues when the replicated write transactions +were assigned an older timestamp. A read transaction would return different data +from the same read query if a transaction was replicated between those two read +transactions which obstructed the snapshot isolation. To avoid this problem, the +timestamp on REPLICA instances isn't increased because the read transactions +don't produce any changes, so no deltas need to be timestamped. + +#### Incompatible instances + +To avoid issues when the durability files of two different database instances +are stored in the same folder, a unique ID is assigned to each storage instance. +The same ID is then assigned to the durability files. Replication uses the +instance ID to validate that the files and the database are compatible. + +A unique ID `epoch_id` is also assigned each time an instance is run as the MAIN +instance in the replication cluster to check if the data is compatible for +replication. The `epoch_id` is necessary when the original MAIN instance fails, +a REPLICA instance becomes a new MAIN, and after some time, the original MAIN +instance is brought back online. If no transactions were run on the original +MAIN instance, the difference in timestamps will indicate that it is behind the +new MAIN, and it would be impossible to set the original MAIN-REPLICA +relationship. But if the transactions were run on the original MAIN after it was +brought back online, the timestamp would be of no help, but the `epoch_id` would +indicate incomparability, thus preventing the original MAIN from reclaiming its +original role. + + diff --git a/docs2/deployment/security.md b/docs2/deployment/security.md new file mode 100644 index 00000000000..b01dad4c20a --- /dev/null +++ b/docs2/deployment/security.md @@ -0,0 +1,698 @@ +--- +id: security +title: Security (Enterprise) +sidebar_label: Security +--- + +[![Related - How to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](how-to-guides/manage-user-privileges.md) + +Before reading this article we highly recommend going through a how-to guide +on [managing user privileges](../how-to-guides/manage-user-privileges.md) +which contains more thorough explanations of the concepts behind `openCypher` +commands listed in this article. + +## Users + +Creating a user can be done by executing the following command: + +```cypher +CREATE USER user_name [IDENTIFIED BY 'password']; +``` + +If the username is an email address, you need to enclose it in backticks (``` ` ```): + +```cypher +CREATE USER `alice@memgraph.com` IDENTIFIED BY '0042'; +``` + +If the user should authenticate themself on each session, i.e. provide their +password on each session, the part within the brackets is mandatory. Otherwise, +the password is set to `null` and the user will be allowed to log-in using +any password provided that they provide the correct username. + +You can also set or alter a user's password anytime by issuing the following +command: + +```cypher +SET PASSWORD FOR user_name TO 'new_password'; +``` + +Removing a user's password, i.e. allowing the user to log-in using any +password can be done by setting it to `null` as follows: + +```cypher +SET PASSWORD FOR user_name TO null; +``` +To delete a user use: + +```cypher +DROP USER user_name; +``` + +### Password encryption algorithm + +You can choose between `bcrypt`, `sha256`, and `sha256-multiple` password encryption algorithms. SHA256 offers better performance compared to the more secure but less performant bcrypt. Change the encryption algorithm by setting the [`--password-encryption-algorithm`](/reference-guide/configuration.md#other) configuration option to the preferred value. + +## User Roles + +Each user can be assigned at most one user role. One can think of user roles +as abstractions which capture the privilege levels of a set of users. For +example, suppose that `Dominik` and `Marko` belong to upper management of +a certain company. It makes sense to grant them a set of privileges that other +users are not entitled to so, instead of granting those privileges to each +of them, we can create a role with those privileges called `manager` +which we assign to `Dominik` and `Marko`. + +In other words, Each privilege that is granted to a user role is automatically +granted to a user (unless it has been explicitly denied to that user). +Similarly, each privilege that is denied to a user role is automatically denied +to a user (even if it has been explicitly granted to that user). + +Creating a user role can be done by executing the following command: + +```cypher +CREATE ROLE role_name; +``` + +Assigning a user role to a certain user can be done by the following command: + +```cypher +SET ROLE FOR user_name TO role_name; +``` + +Removing the role from the user can be done by: + +```cypher +CLEAR ROLE FOR user_name; +``` + +Finally, showing all users that have a certain role can be done as: + +```cypher +SHOW USERS FOR role_name; +``` + +Similarly, querying which role a certain user has can be done as: + +```cypher +SHOW ROLE FOR user_name; +``` + +## Privileges + +At the moment, privileges are confined to users' abilities to perform certain +`OpenCypher` queries. Namely users can be given permission to execute a subset +of the following commands: `CREATE`, `DELETE`, `MATCH`, `MERGE`, `SET`, +`REMOVE`, `INDEX`, `STATS`, `AUTH`, `REPLICATION`, `READ_FILE`, `DURABILITY`, +`FREE_MEMORY`, `TRIGGER`, `STREAM`, `CONFIG`, `CONSTRAINT`, `DUMP`, +`MODULE_READ`, `MODULE_WRITE`, `WEBSOCKET`, `TRANSACTION_MANAGEMENT` and `STORAGE_MODE`. + + +Granting a certain set of privileges to a specific user or user role can be +done by issuing the following command: + +```cypher +GRANT privilege_list TO user_or_role; +``` + +For example, granting `AUTH` and `INDEX` privileges to users with the role +`moderator` would be written as: + +```cypher +GRANT AUTH, INDEX TO moderator: +``` + +Similarly, denying privileges is done using the `DENY` keyword instead of +`GRANT`. + +Both denied and granted privileges can be revoked, meaning that their status is +not defined for that user or role. Revoking is done using the `REVOKE` keyword. +The users should note that, although semantically unintuitive, the level of a +certain privilege can be raised by using `REVOKE`. For instance, suppose a user +has been denied a `INDEX` privilege, but the role it belongs to is granted +that privilege. Currently, the user is unable to use indexing features, +but, after revoking the user's `INDEX` privilege, they will be able to do so. + +Finally, if you wish to grant, deny or revoke all privileges and find it tedious +to explicitly list them, you can use the `ALL PRIVILEGES` construct instead. +For example, revoking all privileges from user `jdoe` can be done with the +following command: + +```cypher +REVOKE ALL PRIVILEGES FROM jdoe; +``` + +Finally, obtaining the status of each privilege for a certain user or role can be +done by issuing the following command: + +```cypher +SHOW PRIVILEGES FOR user_or_role; +``` + +## Owners + +The privileges of the owners of +[streams](/reference-guide/streams/overview.md#creating-a-stream) and +[triggers](/reference-guide/triggers.md#owner) are propagated to the +corresponding query executions: +- in case of streams for the queries returned by the transformations +- in case of triggers for trigger statements + +This means the execution of the queries will fail if the owner doesn't have the +required privileges. There are a few important details: +- If there are no existing users, no privilege check is performed similarly to +regular queries. +- If a stream or trigger is created without using a logged-in user +session, the owner will be `Null`. From the point when the first user is created +such streams and triggers will fail because the lack of owner is treated as a +user without any privileges, so no queries are allowed to be executed. +- Currently, there is no way of changing the owner. The only workaround for this +is to delete the stream or trigger and then create it again with another user. + +## Streams +The user who executes the `CREATE STREAM` query is going to be the owner of the stream. +Authentication and authorization are not supported in Memgraph Community, thus +the owner will always be `Null`, and the privileges are not checked in Memgraph +Community. In Memgraph Enterprise the privileges of the owner are used when +executing the queries returned from a transformation, in other words, the +execution of the queries will fail if the owner doesn't have the required +privileges. More information about how the owner affects the stream can be +found in the [reference guide](/reference-guide/streams/overview.md#create-a-stream). + +## Label-based access control +Sometimes, disabling users from executing certain commands is too restrictive. +Label-based access control enables database administrators to disable users from +viewing or manipulating nodes with certain labels and relationships of certain types. + +[![Related - How to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/manage-label-based-access-control.md) + +Label-based permissions are divided into 4 hierarchical parts or levels: +- `NOTHING` - denies user visibility and manipulation over nodes and relationships +- `READ` - grants the user visibility over nodes and relationships +- `UPDATE` - grants the user visibility and the ability to edit nodes and relationships +- `CREATE_DELETE` - grants the user visibility, editing, creation, and deletion of a node or a +relationship + +### Node permissions + +Granting a certain set of node permissions can be done similarly to the clause +privileges using the following command: + +```cypher +GRANT permission_level ON LABELS label_list TO user_or_role; +``` + +with the legend: +- `permission_level` is either `NOTHING`, `READ`, `UPDATE` or `CREATE_DELETE` +- `label_list` is a set of node labels, separated with a comma and with a colon in front of +each label (e.g. `:L1`), or `*` for specifying all labels in the graph +- `user_or_role` is the already created user or role in Memgraph + +For example, granting a `READ` permission on labels `L1` and `L2` would be written as: + +```cypher +GRANT READ ON LABELS :L1, :L2 TO charlie; +``` + +while granting both `READ` and `EDIT` permissions for all labels in the graph, would be written as: + +```cypher +GRANT UPDATE ON LABELS * TO charlie; +``` + +For denying visibility to a node, the command would be written as: + +```cypher +GRANT NOTHING ON LABELS :L1 TO charlie; +``` + +### Relationship permissions +Relationship permission queries are in essence the same as node permission queries, with the +one difference that the name of the relationship type is `EDGE_TYPE` and not `LABEL`. + +Granting a certain set of edge type permissions can be done similarly to the +clause privileges by issuing the following command: + +```cypher +GRANT permission_level ON EDGE_TYPES edge_type_list TO user_or_role; +``` + +with the same legend as the node permissions. + +For example, granting a `READ` permission on relationship type `:CONNECTS` would be written as: + +```cypher +GRANT READ ON EDGE_TYPES :CONNECTS TO charlie; +``` + +### Revoking label-based permissions +To revoke any of the label-based permissions, users can use one of the following commands: + +```cypher +REVOKE (LABELS | EDGE_TYPES) label_or_edge_type_list FROM user_or_role +``` + +where: +- `label_or_edge_type_list` is a list of labels or edge types with a colon in front of each +label or edge type (or `*` for specifying all labels or edge types) +- `user_or_role` is the existing user or role in Memgraph + +### Show privileges for label-based access control +To check which privileges an existing user or role has in Memgraph, it is enough to write + +```cypher +SHOW PRIVILEGES FOR user_or_role; +``` + +and all the values of clause privileges, as well as label-based permissions will be displayed. + +## Manage user privileges + +:::warning +This is an Enterprise feature. +Once the Memgraph Enterprise license expires, newly created users will be granted all privileges. +The existing users' privileges will still apply but you won't be able to manage them. +::: + +Most databases have multiple users accessing and modifying +data within the database. This might pose a serious security concern for the +system administrators that wish to grant only certain privileges to certain +users. A typical example would be an internal database of some company which +tracks data about their employees. Naturally, only certain users of the database +should be able to perform queries which modify that data. + +At Memgraph, we provide the administrators with the option of granting, +denying or revoking a certain set of privileges to some users or groups of users +(i.e. users that are assigned a specific user role), thereby eliminating such +security concerns. + +By default, anyone can connect to Memgraph and is granted all privileges. +After the first user is created, Memgraph will execute a query if and only +if either a user or its role is granted that privilege and neither the +user nor its role are denied that privilege. Otherwise, Memgraph will not +execute that specific query. Note that `DENY` is a stronger +operation than `GRANT`. This is also notable from the fact that if neither the +user nor its role are explicitly granted or denied a certain privilege, that +user will not be able to perform that specific query. This effect also is known +as a silent deny. The information above is neatly condensed in the following +table: + +User Status | Role Status | Effective Status +------------|-------------|------------------ +GRANT | GRANT | GRANT +GRANT | DENY | DENY +GRANT | NULL | GRANT +DENY | GRANT | DENY +DENY | DENY | DENY +DENY | NULL | DENY +NULL | GRANT | GRANT +NULL | DENY | DENY +NULL | NULL | DENY + +All supported commands that deal with accessing or modifying users, user +roles and privileges can only be executed by users that are granted the +`AUTH` privilege. All of those commands are listed in the appropriate +[reference guide](../reference-guide/security.md). + +At the moment, privileges are confined to users' abilities to perform certain +`OpenCypher` queries. Namely users can be given permission to execute a subset +of the following commands: `CREATE`, `DELETE`, `MATCH`, `MERGE`, `SET`, +`REMOVE`, `INDEX`, `STATS`, `AUTH`, `REPLICATION`, `READ_FILE`, `DURABILITY`, +`FREE_MEMORY`, `TRIGGER`, `STREAM`, `CONFIG`, `CONSTRAINT`, `DUMP`, +`MODULE_READ`, `MODULE_WRITE`, `WEBSOCKET` and `TRANSACTION_MANAGEMENT`. + +We could naturally cluster those privileges into groups: + + * Privilege to access data (`MATCH`) + * Privilege to modify data (`MERGE`, `SET`) + * Privilege to create and delete data (`CREATE`, `DELETE`, `REMOVE`) + * Privilege to index data (`INDEX`) + * Privilege to obtain statistics and information from Memgraph (`STATS`) + * Privilege to view and alter users, roles and privileges (`AUTH`) + * Privilege to use replication queries (`REPLICATION`) + * Privilege to access files in queries, e.g. `LOAD CSV` clause (`READ_FILE`) + * Privilege to manage durability files (`DURABILITY`) + * Privilege to try freeing memory (`FREE_MEMORY`) + * Privilege to use trigger queries (`TRIGGER`) + * Privilege to use stream queries (`STREAM`) + * Privilege to configure Memgraph during runtime and to attain the configuration of the given Memgraph instance(`CONFIG`) + * Privilege to read the content of Python query module files (`MODULE_READ`) + * Privilege to modify the content of Python query modules files (`MODULE_WRITE`) + * Privilege to connect to [Memgraph monitoring server](/reference-guide/monitoring-server.md) (`WEBSOCKET`) + * Privilege to show and terminate transactions (`TRANSACTION_MANAGEMENT`). + * Privilege to change storage mode (`STORAGE_MODE`). + +If you are unfamiliar with any of these commands, you can look them up in our +[Cypher manual](/cypher-manual). + +Similarly, the complete list of commands which can be executed under `AUTH` +privilege can be viewed in the +[appropriate article](../reference-guide/security.md) within our reference +guide. + +The remainder of this article outlines a recommended workflow of +user management within an internal database of a fictitious company. + +### Creating an administrator + +After the first user is created, Memgraph will grant all the privileges to them. +Therefore, let's create a user named `admin` and set its' password to `0000`. +This can be done by executing: + +```cypher +CREATE USER admin IDENTIFIED BY '0000'; +``` + +### Creating other users + +Our fictitious company is internally divided into teams, and each team has +its own supervisor. All employees of the company need to access and modify +data within the database. + +Creating a user account for a new hire named Alice can be done as follows: + +```cypher +CREATE USER alice IDENTIFIED BY '0042'; +``` + +If the username is an email address, you need to enclose it in backticks (``` ` ```): + +```cypher +CREATE USER `alice@memgraph.com` IDENTIFIED BY '0042'; +``` + +Alice should also be granted a privilege to access data, which can be done by +executing the following: + +```cypher +GRANT MATCH, MERGE, SET TO alice; +``` + +### Creating user roles + +Each team supervisor needs to have additional privileges that allow them to +create new data or delete existing data from the database. Instead of tediously +granting additional privileges to each supervisor using language constructs from +the previous chapter, we could do so by creating a new user role for +supervisors. + +Creating a user role named `supervisor` can be done by executing the following +command: + +```cypher +CREATE ROLE supervisor; +``` + +Granting the privilege to create and delete data to our newly created role can +be done as follows: + +```cypher +GRANT CREATE, DELETE, REMOVE TO supervisor; +``` + +Finally, we need to assign that role to each of the supervisors. Suppose, a user +named `bob` is indeed a supervisor within the company. Assigning them that role +within the database can be done by the following command: + +```cypher +SET ROLE FOR bob TO supervisor; +``` + +## Manage label-based access control + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/security.md) + +:::warning +This is an Enterprise feature. +Once the Memgraph Enterprise license expires, newly created users will be granted all privileges. +The existing users' privileges will still apply but you won't be able to manage them. +::: + +Sometimes, authorizing the database by granting and denying clause privileges is not enough to make the +database fully secure. Certain nodes and relationships can be confidential and must be restricted from viewing +and manipulating by multiple users. + +In response to the need for such authorization, Memgraph has advanced its authorization features to enable +authorization on node labels and relationship edge types. By applying authorization to graph's first class +citizens, a database administrator can now keep all the data in one database while keeping any private data +secure from those who don't have adequate permission. + +This how-to-guide will walk you through label-based access control in the use case of a fictional company +doing data analytics and machine learning. + +The fictional company's day-to-day business is ingesting new data for training machine learning models. +Alice is the database administrator in the company, and she would like to set up label-based access control +inside Memgraph to make data manipulation more secure. + +When she tries out Memgraph the first time, she is connected to a session with all privileges and is +able to create users and roles and grant them privileges. As a first task, she creates the admin role, +which is automatically granted permission to write any clause and access any node or relationship. + +```cypher +CREATE USER admin IDENTIFIED BY 'PaSsWoRd'; +``` + +The user `admin` is also able to verify that she has all the privileges by writing: + +```cypher +SHOW PRIVILEGES FOR admin; +``` + +privilege | effective | description +------------------|-----------------|------------------------------------------------ +CREATE | GRANT | GRANTED TO USER +DELETE | GRANT | GRANTED TO USER +MATCH | GRANT | GRANTED TO USER +MERGE | GRANT | GRANTED TO USER +SET | GRANT | GRANTED TO USER +REMOVE | GRANT | GRANTED TO USER +INDEX | GRANT | GRANTED TO USER +STATS | GRANT | GRANTED TO USER +AUTH | GRANT | GRANTED TO USER +CONSTRAINT | GRANT | GRANTED TO USER +DUMP | GRANT | GRANTED TO USER +REPLICATION | GRANT | GRANTED TO USER +READ_FILE | GRANT | GRANTED TO USER +DURABILITY | GRANT | GRANTED TO USER +FREE_MEMORY | GRANT | GRANTED TO USER +TRIGGER | GRANT | GRANTED TO USER +CONFIG | GRANT | GRANTED TO USER +STREAM | GRANT | GRANTED TO USER +MODULE_READ | GRANT | GRANTED TO USER +MODULE_WRITE | GRANT | GRANTED TO USER +WEBSOCKET | GRANT | GRANTED TO USER +TRANSACTION_MANAGEMENT | GRANT | GRANTED TO USER +STORAGE_MODE | GRANT | GRANTED TO USER +ALL LABELS | CREATE_DELETE | GLOBAL LABEL PERMISSION GRANTED TO USER +ALL EDGE_TYPES | CREATE_DELETE | GLOBAL EDGE_TYPE PERMISSION GRANTED TO USER + +If you want to find more about user privileges, hover over to +**[Managing user privileges](/how-to-guides/manage-user-privileges.md)** + +Alice can now log in as an administrator in Memgraph with her own account. From that point on, +she can also create new users and roles in the database. The subsequently created users and roles +won't have any privileges or label-based permissions and need additional commands to be granted +permissions to the clauses and the graph. + +### Granting read permissions + +Bob is a data analyst for the company. He is making sure he can extract any useful insights +from the data imported into the database. For now, all the data is labeled with the `DataPoint` +label. Alice has already created a data analyst role as well as Bob's account in Memgraph with: + +```cypher +CREATE ROLE analyst; +CREATE USER Bob IDENTIFIED BY 'test'; +SET ROLE FOR Bob TO analyst; +``` + +Unfortunately, when he writes: + +```cypher +MATCH (n:DataPoint) RETURN n; +``` + +he gets an error that he can not execute the query. Why is that? +The first problem that we encounter is that Bob can not perform `MATCH` queries, +which we must explicitly grant. + +The database administrator grants him and all the data analysts the `MATCH` query +to traverse the graph with: + +```cypher +GRANT MATCH TO analyst; +``` + +Now Bob is able to perform a match. However, by executing the same query again, he +is not able to get any results. Now that's unfortunate. Did we do anything wrong? + +Enter label-based access control. Since Bob is not an administrator, he was not able +to see any data points in the graph. In other words, he does not have `READ` permission +on the `DataPoint` label. + +Memgraph's label-based access control is hierarchically constructed, and the first +permission one can be given on node labels or relationship edge types is `READ`. + +Alice now updates Bob's permissions by executing: + +```cypher +GRANT READ ON LABELS :DataPoint TO analyst; +``` + +Bob is now executing his queries normally and is able to get insights from the database +with respect to all the data points in the graph! + +Additionally, in the company, it was decided that all the data points would be connected +in a time series fashion, depending on when they were ingested into the database. One +`DataPoint` should therefore be connected to the previously inserted one. +The relationship type is called `:NEXT`. + +Bob now again has problems, because when he executes: + +```cypher +MATCH (n:DataPoint)-[e:NEXT]->(m:DataPoint); +``` + +he is not able to see the patterns. Although Bob can see all the data points, he doesn't +have permission to view the relationships. The database administrator executes the following +command to solve the problem: + +```cypher +GRANT READ ON EDGE_TYPES :NEXT TO analyst; +``` + +Since the users are initially constructed without any permission, they would need an explicit +grant for every new label that appears in the database. This approach is called whitelisting, +and is more secure for adding new entities in the database since confidential nodes and +relationships are not leaked into the database before securing them. + +We have now gone through the `READ` permissions for the first class graph citizens. +Let's move on to a different role in the company. + +### Granting update permissions + +Charlie is a tester and customer care specialist. He is in charge of reporting bugs and fixing +issues in the database. A common problem that he is facing is updating the classes of the data +points if they are labeled incorrectly. For example, the class of one `DataPoint` might be +'dog', while in fact it is an 'elephant', but it was wrongly selected in the rush of labeling +many data points. Charlie needs to update the wrongly labeled data points, and he already has +the IDs of all the nodes he must update. + +Alice has already set up his account with the following commands: + +```cypher +CREATE ROLE tester; +CREATE USER Charlie IDENTIFIED BY 'test'; +SET ROLE FOR Charlie TO tester; + +GRANT MATCH, SET TO tester; + +GRANT READ ON LABELS :DataPoint TO tester; +GRANT READ ON EDGE_TYPES :NEXT TO tester; +``` + +He now has read privileges just like all the data analysts, but when he gets an authorization +error while executing: + +```cypher +MATCH (n:DataPoint {id:505}) SET n.labelY = 'elephant'; +``` + +The error occurs because Charlie does not have permission to update the existing nodes in the +graph. The database administrator needs to update Charlie's permissions and grant him access +to update the node properties with: + +```cypher +GRANT UPDATE ON LABELS :DataPoint TO tester; +``` + +Charlie is now able to update the labeled categories of any data point in the graph! The same +permission applies if he needs to update a relationship property in the graph. + +### Granting full access permissions + +David is the data engineer for the company. He is very skilled in database systems, and he has +been assigned the task of deleting every data point in the system that's older than one year. +Alice has his account set up with the following commands: + +```cypher +CREATE ROLE dataEngineer; +CREATE USER David IDENTIFIED BY 'test'; +SET ROLE FOR David TO dataEngineer; + +GRANT MATCH, DELETE TO dataEngineer; + +GRANT UPDATE ON LABELS :DataPoint TO dataEngineer; +GRANT UPDATE ON EDGE_TYPES :NEXT TO dataEngineer; +``` + +However, `UPDATE` privilege capabilities only grant manipulation of properties, not the nodes +and relationships themselves. Therefore, the query: + +```cypher +MATCH (n:DataPoint) WHERE localDateTime() - n.date > Duration({day:365}) DETACH DELETE n; +``` + +results in an error. The permission that grants read, update, create, and delete rights over +the nodes and relationships in the graph is `CREATE_DELETE`. By executing the following commands: + +```cypher +GRANT CREATE_DELETE ON LABELS :DataPoint TO dataEngineer; +GRANT CREATE_DELETE ON EDGE_TYPES :NEXT TO dataEngineer; +``` + +The permission is executed on relationships as well, since David needs to detach the nodes +prior to deleting them. David is now able to successfully delete the deprecated nodes. + +### Denying visibility + +Eve is the new senior engineer in town, and she is making excellent progress in the company. +The management therefore decided to grant her visibility and manipulation over all the nodes. +However, there are certain confidential nodes that are only for the management people to see. + +Since there could be a lot of different node labels or relationship types in the database, +a shortcut can be made by granting `NOTHING` to the entity. The database administrator therefore +sets Eve's role as: + +```cypher +CREATE ROLE seniorEngineer; +CREATE USER Eve IDENTIFIED BY 'test'; +SET ROLE FOR Eve TO seniorEngineer; + +GRANT MATCH, DELETE TO seniorEngineer; + +GRANT CREATE_DELETE ON LABELS * TO seniorEngineer; +GRANT NOTHING ON LABELS :SecretLabel TO seniorEngineer; +``` + +When granting `NOTHING`, the user is denied both visibility and manipulation of the entity. +Eve is now able to see all the domain data while the management is happy since they have not +leaked any confidential data. + +### Templates for granting privileges + +To grant all privileges to a superuser (admin): + +``` +GRANT ALL PRIVILEGES TO admin; +GRANT CREATE_DELETE ON LABELS * TO admin; +GRANT CREATE_DELETE ON EDGE_TYPES * TO admin; +``` + +To grant all read and write privileges: + +``` +DENY ALL PRIVILEGES TO readWrite; +GRANT CREATE, DELETE, MERGE, SET, REMOVE, INDEX, MATCH, STATS TO readWrite; +GRANT CREATE_DELETE ON LABELS * TO readWrite; +GRANT CREATE_DELETE ON EDGE_TYPES * TO readWrite; +``` + +To grant read only privileges: + +``` +DENY ALL PRIVILEGES TO readonly; +GRANT MATCH, STATS TO readonly; +GRANT READ ON LABELS * TO readonly; +GRANT READ ON EDGE_TYPES * TO readonly; +``` \ No newline at end of file diff --git a/docs2/deployment/server-stats.md b/docs2/deployment/server-stats.md new file mode 100644 index 00000000000..7f13f34c23e --- /dev/null +++ b/docs2/deployment/server-stats.md @@ -0,0 +1,54 @@ +--- +id: server-stats +title: Server stats +sidebar_label: Server stats +--- + +Memgraph supports multiple queries to get information about the instance that is +being queried. + +## Instance version + +To get the version of the instance being queried, run the following query: + +```cypher +SHOW VERSION; +``` + +## Storage information + +Running the following query will return certain information about the storage of +the current instance: + +```cypher +SHOW STORAGE INFO; +``` + +The result will contain the following fields: + +| Field | Description | +| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | +| vertex_count | Number of vertices stored | +| edge_count | Number of edges stored | +| average_degree | Average number of relationships of a single node | +| memory_usage | Amount of RAM used reported by the OS (in bytes) | +| disk_usage | Amount of disk space used by the data directory (in bytes) | +| memory_allocated | Amount of bytes allocated by the instance.
For more info, check out the [memory control](/reference-guide/memory-control.md). | +| allocation_limit | Current allocation limit in bytes set for this instance.
For more info, check out the [memory control](/reference-guide/memory-control.md). | +| global_isolation_level | Current `global` isolation level.
For more info, check out the [isolation levels](/reference-guide/transactions.md). | +| session_isolation_level | Current `session` isolation level. | +| next_session_isolation_level | Current `next` isolation level. | +| storage_mode | Current storage mode.
For more info, check out the [storage modes](/reference-guide/storage-modes.md). | + +## Build information + +Running the following query will return certain information about the build type of +the current instance: + +```cypher +SHOW BUILD INFO; +``` + +| Field | Description | +| ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | +| build_type | The optimization level the instance was built with. | \ No newline at end of file diff --git a/docs2/deployment/ssl-encryption.md b/docs2/deployment/ssl-encryption.md new file mode 100644 index 00000000000..63d7e54ac01 --- /dev/null +++ b/docs2/deployment/ssl-encryption.md @@ -0,0 +1,256 @@ +--- +id: ssl-encryption +title: SSL encryption +--- + +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; + +Memgraph uses SSL (Secure Sockets Layer) protocol for establishing an +authenticated and encrypted connection to a database instance. + +[![Related - +How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/encryption.md) + +Achieving a secure connection is a three-step process that requires + +1. Owning a SSL certificate +2. Configuring the server +3. Enabling SSL connection + +For any errors that might come up, check out [the Help center page on +errors](/errors/memgraph/ssl). + +## SSL certificate + +SSL certificate is a pair of `.pem` documents issued by self-signing, or by a +Certification Authority. Memgraph contains a self-signed testing certificate +(`cert.pem` and `key.pem`) located at `/etc/memgraph/ssl/`. + +If you are using Docker and want to use your own certificates, you need to [copy +them into a Docker +container](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-docker-container) +in order to utilize them. + +## Configure the server + +To use a certain SSL certificate, change the configuration file to include the +`--bolt-cert-file` and `--bolt-key-file` flags and set them to the location of +the certification files. + +If you are using the Memgraph self-signed certificate, set the configuration +flags to: + +``` +--bolt-cert-file=/etc/memgraph/ssl/cert.pem +--bolt-key-file=/etc/memgraph/ssl/key.pem +``` + +When using Linux, be sure that the user `memgraph` has permissions (400) to +access the files. + +Once the flags are included in the configuration, you cannot establish an +insecure connection. + +## Enable SSL connection + + + + +To enable SSL connection in Memgraph Lab, switch to **Connect Manually** view +and turn the SSL on. + + + +When Memgraph Lab is connected to MemgraphDB using SSL encryption, logs cannot +be viewed inside the Lab. + + + + +When starting mgconsol include the `--use-ssl=true` flag. Flag can also be +explicitly set to `false` if needed. + +When working with Memgraph Platform, you should pass configuration flags inside +of environment variables as a part of the `docker run` command, for example: + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -e MGCONSOLE="--use-ssl=true" memgraph/memgraph-platform +``` + +In all other cases passed them on as regular configuration flags. + +For example, if you are starting mgconsole on Linux: + +``` +mgconsole --host 127.0.0.1 --port 7687 --use-ssl=true +``` + +or if you are using `memgraph` or `memgraph-mage` Docker images: + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph memgraph/memgraph-mage --use-ssl=true +``` + + + + +**Javascript** + +Use [Neo4j driver for JavaScript](https://neo4j.com/developer/javascript/), and +add `+ssc` to the UNI when defining a `MEMGRAPH_URI` constant:
+MEMGRAPH_URI = 'bolt+ssc://18.196.53.118:7687'. +

+

+ +**Python** + +Use [pymgclient](https://github.com/memgraph/pymgclient), and add +`sslmode=mgclient.MG_SSLMODE_REQUIRE` to the `mgclient.connect`. + +**C/C++** + +Use [mgclient](https://github.com/memgraph/mgclient), and add set the +`params.use_ssl` to `true` or `false`. + +**Go** + +Use the [Neo4j driver for Go](https://neo4j.com/developer/go/), and add `+ssc` +to the UNI: `"bolt+ssc://18.196.53.118:7687"`. + +**PHP** + +Use the [Bolt protocol library by +stefanak-michal](https://github.com/neo4j-php/Bolt) and add the following code + +```python +$conn->setSslContextOptions([ + 'passphrase' => 'bolt', + 'allow_self_signed' => true, + 'verify_peer' => false, + 'verify_peer_name' => false +]); +``` + +**C#** + +Use the [Neo4j.Driver.Simple](https://neo4j.com/developer/dotnet/), and add +`+ssc` to the UNI: `"bolt+ssc://18.196.53.118:7687"`. + +**Java** + +Use the [Neo4j driver for Java](https://neo4j.com/developer/java/) and add +`+ssc` to the UNI: `"bolt+ssc://18.196.53.118:7687"`. + +**Rust** + +Use [mgclient](https://github.com/memgraph/mgclient), and add `sslmode: +SSLMode::Require` to the `ConnectParams`. + +
+ + +WebSocket over SSL is currently not supported in Memgraph. + + +
+ +## How to set up SSL encryption + +Memgraph uses SSL (Secure Sockets Layer) protocol for establishing an +authenticated and encrypted connection to a database instance. + +[![Related - Reference +Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/encryption.md) + +## Docker + +1. Start a Memgraph instance with `docker run` command including the `-v + mg_lib:/var/lib/memgraph` and `-v mg_etc:/etc/memgraph` volumes. + +2. [Copy the SSL certificate inside of the Docker + container](/how-to-guides/work-with-docker.md#how-to-copy-files-from-and-to-a-docker-container) + or use Memgraph self-signed certificates (`cert.pem` and `key.pem`) located + at `/etc/memgraph/ssl/`. + +3. [Change the configuration file](/how-to-guides/config-logs.md#file) to + include the following configuration flags: + + ``` + --bolt-cert-file= + --bolt-key-file= + ``` + +4. Set the flags to the paths of your SSL certificate. + + If you are using the Memgraph self-signed certificate, set the configuration + flags to: + + ``` + --bolt-cert-file=/etc/memgraph/ssl/cert.pem + --bolt-key-file=/etc/memgraph/ssl/key.pem + ``` + +5. [Stop the Docker container](/how-to-guides/work-with-docker.md#stop-image), + then start it again, including the volumes you used in step 1. + + If you are running `memgraph-platform` image, pass the configuration flag + MGCONSOLE="--use-ssl=true": + + ``` + docker run -it -p 7687:7687 -p 3000:3000 -p 7444:7444 -v mg_lib:/var/lib/memgraph -v mg_etc:/etc/memgraph -e MGCONSOLE="--use-ssl=true" memgraph/memgraph-platform + ``` + +6. Open Memgraph Lab and switch to **Connect Manually** view, turn the **SSL + On** and connect. + + + +7. If you are using [pymgclient](https://github.com/memgraph/pymgclient) to + query the database with Python, add `sslmode=mgclient.MG_SSLMODE_REQUIRE` to + the `mgclient.connect` + +For other ways of connecting to Memgraph DB using SSL encryption, check the +[reference guide](/reference-guide/encryption.md). + +## Linux + +1. Run Memgraph. + +2. Open the configuration file available at `/etc/memgraph/memgraph.conf`. + +3. Change the configuration file to include the following configuration flags: + + ``` + --bolt-cert-file= + --bolt-key-file= + ``` + +4. Set the flags to the paths of your SSL certificate, or use Memgraph + self-signed certificates (`cert.pem` and `key.pem`) located at + `/etc/memgraph/ssl/`: + + ``` + --bolt-cert-file=/etc/memgraph/ssl/cert.pem + --bolt-key-file=/etc/memgraph/ssl/key.pem + ``` + +5. Restart Memgraph. + +6. Open Memgraph Lab and switch to **Connect Manually** view, turn the **SSL + On** and connect. + +7. If you are using [pymgclient](https://github.com/memgraph/pymgclient) to + query the database with Python, add `sslmode=mgclient.MG_SSLMODE_REQUIRE` to + the `mgclient.connect` + +For other ways of connecting to Memgraph DB using SSL encryption, check the +[reference guide](/reference-guide/encryption.md). diff --git a/docs2/deployment/user-management.md b/docs2/deployment/user-management.md new file mode 100644 index 00000000000..2bd98f73cba --- /dev/null +++ b/docs2/deployment/user-management.md @@ -0,0 +1,140 @@ +--- +id: user-management +title: User management +sidebar_label: User management +--- + +import Tabs from "@theme/Tabs"; +import TabItem from "@theme/TabItem"; + +The community edition of Memgraph enables creating users that can access the +database with or without a password. + +If you want to create a user without setting a password, execute the following command: + +```cypher +CREATE USER `user_name`; +``` + +In this case, the user can log in using any password, or none at all, provided that they enter the correct username. + +If you want to create a user and set a password simultaneously, use the following command: + +```cypher +CREATE USER `user_name` IDENTIFIED BY 'password'; +``` + +In this case, the user must log in with the correct username and the set password. + +To set or change a user's password, use the following command: + +```cypher +SET PASSWORD FOR `user_name` TO 'new_password'; +``` + +To check all the users created on an instance, use: + +```cypher +SHOW USERS; +``` + +To remove a user's password, set it to `null`: + +```cypher +SET PASSWORD FOR `user_name` TO null; +``` + +To delete a user use: + +```cypher +DROP USER `user_name`; +``` + +## Authentication + + + + +**`memgraph-platform` image** + +If you are using Docker and `memgraph-platform` image, you should pass the +`username` and `password` environment variables when starting Memgraph: + +```terminal +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MGCONSOLE="--username --password " memgraph/memgraph-platform +``` + +Example: + +```terminal +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MGCONSOLE="--username vlasta --password vp" memgraph/memgraph-platform +``` + +Upon connecting with Memgraph Lab you should select *Connect Manually* and enter +username (and password). + +**`memgraph` and `memgraph-mage` images** + +If you are using Docker and `memgraph` or `memgraph-mage` image enter username +and password when connecting manually to Memgraph Lab. + +If you are connecting with mgconsole you should add the username and password +flags to the `docker run` command: + +```terminal +docker run -it --entrypoint=mgconsole memgraph/memgraph --host CONTAINER_IP --username= --password= +``` + +Example: + +```terminal +docker run -it --entrypoint=mgconsole memgraph/memgraph --host 172.17.0.2 --username=vlasta --password=vp +``` + + + + +If you are using Linux and connecting with Memgraph Lab, select *Connect +Manually* and enter username (and password). + +When connecting with mgconsole, set +the `--username` and `--password` flags: + +```terminal +./mgconsole --host 127.0.0.1 --port 7687 --username --password +``` + +Example: + +```terminal +./mgconsole --host 127.0.0.1 --port 7687 --username vlasta --password vp +``` + + + + +## Password encryption algorithm + +Memgraph offers multiple password encryption algorithms: +* BCrypt +* SHA256 +* SHA256 with multiple iterations (currently set to 1024 iterations) + +The above algorithms can be specified at runtime using the flag `--password-encryption-algorithm` with the +appropriate values of `bcrypt`, `sha256` or `sha256-multiple`. + +### BCrypt +This algorithm is the default algorithm for password encryption. It's the most secure algorithm and has the best +protection against brute-force attack. However, if you're running connecting multiple concurrent enterprise users with +passwords at the same time, it may not be the best choice for you as you might experience slower performance. The performance +is slower only during authentication of the users, and should not degrade once the connection has been established. + +### SHA256 and SHA256 with multiple iterations +SHA256 is an encryption algorithm that's usually not used for password encryption but is fast and secure enough to +offer optimal performance when running a lot of concurrent opening connections to Memgraph. \ No newline at end of file diff --git a/docs2/fundamentals/constraints.md b/docs2/fundamentals/constraints.md new file mode 100644 index 00000000000..3f23a29f42a --- /dev/null +++ b/docs2/fundamentals/constraints.md @@ -0,0 +1,135 @@ +-- +id: constraints +title: Enforce constraints +sidebar_label: constraints +-- + +## Existence constraint + +Existence constraint enforces that each vertex that has a specific `label` +also must have the specified `property`. Only one label and property can be +supplied at a time. This constraint can be enforced using the following +language construct: + +```cypher +CREATE CONSTRAINT ON (n:label) ASSERT EXISTS (n.property); +``` + +For example, suppose you are keeping track of basic employee info in your +database. Obviously, each employee should have a first name and last name. You +can enforce this by issuing the following queries: + +```cypher +CREATE CONSTRAINT ON (n:Employee) ASSERT EXISTS (n.first_name); +CREATE CONSTRAINT ON (n:Employee) ASSERT EXISTS (n.last_name); +``` + +You can confirm that your constraint was successfully created by issuing the +following query: + +```cypher +SHOW CONSTRAINT INFO; +``` + +You should get a result similar to this: + +``` ++-----------------+-----------------+-----------------+ +| constraint type | label | properties | ++-----------------+-----------------+-----------------+ +| exists | Employee | first_name | +| exists | Employee | last_name | ++-----------------+-----------------+-----------------+ +``` + +Trying to modify the database in a way that violates the constraint will +yield an error. + +Constraints can also be dropped using the `DROP` keyword. For example, +dropping the previously created constraints can be done by the following +query: + +```cypher +DROP CONSTRAINT ON (n:Employee) ASSERT EXISTS (n.first_name); +DROP CONSTRAINT ON (n:Employee) ASSERT EXISTS (n.last_name); +``` + +Now, `SHOW CONSTRAINT INFO;` yields an empty set. + +## Uniqueness constraint + +Uniqueness constraint enforces that each `label, property_set` pair is unique. +Adding uniqueness constraint does not create a label-property index, it needs to +be added manually. + +The uniqueness constraint can be enforced using the following language +construct: + +```cypher +CREATE CONSTRAINT ON (n:label) ASSERT n.property1, n.property2, ..., IS UNIQUE; +``` + +For example, suppose you are keeping track of basic employee info in your +database. Obviously, each employee should have a unique e-mail address. You can +enforce this by issuing the following query: + +```cypher +CREATE CONSTRAINT ON (n:Employee) ASSERT n.email IS UNIQUE; +``` + +You can confirm that your constraint was successfully created by issuing the +following query: + +```cypher +SHOW CONSTRAINT INFO; +``` + +You should get a result similar to this: + +``` ++-----------------+-----------------+-----------------+ +| constraint type | label | properties | ++-----------------+-----------------+-----------------+ +| unique | Employee | email | ++-----------------+-----------------+-----------------+ +``` + +Trying to modify the database in a way that violates the constraint will yield +an error `Unable to commit due to unique constraint violation on +:Employee(email)`. + +Naturally, you can also specify multiple properties when creating uniqueness +constraints. For example, we might want to enforce that all employees have a +unique `(name, surname)` pair (obviously, this would be a bad decision in real +life). This can be achieved by the following query: + +```cypher +CREATE CONSTRAINT ON (n:Employee) ASSERT n.name, n.surname IS UNIQUE; +``` + +At this point, `SHOW CONSTRAINT INFO;` yields the following result: + +``` ++-----------------+-----------------+-----------------+ +| constraint type | label | properties | ++-----------------+-----------------+-----------------+ +| unique | Employee | email | +| unique | Employee | name, surname | ++-----------------+-----------------+-----------------+ +``` + +Constraints can also be dropped using the `DROP` keyword. For example, +dropping the previously created constraints can be done by the following +query: + +```cypher +DROP CONSTRAINT ON (n:Employee) ASSERT n.email IS UNIQUE; +DROP CONSTRAINT ON (n:Employee) ASSERT n.name, n.surname IS UNIQUE; +``` + +Now, `SHOW CONSTRAINT INFO;` yields an empty set. + +## Where to next? + +To learn more about Memgraph's functionalities, visit the **[Reference guide](/reference-guide/overview.md)**. +For real-world examples of how to use Memgraph, we strongly suggest going through one of the available **[Tutorials](/tutorials/overview.md)**. \ No newline at end of file diff --git a/docs2/fundamentals/data-types.md b/docs2/fundamentals/data-types.md new file mode 100644 index 00000000000..f130f7020e9 --- /dev/null +++ b/docs2/fundamentals/data-types.md @@ -0,0 +1,623 @@ +--- +id: data-types +title: Data types +sidebar_label: Data types +--- + +import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; + +Since Memgraph is a graph database management system, data is stored in the form +of graph elements: nodes and relationships. Each graph element can contain +various types of data. This page describes which data types are supported in +Memgraph. + +## Node labels & relationship types + +**Nodes** can have labels that are used to label or group nodes. A label is of +the type `String`, and each node can have none or multiple labels. Labels can be +changed at any time. + +**Relationships** have a type, also represented in the form of a `String`. +Unlike nodes, relationships must have exactly one relationship type and once it +is set upon creation, it can never be modified again. + +## Property types + +Nodes and relationships can store various properties. Properties are similar to +mappings or tables containing property names and their accompanying values. +Property names are represented as text, while values can be of different types. + +Each property can store a single value, and it is not possible to have multiple +properties with the same name on a single graph element. But, the same property +names can be found across multiple graph elements. + +Also, there are no restrictions on the number of properties that can be stored +in a single graph element. The only restriction is that the values must be of +the supported types. Below is a table of supported data types. + +| Type | Description | +| --------------------------------- | --------------------------------------------------------------------------------------------------- | +| `Null` | Property has no value, which is the same as if the property doesn't exist. | +| `String` | A character string (text). | +| `Boolean` | A boolean value, either `true` or `false`. | +| `Integer` | An integer number. | +| `Float` | A floating-point number (real number). | +| `List` | A list containing any number of property values of any supported type under a single property name. | +| `Map` | A mapping of string keys to values of any supported type. | +| [`Duration`](#duration) | A period of time. | +| [`Date`](#date) | A date with year, month, and day. | +| [`LocalTime`](#localtime) | Time without the time zone. | +| [`LocalDateTime`](#localdatetime) | Date and time without the time zone. | + +:::note + +If you want to modify `List` and `Map` property values, you need to replace them +entirely. + +The following queries are valid: + +```cypher +CREATE (:Node {property: [1, 2, 3]}); +CREATE (:Node {property: {key: "value"}}); +``` + +But these are not: + +```cypher +MATCH (n:Node) SET n.property[0] = 0; +MATCH (n:Node) SET n.property.key = "other value"; +``` + +::: + +## Maps + +The Cypher query language supports constructing and working with map values. + +### Literal maps + +It is possible to explicitly construct maps by stating key-value pairs: + + + + + ```cypher + RETURN {key: 'Value', listKey: [{inner: 'Map1'}, {inner: 'Map2'}]} + ``` + + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ {key: 'Value', listKey: [{inner: 'Map1'}, {inner: 'Map2'}]} β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ {Map} 2 properties β”‚ +β”‚ { β”‚ +β”‚ "key": "Value", β”‚ +β”‚ "listKey": [ β”‚ +β”‚ { β”‚ +β”‚ "inner": "Map1" β”‚ +β”‚ }, β”‚ +β”‚ { β”‚ +β”‚ "inner": "Map2" β”‚ +β”‚ } β”‚ +β”‚ ] β”‚ +β”‚ } β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + + + +### Map projection + +Cypher’s **map projection** syntax allows for easily constructing map +projections from nodes, relationships, other map values, and all other values +that have properties. + +A map projection begins with the variable bound to the graph entity that’s to +be projected from, and contains a body of comma-separated map elements enclosed +by `{` and `}`. + +```cypher +map_variable {map_element, [, ...n]} +``` + +A map element projects one or more key-value pairs to the map projection. There +are four different types of map projection elements: + +* Property selector: Projects the property name as the key, and the value of + `map_variable.property` as the value for the projection. +* All-properties selector: Projects all key-value pairs from the `map_variable` + value. +* Literal entry: This is a key-value pair, with the value being an arbitrary + expression: `key: `. +* Variable selector: Projects a variable: the variable name is the key, and the + value it is pointing to is the value of the projection: ``. + +The following conditions apply: + +* If `map_variable` points to a null value, its projected values will be null. +* As with literal maps, key names must be strings. + +#### Examples + +The following graph is used by all examples here: + + + + + + + + + + + ```cypher + MATCH (n) DETACH DELETE n; + CREATE + (bradley:Person {name: 'Bradley Cooper', oscars: 0}), + (jennifer:Person {name: 'Jennifer Lawrence', oscars: 1}), + (slp:Movie {title: 'Silver Linings Playbook', released: 2012}), + (amhu:Movie {title: 'American Hustle', released: 2013}), + (joy:Movie {title: 'Joy', released: 2015}), + (asib:Movie {title: 'A Star Is Born', released: 2018}), + (dlu:Movie {title: 'Don’t Look Up', released: 2021}), + (bradley)-[:ACTED_IN]->(slp), + (bradley)-[:ACTED_IN]->(amhu), + (bradley)-[:ACTED_IN]->(joy), + (bradley)-[:ACTED_IN]->(asib), + (jennifer)-[:ACTED_IN]->(slp), + (jennifer)-[:ACTED_IN]->(amhu), + (jennifer)-[:ACTED_IN]->(joy), + (jennifer)-[:ACTED_IN]->(dlu); + ``` + + + + + +Find Jennifer Lawrence and return data about her and the movies she’s acted in. +This example contains a map projection with a literal entry, which in turn also +uses map projection inside `collect()`. + + + + + ```cypher + MATCH (actor:Person {name: 'Jennifer Lawrence'})-[:ACTED_IN]->(movie:Movie) + WITH actor, collect(movie {.title, .year}) AS movies + RETURN actor {.name, roles: movies} AS jennifer + ``` + + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ jennifer β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ {Map} 3 properties β”‚ +β”‚ { β”‚ +β”‚ "name": "Jennifer Lawrence", β”‚ +β”‚ "roles": [ β”‚ +β”‚ { β”‚ +β”‚ "year": 2012, β”‚ +β”‚ "title": "Silver Linings Playbook" β”‚ +β”‚ }, β”‚ +β”‚ { β”‚ +β”‚ "year": 2013, β”‚ +β”‚ "title": "American Hustle" β”‚ +β”‚ }, β”‚ +β”‚ { β”‚ +β”‚ "year": 2015, β”‚ +β”‚ "title": "Joy" β”‚ +β”‚ }, β”‚ +β”‚ { β”‚ +β”‚ "year": 2021, β”‚ +β”‚ "title": "Don’t Look Up" β”‚ +β”‚ } β”‚ +β”‚ ] β”‚ +β”‚ } β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + + + +The below query finds all `Person` nodes that have one or more relationships +of type `ACTED_IN` connected to `Movie` nodes and returns the number of movies +each `Person` has starred in. This example introduces the variable selector and +uses it to project the movie count. + + + + + ```cypher + MATCH (actor:Person)-[:ACTED_IN]->(movie:Movie) + WITH actor, count(movie) AS nMovies + RETURN actor {.name, nMovies} + ``` + + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ actor {.name, nMovies} β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ {Map} 2 properties β”‚ +β”‚ { β”‚ +β”‚ "name": "Jennifer Lawrence", β”‚ +β”‚ "nMovies": 4 β”‚ +β”‚ } β”‚ +β”œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ•Œβ”€ +β”‚ {Map} 2 properties β”‚ +β”‚ { β”‚ +β”‚ "name": "Bradley Cooper", β”‚ +β”‚ "nMovies": 4 β”‚ +β”‚ } β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + + + +Finally, the next query returns all properties from the Bradley Cooper node. It +uses an all-properties selector to project node properties, and in addition +explicitly projects the `dateOfBirth` property. Since this property does not +exist, a null value is projected in its place. + + + + + ```cypher + MATCH (actor:Person {name: 'Bradley Cooper'}) + RETURN actor {.*, .dateOfBirth} as bradley + ``` + + + + + +```plaintext +β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” +β”‚ bradley β”‚ +β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ +β”‚ {Map} 3 properties β”‚ +β”‚ { β”‚ +β”‚ "dateOfBirth": null, β”‚ +β”‚ "name": "Bradley Cooper", β”‚ +β”‚ "oscars": 0 β”‚ +β”‚ } β”‚ +β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ +``` + + + + + +## Temporal types + +### Duration + +You can create a property of temporal type `Duration` from a string or a map by +calling the function `duration`. + +For strings, the duration format is: `P[nD]T[nH][nM][nS]` where `n` stands for +a number, and the capital letters are used as a separator with each field in `[]` +marked optional. For strings, Memgraph only allows the last field to be a +double, e.g., `P2DT2.5H`. However, for maps, every field can be a double, an int +or a mixture of both. Memgraph also supports negative durations. + +| name | description | +| :--: | :---------: | +| D | Days | +| H | Hours | +| M | Minutes | +| S | Seconds | + +Example: + +```cypher +CREATE (:F1Laps {lap: duration("PT2M2.33S")}); +``` + +Maps can contain the following six fields: `day`, `hour`, `minute`, `second`, +`millisecond` and `microsecond`. + +Example: + +```cypher +CREATE (:F1Laps {lap: duration({minute:2, second:2, microsecond:33})}); +``` + +At this point, it must be pointed out that durations internally hold +microseconds. Each of the fields specified above is first converted to +microseconds and then reduced by addition to a single value. This has an +interesting use case: + +```cypher +CREATE (:F1Laps {lap: duration({minute:2, second:-2, microsecond:-33})}); +``` + +This converts `minutes`, `seconds` to `microseconds` and effectively produces +the following equation: `minutes - seconds - microseconds`. + +Each of the individual fields of a duration can be accessed through its +properties as follows: + +| name | description | +| :---------: | :----------------------------------------------------------------: | +| day | Converts all the microseconds back to days and returns the value. | +| hour | Subtracts days and returns the leftover value as hours. | +| minute | Subtracts the days and returns the leftover value as minutes. | +| second | Subtracts the days and returns the leftover value as seconds. | +| millisecond | Subtracts the days and returns the leftover value as milliseconds. | +| microsecond | Subtracts the days and returns the leftover value as microseconds. | +| nanosecond | Subtracts the days and returns the leftover value as nanoseconds. | + +Example: + +```cypher +CREATE (:F1Laps {lap: duration({day:1, hour: 2, minute:3, second:4})}); +``` + +```cypher +MATCH (f:F1Laps) RETURN f.lap.day; +// Result +>> 1 +``` + +```cypher +MATCH (f:F1Laps) RETURN f.lap.hour; +// Result +>> 2 +``` + +```cypher +MATCH (f:F1Laps) RETURN f.lap.minute; +// Result +>> 123 // The value without days is 2 hours and 3 minutes, that is 123 minutes +``` + +```cypher +MATCH (f:F1Laps) RETURN f.lap.second; +// Result +>> 7384 // The value without days is 2 hours, 3 minutes and 4 seconds, that is 7384 minutes +``` + +### Date + +You can create a property of temporal type `Date` from a string or map by +calling the function `Date`. For strings, the date format is specified by the +ISO 8601: `YYYY-MM-DD` or `YYYYMMDD` or `YYYY-MM`. + +| name | description | +| :--: | :---------: | +| Y | Year | +| M | Month | +| D | Day | + +The smallest year is `0` and the highest is `9999`. + +You can call `date` without arguments. This effectively sets the date field to +the current date of the calendar (UTC clock). + +Example: + +```cypher +CREATE (:Person {birthday: date("1947-07-30")}); +``` + +For maps, three fields are available: `year`, `month`, `day`. + +Example: + +```cypher +CREATE (:Person {birthday: date({year:1947, month:7, day:30})}); +``` + +You can access the individual fields of a date through its properties: + +| name | description | +| :---: | :---------------------: | +| year | Returns the year field | +| month | Returns the month field | +| day | Returns the day field | + +Example: + +```cypher +MATCH (b:Person) RETURN b.birthday.year; +``` + +### LocalTime + +You can create a property of temporal type `LocalTime` from a string or map by +calling the function `localTime`. For strings, the local time format is +specified by the ISO 8601: `[T]hh:mm:ss` or `[T]hh:mm` or `[T]hhmmss` or +`[T]hhmm` or `[T]hh`. + +| name | description | +| :--: | :---------: | +| h | Hours | +| m | Minutes | +| s | Seconds | + +`seconds` can be defined as decimal fractions with up to 6 digits. The first 3 +digits represent milliseconds, and the last 3 digits microseconds. For example, +the string `T22:10:32.300600` specifies `300` milliseconds and `600` +microseconds. + +You can call `localTime` without arguments. This effectively sets the time field +to the current time of the calendar (UTC clock). + +Example: + +```cypher +CREATE (:School {Calculus: localTime("09:15:00")}); +``` + +For maps, there are 5 fields available: `hour`, `minute`, `second`, +`millisecond` and `microsecond`. + +Example: + +```cypher +CREATE (:School {Calculus: localTime({hour:9, minute:15})}); +``` + +You can access the individual fields of a LocalTime through its properties: + +| name | description | +| :---------: | :---------------------------: | +| hour | Returns the hour field | +| minute | Returns the minute field | +| second | Returns the second field | +| millisecond | Returns the millisecond field | +| microsecond | Returns the microsecond field | + +Example: + +```cypher +MATCH (s:School) RETURN s.Calculus.hour; +``` + +### LocalDateTime + +You can create a property of temporal type `LocalDateTime` from a string or map +by calling the function `localDateTime`. For strings, the local time format is +specified by the ISO 8601: `YYYY-MM-DDThh:mm:ss` or `YYYY-MM-DDThh:mm` or +`YYYYMMDDThhmmss` or `YYYYMMDDThhmm` or `YYYYMMDDThh`. + +| name | description | +| :--: | :---------: | +| Y | Year | +| M | Month | +| D | Day | +| h | Hours | +| m | Minutes | +| s | Seconds | + +You can call `localDateTime` without arguments. This effectively sets the date +and time fields to the current date and time of the calendar (UTC clock). +Example: + +```cypher +CREATE (:Flights {AIR123: localDateTime("2021-10-05T14:15:00")}); +``` + +For maps the following fields are available: `year`, `month`, `day`, `hour`, +`minute`, `second`, `millisecond` and `microsecond`. + +Example: + +```cypher +CREATE (:Flights {AIR123: localDateTime({year:2021, month:10, day:5, hour:14, minute:15})}); +``` + +You can access the individual fields of LocalDateTime through its properties: + +| name | description | +| :---------: | :---------------------------: | +| year | Returns the year field | +| month | Returns the month field | +| day | Returns the day field | +| hour | Returns the hour field | +| minute | Returns the minute field | +| second | Returns the second field | +| millisecond | Returns the millisecond field | +| microsecond | Returns the microsecond field | + +Example: + +```cypher +MATCH (f:Flights) RETURN f.AIR123.year; +``` + +## Temporal types arithmetic + +Temporal types `Duration`, `Date`, `LocalTime` and `LocalDateTime` support +native arithmetic, and the operations are summarized in the following table: + +Duration operations: + +| op | result | +| :-----------------: | :------: | +| Duration + Duration | Duration | +| Duration - Duration | Duration | +| - Duration | Duration | + +Date operations: + +| op | result | +| :-------------: | :------: | +| Date + Duration | Date | +| Duration + Date | Date | +| Date - Duration | Date | +| Date - Date | Duration | + +LocalTime operations: + +| op | result | +| :-------------------: | :-------: | +| LocalTime + Duration | LocalTime | +| Duration + LocalTime | LocalTime | +| LocalTime - Duration | LocalTime | +| LocalTime - LocalTime | Duration | + +LocalDateTime operations: + +| operation | result | +| :---------------------------: | :-----------: | +| LocalDateTime + Duration | LocalDateTime | +| Duration + LocalTateTime | LocalDateTime | +| LocalDateTime - Duration | LocalDateTime | +| LocalDateTime - LocalDateTime | Duration | + +## Procedures API + +Data types are also used within query modules. Check out the documentation for the [Python API](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md), [C API](/reference-guide/query-modules/implement-custom-query-modules/api/c-api.md) and [C++ API](/reference-guide/query-modules/implement-custom-query-modules/api/cpp-api.md). diff --git a/docs2/fundamentals/fundamentals.md b/docs2/fundamentals/fundamentals.md new file mode 100644 index 00000000000..8ad17665e0e --- /dev/null +++ b/docs2/fundamentals/fundamentals.md @@ -0,0 +1,23 @@ +--- +id: fundamentals +title: Fundamentals +sidebar_label: Fundamentals +--- + +## [Constraints](/fundamentals/constraints.md) + +## [Data types](/fundamentals/data-types.md) + +## [Indexes](/fundamentals/indexes.md) + +## [Memory usage](/fundamentals/memory-usage.md) + +## [Telemetry](/fundamentals/telemetry.md) + +## [Transactions](/fundamentals/transactions.md) + +All Cypher queries are run within transactions. Check how to create explicit +transaction to run multiple queries within one transaction, and isolation levels +available in Memgraph. + +## [Triggers](/fundamentals/triggers.md) \ No newline at end of file diff --git a/docs2/fundamentals/indexing.md b/docs2/fundamentals/indexing.md new file mode 100644 index 00000000000..9c9ce82e971 --- /dev/null +++ b/docs2/fundamentals/indexing.md @@ -0,0 +1,322 @@ +--- +id: indexing +title: Indexing +sidebar_label: Indexing +--- + +[![Related - +How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/indexes.md) +[![Related - Under the +Hood](https://img.shields.io/static/v1?label=Related&message=Under%20the%20hood&color=orange&style=for-the-badge)](/under-the-hood/indexing.md) + +## When to create indexes? + +When you are running queries, you want to get results as soon as possible. In +the worst-case scenario, when you execute a query, all nodes need to be checked +to see if there is a match. + +Here is what the query plan looks like if there is no index on the data: + +```nocopy +memgraph> EXPLAIN MATCH (n:Person {prop: 1}) RETURN n; ++---------------------------------+ +| QUERY PLAN | ++---------------------------------+ +| " * Produce {n}" | +| " * Filter" | +| " * ScanAllByLabel (n :Person)" | +| " * Once" | ++---------------------------------+ +``` + +By enabling indexes, this process can be much faster: + +```cypher +CREATE INDEX ON :Person(prop); +``` + +When a query is executed, the engine first checks if there is an index. An index +stores additional information on certain types of data so that retrieving +indexed data becomes more efficient. Indexes basically store data in a different +kind of way, i.e., they partition it with a key. For example, if you set an +index on a label, the query `MATCH (:Label)` won't have to explicitly check +every node. You just need to check the nodes that were placed on a "shelf". Each +"shelf" has nodes with a specific label. The data is not copied or duplicated to +the "shelf". You actually create a memory map to those nodes and there is no +need to look anywhere else for them. + +Here is what the query plan looks like if indexing is enabled: + +```nocopy +memgraph> EXPLAIN MATCH (n:Person {prop: 1}) RETURN n; ++-----------------------------------------------------+ +| QUERY PLAN | ++-----------------------------------------------------+ +| " * Produce {n}" | +| " * ScanAllByLabelPropertyValue (n :Person {prop})" | +| " * Once" | ++-----------------------------------------------------+ +``` + +## When not to create indexes? + +There are some downsides to indexing, so it is important to carefully choose the +right data for creating an index. The downsides of indexing are: + +- requiring extra storage (memory) for each index and +- slowing down write operations to the database. + +Indexing all of the content will not improve the database speed. The structures +in the index are dynamically updated on modifications or insertionsΒ of new +nodes. Once a new node is created, it needs to be assigned to an index group. +Such an indexed node will be retrieved much faster from the database. + +Indexing will also not bring any improvement if a large number of properties +have the same value. Take a look at the following example. Let's say you have +some property that can have 10 distinct values. Those values are integers in the +range 1 to 10. If you have 100 nodes stored in the database and 1 of them has a +score of 1 while the others have a score of 10 (99 of them), then that is not a +good distinguisher. If 10 of them have a score of 1, 10 of them have a score of +2, etc. then it is a good distinguisher because it partitions them to cut the +order of searching by one magnitude. + +Also, indexing certain data types will not bring any significant performance +gain, e.g., for boolean in the best case scenario, the time will be cut in half. + +## Creating an index + +Indexing can be applied to data with a specific label or a combination of label +and property. They are not automatically created, and the user needs to create +them explicitly. Creation is done using a special `CREATE INDEX ON +:Label(property)` language construct. + +When you create an index, it is added to the registry of indexes. + +Memgraph supports two types of indexes: + +- label index +- label-property index + +### Label index + +Memgraph will not automatically index labeled data. If you want to optimize +queries that fetch nodes by label, you need to create the indexes: + +```cypher +CREATE INDEX ON :Person; +``` + +Retrieving nodes using this query is now much more efficient: + +```cypher +MATCH (n:Person) RETURN n; +``` + +### Label-property index + +For example, to index nodes that are labeled as `:Person` and have a property +named `age`: + +```cypher +CREATE INDEX ON :Person(age); +``` + +After the index is created, retrieving those nodes will become more efficient. +For example, the following query will retrieve all nodes which have an `age` +property, instead of fetching each `:Person` node and checking whether the +property exists: + +```cypher +MATCH (n :Person {age: 42}) RETURN n; +``` + +Using index-based retrieval also works when filtering labels and properties with +`WHERE`. For example, the same effect as in the previous example can be done +with: + +```cypher +MATCH (n) WHERE n:Person AND n.age = 42 RETURN n; +``` + +Since the filter inside `WHERE` can contain any kind of an expression, the +expression can be complicated enough so that the index does not get used. We are +continuously improving the recognition of index usage opportunities from a +`WHERE` expression. If there is any suspicion that an index may not be used, we +recommend putting properties and labels inside the `MATCH` pattern. + +When it comes to label-property indexes, MemgraphDB stores a list of specific +properties that are used in label property-indexes. This list is ordered to make +the search faster. All property types can be ordered. First, they are ordered +based on the type and then within that type. + +:::tip + +For the best performance, create index on properties containing unique integer values. + +::: + +:::caution + +Creating a label-property index will not create a label index! + +::: + +### Speed comparison + +After the query execution, you can see how much time the query took to finish. +Below you can see a comparison of the same query run without an index and with +an index. + +```nocopy +memgraph> SHOW INDEX INFO; +Empty set (0.001 sec) + +memgraph> MATCH (n:Person) WHERE n.name =~ ".*an$" RETURN n.name; ++-------------+ +| n.name | ++-------------+ +| "Lillian" | +| "Logan" | +| "Susan" | +| "Sebastian" | ++-------------+ +4 rows in set (0.021 sec) + +memgraph> CREATE INDEX ON :Person(name); +Empty set (0.015 sec) + +memgraph> MATCH (n:Person) WHERE n.name =~ ".*an$" RETURN n.name; ++-------------+ +| n.name | ++-------------+ +| "Lillian" | +| "Logan" | +| "Susan" | +| "Sebastian" | ++-------------+ +4 rows in set (0.006 sec) +``` + +## Display available indexes + +Information about available indexes can be retrieved by using the following +syntax: + +```cypher +SHOW INDEX INFO; +``` + +The results of this query will be all of the labels and label-property pairs +that Memgraph currently indexes. + +## Deleting index + +Created indexes can also be deleted by using the following syntax: + +```cypher +DROP INDEX ON :Label(property); +``` +## Analyze graph + +When multiple label-property indices exist, the database can sometimes select a non-optimal index due to the data's distribution. The [`ANALYZE GRAPH;`](/reference-guide/analyze-graph.md) query calculates the distribution of property values so the database can select a more optimal label-property index with the smallest average property value size. The query is run only once after all indexes have been created and data inserted in the database. + +## How-to guide + +A database index is a data structure used to improve the speed of data retrieval +within a database at the cost of additional writes and storage space for +maintaining the index data structure. + +When a query is executed, the engine first checks if there is an index. An index +stores additional information on certain types of data so that retrieving +indexed data becomes more efficient. + +Memgraph supports two types of indexes: + +- label index +- label-property index + +## How to check if indexes exist? + +To check if indexes exist, use the following Cypher query: + +```cypher +SHOW INDEX INFO; +``` + +The results of this query will be all of the labels and label-property pairs +that Memgraph currently indexes. + +## How to create indexes? + +Memgraph will not automatically index labeled data. If you want to optimize +queries that fetch nodes using labels, you need to create indexes. + +If you have a node `Person` and you want to create an index for it, run the +following query: + +```cypher +CREATE INDEX ON :Person; +``` + +You can also create indexes on data with a specific combination of label and +property, hence the name label-property index. + +For example, if you are storing information about people and you often retrieve +their age, it might be beneficial to create an index on nodes labeled as +`:Person` with a property named `age` by using the following language construct: + +```cypher +CREATE INDEX ON :Person(age); +``` + +:::tip + +For the best performance, create index on properties containing unique integer values. + +::: + +:::caution + +Creating a label-property index will not create a label index! + +::: + +## How to delete indexes? + +You can delete created indexes by using the following Cypher queries: + +```cypher +DROP INDEX ON :Person; +``` + +```cypher +DROP INDEX ON :Person(age); +``` + +These queries instruct all active transactions to abort as soon as possible. Once all transactions have finished, the index will be deleted. + +## Analyze graph + +When multiple label-property indices exist, the database can sometimes select a non-optimal index due to the data's distribution. The [`ANALYZE GRAPH;`](/reference-guide/analyze-graph.md) query calculates the distribution of property values so the database can select a more optimal label-property index with the smallest average property value size. The query is run only once after all indexes have been created and data inserted in the database. + +## Underlying implementation + +The central part of our index data structure is a highly-concurrent [skip +list](https://en.wikipedia.org/wiki/Skip_list). Skip lists are probabilistic +data structures that allow fast search within an ordered sequence of elements. +The structure itself is built in layers where the bottom layer is an ordinary +linked list that preserves the order. Each higher level can be imagined as a +highway for layers below. + +The implementation details behind skip list operations are well documented in +the literature and are out of scope for this document. Nevertheless, we believe +that it is important for more advanced users to understand the following +implications of this data structure (`n` denotes the current number of elements +in a skip list): + +- The average insertion time is `O(log(n))` +- The average deletion time is `O(log(n))` +- The average search time is `O(log(n))` +- The average memory consumption is `O(n)` + +Read more about [memory usage in Memgraph](/under-the-hood/storage.md). diff --git a/docs2/fundamentals/storage-memory-usage.md b/docs2/fundamentals/storage-memory-usage.md new file mode 100644 index 00000000000..3aedea139b5 --- /dev/null +++ b/docs2/fundamentals/storage-memory-usage.md @@ -0,0 +1,593 @@ +--- +id: storage-memory-usage +title: Storage memory usage +sidebar_label: Storage memory usage +--- + +Estimating Memgraph's storage memory usage is not entirely straightforward +because it depends on a lot of variables, but it is possible to do so quite +accurately. Below is an example that will try to show the basic reasoning. + +If you want to **estimate** the storage memory usage, use the following formula: + +$\texttt{StorageRAMUsage} = \texttt{NumberOfVertices} \times 260\text{B} + \texttt{NumberOfEdges} \times 180\text{B}$ + +Let's test this formula on the [Marvel Comic Universe Social Network +dataset](https://memgraph.com/download/datasets/marvel-cinematic-universe/marvel-cinematic-universe.cypherl.gz), +which is also available as a dataset inside Memgraph Lab and contains 21,723 +vertices and 682,943 edges. + +According to the formula, storage memory usage should be: + +$ +\begin{aligned} +\texttt{StorageRAMUsage} &= 21,723 \times 260\text{B} + 682,943 \times 180\text{B} \\ &= 5,647,980\text{B} + 122,929,740\text{B}\\ &= 128,577,720\text{B} \approx 125\text{MB} +\end{aligned} +$ + +Now, let's run an empty Memgraph instance on a x86 Ubuntu. It consumes **~75MB** +of RAM due to baseline runtime overhead. Once the dataset is loaded, RAM usage +rises up to **~260MB**. Memory usage primarily consists of storage and query +execution memory usage. After executing `FREE MEMORY` query to force the cleanup +of query execution, the RAM usage drops to **~200MB**. If the baseline runtime +overhead of **75MB** is subtracted from the total memory usage of the dataset, +which is **200MB**, and storage memory usage comes up to **~125MB**, which shows +that the formula is correct. + +## The calculation in detail + +Let's dive deeper into the memory usage values. Because Memgraph works on the +x86 architecture, calculations are based on the x86 Linux memory usage. + +:::tip +For the latest and most precise memory layout please clone +[Memgraph](https://github.com/memgraph/memgraph) and use, e.g., +[pahole](https://github.com/PhilArmstrong/pahole-gdb) to discover accurate +information. +::: + +EachΒ `Vertex`Β andΒ `Edge`Β object has a pointer toΒ a `Delta`Β object.Β The +`Delta`Β object stores all changes on a certainΒ `Vertex`Β orΒ `Edge`Β and that's +whyΒ `Vertex`Β andΒ `Edge`Β memory usage will be increased by the memory of +theΒ `Delta`Β objects they are pointing to. If there are few updates, there are +also fewΒ `Delta`Β objects because the latest data is stored in the object. +But, if the database has a lot of concurrent operations, manyΒ `Delta`Β objects +will be created. Of course, theΒ `Delta`Β objects will be kept in memory as long as +needed, and a bit more, because of the internal GC inefficiencies. + +### `Delta` memory layout + +Each `Delta` object has a least **104B**. + +### `Vertex` memory layout + +Each `Vertex` object has at least **112B** + **104B** for the `Delta` object, in +total, a minimum of **216B**. + +Each additional label takes **8B**. + +Keep in mind that three labels take as much space as four labels, and five to +seven labels take as much space as eight labels, etc., due to the dynamic +memory allocation. + +### `Edge` memory layout + +Each `Edge` object has at least **40B** + **104B** for the `Delta` object, in +total, a minimum of **144B**. + +### `SkipList` memory layout + +Each object (`Vertex`, `Edge`) is placed inside a data structure +called a `SkipList`. The `SkipList` has an additional overhead in terms of +`SkipListNode` structure and `next_pointers`. Each `SkipListNode` has an +additional **8B** element overhead and another **8B** for each of the `next_pointers`. + +It is impossible to know the exact number of **next_pointers** upfront, and +consequently the total size, but it's never more than **double the number of +objects** because the number of pointers is generated by binomial distribution +(take a look at [the source +code](https://github.com/memgraph/memgraph/blob/master/src/utils/skip_list.hpp) +for details). + +### Index memory layout + +Each `LabelIndex::Entry` object has exactly **16B**. + +Depending on the actual value stored, each `LabelPropertyIndex::Entry` has at least **72B**. + +Objects of both types are placed into the `SkipList`. + +#### Each index object in total + +- `SkipListNode` object has **24B**. +- `SkipListNode` has at least **80B**. +- Each `SkipListNode` has an additional **16B** because of the **next_pointers**. + +### Properties + +All properties use **1B** for metadata - type, size of property ID and the size +of payload in the case of `NULL` and `BOOLEAN` values, or size of payload size +indicator for other types (how big is the stored value, for example, integers +can be 1B, 2B 4B or 8b depending on their value). + +Then they take up **another byte** for storing property ID, which means each +property takes up at least 2B. After those 2B, some properties (for example, +`STRING` values) store addition metadata. And lastly, all properties store the +value. So the layout of each property is: + + +$\texttt{propertySize} = \texttt{basicMetadata} + \texttt{propertyID} + [\texttt{additionalMetadata}] + \texttt{value}.$ + + +|Value type |Size |Note +|-----------------|--------------------------------|-----------------------------------------------------------------------------------------------------| +|`NULL` |1B + 1B | The value is written in the first byte of the basic metadata. | +|`BOOL` |1B + 1B | The value is written in the first byte of the basic metadata. +|`INT` |1B + 1B + 1B, 2B, 4B or 8B | Basic metadata, property ID and the value depending on the size of the integer. | +|`DOUBLE` |1B + 1B + 8B | Basic metadata, property ID and the value | +|`STRING` |1B + 1B + 1B + min 1B | Basic metadata, property ID, additional metadata and lastly the value depending on the size of the string, where 1 ASCII character in the string takes up 1B.| +|`LIST` |1B + 1B + 1B + min 1B | Basic metadata, property ID, additional metadata and the total size depends on the number and size of the values in the list.| +|`MAP` |1B + 1B + 1B + min 1B | Basic metadata, property ID, additional metadata and the total size depends on the number and size of the values in the map.| +|`TEMPORAL_DATA` |1B + 1B + 1B + min 1B + min 1B | Basic metadata, property ID, additional metadata, seconds, microseconds. Value od the seconds and microseconds is at least 1B, but probably 4B in most cases due to the large values they store.| + +### Marvel dataset use case + +The Marvel dataset consists of `Hero`, `Comic` and `ComicSeries` labels, which +are indexed. There are also three label-property indices - on the `name` +property of `Hero` and `Comic` vertices, and on the `title` property of +`ComicSeries` vertices. The `ComicSeries` vertices also have the `publishYear` +property. + + + +There are 6487 `Hero` and 12,661 `Comic` vertices with the property `name`. +That's 19,148 vertices in total. To calculate how much storage those vertices +and properties occupy, we are going to use the following formula: + +$\texttt{NumberOfVertices} \times (\texttt{Vertex} + \texttt{properties} + \texttt{SkipListNode} + \texttt{next\_pointers} + \texttt{Delta}).$ + +Let's assume the name on average has $3\text{B}+10\text{B} = 13\text{B}$ (each +name is on average 10 characters long). One the average values are included, the +calculation is: + +$19,148 \times (112\text{B} + 13\text{B} + 16\text{B} + 16\text{B} + 104\text{B}) = 19,148 \times 261\text{B} = 4,997,628\text{B}.$ + +The remaining 2,584 vertices are the `ComicSeries` vertices with the `title` and +`publishYear` properties. Let's assume that the `title` property is +approximately the same length as the `name` property. The `publishYear` property +is a list of integers. The average length of the `publishYear` list is 2.17, +let's round it up to 3 elements. Since the year is an integer, 2B for each +integer will be more than enough, plus the 2B for the metadata. Therefore, each +list occupies $3 \times 2\text{B} \times 2\text{B} = 12\text{B}$. Using the same +formula as above, but being careful to include both `title` and `publishYear` +properties, the calculation is: + +$2584 \times (112\text{B} + 13\text{B} + 12\text{B} + 16\text{B} + 16\text{B} + 104\text{B}) = 2584 \times 273\text{B} = 705,432\text{B}.$ + +In total, $5,703,060\text{B}$ to store vertices. + +The edges don't have any properties on them, so the formula is as follows: + +$\texttt{NumberOfEdges} \times (\texttt{Edge} + \texttt{SkipListNode} + \texttt{next\_pointers} + \texttt{Delta}).$ + +There are 682,943 edges in the Marvel dataset. Hence, we have: + +$682,943 \times (40\text{B}+16\text{B}+16\text{B}+104\text{B}) = 682,943 \times 176\text{B} = 120,197,968\text{B}.$ + +Next, `Hero`, `Comic` and `ComicSeries` labels have label indices. To calculate +how much space they take up, use the following formula: + +$\texttt{NumberOfLabelIndices} \times \texttt{NumberOfVertices} \times (\texttt{SkipListNode} + \texttt{next\_pointers}).$ + +Since there are three label indices, we have the following calculation: + +$3 \times 21,723 \times (24\text{B}+16\text{B}) = 65,169 \times 40\text{B} = 2,606,760\text{B}.$ + +For label-property index, labeled property needs to be taken into account. +Property `name` is indexed on `Hero` and `Comic` vertices, while property +`title` is indexed on `ComicSeries` vertices. We already assumed that the +`title` property is approximately the same length as the `name` property. + +Here is the formula: + +$\texttt{NumberOfLabelPropertyIndices} \times \texttt{NumberOfVertices} \times (\texttt{SkipListNode} + \texttt{property} + \texttt{next\_pointers}).$ + +When the appropriate values are included, the calculation is: + +$3 \times 21,723 \times (80\text{B}+13\text{B}+16\text{B})= 65,169 \times 109\text{B} = 7,103,421\text{B}.$ + +Now let's sum up everything we calculated: + +$5,703,060\text{B} + 120,197,968\text{B} + 2,606,760\text{B} + 7,103,421\text{B} = 135,611,209 \text{B} \approx 130\text{MB}.$ + +Bear in mind the number can vary because objects can have higher overhead due to +the additional data. + +## Query execution memory usage + +Query execution also uses up RAM. In some cases, intermediate results are +aggregated to return valid query results and the query execution memory can end +up using a large amount of RAM. Keep in mind that query execution memory +monotonically grows in size during the execution, and it's freed once the query +execution is done. A general rule of thumb is to have double the RAM than what +the actual dataset is occupying. + +## Configuration options to reduce memory usage + +Here are several tips how you can reduce memory usage and increase scalability: + +1. Consider removing label index by executing `DROP INDEX ON :Label;` +2. Consider removing label-property index by executing `DROP INDEX + ON :Label(property);` +3. If you don't have properties on relationships, disable them in the + configuration file by setting the `-storage-properties-on-edges` flag to + `false`. This can significantly reduce memory usage because effectively + `Edge` objects will not be created, and all information will be inlined under + `Vertex` objects. You can disable properties on relationships with a + non-empty database, if the relationships are without properties. If you need + help with adapting the configuration to your needs, check out the the how-to + guide on [changing configuration settings](/how-to-guides/config-logs.md). + +You can also check our reference guide for information about [controlling memory +usage](/reference-guide/memory-control.md), and you +[inspect](/reference-guide/optimizing-queries/inspecting-queries.md) and +[profile](/reference-guide/optimizing-queries/profiling-queries.md) your queries +to devise a plan for their optimization. + +## Control memory usage + +In Memgraph, you can control memory usage by limiting, inspecting and +deallocating memory. + +You can control the memory usage of: + - a whole instance by setting the `--memory-limit` within the configuration file + - a query by including the `QUERY MEMORY` clause at the end of a query + - a procedure by including a `PROCEDURE MEMORY` clause + +### Controlling the memory usage of an instance + +By setting the `--memory-limit` flag in the configuration file, you can set +the maximum amount of memory (in MiB) that a Memgraph instance can allocate +during its runtime. If the memory limit is exceeded, only the queries that don't +require additional memory are allowed. If the memory limit is exceeded while a +query is running, the query is aborted and its transaction becomes invalid. + +If the flag is set to 0, it will use the default values. +Default values are: +- 90% of the total memory if the system doesn't have swap memory. +- 100% of the total memory if the system has swap memory. + +### Controlling the memory usage of a query + +Each Cypher query can include the following clause at the end: + +```plaintext +QUERY MEMORY ( UNLIMITED | LIMIT num (KB | MB) ) +``` + +If you use the `LIMIT` option, you have to specify the amount of memory a query +can allocate for its execution. You can use this clause in a query only once at +the end of the query. The limit is applied to the entire query. + +Examples: +```plaintext +MATCH (n) RETURN (n) QUERY MEMORY LIMIT 10 KB; +``` +```plaintext +MATCH (n) RETURN (n) QUERY MEMORY UNLIMITED; +``` +### Controlling the memory usage of a procedure + +Each procedure call can contain the following clause: + +```plaintext +PROCEDURE MEMORY ( UNLIMITED | LIMIT num ( KB | MB) ) +``` + +If you use the `LIMIT` option, you can specify the amount of memory that the +called procedure can allocate for its execution. If you use the `UNLIMITED` +option, no memory restrictions will be imposed when the procedure is called. If +you don't specify the clause, the memory limit is set to a default value of 100 MB. + +One procedure call can have only one `PROCEDURE MEMORY` clause at the end of the +call. If a query contains multiple procedure calls, each call can have its own +limit specification. + +Examples: +```plaintext +CALL example.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 KB YIELD result; +``` +```plaintext +CALL example.procedure(arg1, arg2, ...) PROCEDURE MEMORY LIMIT 100 MB YIELD result; +``` +```plaintext +CALL example.procedure(arg1, arg2, ...) PROCEDURE MEMORY UNLIMITED YIELD result; +``` + +### Inspecting memory usage + +Run the following query to inspect memory usage: + +```plaintext +SHOW STORAGE INFO; +``` + +Find out more about `SHOW STORAGE INFO` query on [Server stats](./server-stats.md). + +### Deallocating memory + +Memgraph has a garbage collector that deallocates unused objects, thus freeing +the memory. The rate of the garbage collection in seconds can be specified in +the configuration file by setting the `--storage-gc-cycle-sec`. + +You can free up memory by using the following query: + +```cypher +FREE MEMORY; +``` + +This query tries to clean up as much unused memory as possible without affecting +currently running transactions. + +## Storage modes + +Memgraph supports three different storage modes: +* `IN_MEMORY_TRANSACTIONAL` - the default database mode that favors + strongly-consistent ACID transactions using WAL files and snapshots, + but requires more time and resources during data import and analysis. +* `IN_MEMORY_ANALYTICAL` - speeds up import and data analysis but offers no ACID + guarantees besides manually created snapshots. +* `ON_DISK_TRANSACTIONAL` - supports ACID properties in the same way as `IN_MEMORY_TRANSACTIONAL` + with the additional ability to store data on disk (HDD or SSD) thus trading performance for lower costs. **Experimental** + + +### Switching storage modes + +You can switch between in-memory modes within a session using the following query: + +```cypher +STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL}; +``` + +When switching modes, Memgraph will wait until all other transactions are done. +If some other transactions are running in your system, you will receive a +warning message, so be sure to [set the log level to +`WARNING`](/reference-guide/configuration.md#other). + +Switching from the in-memory storage mode to the on-disk storage mode is allowed +when there is only one active session and the database is empty. As Memgraph Lab +uses multiple sessions to run queries in parallel, it is currently impossible to +switch to the on-disk storage mode within Memgraph Lab. You can change the +storage mode to on-disk transactional using `mgconsole`, then connect to the +instance with Memgraph Lab and query the instance as usual. + +To change the storage mode to `ON_DISK_TRANSACTIONAL`, use the following query: + +```cypher +STORAGE MODE ON_DISK_TRANSACTIONAL; +``` + +It is forbidden to change the storage mode from `ON_DISK_TRANSACTIONAL` to any +of the in-memory storage modes while there is data in the database as it might +not fit in the RAM. To change the storage mode to any of the in-memory storages, +empty the instance and restart it. An empty database will start in the default +storage mode (in-memory transactional). + +If you are running the Memgraph Enterprise Edition, you need to have +[`STORAGE_MODE` permission](/reference-guide/auth-module.md) to change the +storage mode. + +You can check the current storage mode using the following query: + +```cypher +SHOW STORAGE INFO; +``` + +An empty instance will always restart in in-memory transactional storage mode. +Upon restart, a non-empty instance in the on-disk storage mode will not change +storage mode, but the instance in an in-memory analytical storage mode will revert +to the default in-memory transactional storage mode. + +### In-memory transactional storage mode (default) + +`IN_MEMORY_TRANSACTIONAL` storage mode offers all ACID guarantees. WAL files and +periodic snapshots are created automatically, and you can also create snapshots +manually. + +In the `IN_MEMORY_TRANSACTIONAL` mode, Memgraph creates a +[`Delta`](/memgraph/under-the-hood/storage#delta-memory-layout) object each time +data is changed. Deltas are the backbone upon which Memgraph provides atomicity, +consistency, isolation, and durability - ACID. By using `Deltas`, Memgraph +creates [write-ahead-logs](/memgraph/reference-guide/backup#write-ahead-logging) +for durability, provides isolation, consistency, and atomicity (by ensuring that +everything is executed or nothing). + +Depending on the transaction [isolation +level](/memgraph/reference-guide/transactions#isolation-levels), other transactions may +see changes from other transactions. + +In the transactional storage mode, +[snapshots](/memgraph/reference-guide/backup#snapshots) are created periodically +or manually. They capture the database state and store it on the disk. A +snapshot is used to recover the database upon startup (depending on the setting +of the configuration flag `--storage-recover-on-startup`, which defaults to +`true`). + +When Memgraph starts creating a periodic snapshot, it is not possible to +manually create a snapshot, until the periodic snapshot is created. + +Manual snapshots are created by running the `CREATE SNAPSHOT;` query. + +### In-memory analytical storage mode + +In the transactional storage mode, Memgraph is fully [ACID +compliant](/reference-guide/backup.md) which could cause memory spikes during data +import because each time data is changed Memgraph creates +[`Delta`](/memgraph/under-the-hood/storage#delta-memory-layout) objects to +provides atomicity, consistency, isolation, and durability + +But `Deltas` also require a lot of memory (104B per change), especially when +there are a lot of changes (for example, during import with the `LOAD CSV` +clause). By switching the storage mode to `IN_MEMORY_ANALYTICAL` mode disables +the creation of `Deltas` thus drastically speeding up import with lower memory +consumption - up to 6 times faster import with 6 times less memory consumption. + +If you want to enable ACID compliance, you can switch back to +`IN_MEMORY_TRANSACTIONAL` and continue with regular work on the database or you +can take advantage of the low memory costs of the analytical mode to run +analytical queries that will not change the data, but be aware that no backup is +created automatically, and there are no ACID guarantees besides manually created +snapshots. There are no `WAL` files created nor periodic snapshots. Users +**can** create a snapshot manually. + +#### Transactions + +In the analytical storage mode, there are no ACID guarantees and other +transactions can see the changes of ongoing transactions. Also, a [transaction +can see the changes it is +doing](/memgraph/reference-guide/transactions#isolation-levels). This means that +the transactions can be committed in random orders, and the updates to the data, +in the end, might not be correct. + +#### WAL + +As mentioned, no [write-ahead +logs](/memgraph/reference-guide/backup#write-ahead-logging) are created in the +`IN_MEMORY_ANALYTICAL` mode. When switching back to the +`IN_MEMORY_TRANSACTIONAL` mode it is recommended to create a snapshot manually +with `CREATE SNAPSHOT;` Cypher query. Once Memgraph switches to the +`IN_MEMORY_TRANSACTIONAL` mode, for all new updates it will create a WAL if not +otherwise instructed by the config file. + +#### Snapshots + +[Snapshots](/memgraph/reference-guide/backup#snapshots) capture the database +state and store it on the disk. A snapshot is used to recover the database upon +startup (depending on the setting of the configuration flag +`--storage-recover-on-startup`, which defaults to `true`). + +In Memgraph, snapshots are created periodically or manually. + +In the `IN_MEMORY_ANALYTICAL` mode, periodic snapshots are **disabled**. + +Manual snapshots are created by running the `CREATE SNAPSHOT;` query. When the +query is run in the `IN_MEMORY_ANALYTICAL` mode, Memgraph guarantees that it +will be **the only** transaction present in the system, and all the other +transactions will wait until the snapshot is created to ensure its validity. + +### On-disk transactional storage mode + +In the on-disk transactional storage mode, disk is used as a physical storage +which allows you to save more data than the capacity of your RAM. This helps +keep the hardware costs to a minimum, but you should except slower performance +when compared to the in-memory transactional storage mode. Keep in mind that +while executing queries, all the graph objects used in the transactions still +need to be able to fit in the RAM, or Memgraph will throw an exception. + +#### Architecture + +RocksDB is used as a background storage to serialize nodes and relationships +into a key-value format. The used architecture is also known as "larger than +memory" as it enables in-memory databases to save more data than the main memory +can hold, without the performance overhead caused by the buffer pool. + +The imported data is residing on the disk, while the main memory contains two +caches, one executing operations on main RocksDB instance and the other for +operations that require indices. In both cases, Memgraph's custom +`SkipList` cache is used, which allows a multithreaded read-write access pattern. + +#### MVCC + +Concurrent execution of transactions is supported differently for on-disk +storage than for in-memory. The in-memory storage mode relies on delta objects +which store the exact versions of data at the specific moment in time. +Therefore, the in-memory storage mode uses a pessimistic approach and +immediately checks whether there is a conflict between two transactions. + +In the on-disk storage mode, the cache is used per transaction. This +significantly simplifies object management since there is no need to question +certain object's validity, but it also requires the optimistic approach for +conflict resolution between transactions. + +In the on-disk storage mode, the conflict is checked at the transaction's commit +time with the help of RocksDB's transaction support. This also implies that +deltas are cleared after each transaction, which can optimize memory usage +during execution. Deltas are still used to fully support Cypher's semantic of +the write queries. The design of the on-disk storage also simplifies the process +of garbage collection, since all the data is on disk. + +#### Isolation level + +The on-disk storage mode supports only snapshot isolation level. Mostly because +it's the Memgraph viewpoint that snapshot isolation should be the default +isolation level for most applications relying on databases. But the snapshot +isolation level also simplifies the query's execution flow since no data is transferred to the disk until the commit of the transaction. + +#### Indices + +The on-disk storage mode supports both label and label-property indices. They +are stored in separate RocksDB instances as key-value pairs so that the access +to the data is faster. Whenever the indexed node is accessed, it's stored into a +separate in-memory cache to maximize the reading speed. + +#### Constraints + +The on-disk storage mode supports both existence and uniqueness constraints. +Existence constraints don't use context from the disk since the validity of +nodes can be checked by looking only at this single node. On the other side, +uniqueness constraints require a different approach. For a node to be valid, the +engine needs to iterate through all other nodes under constraint and check +whether a conflict exists. To speed up this iteration process, nodes under +constraint are stored into a separate RocksDB instance to eliminate the cost of +iterating over nodes which are not under constraint. + +#### Data formats + +Below is the format in which data is serialized to the disk. + +Vertex format for main disk storage: +Key - `label1, label2, ... | vertex gid | commit_timestamp` +Value - `property1, property2` + +Edge format for the main disk storage: +Key - `from vertex gid | to vertex gid | 0 | edge type | edge gid | commit_timestamp` +Value - `property1, property2` +`0` is a placeholder for edge direction in future. + +Format for label index on disk: + +Key - `indexing label | vertex gid | commit_timestamp` + +Value - `label1_id, label2_id, ... | property1, property2, ...` + +Value does not contain `indexing label`. + +Format for label-property index on disk: + +Key - `indexing label | indexing property | vertex gid | commit_timestamp` + +Value - `label1_id, label2_id, ... | property1, property2, ...` + +Value does not contain `indexing label`. + +#### Durability + +In the on-disk storage mode, durability is supported by RocksDB since it keeps +its own +[WAL](https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-%28WAL%29) files. +Memgraph persists the metadata used in the implementation of the on-disk +storage. + +#### Memory control + +If the workload is larger than memory, a single transaction must fit into the +memory. A memory tracker track all allocations happening throughout the +transaction's lifetime. Disk space also has to be carefully managed. Since the +timestamp is serialized together with the raw node and relationship data, the +engine needs to ensure that when the new version of the same node is stored, the +old one is deleted. + +#### Replication + +At the moment, the on-disk storage doesn't support replication. + diff --git a/docs2/fundamentals/telemetry.md b/docs2/fundamentals/telemetry.md new file mode 100644 index 00000000000..0399a460c1a --- /dev/null +++ b/docs2/fundamentals/telemetry.md @@ -0,0 +1,45 @@ +--- +id: telemetry +title: Telemetry +sidebar_label: Telemetry +--- + +Telemetry is an automated process that collects data at a remote point. We at Memgraph use telemetry data for the soul purpose of improving our products by focusing on areas that we believe are important to users. Telemetry is **completely optional** and can be **[fully disabled](#how-to-disable-telemetry)** before starting the database. + +## What kind of data is collected? + +While a Memgraph database instance is running and an open internet connection is available, the following data will be sent to and stored on our servers: +* **Information about the host machine** + * CPU model + * Memory information + * Host OS + * Kernel information +* **Database runtime information** + * CPU usage + * Memory usage + * The number of vertices and edges stored in the database + * Event counters (for example, number of failed queries or ScanAll operator calls) + * Query module calls* + +\***Only the names** of the query module and procedure are recorded. + +No personal information is sent in the process of collecting telemetry data. Each database generates a unique identifier by which we can group data coming from the same database instance. This unique identifier is in no way connected to other personal information about the user. + +## Why do we collect this data? + +Telemetry data is used by Memgraph's developers for the purpose of developing new functionalities and the general maintenance of our products. By analyzing the host machine environment and runtime information, we can further optimize our products to better suit specific user needs. + +For example, if there is a considerable number of users who regularly call NetworkX query modules, we would invest more resources in the development of similar new features and extending the support for implemented ones. + +As is often the case, we need to prioritize certain goals over others. A data-driven understanding of product usage will help us prioritize features that are more likely to benefit a larger subset of our users. + +## How to disable telemetry? + +Telemetry is **completely optional** and can be fully disabled when starting the database. There are two ways to disable Memgraph's telemetry features: +* In `/etc/memgraph/memgraph.conf` change the line `--telemetry-enabled=true` to `--telemetry-enabled=false` +* Include `--telemetry-enabled=false` as a command-line argument when starting the database + +## Additional remarks + +We fully understand the need for user privacy which is why we made the telemetry feature completely optional and provided this article to cultivate transparent communication with the developer community. Your feedback is very much appreciated, and telemetry data is only a way of receiving such feedback. +If you wish to get in touch with us, you can always send us an email to [tech@memgraph.com](mailto:tech@memgraph.com) or join our community on [Discord](https://www.discord.gg/memgraph). diff --git a/docs2/fundamentals/transactions.md b/docs2/fundamentals/transactions.md new file mode 100644 index 00000000000..400f11e1c0b --- /dev/null +++ b/docs2/fundamentals/transactions.md @@ -0,0 +1,240 @@ +--- +id: transactions +title: Transactions +sidebar_label: Transactions +--- + +All Cypher queries are run within transactions, which means that all modification +made by a single query are held in memory by the transaction until the query +is successfully executed. The changes are then committed and visible to all +other transactions, users and systems. In the case of an error, the transaction +is rolled back and no changes are committed. + +These automatic transactions are also called implicit transactions. + +Users can also create explicit transactions to execute multiple Cypher queries +in sequence, then commit them or roll them back. + +During transaction execution, an important property of a database is the +isolation level that defines how or when the changes made by one operation +become visible to others. + +## Explicit transactions + +To start a transaction, run the `BEGIN;` query. + +All the following queries will be executed as a part of a single transaction. + +If any of the queries fails, further queries will no longer be successfully +executed and it won't be possible to commit the transaction. + +Commit successful transactions by executing the `COMMIT;` query. +Roll back unsuccessful transactions by executing the `ROLLBACK;` query. + +## Managing transactions + +Memgraph can return information about running transactions and allow you to terminate them. + +### Show running transactions + +To get information about running transaction execute the following query: + +```cypher +SHOW TRANSACTIONS; +``` + +The query shows only the transactions you started or transactions for which you +have the necessary [privileges](#privileges-needed-to-manage-all-transactions). + + + +If you are connecting to Memgraph using a client, you can pass additional +metadata when starting a transaction (if the client supports additional +metadata) which will be visible when running the `SHOW TRANSACTIONS;` query, +thus allowing you to identify each transaction precisely. + +The Python example below demonstrates how to pass metadata for +both an implicit and explicit transaction: + +```python +import neo4j + +def main(): + driver = neo4j.GraphDatabase.driver("bolt://localhost:7687", auth=("user","pass")) + + s1 = driver.session() + tx = s1.begin_transaction(metadata={"where":"in explicit tx", "my_uuid":1}) + tx.run("MATCH (n) RETURN n LIMIT 1") + + s2 = driver.session() + query=neo4j.Query("SHOW TRANSACTIONS", metadata={"where":"in implicit tx", "my_uuid":2}) + print(s2.run(query).values()) + + tx.close() + s1.close() + s2.close() + +if __name__ == '__main__': + main() +``` + +### Terminate transactions + +To terminate one or more transactions, you need to open a new session and use the following query: + +```cypher +TERMINATE TRANSACTIONS "tid", "", "", ... ; +``` + +The `tid` is the transactional ID that can be seen using the `SHOW TRANSACTIONS;` query. + +The `TERMINATE TRANSACTIONS` query signalizes to the thread executing the +transaction that it should stop the execution. No violent interruption will +happen, and the whole system will stay in a consistent state. To terminate the +transaction you haven't started, you need to have the necessary +[privileges](#privileges-needed-to-manage-all-transactions). + +#### Terminating custom procedures + +If you want to be able to [terminate custom +procedures](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md), +crucial parts of the code, such as `while` and `until` loops, or similar points +where the procedure might become costly, need to be preceded with +`CheckMustAbort()` function. + +### Privileges needed to manage all transactions + +By default, the users can see and terminate only the transactions they started. For all other transactions, the user must have the **TRANSACTION_MANAGEMENT** privilege which the admin assigns with the following query: + +```cypher +GRANT TRANSACTION_MANAGEMENT TO user; +``` + +The privilege to see all the transactions running in Memgraph is revoked using the following query: + +```cypher +REVOKE TRANSACTION_MANAGEMENT FROM user; +``` + +:::info +When Memgraph is first started there is only one explicit super-admin user that has all privileges, including the **TRANSACTION_MANAGEMENT**. The super-admin user is able to see all transactions. +::: + +### Example + +Managing transactions is done by establishing a new connection to the database. + +#### New session with Docker + +If you are using **Memgraph Lab**, you can vertically split screens and open another +Query Execution section. + +If you are using **mgconsole** on an instance running in a Docker container: + +1. Open a new terminal and find the CONTAINER ID of the Memgraph Docker container: + + ``` + docker ps + ``` + +2. Enter the container with the following command: + + ``` + docker exec -it CONTAINER ID bash + ``` +3. Execute `mgconsole` command to run the client + +4. Run the `SHOW TRANSACTIONS;` and `TERMINATE TRANSACTIONS tid;` + +#### Show and terminate transactions + +The output of the `SHOW TRANSACTIONS` command shows that an infinite query is +currently being run as part of the transaction ID "9223372036854775809". + +To terminate the transaction, run the following query: + +```cypher +TERMINATE TRANSACTIONS "9223372036854775809"; +``` + +Upon the transaction termination, the following confirmation will appear: + + + +The following message will appear in the session in which the infinite query was being run: + + + +## Isolation levels + +In database systems, isolation determines how transaction integrity is visible +to other users and systems. + +A lower isolation level allows many users to access the same data at the same +time but increases the number of concurrency effects (such as dirty reads or +lost updates). A higher isolation level secures data consistency but requires +more system resources and increases the chances that one transaction will block +another. + +Memgraph currently supports three isolation levels, from the highest to the +lowest: + - SNAPSHOT_ISOLATION (default) - guarantees that all reads made in a + transaction will see a consistent snapshot of the database, and the + transaction itself will successfully commit only if no updates it has made + conflict with any concurrent updates made since that snapshot. + - READ_COMMITTED - guarantees that any data read was committed at the moment it + is read. + - READ_UNCOMMITTED - one transaction may read not yet committed changes made by + other transactions. + +To check the current isolation level run the following query: + +```cypher +SHOW STORAGE INFO; +``` + +### Setting the isolation level + +To change the isolation level, change the `--isolation-level` configuration flag +to any of the supported values. If you need help changing the configuration, +check out [the how-to guide](/how-to-guides/config-logs.md). + +You can change the initially set isolation level when Memgraph is running in the +[`IN_MEMORY_TRANSACTIONAL` mode](/reference-guide/storage-modes.md) using the +following query: + +```cypher +SET TRANSACTION ISOLATION LEVEL +``` + +`` defines the scope to which the isolation level change should apply: + - GLOBAL - apply the new isolation level globally + - SESSION - apply the new isolation level only for the current session + - NEXT - apply the new isolation level only for the next transaction in the current session + +`` defines the isolation level: + - SNAPSHOT ISOLATION + - READ COMMITTED + - READ UNCOMMITTED + +## Storage modes + +Memgraph can work in `IN_MEMORY_ANALYTICAL`, `IN_MEMORY_TRANSACTIONAL` or +`ON_DISK_TRANSACTIONAL` [storage mode](/reference-guide/storage-modes.md). +`IN_MEMORY_TRANSACTIONAL` is the default mode in which Memgraph runs on startup. + +`IN_MEMORY_TRANSACTIONAL` mode offers all mentioned isolation levels and all +ACID guarantees. `IN_MEMORY_ANALYTICAL` offers no isolation levels and no ACID +guarantees. Multiple transactions can write data to Memgraph simultaneously. One +transaction can therefore see all the changes from other transactions. + +`ON_DISK_TRANSACTIONAL` storage mode uses only snapshot isolation. + +There can't be any active transactions if you want to switch from one in-memory +mode to another. Memgraph will log a warning message if it finds any active +transactions, so set the log level to `WARNING` to see them. No other +transactions will take place during the switch between modes. + +When changing the storage mode to on-disk, there should be only one active +session and the database must be empty. The database also needs to be empty if +you want to change the storage mode from on-disk to in-memory. diff --git a/docs2/fundamentals/triggers.md b/docs2/fundamentals/triggers.md new file mode 100644 index 00000000000..69e3a9ccbca --- /dev/null +++ b/docs2/fundamentals/triggers.md @@ -0,0 +1,485 @@ +--- +id: triggers +title: Triggers +sidebar_label: Triggers +--- + +**Database triggers** are an integral part of most database systems. A trigger is a procedural code that is automatically executed in response to specific events. Events are related to some change in data, such as created, updated and deleted data records. The trigger is often used for maintaining the integrity of the information in the database. For example, in a graph database, when a new property is added to the Employee node, a new Tax, Vacation, and Salary node should be created, along with the relationships between them. Triggers can also be used to log historical data, for example, to keep track of employees' previous salaries. + +[![Related -How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/set-up-triggers.md) + +## Introduction + +Memgraph supports running openCypher clauses after a certain event happens +during database transaction execution, i.e. triggers. + +You can **create**, **delete** and **print** triggers. All the triggers are +persisted on the disk, so no information is lost on database reruns. + +## Creating a trigger + +To create a new trigger, a query of the following format should be used: + +```plaintext +CREATE TRIGGER trigger_name ( ON ( () | --> ) CREATE | UPDATE | DELETE ) +( BEFORE | AFTER ) COMMIT +EXECUTE openCypherStatements +``` +As you can see from the format, you can choose on what object event needs to happen , on `()` node or `-->` relationship. After that you can define on what type of event you what to execute the trigger `CREATE`, `UPDATE` or `DELETE`. After the `EXECUTE` is series of Cypher clauses you want to execute. + +An example of a trigger would be: + +```cypher +CREATE TRIGGER exampleTrigger +ON UPDATE AFTER COMMIT EXECUTE +UNWIND updatedObjects AS updatedObject +WITH CASE + WHEN updatedObject.vertex IS NOT null THEN updatedObject.vertex + WHEN updatedObject.edge IS NOT null THEN updatedObject.edge + END AS object +SET object.updated_at = timestamp(); +``` + +The query may seem complex, so let's break it down: +* `CREATE TRIGGER exampleTrigger`: This statement creates the trigger. Here the + part `exampleTrigger` is the name of the trigger and it must be unique. +* `ON UPDATE AFTER COMMIT EXECUTE`: This statement specifies what kind of event + should activate the execution of trigger. This one will be triggered for every update + operation and the query below will be executed after the update event has been + committed. +* `UNWIND updatedObjects AS updatedObject`: If multiple objects were updated, + unwind the list and go over each one. +* `WITH CASE...`: The `CASE` expression checks what type of object was updated, + a node (vertex) or a relationship (edge). +* `SET object.updated_at = timestamp();`: Add an `updated_at` property to the + object indicating when the action happened. + +### Trigger name +Each created trigger must have a globally unique name. This implies that you +can't have a pair of triggers with the same name, even if they apply to +different events. + +### Event type +Optionally, users can define on which event a trigger should execute its +statements. The event type is defined using the following part: + +```plaintext +ON ( () | --> ) CREATE | UPDATE | DELETE +``` + +There are three main event types: + - CREATE + - UPDATE + - DELETE + +For each event type, users can specify whether to execute the trigger statements +only on the events that happened on a vertex, or on an edge. Vertices are +denoted with `()`, and edges with `-->`. + +Few examples would be: +* `ON CREATE` - trigger the statements only if an object (vertex and/or edge) + was created during the transaction execution. +* `ON () UPDATE` - trigger the statements only if a vertex was updated (e.g. + property was set on it) during the transaction execution. +* `ON --> DELETE` - trigger the statements only if an edge was deleted during + the transaction execution. + +Each event comes with certain information that can be used in the openCypher +statements the trigger executes. The information is contained in the form of +[predefined variables](#predefined-variables). + +If no event type is specified, the trigger executes its statements every time, +and all the predefined variables can be used. + +### Statement execution phase +A trigger can execute its statements at a specified phase, before or after +committing the transaction that triggered it. If the `BEFORE COMMIT` option is +used, the trigger will execute its statements as part of that transaction before +it's committed. If the `AFTER COMMIT` option is used, the trigger will execute +its statements asynchronously after that transaction is committed. + +### Execute statements +A trigger can execute any valid openCypher query. No specific constraints are +imposed on the queries. The only way trigger queries (i.e. statements) differ +from standard queries is that a trigger query may use predefined variables, +which are based on the event type specified for the trigger. + +### Predefined variables +Statements that a trigger executes can contain certain predefined variables +which contain information about the event that triggered it. Values of +predefined variables are determined by database transactions, that is, by all +the creations, updates or deletes that are part of a single transaction. + +Based on the event type, the following predefined variables are available: + +| Event type | Predefined variables | +| ---------- | -------------------- | +| ON CREATE | createdVertices, createdEdges, createdObjects| +| ON () CREATE | createdVertices | +| ON --> CREATE | createdEdges | +| ON UPDATE | setVertexProperties, setEdgeProperties, removedVertexProperties, removedEdgeProperties, setVertexLabels, removedVertexLabels, updatedVertices, updatedEdges, updatedObjects | +| ON () UPDATE | setVertexProperties, removedVertexProperties, setVertexLabels, removedVertexLabels, updatedVertices | +| ON --> UPDATE | setEdgeProperties, removedEdgeProperties, updatedEdges | +| ON DELETE | deletedVertices, deletedEdges, deletedObjects | +| ON () DELETE | deletedVertices | +| ON --> DELETE | deletedEdges | +| no event type specified | All predefined variables can be used | + + +#### createdVertices +List of all created vertices. + +#### createdEdges +List of all created edges + +#### createdObjects +List of all created objects where each element is a map. If the element contains +a created vertex, it will be in the following format +```json +{ + "event_type": "created_vertex", + "vertex": created_vertex_object +} +``` + +If the element contains a created edge, it will be in the following format +```json +{ + "event_type": "created_edge", + "edge": created_edge_object +} +``` + +#### deletedVertices +List of all deleted vertices. + +#### deletedEdges +List of all deleted edges + +#### deletedObjects +List of all deleted objects where each element is a map. If the element contains +a deleted vertex, it will be in the following format +```json +{ + "event_type": "deleted_vertex", + "vertex": deleted_vertex_object +} +``` + +If the element contains a deleted edge, it will be in the following format +```json +{ + "event_type": "deleted_edge", + "edge": deleted_edge_object +} +``` + +#### General notes about the predefined variables for updates +Setting an element to `NULL` is counted as a removal. The changes are looked at +on the transaction level only. That means if the value under a property on the +same object was changed multiple times, only one update will be generated. The +same applies for the labels on the vertex. + +#### setVertexProperties +List of all set vertex properties. Each element is in the following format: +```json +{ + "vertex": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property, + "new": new_value_of_that_property +} +``` + +#### setEdgeProperties +List of all set edge properties. Each element is in the following format: +```json +{ + "edge": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property, + "new": new_value_of_that_property +} +``` + +#### removedVertexProperties +List of all removed vertex properties. Each element is in the following format: +```json +{ + "vertex": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property +} +``` + +#### removedEdgeProperties +List of all removed edge properties. Each element is in the following format: +```json +{ + "vertex": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property +} +``` + +#### setVertexLabels +List of all set vertex labels. Each element is in the following format: +```json +{ + "label": label, + "vertices": list_of_updated_vertices +} +``` + +#### removedVertexLabels +List of all removed vertex labels. Each element is in the following format: +```json +{ + "label": label, + "vertices": list_of_updated_vertices +} +``` + +#### updatedVertices +List of updates consisting of set and removed properties, and set and removed +labels on vertices. + +#### updatedEdges +List of updates consisting of set and removed properties on edges. + +#### updatedObjects +List of updates consisting of set and removed properties on edges and vertices, +and set and removed labels on vertices. + +#### Elements of the predefined variables for update +Each element has a similar format as the previously defined elements. + +If the element contains information about a set vertex property, it's in the +following format: +```json +{ + "event_type": "set_vertex_property", + "vertex": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property, + "new": new_value_of_that_property +} +``` + +If the element contains information about a removed vertex property, it's in the +following format: +```json +{ + "event_type": "removed_vertex_property", + "vertex": updated_vertex_object, + "key": property_that_was_updated, + "old": old_value_of_that_property +} +``` + +If the element contains information about a set edge property, it's in the +following format: +```json +{ + "event_type": "set_edge_property", + "edge": updated_edge_object, + "key": property_that_was_updated, + "old": old_value_of_that_property, + "new": new_value_of_that_property +} +``` + +If the element contains information about a removed edge property, it's in the +following format: +```json +{ + "event_type": "removed_edge_property", + "edge": updated_edge_object, + "key": property_that_was_updated, + "old": old_value_of_that_property +} +``` + +If the element contains information about a set vertex label, it's in the +following format: +```json +{ + "event_type": "set_vertex_label", + "vertex": updated_vertex_object, + "label": label +} +``` + +If the element contains information about a removed vertex label, it's in the +following format: +```json +{ + "event_type": "removed_vertex_label", + "vertex": updated_vertex_object, + "label": label +} +``` +### Owner + +The user who executes the create query is going to be the owner of the trigger. +Authentication and authorization are not supported in Memgraph Community, thus +the owner will always be `Null`, and the privileges are not checked in Memgraph +Community. In Memgraph Enterprise the privileges of the owner are used when +executing `openCypherStatements`, in other words, the execution of the +statements will fail if the owner doesn't have the required privileges. More +information about how the owner affects the trigger can be found in the +[reference guide](reference-guide/security.md#owners). + +## Dropping a trigger +A trigger can be removed by running the following query: + +```plaintext +DROP TRIGGER trigger_name; +``` + +## Trigger info +Users can get info about all the triggers by using the following query: + +```plaintext +SHOW TRIGGERS; +``` +which returns results in the following format: + +|trigger name| statement | event type | phase | owner | +|----------- |---------- | -----------| ------|-------| +| name of the trigger| statement which the trigger executes | event which triggers the statement | phase at which the trigger executes its statement | owner of the trigger or `Null` | + + +## How to guide + +Memgraph supports **database triggers** that can be executed if a particular +type of event occurs. Events are related to changes in data, such as created, +updated, and deleted nodes or relationships. + +[![Related - Reference Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/triggers.md) + + +### How to create a trigger? + +You can create a trigger by executing Cypher clauses. Creating a trigger will +ensure that some procedural code is executed on a certain type of event. All +created triggers are persisted on the disk, which means they will be active on +database reruns and no information is ever lost. + +#### Trigger execution upon node creation + +Node creation is the most common event your Memgraph database can react to. For +example, you need to update some values on a created node. If you need a trigger +after the node (vertex) has been created, you can set up the following trigger: + +```cypher +CREATE TRIGGER createVertex +ON () CREATE AFTER COMMIT EXECUTE +UNWIND createdVertices AS createdVertex +SET createdVertex.created = timestamp() +``` + +Here the trigger's name is `createdVertex` and should be unique. Cypher clause +`ON CREATE` defines on what event will trigger be executed. `AFTER COMMIT +EXECUTE` means the trigger will be executed after changes have been committed to +the database. For the sake of ease of use, triggers have a set of **predefined +variables**. One of them is `createdVertices`, a list of all created nodes +(vertices). In this example, the list is unwound by a Cypher clause `UNWIND`. To +find a complete list of predefined variables, supported operations, and +configuration details, look at triggers [reference +guide](/reference-guide/triggers.md). + +In this trigger, a node is getting a timestamp upon creation via the `SET +createdVertex.created = timestamp()` Cypher clause. + +#### Trigger execution upon node update + +Node property updates are common in graphs, in order to react to them, you can +create a trigger for that type of event: + +```cypher +CREATE TRIGGER updateVertex +ON () UPDATE AFTER COMMIT EXECUTE +UNWIND updatedVertices AS updatedVertex +SET updatedVertex.updated_at = timestamp() + +``` +The trigger for node updates is almost the same as node creation. Notice +different predefined variable `updatedVertices` and `ON UPDATE` Cypher clause. + +In this trigger, a node is getting a new updated timestamp via `SET +updatedVertex.updated_at = timestamp()` Cypher clause. + +#### Trigger execution upon node or relationship creation + +You can also set up a trigger for multiple events, such as node or relationship +creation. It doesn't matter what will be created, a node or relationship, but +the event will execute the trigger. A sample query for that kind of trigger: + +```cypher +CREATE TRIGGER exampleTrigger +ON CREATE AFTER COMMIT EXECUTE +UNWIND createdObjects AS createdObject +WITH CASE + WHEN createdObject.vertex IS NOT null THEN createdObject.vertex + WHEN createdObject.edge IS NOT null THEN createdObject.edge + END AS object +SET object.created_at = timestamp(); +``` + +A predefined variable `createdObjects` is a list of dictionaries. Each +dictionary contains information about the created object, which can be either a +node or a relationship. The object's key `event_type` is set based on the +dictionary and information within it, and the value of the key `vertex` or `edge` +(depending on the type of object) is set to that created object. + +In this trigger, the node or relationships property `created_at` is set to the +current timestamp value via `SET object.created_at = timestamp();` Cypher +clause. + +### How to create a trigger for Python query module? +If you want a trigger to be activated by executing code from a Python query +module, you can call the query module from the trigger. In the example below, +the trigger will call `query_module.new_edge(edge)` each time a new relationship +(edge) is created: + +```cypher +CREATE TRIGGER newEdge +ON CREATE BEFORE COMMIT EXECUTE +UNWIND createdEdges AS edge +CALL query_module.new_edge(edge) YIELD *; +``` + +Make sure your function accepts the proper Memgraph type, `mgp.Edge` in this +case. + +```python +def new_edge( + context: mgp.ProcCtx, + edge: mgp.Edge +) +``` +Memgraph Python API is defined by `mgp.py` script, and in it, you can find all +supported types such as `mgp.Edge`, `mgp.Vertex` etc. If you want to explore the +API further, feel free to check the reference guide on [Python +API](/reference-guide/query-modules/implement-custom-query-modules/api/python-api.md). + +### How to create a trigger for dynamic algorithms? + +Dynamic algorithms are often designed for dataset updates. With a trigger, you +can ensure that any dataset is up to date and consistent. In the sample code +below, a trigger is set to use MAGE `pagerank_online` algorithm. For more +details on dynamic algorithms, visit [MAGE +docs](/mage/query-modules/available-queries). In this +case, all created or deleted objects are passed from the database transaction to +the trigger. After each transaction that has created or deleted objects, the +trigger will automatically execute the PageRank algorithm and update the rank +property. This will ensure data consistency and lead to performance benefits. + +```cypher +CREATE TRIGGER pagerank_trigger +BEFORE COMMIT +EXECUTE CALL pagerank_online.update(createdVertices, createdEdges, deletedVertices, deletedEdges) +YIELD node, rank +SET node.rank = rank; +``` \ No newline at end of file diff --git a/docs2/getting-started/cli.md b/docs2/getting-started/cli.md new file mode 100644 index 00000000000..e72e233b0c8 --- /dev/null +++ b/docs2/getting-started/cli.md @@ -0,0 +1,271 @@ +--- +id: cli +title: Command line interface +sidebar_label: CLI +--- + +import Tabs from "@theme/Tabs"; import TabItem from "@theme/TabItem"; + +The easiest way to execute Cypher queries against Memgraph is by using +Memgraph's command-line tool, **mgconsole**. + +## 1. Install mgconsole + +:::tip + +If you installed **Memgraph Platform** with the Docker image +(`memgraph/memgraph-platform`), mgconsole will start automatically when you run +the container. Skip the installation steps and continue with [executing +Cypher queries](#execute-cypher-queries). + +If you installed any other Docker image or want to explicitly start the +mgconsole from the Memgraph Platform image, you need to manually run mgconsole +following the steps described below. + +::: + +If you want to install or run mgconsole to query a running Memgraph database +instance, use the following steps: + + + + +If you installed MemgraphDB using Docker or closed the terminal with the running mgconsole from Memgraph +Platform image, run the mgconsole client from your Docker image using the following commands: + +**1.** First, you need to find the `CONTAINER_ID` by running: + +```terminal +docker ps +``` + +**2.** Once you know the `CONTAINER_ID`, you can start mgconsole by running the following command: + +```terminal +docker exec -it CONTAINER_ID mgconsole +``` + + + + +**1.** Download mgconsole from the [Download +Hub](https://memgraph.com/download#mgconsole). + +**2.** From PowerShell, start mgconsole with the command: + +```terminal +./mgconsole --host HOST --port PORT +``` + +If Memgraph is running locally using the default configuration, start +mgconsoleΒ with: + +```terminal +./mgconsole --host 127.0.0.1 --port 7687 +``` + + + + +**1.** Download mgconsole from the [Download +Hub](https://memgraph.com/download#mgconsole). + +**2.** From the terminal, provide execution permission to the current user: + +```terminal +chmod u+x ./mgconsole +``` + +**3.** Start mgconsole with the command: + +```terminal +./mgconsole --host HOST --port PORT +``` + +If Memgraph is running locally using the default configuration, start +mgconsoleΒ with: + +```terminal +./mgconsole --host 127.0.0.1 --port 7687 +``` + + + + +:::note + +We will soon release a downloadable Debian package, so you don't have to install +mgconsole from source. + +::: + +**1.** Follow the instructions on how to [build and +install](https://github.com/memgraph/mgconsole#building-and-installing) +mgconsole from source. + +**2.** Start mgconsole with the command: + +```terminal +mgconsole --host HOST --port PORT +``` + +If Memgraph is running locally using the default configuration, start +mgconsoleΒ with: + +```terminal +mgconsole --host 127.0.0.1 --port 7687 +``` + + + + +## 2. Execute a Cypher query {#execute-cypher-queries} + + +After the client has started, it should present a command prompt similar to: + +``` +mgconsole X.X +Connected to 'memgraph://127.0.0.1:7687' +Type :help for shell usage +Quit the shell by typing Ctrl-D(eof) or :quit +memgraph> +``` + +At this point, it is possible to execute Cypher queries against a running +Memgraph database instance. + +:::tip + +You can use the `TAB` key to autocomplete commands in `mgconsole`. + +::: + +Each query needs to end with the `;` (*semicolon*) character. For example: + +```cypher +CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); +``` + +The above query will create 2 nodes in the database, one labeled "User" with name +"Alice" and the other labeled "Software" with name "Memgraph". It will also +create a relationship that "Alice" *likes* "Memgraph". + +To find created nodes and relationships, execute the following query: + +```cypher +MATCH (u:User)-[r]->(x) RETURN u, r, x; +``` + +### Query execution time + +To get a breakdown of the execution time, set the `-verbose_execution_info` flag +to `true`. + +Upon query execution you will get this information: + +```bash +Query COST estimate: 3066 +Query PARSING time: 0.000175982 sec +Query PLAN EXECUTION time: 0.0154524 sec +Query PLANNING time: 8.054e-05 sec +``` + +The values show: + +- COST estimate - Internal planner estimation on the cost of the query. When comparing two query executions, an order of magnitude larger COST estimates might indicate the query's longer execution time. +- PARSING time - Time spent checking if the query is valid and normalizing it for cache. +- PLAN EXECUTION time - Time executing the plan. +- PLANNING time - Time it takes the query planner to create the optimal plan to execute the query. + +## Configure mgconsole + +Below are configurational flags you can use with mgconsole: + +### Main + +| Flag | Description | Type | Default | +|--------------------------|-------------------------------------------------------------------------------------------------------------------------|---------|---------| +| -csv_delimiter | Character used to separate fields. | string | "," | +| -csv_doublequote | Controls how instances of the quotechar(") appearing inside a field should themselves be quoted. When `true`, the character is doubled. When `false`, the escapechar is used as a prefix to the quotechar. If `csv_doublequote` is `false`, `csv_escapechar` must be set. | bool | true | +| -csv_escapechar | Character used to escape the quote character (") if `csv_doublequote` is `false`. | string | "" | +| -fit_to_screen | Fit output width to screen width. | bool | false | +| -history | Use the specified directory to save history. | string | "~/.memgraph" | +| -host | Server address. It can be a DNS resolvable hostname. | string | "127.0.0.1" | +| -no_history | Do not save history. | bool | false | +| -output_format | Query output format. Can be `csv` or `tabular`. If the output format is not tabular `fit_to_screen` flag is ignored. | string | "tabular" | +| -password | Database password. | string | "" | +| -port | Server port. | int32 | 7687 | +| -term_colors | Use terminal colors syntax highlighting. | bool | false | +| -use_ssl | Use SSL when connecting to the server. | bool | false | +| -username | Database username. | string | "" | +| -verbose_execution_info | Output the additional information about query such as query cost, parsing, planning and execution times. | bool | false | + +### Flags + +| Flag | Description | Type | Default | +|--------------------------|-------------------------------------------------------------------------------------------------------------------------|---------|---------| +| -flagfile | Load flags from a file. | string | "" | +| -fromenv | Set flags from the environment [example: 'export FLAGS_flag1=value']. | string | "" | +| -tryfromenv | Set flags from the environment if present. | string | "" | +| -undefok | Comma-separated list of flag names. These flags can be specified on the command line even if the program does not define a flag with that name. IMPORTANT: Flags from the list that have arguments MUST use the flag=value format. | string | "" | +| -tab_completion_columns | The number of columns used in output for tab completion. | int32 | 80 | +| -tab_completion_word | If non-empty, `HandleCommandLineCompletions()` will hijack the process and attempt to do bash-style command line flag completion on this value. | string | "" | + +### Help + +| Flag | Description | Type | Default | +|--------------------------|-------------------------------------------------------------------------------------------------------------------------|---------|---------| +| -help | Show help on all flags [tip: all flags can have two dashes]. | bool | false | +| -helpfull | Show help on all flags -- same as -help. | bool | false | +| -helpmatch | Show help on modules, names of which contain the specified substring. | string | "" | +| -helpon | Show help on the modules named by this flag value. | string | "" | +| -helppackage | Show help on all modules in the main package. | bool | false | +| -helpshort | Show help on the main module for this program only. | bool | false | +| -helpxml | Produce an .xml version of help. | bool | false | +| -version | Show version and build info then exit. | bool | false | + +:::caution + +When working with Memgraph Platform, you should pass configuration flags inside +of environment variables. + +For example, you should start Memgraph Platform with `docker run -e +MGCONSOLE="-output_format="csv"" memgraph/memgraph-platform`. + +::: + +## Non-interactive mode + +To get the query result in bash, use the following command: +```bash +mgconsole < <(echo "MATCH (n:Person) RETURN n;") +``` +or +```bash +echo "MATCH (n:Person) RETURN n;" | mgconsole +``` + +To save the query results in a file, use the following command: +```bash +mgconsole < <(echo "MATCH (n:Person) RETURN n;") > results.txt +``` + +## Where to next? {#where-to-next} + +If you want to learn more about graph databases and Cypher queries, visit +[Memgraph Playground](https://playground.memgraph.com/) and go through the +guided lessons. All the datasets and most of the queries used in the guided +lessons can also be explored in the [Tutorials](/tutorials/overview.md) section, +and knowledge about Cypher is gathered in the [Cypher manual](/cypher-manual). + +If you are all good to go on your own - [import your +data](/import-data/overview.md)! diff --git a/docs2/getting-started/first-steps-with-memgraph.md b/docs2/getting-started/first-steps-with-memgraph.md new file mode 100644 index 00000000000..76cbce3dda7 --- /dev/null +++ b/docs2/getting-started/first-steps-with-memgraph.md @@ -0,0 +1,305 @@ +--- +id: first-steps-with-memgraph +title: First steps with Memgraph +sidebar_label: First steps with Memgraph +--- + +import EmbedYTVideo from '@site/src/components/EmbedYTVideo'; + +In this tutorial, you will learn how to install Memgraph Platform, connect to it +using Memgraph Lab, run your first query and style your graph. You will see that +using Memgraph is not hard at all! + +This tutorial is also available as a video on Memgraph's YouTube channel: + + +
+ +Let's get started! + +## Prerequisites + +Memgraph Platform can be installed only with **Docker**. Instructions on how to +install Docker can be found on the [official Docker +website](https://docs.docker.com/get-docker/). + + +## 1. Install Memgraph Platform + +First, you need to download and install Memgraph Platform. All you need to do is +to open a terminal on your computer and run the following command: + +```bash +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform +``` + +Once the installation is done, you will see a message similar to this one: + +```nocopy + +Status: Downloaded newer image for memgraph/memgraph-platform:latest +Memgraph Lab is running at localhost:3000 + +mgconsole 1.1 +Connected to 'memgraph://127.0.0.1:7687' +Type :help for shell usage +Quit the shell by typing Ctrl-D(eof) or :quit +memgraph> + +``` + +That means that you have installed Memgraph Platform and that you have Memgraph +up and running. Kudos! + +## 2. Connect to Memgraph Lab + +Since you installed and started the Memgraph Platform, then the Memgraph Lab is +already running, so open your web browser and go to +[`localhost:3000`](http://localhost:3000). When the Memgraph Lab loads, click +**Connect now**. + + + +That's it! You can see the Memgraph Lab Dashboard, so you are ready to head over +to the next step. + + + +## 3. Import dataset + +Since this is a fresh install, there are no nodes and relationships in your +database. We have prepared more than 20 datasets that you can use for testing +and learning. You will now import one of those datasets. In the sidebar click +**Datasets**. Next, go to **Capital cities and borders** and click **Load Dataset**. + + + +You will see the warning that a new dataset will overwrite current data in the +database. That is not a problem for you since you don't have any data in your +database, but in the future be careful when importing data. Go ahead and click +**Confirm**. Once the import is done, click the **X** to close the dialog. + + + +## 4. Run query + +Now that the data is imported it is time to run your first Cypher query. You +will write a query that that displays all of the cities and all of the +connections. + +Click the **Query Execution** in the sidebar, and then copy-and-paste the +following code into the **Cypher Editor**. + +```cypher + +MATCH (n)-[r]-(m) +RETURN n, r, m; + +``` + +Click **Run query** to run the above query and see the result in the **Graph +results** tab. + + + +Here is another query for you. Imagine that you are in Madrid and you want to +visit other capital cities that are one or two hops away from Madrid. How can +you figure out which cities are your possible destinations? You will use Cypher +query language to find that out. + +Click the **Query Execution** in the sidebar, and then copy-and-paste the +following code into the **Cypher Editor**. + +```cypher + +MATCH p = (madrid:City { name: "Madrid" })-[e *BFS ..2]-(:City) +RETURN p; + +``` + +This query will show all of the capital cities on the map that are up to two +hops away from Madrid. You don't have to worry about exact semantics of this +query for now, but if you want to find out more, check out the [learning +materials](/cypher-manual/) for Cypher. Click **Run query** to run the above +query and see the result in the **Graph results** tab. + + + +The result that you can see shows all of the capital cities that are two hops +away from Madrid. + + + +## 5. Style your graph + +When your results are shown on the map, you can move around. Go ahead and +zoom in and change the map style to **Detailed**. + + + +You will now use **Graph Style Editor** to change how nodes and relationships +are shown on the map. We have included a flag for each capital city as a node +property for the country it belongs to. You will now add one line of code to +change the style of the graph. + +Find the part of the code that looks like this: + +```nocopy +@NodeStyle HasLabel(node, "City") { + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) +} +``` +and add the line + +``` + image-url: Property(node, "flag") +``` + +so that the above block looks like this: + +```nocopy +@NodeStyle HasLabel(node, "City") { + image-url: Property(node, "flag") + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) +} +``` + +Click **Apply**, and your result should look like this: + + + +That looks great, but let's make the names of the cities and nodes a little bit +bigger. + +In the Graph Style Editor, locate the following code: + +```nocopy +@NodeStyle { + size: 6 + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) + border-width: 1.8 + border-color: #1d1d1d + font-size: 7 +} +``` +and replace it with: + +``` +@NodeStyle { + size: 10 + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) + border-width: 1.8 + border-color: #1d1d1d + font-size: 12 +} +``` + +You have increased the node size to 10, and the font size to 12. Now you will update the styling for the relationships. To make them thicker and change their color to red on hover, replace the following code in the Graph Style Editor: + +```nocopy +@EdgeStyle { + color: #999999 + color-hover: #1d1d1d + color-selected: #1d1d1d + width: 0.9 + width-hover: 2.7 + width-selected: 2.7 + font-size: 7 + label: Type(edge) +} +``` + +with + +``` +@EdgeStyle { + color: #999999 + color-hover: #ff0000 + color-selected: #1d1d1d + width: 2 + width-hover: 2.7 + width-selected: 2.7 + font-size: 7 + label: Type(edge) +} +``` + +
+ In case you need it, here is the complete Graph Style Code: + +``` +@NodeStyle { + size: 10 + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) + border-width: 1.8 + border-color: #1d1d1d + font-size: 12 +} + +@NodeStyle HasLabel(node, "City") { + image-url: Property(node, "flag") + color: #DD2222 + color-hover: Lighter(#DD2222) + color-selected: Lighter(#DD2222) +} + +@NodeStyle Greater(Size(Labels(node)), 0) { + label: Format(":{}", Join(Labels(node), " :")) +} + +@NodeStyle HasProperty(node, "name") { + label: AsText(Property(node, "name")) +} + +@EdgeStyle { + color: #999999 + color-hover: #ff0000 + color-selected: #1d1d1d + width: 2 + width-hover: 2.7 + width-selected: 2.7 + font-size: 7 + label: Type(edge) +} +``` + +
+ +Below you can see how the graph looks like in the end. We hope that you have +enjoyed this short tutorial. Now that you have seen Memgraph in action, we +encourage you to keep exploring Memgraph features. A wonderful world of graphs +awaits you! + + + +## Where to next? + +In this tutorial, you've learned how to install Memgraph Platform, use Memgraph Lab to +import a dataset, run queries and style your graph. Not bad for a start, right? + +You don't want to bother with installation? Done! [Memgraph +Cloud](/memgraph-cloud) at your service - register and run an instance in few +easy steps. + +We have promised along the way some more resources, so here they are: + +* [Cypher manual](/cypher-manual/) +* [Guide to Style Script script](/docs/memgraph-lab/graph-style-script-language) +* [How to work with Docker](/how-to-guides/work-with-docker.md) + +We hope that you had fun going through this tutorial! You can check out +[some of the tutorials](/memgraph/tutorials/) that we have prepared for you, or you can +go to [Memgraph Playground](https://playground.memgraph.com/) and go through +the guided lessons. diff --git a/docs2/getting-started/getting-started.md b/docs2/getting-started/getting-started.md new file mode 100644 index 00000000000..c1b1d32f18f --- /dev/null +++ b/docs2/getting-started/getting-started.md @@ -0,0 +1,130 @@ +--- +id: getting-started +title: Getting started with Memgraph +sidebar_label: Getting started +--- + +Memgraph is an open source graph database built for teams who expect highly +performant, advanced analytical insights - as compatible with your current +infrastructure as Neo4j (but up to 120x faster). Memgraph is powered by an +in-memory graph database built to handle real-time use cases at enterprise +scale. Memgraph supports strongly-consistent ACID transactions and uses the +standardized Cypher query language for structuring, manipulating, and exploring +data. + +If you're interested in trying out Memgraph from the comfort of your browser, +you can run an instance on [Memgraph Cloud](/memgraph-cloud) and +explore Memgraph during the 2-week trial period or you can play around with +datasets and queries on [Memgraph Playground](https://playground.memgraph.com/). + +Are you eager to start working with the real thing? Read on! + +## Quick start + +Follow these three steps, and you will have Memgraph as a full-running graph +application platform in no time at all. Here is what you need to do: + +### 1. Download and install Memgraph or run it in Cloud + +Start your journey through the world of graph analytics by [downloading and +installing](/installation/overview.mdx) Memgraph. You can install Memgraph using +Docker on Windows and macOS, or natively on Linux and WSL. + +For a quick start, register at [Memgraph Cloud](https://cloud.memgraph.com/) and +create a project in few easy steps! + +### 2. Connect to Memgraph + +Once your Memgraph instance is up and running, you are ready to [connect to +Memgraph](/connect-to-memgraph/overview.mdx). If you are a command line fan, you +can query using [mgconsole](/connect-to-memgraph/mgconsole.md). If you prefer to +query using a visual interface, go ahead and use [Memgraph Lab](/memgraph-lab). +You can also connect to Memgraph using +[drivers](/connect-to-memgraph/drivers/overview.md) for your favorite +programming language (as long as your favorite programming language is either +[Python](/connect-to-memgraph/drivers/python.md), +[Rust](/connect-to-memgraph/drivers/rust.md), +[C#](/connect-to-memgraph/drivers/c-sharp.md), +[Java](/connect-to-memgraph/drivers/java.md), +[Go](/connect-to-memgraph/drivers/go.md), +[JavaScript](/connect-to-memgraph/drivers/javascript.md) or +[PHP](/connect-to-memgraph/drivers/php.md)). The choice is yours! + +### 3. Import data + +Now it's time to [import your data](/import-data/overview.md) into Memgraph and +you can use different sources. Memgraph supports importing [CSV +files](/import-data/files/load-csv-clause.md), establishing [connections to data +streams](/import-data/data-streams/overview.md) with Kafka, Pulsar and Redpanda, as +well as migrating data from SQL databases like +[PostgreSQL](/import-data/migrate/postgresql.md) and +[MySQL](/import-data/migrate/mysql.md). + +## What to do next? + +Now that your data is safe and sound within Memgraph, it's time to discover all +the possibilities Memgraph offers. + +### Dive into learning + +#### Memgraph Playground + +You can start your learning on [Memgraph +Playground](https://playground.memgraph.com/) where guided lessons will help you +become familiar with graph databases and Cypher queries. Lessons vary in +difficulty and datasets, so feel free to choose the topic that will keep you +extra motivated. For example, you can start with [TED-talks +lessons](https://playground.memgraph.com/dataset/ted-talks) that use real-world +data related to TED talks providing you with tips and tricks that will help you +explore your own datasets. + +#### Tutorials and How-to guides + +All the datasets and most of the queries used in the guided lessons can be also +explored here, in the [Tutorials](/tutorials/overview.md) section. If you are +interested in using a particular Memgraph feature or you are stuck solving a +tricky problem, try to find the solution in the [How-to +guides](/how-to-guides/overview.md). Even more tutorials dealing with specific +issues are available on our [Blog](https://memgraph.com/category/tutorials). + +#### Email courses + +We have created two free email courses for you. The first one covers [Cypher +query language](https://memgraph.com/learn-cypher-query-language). By the end of +the ten days, you'll have learned everything you need to start with Cypher and +graph database. The second one is a [Graph modeling +course](https://memgraph.com/learn-graph-modeling). After ten days of this +course, you will know how to model graphs. + +#### Video courses + +For those of you who are more visual and auditory learners, you can find the +best materials related to graphs and graphs analytics in our [list of +recommended +content](https://www.youtube.com/channel/UCZ3HOJvHGxtQ_JHxOselBYg/playlists), +most of which is free. + +#### Run an example streaming application + +We've built an example streaming application to get you started quickly. Pull +the code from our [GitHub +repository](https://github.com/memgraph/example-streaming-app) and get started. + +### Look under the hood + +If you want to know more about Memgraph and learn details of implemented +features, take a deep dive into our [Reference +guide](/reference-guide/overview.md) and look [under Memgraph's +hood](/under-the-hood/overview.md). + +### Power up with MAGE + +[Memgraph Advanced Graph Extensions (MAGE)](/mage) is an open-source repository that +contains graph algorithms and modules in the form of query modules written by +the team behind Memgraph and its users. It aims to help you tackle the most +interesting and challenging graph analytics problems. + +### Browse through the Changelog + +Want to know what's new in Memgraph? Take a look at [Changelog](/changelog.md) +to see a list of new features. diff --git a/docs2/getting-started/install-memgraph/debian.md b/docs2/getting-started/install-memgraph/debian.md new file mode 100644 index 00000000000..7c7d260d31b --- /dev/null +++ b/docs2/getting-started/install-memgraph/debian.md @@ -0,0 +1,179 @@ +--- +id: debian +title: Install Memgraph on Debian +sidebar_label: Debian +--- + +This article briefly outlines the basic steps necessary to install and run +Memgraph on Debian. + +import BackwardCompatibilityWarning from '../../templates/_backward_compatibility_warning.mdx'; + + + +## Prerequisites + +Before you proceed with the installation guide make sure that you have: +* The latest **Memgraph Debian Package** which can be downloaded from the + [Memgraph download hub](https://memgraph.com/download/). + +:::note + +Memgraph packages are available for: +- **Debian 10** +- **Debian 11** + +::: + +You can also use [direct download](../direct-download-links.md) links to get the +latest Memgraph packages. + +## Installation guide {#installation-guide} + +After downloading Memgraph as a Debian package, install it by running the +following: + +```console +sudo dpkg -i /path-to/memgraph_.deb +``` + +:::note Why use sudo? +In order to perform some actions on your operating system +like installing new software, you may need **superuser** privileges (commonly +called **root**).Β  +::: + +:::caution Potential installation error +You could get errors while installing +the package with the above command if you don't have all of Memgraph's +dependencies installed. The issues mostly look like the following: + +```console +dpkg: error processing package memgraph (--install): + dependency problems - leaving unconfigured +Errors were encountered while processing: + memgraph +``` + +To install missing dependencies and finish the installation of the Memgraph +package, just issue the following command: + +```console +sudo apt-get install -f +``` + +The above command will install all missing dependencies and will finish +configuring the Memgraph package. +::: + +To verify that Memgraph is running, run the following: + +```console +sudo journalctl --unit memgraph +``` + +If successful, you should receive an output similar to the following: + +```console +You are running Memgraph vX.X.X +``` + +If the Memgraph database instance is not running, you can start it explicitly: + +```console +sudo systemctl start memgraph +``` + +If you want to start Memgraph with different configuration settings, check out +the [Configuration section](#configuration). At this point, Memgraph is ready for you +to [submit queries](/connect-to-memgraph/overview.mdx). + +## Stopping Memgraph + +To shut down the Memgraph server, issue the following command: + +```console +sudo systemctl stop memgraph +``` + +## Configuration + +The Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the +configuration file is altered, Memgraph needs to be restarted. To learn about +all the configuration options, check out the [Reference +guide](/reference-guide/configuration.md). + +## Troubleshooting + +### Unable to install the Memgraph package with `dpkg` + +While running the following `dpkg` command: + +```bash +dpkg -i /path-to/memgraph_.deb +``` + +you may encounter errors that resemble the following: + +```console +dpkg: error processing package memgraph (--install): dependency problems - +leaving unconfigured Errors were encountered while processing: memgraph +``` + +These errors indicate that you don’t have all of the necessary dependencies +installed. To install the missing dependencies and finish the installation, +issue the following command: + +```console +sudo apt-get install -f +``` + +### Multiple notes when starting Memgraph + +When you start a Memgraph instance, you may see the following list of notes in +your terminal: + +```console +You are running Memgraph v1.4.0-community + +NOTE: Please install networkx to be able to use graph_analyzer module. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use Memgraph NetworkX wrappers. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx, numpy, scipy to be able to use proxied NetworkX algorithms. E.g., CALL nxalg.pagerank(...). +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use wcc module. +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] +``` + +If you wish to work with built-in NetworkX modules in Memgraph, you need to +install the following Python libraries: +* [NumPy](https://numpy.org/) +* [SciPy](https://www.scipy.org/) +* [NetworkX](https://networkx.org/) + +For more information on how to install Python libraries in Linux, follow the +[Installing Packages +guide](https://packaging.python.org/tutorials/installing-packages/). If you are +not interested in working with query modules that depend on these libraries, you +can ignore the warnings. + +For more information on the installation process and for additional questions, +visit the **[Help Center](/help-center)** page. + +## Where to next? + +To learn how to query the database, take a look at the +**[querying](/connect-to-memgraph/overview.mdx)** guide or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive tutorials.
+Visit the **[Drivers overview](/connect-to-memgraph/drivers/overview.md)** +page if you need to connect to the database programmatically. \ No newline at end of file diff --git a/docs2/getting-started/install-memgraph/direct-download-links.md b/docs2/getting-started/install-memgraph/direct-download-links.md new file mode 100644 index 00000000000..cf4a36f02be --- /dev/null +++ b/docs2/getting-started/install-memgraph/direct-download-links.md @@ -0,0 +1,62 @@ +--- +id: direct-download-links +title: Memgraph direct download links +sidebar_label: Direct download links +--- + +You can download all of the MemgraphDB packages from the [Memgraph download +hub](https://memgraph.com/download/). If you need direct links for the latest +version of MemgraphDB, take a look at the list below. + +## Docker + +- **Memgraph DB docker** - + [https://download.memgraph.com/memgraph/v2.9.0/docker/memgraph-2.9.0-docker.tar.gz](https://download.memgraph.com/memgraph/v2.9.0/docker/memgraph-2.9.0-docker.tar.gz) + +## Linux + +### Amazon Linux 2 + +- **Amazon Linux 2** - + [https://download.memgraph.com/memgraph/v2.9.0/amzn-2/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/amzn-2/memgraph-2.9.0_1-1.x86_64.rpm) + + +### CentOS + +- **CentOS 7** - + [https://download.memgraph.com/memgraph/v2.9.0/centos-7/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/centos-7/memgraph-2.9.0_1-1.x86_64.rpm) +- **CentOS 9** - + [https://download.memgraph.com/memgraph/v2.9.0/centos-9/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/centos-9/memgraph-2.9.0_1-1.x86_64.rpm) + +### Debian + +- **Debian 10** - + [https://download.memgraph.com/memgraph/v2.9.0/debian-10/memgraph_2.9.0-1_amd64.deb](https://download.memgraph.com/memgraph/v2.9.0/debian-10/memgraph_2.9.0-1_amd64.deb) +- **Debian 11** - + [https://download.memgraph.com/memgraph/v2.9.0/debian-11/memgraph_2.6.0-1_amd64.deb](https://download.memgraph.com/memgraph/v2.9.0/debian-11/memgraph_2.9.0-1_amd64.deb) +- **Debian 11 (ARM64/AArch64)** - + [https://download.memgraph.com/memgraph/v2.9.0/debian-11-aarch64/memgraph_2.9.0-1_arm64.deb](https://download.memgraph.com/memgraph/v2.9.0/debian-11-aarch64/memgraph_2.9.0-1_arm64.deb) + + +### Fedora + +- **Fedora 36** - [https://download.memgraph.com/memgraph/v2.9.0/fedora-36/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/fedora-36/memgraph-2.9.0_1-1.x86_64.rpm) + +### RedHat + +- **RedHat 7** - + [https://download.memgraph.com/memgraph/v2.9.0/centos-7/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/centos-7/memgraph-2.9.0_1-1.x86_64.rpm) +- **RedHat 9** - + [https://download.memgraph.com/memgraph/v2.9.0/centos-9/memgraph-2.9.0_1-1.x86_64.rpm](https://download.memgraph.com/memgraph/v2.9.0/centos-9/memgraph-2.9.0_1-1.x86_64.rpm) + + +### Ubuntu + +- **Ubuntu 18.04** - + [https://download.memgraph.com/memgraph/v2.9.0/ubuntu-18.04/memgraph_2.9.0-1_amd64.deb](https://download.memgraph.com/memgraph/v2.9.0/ubuntu-18.04/memgraph_2.9.0-1_amd64.deb) +- **Ubuntu 20.04** - + [https://download.memgraph.com/memgraph/v2.9.0/ubuntu-20.04/memgraph_2.9.0-1_amd64.deb](https://download.memgraph.com/memgraph/v2.9.0/ubuntu-20.04/memgraph_2.9.0-1_amd64.deb) +- **Ubuntu 22.04** - + [https://download.memgraph.com/memgraph/v2.9.0/ubuntu-22.04/memgraph_2.9.0-1_amd64.deb](https://download.memgraph.com/memgraph/v2.9.0/ubuntu-22.04/memgraph_2.9.0-1_amd64.deb) +- **Ubuntu 22.04 (ARM64/AArch64)** - + [https://download.memgraph.com/memgraph/v2.9.0/ubuntu-22.04-aarch64/memgraph_2.9.0-1_arm64.deb](https://download.memgraph.com/memgraph/v2.9.0/ubuntu-22.04-aarch64/memgraph_2.9.0-1_arm64.deb) \ No newline at end of file diff --git a/docs2/getting-started/install-memgraph/docker-compose.md b/docs2/getting-started/install-memgraph/docker-compose.md new file mode 100644 index 00000000000..38bd135babe --- /dev/null +++ b/docs2/getting-started/install-memgraph/docker-compose.md @@ -0,0 +1,150 @@ +--- +id: docker-compose +title: Docker Compose +sidebar_label: Docker Compose +--- + +If you define an application with **Docker Compose**, you can use that +definition to run the application in CI, staging, or production environments. +Here you can find `docker-compose.yml` files necessary to run [**Memgraph +Platform**](#docker-compose-for-memgraph-platform-image), [**Memgraph +MAGE**](#docker-compose-for-memgraph-mage-image) and +[**Memgraph**](#docker-compose-for-memgraph-image) images. + +[![Related - How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/work-with-docker.md) + +## Docker Compose for Memgraph Platform image + +The **Memgraph Platform** image contains: + +- **MemgraphDB** - the database that holds your data +- **Memgraph Lab** - visual user interface for running queries and visualizing + graph data +- **mgconsole** - command-line interface for running queries +- **MAGE** - graph algorithms and modules library + +```yaml +version: "3" +services: + memgraph-platform: + image: "memgraph/memgraph-platform" + ports: + - "7687:7687" + - "3000:3000" + - "7444:7444" + volumes: + - mg_lib:/var/lib/memgraph + - mg_log:/var/log/memgraph + - mg_etc:/etc/memgraph + environment: + - MEMGRAPH="--log-level=TRACE" + entrypoint: ["/usr/bin/supervisord"] +volumes: + mg_lib: + mg_log: + mg_etc: +``` + +The port `7687` is used for communication with Memgraph via Bolt protocol. The +port `3000` is exposed because Memgraph Lab will be running on `localhost:3000`, +while the port `7444` is there so that you can see logs from Memgraph inside +Memgraph Lab. We specified three useful volumes: + +- `mg_lib` - directory containing data that enables data persistency +- `mg_log` - directory containing log files +- `mg_etc` - directory containing the configuration file + +The exact location of the local directories depends on your specific setup. + +[Configuration settings](/reference-guide/configuration.md) can be changed by +setting the value of the `MEMGRAPH` environment variable. In the above example, +you can see how to set `--log-level` to `TRACE`. Since Memgraph Platform is not +a single service, the process manager +[`supervisord`](https://docs.docker.com/config/containers/multi-service_container/) +is used as the main running process in the `entrypoint`. The MAGE library is +included in this image, so you can use the available graph algorithms. + +## Docker Compose for Memgraph MAGE image + +The **Memgraph MAGE** image contains: + +- **MemgraphDB** - the database that holds your data +- **MAGE** - graph algorithms and modules library + +```yaml +version: "3" +services: + memgraph-mage: + image: "memgraph/memgraph-mage" + volumes: + - mg_lib:/var/lib/memgraph + - mg_log:/var/log/memgraph + - mg_etc:/etc/memgraph + ports: + - "7687:7687" + - "7444:7444" + entrypoint: ["/usr/lib/memgraph/memgraph", "--log-level=TRACE"] +volumes: + mg_lib: + mg_log: + mg_etc: +``` + +The port `7687` is used for communication with Memgraph via Bolt protocol, while +the port `7444` is there so that you can see logs from Memgraph inside the +Memgraph Lab application. We specified three useful volumes: + +- `mg_lib` - directory containing data that enables data persistency +- `mg_log` - directory containing log files +- `mg_etc` - directory containing the configuration file + +The exact location of the local directories depends on your specific setup. + +[Configuration settings](/reference-guide/configuration.md) can be changed by +adding the `entrypoint`. You first need to add `/usr/lib/memgraph/memgraph` and +then the configuration setting you'd like to change. In the above example, you +can see how to set `--log-level` to `TRACE`. Since the MAGE library is included +in this image, you can use the available graph algorithms. + +## Docker Compose for Memgraph image + +The **Memgraph** image contains **MemgraphDB** - the database that holds your +data. + +```yaml +version: "3" +services: + memgraph: + image: "memgraph/memgraph" + ports: + - "7687:7687" + - "7444:7444" + volumes: + - mg_lib:/var/lib/memgraph + - mg_log:/var/log/memgraph + - mg_etc:/etc/memgraph + entrypoint: ["/usr/lib/memgraph/memgraph", "--log-level=TRACE"] +volumes: + mg_lib: + mg_log: + mg_etc: +``` + +The port `7687` is used for communication with Memgraph via Bolt protocol, while +the port `7444` is there so that you can see logs from Memgraph inside the +Memgraph Lab application. We specified three useful volumes: + +- `mg_lib` - directory containing data that enables data persistency +- `mg_log` - directory containing log files +- `mg_etc` - directory containing the configuration file + +The exact location of the local directories depends on your specific setup. + +[Configuration settings](/reference-guide/configuration.md) can be changed by +adding the `entrypoint`. You first need to add `/usr/lib/memgraph/memgraph` and +then the configuration setting you'd like to change. In the above example, you +can see how to set `--log-level` to `TRACE`. Since this image doesn't have the +MAGE library included, you won't be able to use graph algorithms. + +> Want to see applications built with Memgraph and Docker Compose? Check out +> [Memgraph's Github](https://github.com/memgraph) repositories. diff --git a/docs2/getting-started/install-memgraph/docker.md b/docs2/getting-started/install-memgraph/docker.md new file mode 100644 index 00000000000..9427fc3ceed --- /dev/null +++ b/docs2/getting-started/install-memgraph/docker.md @@ -0,0 +1,451 @@ +# Install Memgraph with Docker + +[Docker](https://www.docker.com) is a service that uses OS-level virtualization +to deliver software in packages that are called +[containers](https://www.docker.com/resources/what-container). + +Memgraph uses Docker because it is: + +- Flexible +- Lightweight +- Portable - you can build locally or deploy to the cloud +- Runs on all platforms - Windows, Linux and macOS +- Deploys in Kubernetes + +We recommend you install **Memgraph Platform** Docker image which contains: +- **MemgraphDB** - the database that holds your data +- **Memgraph Lab** - visual user interface for running queries and visualizing + graph data +- **mgconsole** - command-line interface for running queries +- **MAGE** - graph algorithms and modules library + +1. Install [**Docker Desktop**](https://docs.docker.com/get-docker/). +2. Open a terminal and use the following command: + + ```console + docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform + ``` + + If successful, you should see a message similar to the following: + + ```nocopy + mgconsole X.X + Connected to 'memgraph://127.0.0.1:7687' + Type :help for shell usage + Quit the shell by typing Ctrl-D(eof) or :quit + memgraph> + ``` + Command-line tool **mgconsole** is open in the terminal, and the visual user + interface **Memgraph Lab** is available at [`http://localhost:3000`](http://localhost:3000). + +3. User either CLI **mgconsole**, visual interface **Memgraph Lab** or various **clients** to connect + and query the database. + + +The configuration file is located inside the Docker container at +`/var/lib/docker/volumes/mg_etc/_data/memgraph.conf`, adn the logs are located +at `/var/lib/docker/volumes/`. + +When using Docker, you can also specify the configuration options in the `docker +run` command: + +```console +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH="--log-level=TRACE" memgraph/memgraph-platform +``` + +:::caution + +When working with Memgraph Platform, you should pass configuration flags inside +of environment variables. + +For example, you can start the MemgraphDB image with `docker run memgraph/memgraph +--bolt-port=7687 --log-level=TRACE`, but you should start Memgraph Platform with +`docker run -p 7687:7687 -p 7444:7444 -p 3000:3000 -e MEMGRAPH="--bolt-port=7687 --log-level=TRACE" +memgraph/memgraph-platform`. + +::: + +To learn about all the configuration options, check out the [Reference +guide](/reference-guide/configuration.md). + +## Other available Docker images + +- memgraph/memgraph includes MemgraphDB and mgconsole +- memgraph/memgraph-mage includes MemgraphDB, mgconsole and MAGE (check the docs) + - memgraph/memgraph-mage + cuGraph - includes MemgraphDB, mgconsole, MAGE and NVIDIA cuGraph GPU-powered graph algorithms +- memgraph/memgraph-platform includes MemgraphDB, mgconsole, MAGE and Memgraph Lab + +Memgraph Platform also includes a version of image without MAGE - look for the tag using only memgraph and lab keywords + +## Install Memgraph using other Docker images + +**1.** Download the latest **Docker image** from the [Download +Hub](https://memgraph.com/download/). + +**2.** Import the image using the following command, for example: + +```console +docker load -i /path-to/memgraph--docker.tar.gz +``` + +**3.** Start Memgraph using the following command: + +```console +docker run -p 7687:7687 -p 7444:7444 -v mg_lib:/var/lib/memgraph memgraph/memgraph +``` + +## Troubleshooting + +### Issues with loading Memgraph +```console +docker load -i memgraph.tar.gz +``` + +#### Error during connect:
`This error may indicate that the docker daemon is not running.` +Run the Docker Desktop application and wait for it to load fully. + +#### Error response from daemon:
`open \\.\pipe\docker_engine_linux: The system cannot find the file specified.` +Reload the Docker Desktop application and wait for it to load fully. + +#### Unsupported os linux + +You need to download the [Windows Subsystem for +Linux](https://docs.microsoft.com/en-gb/windows/wsl/install-win10#step-4---download-the-linux-kernel-update-package), +and enable experimental features in Docker Desktop, under *Settings* -> *Docker +Engine*, change *experimental* to *true*. + +### Issues when connecting to Memgraph + +```console +docker run -it memgraph/memgraph-platform +``` + +While this command will start a Memgraph instance, not publishing the port will +cause problems when trying to connect to the database via **Memgraph Lab** or +**mgconsole**. To avoid this, you should publish the +container's port to the host using the `-p` flag and by specifying the port: + +```console +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform +``` + +### Issues with connecting **mgconsole** to the database + +```console +docker run -it --entrypoint=mgconsole memgraph/memgraph-platform --host HOST +``` + +Although unlikely, sometimes there are issues with connecting **mgconsole** to +the Docker Container’s IP address because it is running on a custom IP rather +than `localhost`. This problem is most often accompanied with the following +error: + +```console +Connection failure: Couldn't connect to 127.0.0.1:7687! +``` + +To fix this issue, just replace `HOST` from the first command with +`host.docker.internal`. To find out more about networking in Docker, take a look +at [Networking features in Docker Desktop for +Windows](https://docs.docker.com/docker-for-windows/networking/) guide or +[Mac](https://docs.docker.com/docker-for-mac/networking/) guide . + +### Issues with the IP address + +Although unlikely, some users might experience minor difficulties after the +Docker installation. Instead of running on `localhost`, a Docker container for +Memgraph may be running on a custom IP address. Fortunately, that IP address can +be found as follows: + +**1.** Determine the ID of the Memgraph Container by issuing the +command `docker ps`. The user should get an output similar to the following: + +```console +CONTAINER ID IMAGE COMMAND CREATED +9397623cd87e memgraph "/usr/lib/memgraph/m…" 2 seconds ago +``` + +At this point, it is important to remember the container ID of the Memgraph +Image. In our case, that is `9397623cd87e`. + +**2.** Use the this ID to retrieve the IP address of the Container: + +```console +docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' 9397623cd87e +``` + +The command above will yield the IP address that should be used when connecting +to Memgraph via **Memgraph Lab** or **mgconsole** as described in +the [querying](/connect-to-memgraph/overview.mdx) section. Just replace +`HOST` from the following command with the appropriate IP address: + +```console +docker run -it --entrypoint=mgconsole memgraph/memgraph-platform --host HOST +``` + +## Work with Docker + +If you are new to Docker, this guide will help you get a grasp of Docker and +make it easier to accomplish tasks within Memgraph. After installing Docker, all +commands are run from the command-line tool of your preference. + +### Download the Memgraph Docker image + +Images are downloaded using the `docker pull` command followed by the name of +the Docker image. We encourage you to use Memgraph Platform as it includes +everything you might need while making the best of Memgraph. + +To download the latest Memgraph Platform image, run: + +``` +docker pull memgraph/memgraph-platform +``` + +### Architecture of Docker container running Memgraph + +The picture below shows the architecture of the Memgraph Docker ecosystem. + + + +### Run a Memgraph Docker image + +All images are started using the `docker run` command followed by various flags, +environment variables and configuration options. + +The most common flags used while running Memgraph images are: + +- enable interactive mode: `-it` +- publish ports: `-p 3000:3000` +- specify volumes for data persistence `-v mg_lib:/var/lib/memgraph` +- set up configuration using environment variables in the case of the + `memgraph-platform` image, or configuration flags using the `memgraph` or + `memgraph-mage` image + +A `docker run` command can look like this: + +``` +docker run -it -p 7687:7687 [-p host_port:container_port] [-v volume_name:volume_path] [configuration] memgraph/image_name +``` + +#### Publish ports + +Ports are published in order to allow services outside the container to be able +to connect to the container. When publishing ports, you need to define two ports +separated by a colon. The left side port stands for the **host port** and the +right side port is the **container port**. + +The most common ports published while running Memgraph are: + +- `-p 7687:7687` - connection to the database instance (the Bolt protocol uses + this port by default) +- `-p 3000:3000` - connection to the Memgraph Lab application when running + Memgraph Platform +- `-p 7444:7444` - connection to fetch log files from Memgraph Lab (only in + version 2.0 and newer) + +For example, if you are running two instances using the `memgraph-platform` +image and you want to connect to both instances using the Memgraph Lab +in-browser application. You would run the first instance with: + +``` +docker run -it -p 7444:7444 -p 3000:3000 memgraph/memgraph-platform +``` + +Because port `3000` is now taken, you need to change the left side +port (host ports): + +``` +docker run -it -p 7444:7444 -p 3001:3000 memgraph/memgraph-platform +``` + +To connect to the first instance, you should open Memgraph Lab in your browser +at `localhost:3000`, but the second instance is reachable at `localhost:3001`. + +#### Specify volumes + +Specifying a volume syncs the specified directory inside the Docker container as +a local directory and serves for durability. The `-v` flag is followed by the +name of the local directory separated from the path of the volume in the +container by a semicolon: + +``` +-v volume_name:volume_path +``` + +Named volumes handle data permissions so there shouldn't be any issue with data +permissions. + +Useful volumes you can specify while running Memgraph are: + +- `-v mg_lib:/var/lib/memgraph` - directory containing data, enables data + persistency +- `-v mg_log:/var/log/memgraph` - directory containing log files +- `-v mg_etc:/etc/memgraph` - directory containing the configuration file + +The exact location of the local directories depends on your specific setup. + +The configuration file can usually be found at +`/var/lib/docker/volumes/mg_etc/_data/memgraph.conf` but you can also copy the +file from the Docker container, modify it and copy it back into the container. + +The logs will be saved to the `mg_log` volume, and the directories can usually be +found in `/var/lib/docker/volumes/`, but you can also view them in the Memgraph +Lab 2.0 (or newer) by publishing the port `7444`. + +#### Specify bind mounts + +Bind mounts are local directories or files that can be modified by other +processes other than Docker. Any changes made to these directories or files +locally will be reflected inside the Docker container and vice-versa. Also, a +bind mount will overwrite the content of the Docker container. + +For example, if I have a `Data` directory on my `C:` disk, and I want to access +it from inside the container at `/usr/lib/memgraph/data`, I would run Docker +with the following `-v` flag. + +``` +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v "C:/data":/usr/lib/memgraph/data memgraph/memgraph-platform +``` + +You can use bind mounts to transfer durability files such as snapshot or wal +files inside the container to restore data, or CSV files you will use to import +data with `CSV LOAD` clause. + +Bind mounts do not handle data permissions which could cause issues with permissions. + +#### Set up the configuration + +If you want a certain configuration setting to be applied during this run only, +you need to pass the configuration option within the `docker run` command +instead of changing the configuration file. + +If you are working with the `memgraph-platform` image, you should pass +configuration options with environment variables. + +For example, if you want to limit memory usage for the whole instance to 50 MiB +and set the log level to `TRACE`, pass the configuration like this: + +``` +docker run -it -p 7687:7687 -p 3000:3000 -p 7444:7444 -e MEMGRAPH="--memory-limit=50 --log-level=TRACE" memgraph/memgraph-platform +``` + +When you are working with `memgraph` or `memgraph-mage` images, you should pass +configuration options as arguments. + +For example, if you want to limit memory usage for the whole instance to 50 MiB +and set the log level to `TRACE`, pass the configuration argument like this: + +``` +docker run -it -p 7687:7687 memgraph/memgraph --memory-limit=50 --log-level=TRACE +``` + +### Stop container + +Database instances are stopped by stopping the Docker container with the command +`docker stop`. To stop a container you need [to know the container's +ID](#how-to-retrieve-a-docker-container-id). + +You can list all the containers you want to stop in one `docker stop` command: + +``` +docker stop CONTAINER1_ID CONTAINER2_ID +``` + +### Start container + +If you want to start a stopped container, list them using the following command: + +``` +docker ps -a +``` + +Then start the container with: +``` +docker start +``` + +### Retrieve a Docker container ID + +Run the following command: + +``` +docker ps +``` + +You should get an output similar to this: + +```console +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +45fa0f86f826 memgraph/memgraph-platform "/bin/sh -c '/usr/bi…" 21 hours ago Up 21 hours 0.0.0.0:3000->3000/tcp, 0.0.0.0:7444->7444/tcp, 0.0.0.0:7687->7687/tcp admiring_almeida +``` + +You can shorten this ID to 4 letters if the ID remains unique, for example, +`45fa`. + +### Retrieve a Docker container IP address + +To retrieve the Docker container IP address, first, you need [to retrieve its +ID](#how-to-retrieve-a-docker-container-id). + +Then run the following command if the container ID is `9397623cd87e`. + +```console +docker inspect -f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' 9397623cd87e +``` + +### Browse files inside a Docker container + +To browse files inside a Docker container, first, you need [to retrieve its +ID](#how-to-retrieve-a-docker-container-id). + +Then run the following command if the container ID is `9397623cd87e`: + +``` +docker exec -it 9397623cd87e bash +``` + +To navigate through the container, use the following commands: + +- `ls` - list all the directories and files +- `cd ` - enter a directory +- `cd ..` - leave the directory +- `cat ` - list the content of a file + +You don't have to write file and directory names in full, once you write enough +letters to have a unique string, press the `` key. + +## Copy files from and to a Docker container + +To copy files from and to the Docker container, first, you need [to retrieve its +ID](#how-to-retrieve-a-docker-container-id). + +1. Place yourself in the local directory where you want to copy the file. + +2. Copy the file from the container to your current directory with the command: + + ``` + docker cp : + ``` + + Be sure to replace the `` parameter. + + The example below will copy the configuration file to the user's Desktop: + + ``` + C:\Users\Vlasta\Desktop>docker cp bb3de2634afe:/etc/memgraph/memgraph.conf memgraph.conf + ``` + +3. Copy the file from your current directory to the container with the command: + + ``` + docker cp : + ``` + + Be sure to replace the `` parameter. + + The example below will replace the configuration file with the one from the + user's Desktop: + + ``` + C:\Users\Vlasta\Desktop>docker cp memgraph.conf bb3de2634afe:/etc/memgraph/memgraph.conf + ``` diff --git a/docs2/getting-started/install-memgraph/install-memgraph.md b/docs2/getting-started/install-memgraph/install-memgraph.md new file mode 100644 index 00000000000..31bef757125 --- /dev/null +++ b/docs2/getting-started/install-memgraph/install-memgraph.md @@ -0,0 +1,35 @@ +# Install Memgraph + +Install Memgraph Platform and get the complete graph application platform that includes: + +- MemgraphDB - the database that holds your data +- Memgraph Lab - visual user interface for running queries and visualizing graph data +- mgconsole - command-line interface for running queries +- MAGE - graph algorithms and modules library + +Open a terminal and use the following command: + +```console +docker run -it -p 7687:7687 -p 7444:7444 -p 3000:3000 -v mg_lib:/var/lib/memgraph memgraph/memgraph-platform +``` + +For more details on the Docker installation or other installation options check +the installation guide. + +You don't want to bother with installation? Done! Memgraph Cloud at your service +- register and run an instance in few easy steps. + +## System requirements + +Below are minimum and recommended system requirements for installing Memgraph. + +| | Minimum | Recommended | +| ------- | -------- | ------------------------------ | +| CPU | Server or desktop processor: Intel Xeon AMD Opteron/Epyc ARM machines or Apple M1 Amazon Graviton | Server processor: Intel Xeon AMD Opteron/Epyc ARM machines or Apple M1 Amazon Graviton | +| RAM | 1 GB | β‰₯ 16 GB ECC | +| Disk | 1 GB | equally as RAM | +| Cores | 1 vCPU | β‰₯ 8 vCPUs (β‰₯ 4 physical cores) | +| Network | 100 Mbps | β‰₯ 1 Gbps | + +The disk is used for storing database snapshots and write-ahead logs. + diff --git a/docs2/getting-started/install-memgraph/kubernetes.md b/docs2/getting-started/install-memgraph/kubernetes.md new file mode 100644 index 00000000000..b9eff9af039 --- /dev/null +++ b/docs2/getting-started/install-memgraph/kubernetes.md @@ -0,0 +1,87 @@ +--- +id: kubernetes +title: Kubernetes +sidebar_label: Kubernetes +--- + +To include **standalone Memgraph** as a part of your Kubernetes cluster, you can use the Helm chart provided in the [**Memgraph Helm charts repository**](https://github.com/memgraph/helm-charts). Due to numerous possible use cases and deployment setups via Kubernetes, the provided Helm chart is just a starting point you can modify according to your needs. + + +Memgraph Helm charts repository currently contains a chart for [**standalone Memgraph deployment**](#helm-chart-for-standalone-memgraph) as a Kubernetes `StatefulSet` workload, which is designed for services that require permanent storage, such as databases. + +:::note +The currently available Helm chart uses the latest **Memgraph** Docker image from the [Docker Hub](https://hub.docker.com/r/memgraph/memgraph). For other Memgraph Docker images (Memgraph MAGE or Memgraph Platform), modify the chart accordingly. We are eager to see new pull requests on our [helm charts repository](https://github.com/memgraph/helm-charts). +::: + +## Helm chart for standalone Memgraph + + + +Since [Helm chart for standalone Memgraph](https://github.com/memgraph/helm-charts/tree/main/charts/memgraph) is configured to deploy Memgraph as a Kubernetes `StatefulSet` workload, it is also necessary to define a `PersistentVolumeClaims` to store [the data directory](/reference-guide/backup.md) (`/var/lib/memgraph`). This enables the data to be persisted even if the pod is restarted or deleted. + +If you don't require data persistency or your dataset is static, there is no need to use the `StatefulSet` workload. Stateful applications are more complex to set up and maintain as they require more attention when handling storage information and security. + +To include standalone Memgraph as a part of your Kubernetes cluster, you need to [**add the repository**](#add-the-repository) and [**install Memgraph**](#install-memgraph). + +### Add the repository + +Add the Memgraph Helm chart repository to your local Helm setup by running the following command: + +``` +helm repo add memgraph https://memgraph.github.io/helm-charts +``` + +Make sure to update the repository to fetch the latest Helm charts available: + +``` +helm repo update +``` + +### Install Memgraph + +To install Memgraph Helm Chart, run the following command: +``` +helm install memgraph/memgraph +``` +Replace `` with the name of the release you chose. + +### Access Memgraph +Once Memgraph is installed, you can access it using the provided services and endpoints. Refer to the [Memgraph documentation](/docs/connect-to-memgraph/overview.mdx) for details on how to connect to and interact with Memgraph. + +### Configuration options +The following table lists the configurable parameters of the Memgraph chart and their default values. + +parameter | description | default +--- | --- | --- +`image` | Memgraph Docker image repository | `memgraph` +`persistentVolumeClaim.storagePVC` | Enable persistent volume claim for storage | `true` +`persistanceVolumeClaim.storagePVCSize` | Size of the persistent volume claim for storage | `1Gi` +`persistentVolumeClaim.logPVC` | Enable persistent volume claim for logs | `true` +`persistanceVolumeClaim.logPVCSize` | Size of the persistent volume claim for logs | `256Mi` +`service.type` | Kubernetes service type | `NodePort` +`service.port` | Kubernetes service port | `7687` +`service.targetPort` | Kubernetes service target port | `7687` +`memgraphConfig` | Memgraph configuration settings | `["--also-log-to-stderr=true"]` + +To change the default chart values, provide your own `values.yaml` file during the installation: +``` +helm install memgraph/memgraph -f values.yaml +``` +Default chart values can also be changed by setting the values of appropriate parameters: +``` +helm install memgraph/memgraph --set =,=,... +``` + +:::info +Memgraph will start with the `--also-log-to-stderr=true` flag, meaning the logs will also be written to the standard error output and you can access logs using the `kubectl logs` command. To modify other Memgraph database settings, you should update the `memgraphConfig` parameter. It should be a list of strings defining the values of Memgraph configuration settings. For example, this is how you can define `memgraphConfig` parameter in your `values.yaml`: +``` +memgraphConfig: + - "--also-log-to-stderr=true" + - "--log-level=TRACE" +``` +For all available database settings, refer to the [Configuration settings reference guide](https://memgraph.com/docs/memgraph/reference-guide/configuration). +::: + +:::note +Since Memgraph Docker image has root privileges on the data located on volumes and log directories, it is necessary that `runAsUser` is set to `0` in the `securityContext` section of the pod to override the `memgraph` user from the Docker image. Currently, Memgraph must have root privileges on the volumes. +::: diff --git a/docs2/getting-started/install-memgraph/memgraph-cloud.md b/docs2/getting-started/install-memgraph/memgraph-cloud.md new file mode 100644 index 00000000000..31cd8d7cbd6 --- /dev/null +++ b/docs2/getting-started/install-memgraph/memgraph-cloud.md @@ -0,0 +1,918 @@ +--- +id: memgraph-cloud +title: Memgraph Cloud +sidebar_label: Memgraph Cloud +--- + +import EmbedYTVideo from '@site/src/components/EmbedYTVideo'; + +[Memgraph Cloud](https://memgraph.com/cloud) is a cloud service fully managed +on AWS and available in 6 geographic regions around the world. Memgraph Cloud +allows you to create projects with Enterprise instances of MemgraphDB from your +browser. The instances can use up to 32 GB of RAM and you can connect to them +using [Memgraph Lab](cloud-connect#connect-with-memgraph-lab), +[mgconsole](cloud-connect#connect-with-mgconsole) or various +[drivers](cloud-connect#connect-with-drivers). All connections use SSL +encryption with a self-signed certificate. + + +![Cloud-Img](../data/memgraph-cloud/cloud-img.svg) + + +Use Memgraph Cloud to stream data into Memgraph in real-time and run complex +graph algorithms and modules developed within the [MAGE](/docs/mage) repository, +such as +[PageRank](/docs/mage/algorithms/traditional-graph-analytics/pagerank-algorithm), +[Community +detection](/docs/mage/algorithms/traditional-graph-analytics/community-detection-algorithm) +or [Betweenness +centrality](/docs/mage/algorithms/traditional-graph-analytics/betweenness-centrality-algorithm). +You can also extend the Cypher query language by developing your own procedures +within query modules in Memgraph Lab. + +Instances can be easily paused to save resources, backed up and cloned by +creating snapshots, and they all use the Enterprise edition of Memgraph which +includes the [role-base access control](cloud-projects/#role-base-access-control). + +As a new user, try out Memgraph Cloud in a 14-days free trial and provide us +with feedback on [Discord](https://discord.com/invite/memgraph): + +1. Go to [Memgraph Cloud](https://cloud.memgraph.com). +2. Log in with a Google account or create a Memgraph Cloud account. +3. Give your project a name and provide a password. +4. Your project is up and running - connect to the instance, import data and + start querying! + +If you are looking for a quick start, take a look at our short tutorial. + + +
+ +For a more detailed explanation of Memgraph Cloud, take a look at the demo video made for the launch that will take you through it's features: + +[memgraph_cloud](https://youtu.be/Tt5KPKylU8k?t=683 "Get started with Memgraph Cloud") + +## Cloud account + +Find out how to sign up for Memgraph Cloud, manage passwords and add payment method. + +Fell free to watch a demo video made for the Cloud launch that will explain +the Account section of Memgraph Cloud: + +[account-payment](https://youtu.be/Tt5KPKylU8k?t=941 "Account section") + +### Create Memgraph Cloud account + +To create Memgraph Cloud account: + +1. Go to [Memgraph Cloud sign-up](https://cloud.memgraph.com/signup) page. +2. Provide your personal information, set up a password and accept the terms of + service. +3. Verify your email address by clicking on the link in the email you got from + Memgraph. +4. Before you start using Cloud, help us by choosing a programming language you + prefer. In return, we can direct our support better, and adding languages + that we haven’t listed helps us leave no man behind once a user base is + established. + +:::tip + +You can also register to Memgraph Cloud with your Google account. + +::: + + +As a new user, you will start using a 14-day free trial version of Memgraph +Cloud, in which you can create one project that uses up to 2GB RAM. + +If you require more compute, enter a valid payment method and upgrade your +project. + +Below is a demo video made for the launch that will take you through setting up +the Cloud account: + +[cloud-singup](https://youtu.be/Tt5KPKylU8k?t=683 "How to create Cloud account") + +### Change Memgraph Cloud password + +To change your Memgraph Cloud account password, login into your account and: + +1. Click **Account** in the left sidebar. +2. In the **Payments section** tab, locate **Personal information** section and + click the **Change password** link. +3. In the pop-up, fill in the **Old Password** and **New Password**. +4. Click **Confirm** to save changes. + + + +### Retrieve Memgraph Cloud password + +If you forgot your Memgraph Cloud account password, you can reset it: + +1. Visit [Forgot your + password](https://cloud.memgraph.com/reset-password-request) page. +2. Enter your email address and click **Send recovery email**. +3. Click on the link in the *Reset the password for Memgraph Cloud* email. It + will redirect you to the *Reset your password* page. +4. Enter a new password and **Confirm changes**. + +### Retrieve Memgraph Cloud project password + +Each project within your Memgraph Cloud has its password. The project +password is not the same password you use to log into Memgraph Cloud. Memgraph +**doesn't have access** to those credentials and can't retrieve lost credentials +for Memgraph Cloud projects. + +Below is a demo video made for the launch that will explain the importance of +Memgraph Cloud project password: + +[project-password](https://youtu.be/Tt5KPKylU8k?t=862 "Why is it important to remember your project password") + +### Manage payment methods + +In the **Account** section of Memgraph Cloud you can **Add Credit Card**, +**Redeem Code** or switch to the **Invoice** tab to check paid and due invoices. + +For more details and current rates, visit the [payment](payment) section of the +docs. + +## Cloud projects + +After you've created a Memgraph Cloud project, you can pause and resume it, +delete, backup, restore, clone and resize it. + +### Create a new Memgraph Cloud project + +If you are using a 14-day free trial version of Memgraph Cloud, you can create +one project that uses up to 2GB of RAM. + +If you are using a paid version of Memgraph Cloud, you can create a maximum of 3 +projects with the following [rates](payment). If you need more projects, feel +free to [contact us](/help-center). + +To create a new project: + +1. Click **Projects** in the left sidebar. +2. Click **Add new** button. +3. In the pop-up, enter the project name, choose the cloud region, size and + Memgraph version and click **Next**. +4. Add a password to your project to connect to your Memgraph project and click + **Next**. Keep in mind that Memgraph can't retrieve this password if you lose + it. +5. Click **Go to project** to complete the project creation. + +Below is a demo video made for the launch that will take you through setting up +a new Cloud project: + +[cloud-new-project](https://youtu.be/Tt5KPKylU8k?t=774 "How to create Cloud project") + +### Pause, resume or delete a project + +When you don't need compute you can pause the project and you won't be charged +for compute as long as the project is paused. However, you will continue to be +charged for storage. + +To pause a project: +1. Click **Projects** in the left sidebar. +2. Click on the project you want to pause. +3. In the **Actions** section click **Pause Project**. + +To resume a project: +1. Click **Projects** in the left sidebar. +2. Click on the project you want to resume. +3. In the **Actions** section click **Resume Project**. + +When you no longer need a specific project, you can delete it. Keep in mind that +you can't undo this action. + +To delete a project: +1. Click **Projects** in the left sidebar. +2. Click on the project you want to delete. +3. In the **Actions** section, click **Delete Project**. +4. In the confirmation pop-up, click the **Confirm** button. + + + +Fell free to watch a demo video made for the Cloud launch that will explain +the Projects section of Memgraph Cloud where you manage projects: + +[project-management](https://youtu.be/Tt5KPKylU8k?t=1029 "Projects section") + +### Back up a project + +A project is backed up by creating a snapshot with Amazon EBS. You cannot create +a snapshot if you are using a 14-day free trial version of Memgraph Cloud. + +If you are using a paid version of Memgraph Cloud, you can create a maximum of 5 +snapshots with the following [rates](payment). If you need more snapshots, feel +free to [contact us](/help-center). + +The size of the snapshot is 8 GB smaller than the disk size the project is +using. If you are using 1 GB of RAM and 11 GB of disk, the snapshot size is 3GB. + +To create a snapshot: +1. Click **Projects** in the left sidebar. +2. Click on the project you want to back up. +3. In the **Actions** section, click **Create Snapshot**. +4. In the pop-up, give the snapshot a name and **Create** it. + +You can manage your snapshots in the **Snapshots** view, where you can **Edit +Name** or **Delete Snapshot**. + +## Restore or clone a project + +You can restore or clone projects from the snapshots you've created from +existing projects. + +To restore or clone a project: +1. Click **Snapshots** in the left sidebar. +2. Click on the snapshot you want to use. +3. In the **Actions** section, click **Reboot as Project**. +4. In the pop-up, give the new project a name, set password and select project + size, then **RESTORE**. + +### Resize a project + +When your project becomes to big for the current compute, upgrade it: + +1. [Back up the project](#back-up-a-project) by creating a snapshot. +2. [Clone the project](#restore-or-clone-a-project) to a bigger project. +3. [Delete the smaller project.](#pause-resume-or-delete-a-project) + +If you feel your project is too small for the current compute, downgrade it: +1. Export the database (using Memgraph Lab, mgconsole, GQLAlchemy, driver or any +other tool). +2. [Create a new Memgraph Cloud project](#create-a-new-memgraph-cloud-project). +3. Use an appropriate tool to connect to the project and import the database. + +### Role-base access control + +Memgraph Cloud project instances come with 3 roles: `admin`, `readonly`, +`readwrite` + +Users can belong to one of these three roles and the admin can grant, deny or +revoke a certain set of privileges, thereby eliminating security concerns. +Read more [how to manage user +privileges](/docs/memgraph/how-to-guides/manage-user-privileges). + +## Connect to Cloud instances + +ou can connect to an instance running within the Memgraph Cloud project via +**Memgraph Lab**, a visual interface, **mgconsole**, command-line interface, or +one of many **drivers** below. + +Fell free to watch a demo video made for the Cloud launch that will explain +how to connect to Memgraph using Memgraph Cloud: + +[paused-project](https://youtu.be/Tt5KPKylU8k?t=1233 "Connect to Memgraph from Memgraph Cloud") + +### Connect with Memgraph Lab + +Memgraph Lab comes in two flavors, as a desktop application and as an in-browser +application. + +To connect using the in-browser application: +1. Click **Projects** in the left sidebar. +2. Locate **Connect via client** section. +3. Click **Connect in browser** button to open Memgraph Lab in your browser. The + login form will be automatically filled with the connection data, except for + the password. + + + +To use the desktop version of Memgraph Lab: +1. Download [Memgraph Lab](https://memgraph.com/download/#memgraph-lab). +2. Open Memgraph Lab and switch to **Connect Manually**. +3. Extend the **Advanced Settings** and fill out the connection fields with the + data from the **Connect via client** section from the Memgraph Cloud project. +4. Enable SSL **Encryption** and **Connect now**. + +### Connect with CLI `mgconsole` + +To connect to Cloud via a command-line interface **mgconsole**: +1. [Build **mgconsole** from source](https://github.com/memgraph/mgconsole) or + [download it](https://memgraph.com/download/#mgconsole) +2. Run `mgconsole` with the `--host`, `--port`, `--username`, `--password` and + `use-ssl` parameters set to the values provided in the **Connect via + console** section of the Memgraph Cloud project. + +### Connect with clients + +#### Python + +Step 1: Install the driver with pip or poetry: + +```python +pip install gqlalchemy +# or with Poetry: poetry add gqlalchemy + +``` + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```python +from gqlalchemy import Memgraph + +MEMGRAPH_HOST = 'MEMGRAPH_HOST_ADDRESS' +MEMGRAPH_PORT = 7687 +MEMGRAPH_USERNAME = 'YOUR_MEMGRAPH_USERNAME' +# Place your Memgraph password that was created during Project creation +MEMGRAPH_PASSWORD = 'YOUR_MEMGRAPH_PASSWORD' + +def hello_memgraph(host: str, port: int, username: str, password: str): + connection = Memgraph(host, port, username, password, encrypted=True) + results = connection.execute_and_fetch( + 'CREATE (n:FirstNode { message: "Hello Memgraph from Python!" }) RETURN n.message AS message' + ) + print("Created node with message:", next(results)["message"]) + +if __name__ == "__main__": + hello_memgraph(MEMGRAPH_HOST, MEMGRAPH_PORT, MEMGRAPH_USERNAME, MEMGRAPH_PASSWORD) + +``` + +Read more about it on [GQLAlchemy Quick Start Guide](/gqlalchemy/how-to-guides). + + +#### Rust + +Rust driver `rsmgclient` is implemented as a wrapper around `mgclient`, the official Memgraph client library. You will need to install `mgclient` before using `rsmgclient`. + +Step 1: Install `mgclient`, which is a C library interface for the Memgraph database. Follow the installation instructions from [GitHub main page](https://github.com/memgraph/mgclient). + +``` +git clone https://github.com/memgraph/mgclient +# Install the library by following the GitHub installation instructions +``` + +Step 2: Add the following line to the Cargo.toml file under the line [dependencies]: + +``` +rsmgclient = "2.0.0" +``` +Step 3: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```rust +use rsmgclient::{ConnectParams, Connection, MgError, Value, SSLMode}; + +fn execute_query() -> Result<(), MgError> { + // Connect to Memgraph. + let connect_params = ConnectParams { + host: Some(String::from("MEMGRAPH_HOST_ADDRESS")), + port: 7687, + username: Some(String::from("YOUR_MEMGRAPH_USERNAME")), + password: Some(String::from("YOUR_MEMGRAPH_PASSWORD")), + sslmode: SSLMode::Require, + ..Default::default() + }; + let mut connection = Connection::connect(&connect_params)?; + + // Create simple graph. + connection.execute_without_results( + "CREATE (p1:Person {name: 'Alice'})-[l1:Likes]->(m:Software {name: 'Memgraph'}) \ + CREATE (p2:Person {name: 'John'})-[l2:Likes]->(m);", + )?; + + // Fetch the graph. + let columns = connection.execute("MATCH (n)-[r]->(m) RETURN n, r, m;", None)?; + println!("Columns: {}", columns.join(", ")); + for record in connection.fetchall()? { + for value in record.values { + match value { + Value::Node(node) => print!("{}", node), + Value::Relationship(edge) => print!("-{}-", edge), + value => print!("{}", value), + } + } + println!(); + } + connection.commit()?; + + Ok(()) +} + +fn main() { + if let Err(error) = execute_query() { + panic!("{}", error) + } +} +``` +Read more about it on [Rust Quick Start Guide](/memgraph/connect-to-memgraph/drivers/rust). + +#### C++ + +Step 1: Install `mgclient`, which is a C library interface for the Memgraph database. Follow the installation instructions from [GitHub main page](https://github.com/memgraph/mgclient). + +``` +git clone https://github.com/memgraph/mgclient +# Install the library by following the GitHub installation instructions +``` + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```c +#include +#include + +#include + +int main(int argc, char *argv[]) { + mg::Client::Init(); + + mg::Client::Params params; + params.host = "MEMGRAPH_HOST_ADDRESS"; + params.port = "7687"; + params.username = "YOUR_MEMGRAPH_USERNAME"; + params.password = "YOUR_MEMGRAPH_PASSWORD"; + params.use_ssl = true; + auto client = mg::Client::Connect(params); + + if (!client) { + std::cerr << "Failed to connect!\n"; + return 1; + } + + if (!client->Execute("CREATE (n:FirstNode {message: 'Hello Memgraph from C++!'}) RETURN n")) { + std::cerr << "Failed to execute query!\n"; + return 1; + } + + while (const auto maybe_result = client->FetchOne()) { + const auto result = *maybe_result; + if (result.size() < 1) { + continue; + } + const auto value = result[0]; + if (value.type() != mg::Value::Type::Node) { + continue; + } + const auto node = value.ValueNode(); + const auto props = node.properties(); + std::cout << "Created node: " << props["message"].ValueString() << std::endl; + } + + // Deallocate the client because mg_finalize has to be called globally. + client.reset(nullptr); + + mg::Client::Finalize(); + + return 0; +} +``` + +#### Java + +Step 1: Add the following driver dependency in your `pom.xml` file: + +```xml + + org.neo4j.driver + neo4j-java-driver + 4.1.1 + +``` + +:::info +If you want to use neo4j-java-driver v5, please connect to the local instance following the instructions on [Java Quick Start Guide](/memgraph/connect-to-memgraph/drivers/java). +::: + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```java +import org.neo4j.driver.AuthTokens; +import org.neo4j.driver.Driver; +import org.neo4j.driver.GraphDatabase; +import org.neo4j.driver.Session; +import org.neo4j.driver.Result; +import org.neo4j.driver.Transaction; +import org.neo4j.driver.TransactionWork; + +import static org.neo4j.driver.Values.parameters; + +public class HelloMemgraph implements AutoCloseable +{ + private final Driver driver; + + public HelloMemgraph( String uri, String username, String password ) + { + driver = GraphDatabase.driver( uri, AuthTokens.basic( username, password ) ); + } + + public void close() throws Exception + { + driver.close(); + } + + public void createAndPrintNode( final String message ) + { + try ( Session session = driver.session() ) + { + String nodeMessage = session.writeTransaction( new TransactionWork() + { + @Override + public String execute( Transaction tx ) + { + Result result = tx.run( "CREATE (n:FirstNode {message: $message}) " + + "RETURN id(n) AS nodeId, n.message AS message", + parameters( "message", message ) ); + return result.single().get( 1 ).asString(); + } + } ); + System.out.println( "Created node:", nodeMessage ); + } + } + + public static void main( String... args ) throws Exception + { + try ( HelloMemgraph program = new HelloMemgraph( "bolt+ssc://MEMGRAPH_HOST_ADDRESS:7687", "YOUR_MEMGRAPH_USERNAME", "YOUR_MEMGRAPH_PASSWORD" ) ) + { + program.createAndPrintNode( "Hello Memgraph from Java!" ); + } + } +} +``` + +Read more about it on [Java Quick Start Guide](/memgraph/connect-to-memgraph/drivers/java). + +#### C# + +Step 1: Install the driver with Package Manager: + +``` +Install-Package Neo4j.Driver.Simple@4.4.0 +``` + +:::info +If you want to use Neo4j.Driver.Simple v5, please connect to the local instance following the instructions on [C# Quick Start Guide](/memgraph/connect-to-memgraph/drivers/c-sharp). +::: + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```cs +using System; +using System.Linq; +using Neo4j.Driver; + +namespace MemgraphApp +{ + public class Program : IDisposable + { + private readonly IDriver _driver; + public Program(string uri, string user, string password) + { + _driver = GraphDatabase.Driver(uri, AuthTokens.Basic(user, password)); + } + public void CreateAndPrintNode(string message) + { + using (var session = _driver.Session()) + { + var nodeMessage = session.WriteTransaction(tx => + { + var result = tx.Run("CREATE (n:FirstNode {message: $message}) " + + "RETURN id(n) AS nodeId, n.message AS message", + new { message }); + return result.Single()[1].As(); + }); + Console.WriteLine("Created node:", nodeMessage); + } + } + public void Dispose() + { + _driver?.Dispose(); + } + public static void Main() + { + using (var program = new Program("bolt+ssc://MEMGRAPH_HOST_ADDRESS:7687", "YOUR_MEMGRAPH_USERNAME", "YOUR_MEMGRAPH_PASSWORD")) + { + program.CreateAndPrintNode("Hello Memgraph from C#!"); + } + } + } +} +``` + +Read more about it on [C# Quick Start Guide](/memgraph/connect-to-memgraph/drivers/c-sharp). + +#### Golang + +Step 1: Make sure your application has been set up to use go modules (there should be a `go.mod` file in your application root). Add the driver with: + +``` +go get github.com/neo4j/neo4j-go-driver/v5 +``` + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```go +package main + +import ( + "fmt" + "github.com/neo4j/neo4j-go-driver/v5/neo4j" +) + +func main() { + dbUri := "bolt+ssc://MEMGRAPH_HOST_ADDRESS:7687" + driver, err := neo4j.NewDriver(dbUri, neo4j.BasicAuth("YOUR_MEMGRAPH_USERNAME", "YOUR_MEMGRAPH_PASSWORD", "")) + if err != nil { + panic(err) + } + + defer driver.Close() + item, err := insertItem(driver) + if err != nil { + panic(err) + } + fmt.Printf("%v\n", item.Message) +} + +func insertItem(driver neo4j.Driver) (*Item, error) { + + session := driver.NewSession(neo4j.SessionConfig{}) + defer session.Close() + result, err := session.WriteTransaction(createItemFn) + if err != nil { + return nil, err + } + return result.(*Item), nil +} + +func createItemFn(tx neo4j.Transaction) (interface{}, error) { + records, err := tx.Run( + "CREATE (a:Greeting) SET a.message = $message RETURN 'Node ' + id(a) + ': ' + a.message", + map[string]interface{}{"message": "Hello Memgraph from Go!"}) + + if err != nil { + return nil, err + } + record, err := records.Single() + if err != nil { + return nil, err + } + + return &Item{ + Message: record.Values[0].(string), + }, nil +} + +type Item struct { + Message string +} +``` + +Read more about it on [Go Quick Start Guide](/memgraph/connect-to-memgraph/drivers/go). + +#### PHP + +Step 1: Install the driver with composer: + +``` +composer require stefanak-michal/memgraph-bolt-wrapper +``` + +Step 2: Copy the following code and fill out the missing details (`YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS`) before running it: + +```php +setSslContextOptions([ + 'peer_name' => 'Memgraph DB', + 'allow_self_signed' => true +]); + +// Create a new Bolt instance and provide a connection object. +$bolt = new \\Bolt\\Bolt($conn); + +// Set available Bolt versions for Memgraph. +$bolt->setProtocolVersions(4.1, 4, 3); + +// Build and get protocol version instance which creates connection and executes a handshake. +$protocol = $bolt->build(); + +// Login to database with credentials +$protocol->hello(\\Bolt\\helpers\\Auth::basic('YOUR_MEMGRAPH_USERNAME', 'YOUR_MEMGRAPH_PASSWORD')); + +// Pipeline two messages. One to execute query with parameters and second to pull records. +$protocol + ->run('CREATE (a:Greeting) SET a.message = $message RETURN id(a) AS nodeId, a.message AS message', ['message' => 'Hello, World!']) + ->pull(); + +// Server responses are waiting to be fetched through iterator. +$rows = iterator_to_array($protocol->getResponses(), false); + +// Get content from requested record. +$row = $rows[1]->getContent(); +echo 'Node ' . $row[0] . ' says: ' . $row[1]; +``` + +Read more about it on [PHP Quick Start Guide](/memgraph/connect-to-memgraph/drivers/php). + +#### node.js + +Step 1: Install the driver with npm: + +``` +npm install neo4j-driver +``` + +Step 2: Copy the following code and fill out the missing details( `YOUR_MEMGRAPH_PASSWORD`, `YOUR_MEMGRAPH_USERNAME` and `MEMGRAPH_HOST_ADDRESS` ) before running it: + +```php +const neo4j = require('neo4j-driver') + +const MEMGRAPH_URI = 'bolt+ssc://MEMGRAPH_HOST_ADDRESS:7687'; +const MEMGRAPH_USERNAME = 'YOUR_MEMGRAPH_USERNAME'; +// Place your Memgraph password that was created during Project creation +const MEMGRAPH_PASSWORD = 'YOUR_MEMGRAPH_PASSWORD'; + +const helloMemgraph = async (uri, username, password) => { + const driver = neo4j.driver(uri, neo4j.auth.basic(username, password)); + + const session = driver.session(); + const message = 'Hello Memgraph from Node.js!'; + + try { + const result = await session.run( + `CREATE (n:FirstNode {message: $message}) RETURN n`, + { message }, + ); + + const singleRecord = result.records[0]; + const node = singleRecord.get(0); + + console.log('Created node:', node.properties.message); + } finally { + await session.close() + } + + await driver.close() +}; + +helloMemgraph(MEMGRAPH_URI, MEMGRAPH_USERNAME, MEMGRAPH_PASSWORD) + .catch((error) => console.error(error)); +``` + +Read more about it on [Node.js Quick Start Guide](/memgraph/connect-to-memgraph/drivers/nodejs). + +## Payment + +Below are instructions on how to manage Memgraph Cloud payment, and current +Cloud rates. + +Fell free to watch a demo video made for the Cloud launch that will explain +the Account section of Memgraph Cloud where you handle your payment methods: + +[account-payment](https://youtu.be/Tt5KPKylU8k?t=941 "Account section") + +### Add a payment method + +To add a payment method: + +1. Go to **Account** and expand the **Add Credit Card** section +2. Enter *Cardholder Name* and credit card details and **Add Card** +3. Verify the credit card + +You can replace the current credit card with a new credit card by following the +same steps, and the **Remove** button will remove the credit card completely. + +### Redeem coupon code + +To redeem a coupon code you first need to [add a payment +method](#add-a-payment-method), then: + +1. Go to **Account** and expand the **Add Coupon Code** section +2. Enter the coupon code and **Redeem code** + +Each code has an expiration date. If you do not create a project or snapshot within +that period, the code will expire. + +Once you redeem a code, it will be applied to your next invoice, regardless of +the amount of fees on the invoice, which means that the whole coupon will +applied even if the value of the coupon is higher than the amount of the invoice +it is applied to. + +### Check paid and due invoices + +To check pay and due invoices: + +1. Go to **Account** and open the **Invoices** tab +2. Check an estimate for the next payment or the amount of paid invoices + +You can also download paid invoices as PDF to check the cost breakdown. + +### Charge rates + +Below are daily and monthly project and snapshot rates within the Memgraph +Cloud. + +Fell free to watch a demo video made for the Cloud launch that will explain +the logic behind payment rates: + +[paused-project](https://youtu.be/Tt5KPKylU8k?t=1070 "How are rates applied") + +#### Project rates + +Once your 2-week free trial is finished, the cost of the project will be +calculated by the following rates: + +| AWS region | RAM (GB) | Disk (GB) | Daily price ($) | Monthly price ($) | +|------------------------------|-----------|-----------|-----------------|-------------------| +| N. Virginia (us-east-1) | 1 | 11 | 0.55 | 16.61 | +| | 2 | 14 | 1.02 | 30.73 | +| | 4 | 20 | 1.97 | 58.99 | +| | 8 | 32 | 4.85 | 145.45 | +| | 16 | 56 | 6.45 | 193.39 | +| | 32 | 104 | 12.81 | 384.30 | +| N. California (us-west-1) | 1 | 11 | 0.65 | 19.63 | +| | 2 | 14 | 1.21 | 36.32 | +| | 4 | 20 | 2.33 | 69.85 | +| | 8 | 32 | 5.66 | 169.74 | +| | 16 | 56 | 7.19 | 215.72 | +| | 32 | 104 | 14.29 | 428.66 | +| Frankfurt (eu-central-1) | 1 | 11 | 0.63 | 19.03 | +| | 2 | 14 | 1.18 | 35.29 | +| | 4 | 20 | 2.26 | 67.80 | +| | 8 | 32 | 5.80 | 174.01 | +| | 16 | 56 | 7.76 | 232.92 | +| | 32 | 104 | 15.44 | 463.07 | +| Hong Kong (ap-east-1) | 1 | 11 | 0.84 | 25.09 | +| | 2 | 14 | 1.56 | 46.90 | +| | 4 | 20 | 3.02 | 90.51 | +| | 8 | 32 | 6.65 | 199.62 | +| | 16 | 56 | 8.54 | 256.28 | +| | 32 | 104 | 16.98 | 509.27 | +| Sydney (ap-southeast-2) | 1 | 11 | 0.70 | 21.13 | +| | 2 | 14 | 1.30 | 38.98 | +| | 4 | 20 | 2.48 | 74.52 | +| | 8 | 32 | 6.06 | 181.76 | +| | 16 | 56 | 7.74 | 232.06 | +| | 32 | 104 | 15.36 | 460.84 | +| Ohio (ap-southeast-2) | 1 | 11 | 0.55 | 16.61 | +| | 2 | 14 | 1.02 | 30.73 | +| | 4 | 20 | 1.97 | 58.99 | +| | 8 | 32 | 4.85 | 145.45 | +| | 16 | 56 | 6.45 | 193.39 | +| | 32 | 104 | 12.81 | 384.30 | + +#### Snapshot rates + +The size of a snapshot is 8 GB smaller than the disk size the project is using. +If you are using 1 GB of RAM and 11 GB of disk, the snapshot size is 3GB. +Snapshots will be charged by the following rates: + +| AWS region | Source project size |Disk (GB) | Daily price ($) | Monthly price ($) | +|----------------------------------|---------------------|-----------|-----------------|-------------------| +| N. Virginia (us-east-1) | 1 GB RAM | 3 | 0.01 | 0.29 | +| | 2 GB RAM | 6 | 0.02 | 0.59 | +| | 4 GB RAM | 12 | 0.04 | 1.18 | +| | 8 GB RAM | 24 | 0.08 | 2.35 | +| | 16 GB RAM | 48 | 0.16 | 4.70 | +| | 32 GB RAM | 96 | 0.32 | 9.40 | +| N. California (us-west-1) | 1 GB RAM | 3 | 0.01 | 0.32 | +| | 2 GB RAM | 6 | 0.02 | 0.65 | +| | 4 GB RAM | 12 | 0.04 | 1.30 | +| | 8 GB RAM | 24 | 0.09 | 2.59 | +| | 16 GB RAM | 48 | 0.17 | 5.18 | +| | 32 GB RAM | 96 | 0.34 | 10.37 | +| Frankfurt (eu-central-1) | 1 GB RAM | 3 | 0.01 | 0.32 | +| | 2 GB RAM | 6 | 0.02 | 0.65 | +| | 4 GB RAM | 12 | 0.04 | 1.30 | +| | 8 GB RAM | 24 | 0.09 | 2.59 | +| | 16 GB RAM | 48 | 0.17 | 5.18 | +| | 32 GB RAM | 96 | 0.35 | 10.37 | +| Hong Kong (ap-east-1) | 1 GB RAM | 3 | 0.01 | 0.32 | +| | 2 GB RAM | 6 | 0.02 | 0.65 | +| | 4 GB RAM | 12 | 0.04 | 1.30 | +| | 8 GB RAM | 24 | 0.09 | 2.59 | +| | 16 GB RAM | 48 | 0.17 | 5.18 | +| | 32 GB RAM | 96 | 0.35 | 10.37 | +| Sydney (ap-southeast-2) | 1 GB RAM | 3 | 0.01 | 0.32 | +| | 2 GB RAM | 6 | 0.02 | 0.65 | +| | 4 GB RAM | 12 | 0.04 | 1.30 | +| | 8 GB RAM | 24 | 0.09 | 2.59 | +| | 16 GB RAM | 48 | 0.17 | 5.18 | +| | 32 GB RAM | 96 | 0.35 | 10.37 | +| Ohio (us-east-2) | 1 GB RAM | 3 | 0.01 | 0.29 | +| | 2 GB RAM | 6 | 0.02 | 0.59 | +| | 4 GB RAM | 12 | 0.04 | 1.18 | +| | 8 GB RAM | 24 | 0.08 | 2.35 | +| | 16 GB RAM | 48 | 0.16 | 4.70 | +| | 32 GB RAM | 96 | 0.31 | 9.40 | + +#### CPU number + +The number of CPUs in current instances: + +| RAM | Instance type | vCPU* | +| --------- | ------------- | ------ | +| 1 GB RAM | t3a.micro | 2 vCPU | +| 2 GB RAM | t3a.small | 2 vCPU | +| 4 GB RAM | t3a.medium | 2 vCPU | +| 8 GB RAM | m5.large | 2 vCPU | +| 16 GB RAM | r5.large | 2 vCPU | +| 32 GB RAM | r5.xlarge | 4 vCPU | + +*vCPU definition from AWS: Each virtual CPU is a hyperthread of an Intel Xeon core. \ No newline at end of file diff --git a/docs2/getting-started/install-memgraph/rpm-package.md b/docs2/getting-started/install-memgraph/rpm-package.md new file mode 100644 index 00000000000..bed82ce9eb8 --- /dev/null +++ b/docs2/getting-started/install-memgraph/rpm-package.md @@ -0,0 +1,108 @@ +--- +id: rpm-package +title: Install Memgraph from RPM package +sidebar_label: RPM package +--- + +This article briefly outlines the basic steps necessary to install and run +Memgraph from a RPM package. + +import BackwardCompatibilityWarning from '../../templates/_backward_compatibility_warning.mdx'; + + + +## Prerequisites + +Before you proceed with the installation guide make sure that you have: +* The latest **Memgraph RPM Package** which can be downloaded from the + [Memgraph download hub](https://memgraph.com/download/). + +:::note + +Memgraph packages are available for: +- **CentOS 7** +- **CentOS 9** +- **Fedora 36** +- **RedHat 7** +- **RedHat 9** + +::: + +You can also use [direct download](../direct-download-links.md) links to get the +latest Memgraph packages. + +## Installation guide + +After downloading the Memgraph RPM package, you can install it by issuing the +following command: + +```console +sudo yum --nogpgcheck localinstall /path-to/memgraph-.rpm +``` + +:::info +**NOTE:** Please take care of the SELinux config. The easiest way of +running Memgraph is to disable SELinux by executing `setenforce 0`. If that's +not an option, please configure system properly. +::: + +After successful installation, Memgraph can be started as a service using the +following command: + +```console +systemctl start memgraph +``` + +To verify that Memgraph is running, run the following command: + +```console +journalctl --unit memgraph +``` + +If successful, you should receive an output similar to the following: + +```console +You are running Memgraph vX.X.X +``` + +If you want the Memgraph service to start automatically on each startup, run the +following command: + +```console +systemctl enable memgraph +``` + +If you want to start Memgraph with different configuration settings, check out +the [Configuration section](#configuration). At this point, Memgraph is ready for you +to [submit queries](/connect-to-memgraph/overview.mdx). + +## Stopping Memgraph + +To shut down the Memgraph server, issue the following command: + +```console +systemctl stop memgraph +``` + +## Configuration + +The Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the +configuration file is altered, Memgraph needs to be restarted. To learn about +all the configuration options, check out the [Reference +guide](/reference-guide/configuration.md). + +## Where to next? + +To learn how to query the database, take a look at the +**[querying](/connect-to-memgraph/overview.mdx)** guide or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive tutorials.
+Visit the **[Drivers overview](/connect-to-memgraph/drivers/overview.md)** +page if you need to connect to the database programmatically. + +## Getting help + +If you run into problems during the installation process, check out our +**[installation troubleshooting +guide](/installation/linux/linux-installation-troubleshooting.md)** to see if we +have already covered the topic. For more information on the installation process +and for additional questions, visit the **[Help Center](/help-center)** page. diff --git a/docs2/getting-started/install-memgraph/ubuntu.md b/docs2/getting-started/install-memgraph/ubuntu.md new file mode 100644 index 00000000000..f9d8238a82e --- /dev/null +++ b/docs2/getting-started/install-memgraph/ubuntu.md @@ -0,0 +1,183 @@ +--- +id: ubuntu-installation +title: Install Memgraph on Ubuntu +sidebar_label: Ubuntu +slug: /install-memgraph-on-ubuntu +pagination_prev: installation/overview +pagination_next: connect-to-memgraph/overview +--- + +This article briefly outlines the basic steps necessary to install and run +Memgraph on Ubuntu. + +import BackwardCompatibilityWarning from '../../templates/_backward_compatibility_warning.mdx'; + + + +## Prerequisites + +Before you proceed with the installation guide make sure that you have: +* The latest **Memgraph Ubuntu Package** which can be downloaded from the + [Memgraph download hub](https://memgraph.com/download/). + +:::note + +Memgraph packages are available for: +- **Ubuntu 18.04** +- **Ubuntu 20.04** +- **Ubuntu 22.04** + +::: + +You can also use [direct download](../direct-download-links.md) links to get the +latest Memgraph packages. + +## Installation guide {#installation-guide} + +After downloading Memgraph as a Debian package, install it by running the +following: + +```console +sudo dpkg -i /path-to/memgraph_.deb +``` + +:::note Why use sudo? +In order to perform some actions on your operating system +like installing new software, you may need **superuser** privileges (commonly +called **root**).Β  +::: + +:::caution Potential installation error +You could get errors while installing +the package with the above command if you don't have all of Memgraph's +dependencies installed. The issues mostly look like the following: + +```console +dpkg: error processing package memgraph (--install): + dependency problems - leaving unconfigured +Errors were encountered while processing: + memgraph +``` + +To install missing dependencies and finish the installation of the Memgraph +package, just issue the following command: + +```console +sudo apt-get install -f +``` + +The above command will install all missing dependencies and will finish +configuring the Memgraph package. +::: + +To verify that Memgraph is running, run the following: + +```console +sudo journalctl --unit memgraph +``` + +If successful, you should receive an output similar to the following: + +```console +You are running Memgraph vX.X.X +``` + +If the Memgraph database instance is not running, you can start it explicitly:: + +```console +sudo systemctl start memgraph +``` + +If you want to start Memgraph with different configuration settings, check out +the [Configuration section](#configuration). At this point, Memgraph is ready for you +to [submit queries](/connect-to-memgraph/overview.mdx). + +## Stopping Memgraph + +To shut down the Memgraph server, issue the following command: + +```console +sudo systemctl stop memgraph +``` + +## Configuration + +The Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the +configuration file is altered, Memgraph needs to be restarted. To learn about +all the configuration options, check out the [Reference +guide](/reference-guide/configuration.md). + +## Troubleshooting + +### Unable to install the Memgraph package with `dpkg` + +While running the following `dpkg` command: + +```bash +dpkg -i /path-to/memgraph_.deb +``` + +you may encounter errors that resemble the following: + +```console +dpkg: error processing package memgraph (--install): dependency problems - +leaving unconfigured Errors were encountered while processing: memgraph +``` + +These errors indicate that you don’t have all of the necessary dependencies +installed. To install the missing dependencies and finish the installation, +issue the following command: + +```console +sudo apt-get install -f +``` + +### Multiple notes when starting Memgraph + +When you start a Memgraph instance, you may see the following list of notes in +your terminal: + +```console +You are running Memgraph v1.4.0-community + +NOTE: Please install networkx to be able to use graph_analyzer module. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use Memgraph NetworkX wrappers. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx, numpy, scipy to be able to use proxied NetworkX algorithms. E.g., CALL nxalg.pagerank(...). +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use wcc module. +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] +``` + +If you wish to work with built-in NetworkX modules in Memgraph, you need to +install the following Python libraries: +* [NumPy](https://numpy.org/) +* [SciPy](https://www.scipy.org/) +* [NetworkX](https://networkx.org/) + +For more information on how to install Python libraries in Linux, follow the +[Installing Packages +guide](https://packaging.python.org/tutorials/installing-packages/). If you are +not interested in working with query modules that depend on these libraries, you +can ignore the warnings. + +For more information on the installation process and for additional questions, +visit the **[Help Center](/help-center)** page. + +## Where to next? + +To learn how to query the database, take a look at the +**[querying](/connect-to-memgraph/overview.mdx)** guide or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive tutorials.
+Visit the **[Drivers overview](/connect-to-memgraph/drivers/overview.md)** +page if you need to connect to the database programmatically. \ No newline at end of file diff --git a/docs2/getting-started/install-memgraph/wsl.md b/docs2/getting-started/install-memgraph/wsl.md new file mode 100644 index 00000000000..4f2b3543e71 --- /dev/null +++ b/docs2/getting-started/install-memgraph/wsl.md @@ -0,0 +1,415 @@ +--- +id: wsl +title: Install MemgraphDB on Windows with WSL +sidebar_label: Windows Subsystem for Linux +--- + +[![Related - Tutorial](https://img.shields.io/static/v1?label=Related&message=Tutorial&color=008a00&style=for-the-badge)](/tutorials/install-memgraph-on-windows-10.md) + +This article briefly outlines the basic steps necessary to install and run +Memgraph on Windows with the Windows Subsystem for Linux. + +import BackwardCompatibilityWarning from '../../templates/_backward_compatibility_warning.mdx'; + + + +## Prerequisites + +Before you proceed with the installation guide make sure that you have: + +- The latest **Memgraph Ubuntu package** which can be downloaded from the + [Memgraph download hub](https://memgraph.com/download/). +- Installed **Windows Subsystem for Linux (WSL)**. For detailed instructions, + refer to the [Microsoft + documentation](https://docs.microsoft.com/en-us/windows/wsl/install). + +## Installation guide {#installation-guide} + +**1.** Start WSL by running the following command from **PowerShell**: + +```console +wsl +``` + +**2.** Install MemgraphDB using the latest Memgraph Ubuntu package and by running the +following command in the Ubuntu terminal: + +```console +sudo dpkg -i /mnt//Users//Downloads/memgraph_.deb +``` + +**3.** Start the Memgraph server by issuing the following command: + +``` +sudo runuser -l memgraph -c '/usr/lib/memgraph/memgraph' +``` + +If successful, you should receive an output similar to the following: + +``` +You are running Memgraph vX.X.X +``` + +If you want to start Memgraph with different configuration settings, check out +the [Configuration section](#configuration).
+At this point, Memgraph is ready for you +to [submit queries](/connect-to-memgraph/overview.mdx). + +:::caution +Potential installation error You could get errors while installing the package +with the above commands if you don't have all of Memgraph's dependencies +installed. The issues mostly look like the following: + +``` +dpkg: error processing package memgraph (--install): + dependency problems - leaving unconfigured +Errors were encountered while processing: + memgraph +``` + +To install missing dependencies and finish the installation of the Memgraph +package, just issue the following command: + +``` +sudo apt-get install -f +``` + +The above command will install all missing dependencies and will finish +configuring the Memgraph package. +::: + +## Configuration + +The Memgraph configuration file is available at `/etc/memgraph/memgraph.conf`. If the +configuration file is altered, Memgraph needs to be restarted. + +To learn about +all the configuration options, check out the [Reference +guide](/reference-guide/configuration.md). + +## Troubleshooting + +### Accessing files from your Windows system + +Usually, you can find the Windows users directories in this location: + +```console +/mnt//Users/ +``` + +### Unable to install the Memgraph package with `dpkg` + +While running the following `dpkg` command: + +```bash +sudo dpkg -i /mnt//Users//Downloads/memgraph_.deb +``` + +you may encounter errors that resemble the following: + +```console +dpkg: error processing package memgraph (--install): dependency problems - +leaving unconfigured Errors were encountered while processing: memgraph +``` + +These errors indicate that you don’t have all of the necessary dependencies +installed. To install the missing dependencies and finish the installation, +issue the following command: + +```console +sudo apt-get install -f +``` + +### Multiple notes when starting Memgraph + +When you start a Memgraph instance, you may see the following list of notes in +your terminal: + +```console +You are running Memgraph v1.4.0-community + +NOTE: Please install networkx to be able to use graph_analyzer module. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use Memgraph NetworkX wrappers. Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx, numpy, scipy to be able to use proxied NetworkX algorithms. E.g., CALL nxalg.pagerank(...). +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] + +NOTE: Please install networkx to be able to use wcc module. +Using Python: +3.8.2 (default, Jul 16 2020, 14:00:26) +[GCC 9.3.0] +``` + +If you wish to work with built-in NetworkX modules in Memgraph, you need to +install the following Python libraries: +* [NumPy](https://numpy.org/) +* [SciPy](https://www.scipy.org/) +* [NetworkX](https://networkx.org/) + +For more information on how to install Python libraries in WSL, follow the [Python +installation +guide](https://docs.microsoft.com/en-us/windows/python/web-frameworks#install-python-pip-and-venv). +If you are not interested in working with query modules that depend on these +libraries, you can ignore the warnings. + +For more information on the installation process and for additional questions, +visit the **[Help Center](/help-center)** page. + +## Where to next? + +To learn how to query the database, take a look at the +**[querying](/connect-to-memgraph/overview.mdx)** guide or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive tutorials.
+Visit the **[Drivers overview](/connect-to-memgraph/drivers/overview.md)** +page if you need to connect to the database programmatically. + +## Getting help + +If you run into problems during the installation process, check out our +**[installation troubleshooting +guide](/installation/windows/windows-installation-troubleshooting.md)** to see +if we have already covered the topic. For more information on the installation +process and for additional questions, visit the **[Help Center](/help-center)** +page. + +## Tutorial - Install Memgraph on Windows 10 with WSL + +In this tutorial, you will install both MemgraphDB and Memgraph Lab on Windows 10. +You will then test each installation by running a few basic queries to make +sure that everything is working correctly. + +:::info + +You can install MemgraphDB and Memgraph Lab as separate components or you can +use the **Memgraph Platform** Docker image. Memgraph Platform contains +MemgraphDB, Memgraph Lab, mgconsole and MAGE. It is the recommended installation +option, and it isn't part of this tutorial. + +If you want to install Memgraph Platform, please follow the [Memgraph Platform installation guide](../install-memgraph-on-windows-docker). + +::: + +:::info + +Memgraph is also available as a [Memgraph Cloud](/memgraph-cloud) solution that +requires no installation - be sure to check it out. + +::: + +[MemgraphDB](https://memgraph.com/product/) is a native, in-memory graph +database built for real-time business-critical applications. Memgraph supports +strongly-consistent ACID transactions and uses the [Cypher query +language](/cypher-manual/) for structuring, manipulating, and exploring data. + +[Memgraph Lab](https://memgraph.com/product/lab/) is a lightweight and intuitive +Cypher and [Bolt](https://boltprotocol.org/) compatible integrated development +environment (IDE). It's designed to help you import data, develop, debug, and +profile database queries and visualize query results. + +### Prerequisites + +For a seamless installation of MemgraphDB and Memgraph Lab on Windows 10, ensure +that you have: + +- A computer running Windows 10 (64-bit version) with Windows subsystem for + Linux +- Administrative rights to your Windows PC and an internet connection. +- Basic knowledge of working with the command line. + +### Step 1 - Enable Windows Subsystem for Linux + +First, you need to enable the Windows subsystem (WSL) for Linux by following the +[Microsoft +instructions](https://docs.microsoft.com/en-us/windows/wsl/install-win10). + +After you install WSL the next step is to install Ubuntu Linux distribution. To +install it do the following: + +1. Open Windows PowerShell +2. Run the `wsl --install -d Ubuntu` command +3. Enter the username and password for your new Linux user + +If everything works properly, you will get the following output: + +```nocopy +Enter new UNIX username: james +New password: +Retype new password: +passwd: password updated successfully +Installation successful! +``` + +Congratulations! You have successfully installed the Debian distribution of +Linux on your Windows machine. You are now ready to install Memgraph. + +### Step 2 - Installing Memgraph {#step-2-installing-memgraph} + +You must know your exact Ubuntu version so that you can download the right +Memgraph package. To find out the version, run the following command in the +Ubuntu shell: + +```bash +lsb_release -d +``` + +Your output will look something like this: + +```nocopy +Description: Ubuntu 20.04 LTS +``` + +Therefore, the Linux distribution is Ubuntu 20.04. + +Now you can go to Memgraph's [download](https://memgraph.com/download/#memgraph) +page and download the installation package for your Linux distribution (in this +example, Ubuntu 20.04). + +Once the download is complete, open your Ubuntu shell and run the following +command to start the installation process: + +```bash +sudo dpkg -i /path/to/memgraph_.deb +``` + +- replace `/path/to` with path to where you downloaded your installation + package. +- replace `_version` with the version of the package that you are installing + (usually the name of the installation package you downloaded). + +For example, if user Arthur downloads version `2.1.1-1_amd64` to default Windows +download folder, the file will be located at `/mnt/c/Users/Arthur/Downloads`, +and the command would be: + +```bash +sudo dpkg -i /mnt/c/Users/Arthur/Downloads/memgraph_2.1.1-1_amd64.deb +``` + +:::note + +If you see any error related to missing dependency packages, you might have to +run the following commands before installing Memgraph: + +```bash +sudo apt-get update +sudo apt-get -f install +``` + +::: + +Normally, you would start Memgraph using `systemd`, but unfortunately, this is +not an option in WSL. You can bypass this inconvenience by using the command +`runuser`, which allows you to run commands with a substitute user and group ID. +Start the Memgraph server by issuing the following command: + +```bash +sudo runuser -l memgraph -c '/usr/lib/memgraph/memgraph' +``` + +If Memgraph has been installed correctly, you will see something like this: + +```nocopy +You are running Memgraph v2.1.1 +``` + +Awesome! Now you have a running Memgraph instance on your Windows machine. + +### Step 3 - Installing Memgraph Lab and connecting to Memgraph + +Start by downloading [Memgraph Lab](https://memgraph.com/download/#memgraph-lab) +for Windows. + +The downloaded package will be a `.exe` installer and can be easily run just +like other Windows installers. + +:::note + +Before connecting, ensure that the Memgraph server is running as explained in +[Step 2](#step-2-installing-memgraph). You won't be able to connect if the +server is not already running! + +::: + +Double click on the installer to start the installation process. + +Once the installation is completed, Memgraph Lab will launch, and you will be +presented with the Home screen. Click **Connect now** to connect to your +Memgraph instance. + +![Connect to MemgraphDB](../data/install-memgraph-on-windows-10/memgraph-lab-connect-now.png) + +:::note + +You can also click **Connect Manually** to connect to Memgraph. Manual +connection is usually used when you want to connect to a remote instance of +Memgraph, and not a local one. Using the default values of the "Host" and "Port" +text fields, and leaving the "Username" and "Password" fields blank, will also +connect you to your running Memgraph instance. + +::: + +Once connected, you'll be presented with Memgraph Lab's user interface. + +Now that you have Memgraph Lab installed and connected to Memgraph, you will run +a few basic queries to make sure everything works properly. + +### Step 4 - Testing the Memgraph Lab's connection to Memgraph + +You can test the Memgraph Lab's connection to Memgraph by running your first +query. You will use the Cypher query to create a simple graph that has two nodes +and one relationship. + +![memgraph-lab-run-match-query-result](../data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-result.png) + +1. First, click the **Query** in the sidebar. +2. Next, enter this first query at the query editor which is located at the top + of the screen: + + ```cypher + CREATE (u:User {name: "Alice"})-[:Likes]->(m:Software {name: "Memgraph"}); + ``` + + The query above creates 2 nodes and a relationship between them. + +3. Lastly, click "Run" or press `Ctrl + Enter` to execute the query. + +![memgraph-lab-run-query](../data/install-memgraph-on-windows-10/memgraph-lab-run-query.png) + +If no error message appeared, that means your query was executed successfully. + +You can retrieve the nodes and relationships you've just created by executing +the following Cypher query: + +```cypher +MATCH (u:User)-[r]->(x) +RETURN u, r, x; +``` + +On the **Data** tab your result should look similar to this: + +![memgraph-lab-run-match-query-data](../data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-data.png) + +If you switch to the **Graph** tab you will see something like this: + +![memgraph-lab-run-match-query-graph](../data/install-memgraph-on-windows-10/memgraph-lab-run-match-query-graph.png) + +Now you have the Memgraph Lab working correctly on your system. Memgraph Lab's +visual presentation of query results is one of its best features. + +### Where to next? + +In this tutorial, you installed MemgraphDB and Memgraph Lab on Windows 10 using +Windows Subsystem for Linux. You tested the Memgraph Lab's connection to +MemgraphDB by executing Cypher queries. + +To learn how to query the database, take a look at the +**[querying](/connect-to-memgraph/overview.mdx)** guide or **[Memgraph +Playground](https://playground.memgraph.com/)** for interactive tutorials.
+Visit the **[Drivers overview](/connect-to-memgraph/drivers/overview.md)** +page if you need to connect to the database programmatically. \ No newline at end of file diff --git a/docs/tutorials/graph-modeling.md b/docs2/graph-modeling/graph-modeling.md similarity index 100% rename from docs/tutorials/graph-modeling.md rename to docs2/graph-modeling/graph-modeling.md diff --git a/docs2/help-center/errors/auth.md b/docs2/help-center/errors/auth.md new file mode 100644 index 00000000000..50c2a5757d3 --- /dev/null +++ b/docs2/help-center/errors/auth.md @@ -0,0 +1,56 @@ +--- +id: auth +title: Auth module errors +sidebar_label: Auth module +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Couldn't authenticate user '{}' using the auth module because the user + already exists as a role. For more details, visit: memgr.ph/auth.](#error-1) +2. [Couldn't authenticate user '{}' using the auth module because the user + doesn't exist. For more details, visit: memgr.ph/auth.](#error-2) +3. [Couldn't authenticate user '{}' using the auth module because the user's + role '{}' already exists as a user. For more details, visit: + memgr.ph/auth.](#error-3) +4. [Couldn't authenticate user '{}' using the auth module because the user's + role '{}' doesn't exist. For more details, visit: memgr.ph/auth.](#error-4) +5. [Couldn't authenticate user '{}' because the user doesn't exist. For more + details, visit: memgr.ph/auth.](#error-2) +6. [Couldn't authenticate user '{}'. For more details, visit: + memgr.ph/auth.](#error-5) + +## The user already exists as a role {#error-1} + +A user and a role can't share the same name. Please change the name of the user +or the role. + +## The user doesn't exist {#error-2} + +By using the `--auth-module-create-user` flag, you can specify if a missing user +should be created. Otherwise, the user can't be created and this error will be +thrown. + +## The role already exists as a user {#error-3} + +A user and a role can't share the same name. Please change the name of the user +or the role. + +## The role doesn't exist {#error-4} + +By using the `--auth-module-create-user` flag, you can specify if a missing role +should be created. Otherwise, the role can't be created and this error will be +thrown. + +## Couldn't authenticate user {#error-5} + +The specified password was most probably wrong. Please check the credentials +again. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/durability.md b/docs2/help-center/errors/durability.md new file mode 100644 index 00000000000..baea2cdcb37 --- /dev/null +++ b/docs2/help-center/errors/durability.md @@ -0,0 +1,52 @@ +--- +id: durability +title: Durability errors +sidebar_label: Durability +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Snapshot or WAL directory don't exist, there is nothing to recover. For more + details, visit: memgr.ph/durability.](#error-1) +2. [No snapshot or WAL file found. For more details, visit: + memgr.ph/durability.](#error-1) +3. [Couldn't get WAL file info from the WAL directory. For more details, visit: + memgr.ph/durability.](#error-1) + +## Why are snapshot and WAL files missing? {#error-1} + +There are two options: +1. [The files are missing because Docker doesn't store them.](#error-1-1) +2. [Memgraph is looking in the wrong directory.](#error-1-2) + +### Docker not persisting data {#error-1-1} + +It's possible that your Docker containers don’t persist data by default (all +changes are lost when the container is stopped). You need to use local volumes +to store the data permanently which is why Memgraph is started with the `-v` +flag: + +```console +docker run -p 7687:7687 -v mg_lib:/var/lib/memgraph memgraph/memgraph +``` + +More information on Docker Volumes can be found +[here](https://docs.docker.com/storage/volumes/). + +### Memgraph data directory not set correctly {#error-1-2} + +Make sure that Memgraph is searching for the snapshot files in the right +directory. The Memgraph configuration is available in +`/etc/memgraph/memgraph.conf` and you can specify the directory with the +`--data-directory` flag. If the configuration file is altered, Memgraph needs to +be restarted. The default directory is `/var/lib/memgraph`. To learn about all +the configuration options, check out the [Reference +guide](/memgraph/reference-guide/configuration). + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/memory.md b/docs2/help-center/errors/memory.md new file mode 100644 index 00000000000..d719670cba0 --- /dev/null +++ b/docs2/help-center/errors/memory.md @@ -0,0 +1,27 @@ +--- +id: memory +title: Memory (RAM) errors +sidebar_label: Memory (RAM) +--- + +import Help from '../templates/_help.mdx'; + + + +## Warnings + +1. [Running out of available RAM, only {} MB left. For more details, visit + memgr.ph/ram.](#error-1) + +## Running out of available RAM {#error-1} + +This is a warning that can be disabled in the Memgraph configuration. The +Memgraph configuration is available in `/etc/memgraph/memgraph.conf` and you can +disable the warning with the `--memory-warning-threshold` flag. The default +value is `true`. If the configuration file is altered, Memgraph needs to be +restarted. To learn about all the configuration options, check out the +[Reference guide](/memgraph/reference-guide/configuration). + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/modules.md b/docs2/help-center/errors/modules.md new file mode 100644 index 00000000000..35a0561f0b1 --- /dev/null +++ b/docs2/help-center/errors/modules.md @@ -0,0 +1,59 @@ +--- +id: modules +title: Module errors +sidebar_label: Modules +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Unable to load module {}; {}. For more details, visit + memgr.ph/modules.](#error-1) +2. [Failed to close module {}; {}. For more details, visit + memgr.ph/modules.](#error-1) +3. [Unable to overwrite an already loaded module {}. For more details, visit + memgr.ph/modules.](#error-2) +4. [Module directory {} doesn't exist. For more details, visit + memgr.ph/modules.](#error-3) + +## Warnings + +1. [Unknown query module file {}. For more details, visit: + memgr.ph/modules.](#warning-1) + +## Errors when loading or closing modules {#error-1} + +When Memgraph is loading/closing modules, an error can occur if: +1. The file could not be found: check if the file has been deleted. +2. The file is not readable: make the file readable for the user `memgraph`. +3. The file had the wrong format: check if the file has the expected format. +4. The file caused errors during loading. + +## Unable to overwrite an already loaded module {#error-2} + +Module names need to be distinct. Try to rename your module and load it again +with `CALL mg.load_all();`. + +## Module directory {} doesn't exist {#error-3} + +Make sure that Memgraph is searching for the modules in the right directory. The +Memgraph configuration is available in `/etc/memgraph/memgraph.conf` and you can +specify the directory with the `--query-modules-directory` flag. The default +directory is `/usr/lib/memgraph/query-modules`. If the configuration file is +altered, Memgraph needs to be restarted. To learn about all the configuration +options, check out the [Reference guide](/memgraph/reference-guide/configuration). + +## Unknown query module file {#warning-1} + +Query modules can be implemented using the Python or C API provided by Memgraph. +Modules written in languages other than Python need to be compiled to a shared +library so that they can be loaded when Memgraph starts. This means that you can +write the procedures in any programming language which can work with C and can +be compiled to the ELF shared library format. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/overview.md b/docs2/help-center/errors/overview.md new file mode 100644 index 00000000000..837ae3de5a3 --- /dev/null +++ b/docs2/help-center/errors/overview.md @@ -0,0 +1,17 @@ +--- +id: overview +title: Errors overview +sidebar_label: Errors overview +slug: / +--- + +import Help from '../templates/_help.mdx'; + + + +Welcome to the Memgraph Error troubleshooting guide. This site contains +descriptions of various errors, warnings, and other logged messages that can be +observed in Memgraph. Be aware that only a fraction of the errors and warnings +are covered here. If you encounter an error that isn't covered here, please +contact us on our [Discord](https://www.discord.gg/memgraph) server or +submit a [Support ticket](https://support.memgraph.com). diff --git a/docs2/help-center/errors/ports.md b/docs2/help-center/errors/ports.md new file mode 100644 index 00000000000..f5dfb98f002 --- /dev/null +++ b/docs2/help-center/errors/ports.md @@ -0,0 +1,74 @@ +--- +id: ports +title: Port errors +sidebar_label: Ports +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Invalid port number {}. For more details, visit: memgr.ph/ports.](#error-1) +2. [Invalid port number {}. The port number must be a positive integer. For more + details, visit: memgr.ph/ports.](#error-2) +3. [Invalid port number {}. The port number exceedes the maximum possible size. + For more details, visit: memgr.ph/ports.](#error-2) + +## What port is Memgraph running on? {#error-1} + +The default port Memgraph uses is `7687` is not otherwise specified. + +## How to change the port? + +You can change the default port using the configuration settings. + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +The Memgraph configuration is available in Docker's named volume `mg_etc`. On +Linux systems, it should be in +`/var/lib/docker/volumes/mg_etc/_data/memgraph.conf`. Keep in mind that this way +of specifying configuration options is only valid if Memgraph was started [using +volumes](/memgraph/how-to-guides/work-with-docker). + +When using Docker, you can also specify the configuration options in the `docker +run` command: + +``` +docker run -p 7687:7687 memgraph/memgraph --log-level=TRACE +``` + + + + +The Memgraph configuration is available in `/etc/memgraph/memgraph.conf`. If the +configuration file is altered, Memgraph needs to be restarted. + + + + +To learn about all the configuration options, check out the [Reference To learn +about all the configuration options, check out the [Reference +guide](/memgraph/reference-guide/configuration). + +## What is the valid range for choosing a port? {#error-2} + +A port number is a 16-bit unsigned integer, thus ranging from 0 to 65535. Ports +0 through 1023 are defined as well-known ports. Registered ports are from 1024 +to 49151. The remainder of the ports from 49152 to 65535 can be used dynamically +by applications. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/python-modules.md b/docs2/help-center/errors/python-modules.md new file mode 100644 index 00000000000..1a909ebd8de --- /dev/null +++ b/docs2/help-center/errors/python-modules.md @@ -0,0 +1,33 @@ +--- +id: python-modules +title: Python module errors +sidebar_label: Python modules +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Unable to load support for embedded Python: {}. For more details, visit: + memgr.ph/python.](#error-1) +2. [Unable to load support for embedded Python: missing directory {}. For more + details, visit: memgr.ph/python.](#error-1) + +## How to install Python packages globally? {#error-1} + +The Python packages need to be installed globally for Memgraph to access them. +To install a Python module globally, you will need to run it with the following +command: + +```console +sudo pip3 install +``` + +If this approach doesn't work, try to install a pre-compiled package using +`apt-get` if available. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/replication.md b/docs2/help-center/errors/replication.md new file mode 100644 index 00000000000..a61ea167fb6 --- /dev/null +++ b/docs2/help-center/errors/replication.md @@ -0,0 +1,36 @@ +--- +id: replication +title: Replication errors +sidebar_label: Replication +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Failed to connect to replica {} at the endpoint {}. For more details, visit: + memgr.ph/replication.](#error-1) +2. [Couldn't replicate data to {}. For more details, visit: + memgr.ph/replication.](#error-1) + +## Warning + +1. [Snapshots are disabled for replicas. For more details, visit: memgr.ph/replication.](#warning-1) + +## Troubleshooting replication errors {#error-1} + +1. Make sure that the Memgraph instances serving as replicas are up and running. +2. Check the firewall on your machine because it could be blocking the traffic + requested by Memgraph. +3. Verify that there are no network problems. + +## Snapshots are disabled for replicas {#warning-1} + +Because of consistency constraints, snapshots are disabled on replicas. If you +need a snapshot of the database, then create one on the main instance. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/snapshots.md b/docs2/help-center/errors/snapshots.md new file mode 100644 index 00000000000..83dc9326112 --- /dev/null +++ b/docs2/help-center/errors/snapshots.md @@ -0,0 +1,71 @@ +--- +id: snapshots +title: Snapshot errors +sidebar_label: Snapshots +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Couldn't ensure that exactly {} snapshots exist because an error occurred: + {}. For more information about snapshots, visit: memgr.ph/snapshots.](#error-1) +2. [Couldn't ensure that only the absolutely necessary WAL files exist because an + error occurred: {}. For more details, visit: memgr.ph/snapshots.](#error-1) + +## What are snapshots? {#error-1} + +Database snapshots are like a view of a database as it was at a certain point in +time. It is a read-only copy of the data that can be used for backup or data +persistence. Memgraph will try to load the newest snapshot file on startup. + +## What to do with corrupt snapshots? + +Because snapshots are read-only, any modifications will result in corrupt +files that won't be loaded. The solution is to delete the snapshot files and to +start Memgraph again. + +## Why is data lost when Memgraph is restarted? + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + + + + +Docker containers don’t persist data by default (all changes are lost when the +container is stopped). You need to use local volumes to store the data +permanently which is why Memgraph is started with the `-v` flags: + +```console +docker run -p 7687:7687 -v mg_lib:/var/lib/memgraph memgraph/memgraph +``` + +More information on Docker Volumes can be found +[here](https://docs.docker.com/storage/volumes/). + + + + +Make sure that Memgraph is searching for the snapshot files in the right +directory. The Memgraph configuration is available in +`/etc/memgraph/memgraph.conf` and you can specify the directory with the +`--data-directory` flag. If the configuration file is altered, Memgraph needs to +be restarted. The default directory is `/var/lib/memgraph`. To learn about all the +configuration options, check out the [Reference +guide](/memgraph/reference-guide/configuration). + + + + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/socket.md b/docs2/help-center/errors/socket.md new file mode 100644 index 00000000000..a7605e1a464 --- /dev/null +++ b/docs2/help-center/errors/socket.md @@ -0,0 +1,26 @@ +--- +id: socket +title: Socket errors +sidebar_label: Socket +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Cannot bind to socket on endpoint {}. For more details, visit: + memgr.ph/socket.](#error-1) +2. [Cannot listen on socket endpoint {}. For more details, visit: + memgr.ph/socket.](#error-1) + +## Memgraph cannot bind to or listen on socket endpoint {#error-1} + +Make sure that the specified port (the default port Memgraph runs on is 7687) +isn't being used by another process or that you haven't already started another +Memgraph instance on the same port. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/ssl.md b/docs2/help-center/errors/ssl.md new file mode 100644 index 00000000000..60dac2501b8 --- /dev/null +++ b/docs2/help-center/errors/ssl.md @@ -0,0 +1,43 @@ +--- +id: ssl +title: SSL errors +sidebar_label: SSL +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [An unknown error occurred while processing an SSL message. Please make sure + that you have SSL properly configured on the server and the client. For more + details, visit: memgr.ph/ssl.](#error-1) + +## Warnings + +1. [Using non-secure Bolt connection (without SSL). For more details, visit: + memgr.ph/ssl.](#error-1) + +## Secure Sockets Layer (SSL) {#error-1} + +Secure connections are supported and disabled by default. The server initially +ships with a self-signed testing certificate located at `/etc/memgraph/ssl/`. +You can use it by [changing the configuration](/memgraph/how-to-guides/config-logs) and passing its path within the +following parameters: + +``` +--bolt-cert-file=/etc/memgraph/ssl/cert.pem +--bolt-key-file=/etc/memgraph/ssl/key.pem +``` + +If you are using your own certificate be sure to enter the correct path to the +certificate. + +To disable SSL support and use insecure connections to the database erase or +make comments out of both parameters (`--bolt-cert-file` and `--bolt-key-file`) +by adding a hashtag in front of them. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/errors/templates/_help.mdx b/docs2/help-center/errors/templates/_help.mdx new file mode 100644 index 00000000000..900a00b9800 --- /dev/null +++ b/docs2/help-center/errors/templates/_help.mdx @@ -0,0 +1,4 @@ +:::info +If you are having trouble dealing with an error, please let us know on the Discord. +::: diff --git a/docs2/help-center/errors/templates/_submit-error.mdx b/docs2/help-center/errors/templates/_submit-error.mdx new file mode 100644 index 00000000000..81641f207e2 --- /dev/null +++ b/docs2/help-center/errors/templates/_submit-error.mdx @@ -0,0 +1,4 @@ +:::note +If you weren't able to find the error, please submit it through a Support Ticket so we can look into it and get back to you. +::: \ No newline at end of file diff --git a/docs2/help-center/errors/unknown.md b/docs2/help-center/errors/unknown.md new file mode 100644 index 00000000000..ee514cdc190 --- /dev/null +++ b/docs2/help-center/errors/unknown.md @@ -0,0 +1,24 @@ +--- +id: unknown +title: Unknown errors +sidebar_label: Unknown +--- + +import Help from '../templates/_help.mdx'; + + + +## Errors + +1. [Unknown exception occurred during query execution {}. For more details, + visit: memgr.ph/unknown.](#error-1) + +## How to handle an unknown error? {#error-1} + +Please report the error by opening an issue on [GitHub](https://github.com/memgraph/memgraph), joining our [Discord](https://www.discord.gg/memgraph) or through a [Support +ticket](https://support.memgraph.com). We will contact you about the error as soon as +possible. + +import SubmitError from '../templates/_submit-error.mdx'; + + diff --git a/docs2/help-center/faq.md b/docs2/help-center/faq.md new file mode 100644 index 00000000000..6bfbb23f737 --- /dev/null +++ b/docs2/help-center/faq.md @@ -0,0 +1,438 @@ +# Frequently asked questions + +## Memgraph 101 + +### What is Memgraph? + +Memgraph is an **open-source in-memory graph database** built for teams that expect +highly performant, advanced analytical insights - as compatible with your +current infrastructure as Neo4j (but up to 120x faster). Memgraph is powered by +a query engine built in C/C++ to handle real-time use cases at an enterprise +scale. Memgraph supports **strongly-consistent ACID transactions** and uses the +standardized **Cypher query language** over Bolt protocol for structuring, +manipulating, and exploring data. + +### What are the benefits of being an in-memory graph database? + +When data is stored on disk, the computer has to physically read it from the +disk and transfer it to the RAM before it can be processed. This process is +relatively slow because it involves several physical processes, such as seeking +the right location on the disk and waiting for the data to be read. Writing the +data is also much slower for the same reasons. + +Storing data in the computer's RAM eliminates the need for these physical +processes, and data can be accessed and added almost instantly. + +Therefore, in-memory graph databases are ideal for applications requiring fast +data processing, real-time analytics, and quick response times. + +### What use cases is Memgraph best suited for? + +Memgraph is best suited for use cases with complex data relationships that +require real-time processing and high scalability. + +### How do I know my use case has complex data relationships? + +In relational databases, complex data relationships arise when data from +different tables is related or somehow interconnected. Because data is spread +across multiple tables, querying it requires hopping from one table to other and +joining it with slow and resource-intensive join operations. + +The complexity of join operations can increase exponentially as the number of +tables increases and as the links between various tables are no longer neatly +structured following a clearly set pattern. It is no longer sufficient to join +just two or three tables but hop through more than seven tables to find the +correct link between the data and gain valuable analytics. + +Examples of complex data are deep hierarchical relationships between data, such +as parent-child relationships or many-to-many relationships between different +tables. + +### How does Memgraph compare to other graph databases regarding performance? + +Memgraph is designed to be a high-performance graph database, and it typically +outperforms many other graph databases in terms of speed and scalability. Key +factors contributing to Memgraph's performance are its in-memory architecture +and a performant query engine written in C++. Memgraph also offers a variety of +tools and features to help optimize query performance, including label and +label-property indexes and a custom visualization library. Check our +[benchmark](https://memgraph.com/benchgraph/) comparing Memgraph and Neo4j. + +### Is Memgraph a distributed database? + +At the moment, Memgraph does not support running and storing data across +multiple physical locations, but the next version of Memgraph, Memgraph 3.0 will +enable horizontal scaling. Check out plans for [Memgraph 3.0 on +GitHub](https://github.com/orgs/memgraph/projects/5). + +### How does Memgraph ensures persistency and durability? + +Although Memgraph is an in-memory database, the data is persistent and durable. +This means data will not be lost upon reboot. + +Memgraph uses two mechanisms to ensure the +[durability](/memgraph/reference-guide/backup) of +stored data and make disaster recovery possible: + +* write-ahead logging (WAL) +* periodic snapshot creation + + +Each database modification is recorded in a log file before being written to the +database. Therefore, the log file contains all steps needed to reconstruct the +DB’s most recent state. Memgraph also periodically takes snapshots during +runtime to write the entire data storage to the drive. On startup, the database +state is recovered from the most recent snapshot file. The timestamp of the +snapshot is compared with the latest update recorded in the WAL file and, if the +snapshot is less recent, the state of the DB will be completely recovered using +the WAL file. + +If you are using Memgraph with Docker, be sure to [specify a volume for data +persistence](/memgraph/how-to-guides/work-with-docker#specify-volumes). + +### How does Memgraph ensure high availability? + +Memgraph ensures high availability by using +[replication](/memgraph/reference-guide/replication). Replication involves +replicating data from one MAIN instance to one or several REPLICA instances. If +a MAIN instance fails, another REPLICA instance can be upgraded and serve as the +MAIN instance, thus ensuring continuous data availability. + +### Does Memgraph support multitenancy? + +Memgraph doesn't support multitenancy, but it is planned that the next version +of Memgraph, Memgraph 3.0 does. Check out plans [for Memgraph 3.0 on +GitHub](https://github.com/orgs/memgraph/projects/5). + +### How many cores does Memgraph utilizes? + +Memgraph is designed to utilize all available CPU cores on a machine to process +queries and perform other operations in parallel, significantly improving +performance and reducing query response times. + +### What are the minimum and recommended requirements to run a on-premise instance? + +To run Memgraph on-premise, you require an Intel Xeon, AMD Opteron/Epyc, ARM +machines, Apple M1, Amazon Graviton server or desktop processor, at least 1 GB +of RAM and disk and at least 1 vCPU. We recommend using a server processor, at +least 16 GB of ECC RAM, the same amount of disk storage and at least 8 vCPUs or +4 physical cores. + +### How much RAM do I need for my graph? + +We recommend twice as many GB of RAM as the data size. If you have 8 GB of +data, we recommend having at least 16 GB of RAM. Of course, the actual memory +needs depend on the complexity of executed queries. The more graph objects query +needs to return as a result, the more RAM will be required. To calculate the +Memgraph RAM instance requirements based on your data, check out [how Memgraph +uses memory](/memgraph/under-the-hood/storage). + +### Are there any graph size limits? + +Memgraph vertically scales effortlessly up to 1B nodes and 10B edges. The only +limit is the size of your RAM. We recommend twice as many GB of RAM as the data +size. If you have 8 GB of data, we recommend having at least 16 GB of RAM. Of +course, the actual memory needs depend on the complexity of executed queries. +The more graph objects query needs to return as a result, the more RAM will be +required. + +### What is the difference between Memgraph and Memgraph Platform? + +There are three official Docker images for Memgraph: + +* `memgraph/memgraph` - the most basic MemgraphDB instance. +* `memgraph/memgraph-mage` - the image contains a MemgraphDB instance together + with all the newest [MAGE](/mage) modules and graph algorithms. +* `memgraph/memgraph-platform` - the image contains MemgraphDB, Memgraph Lab, + mgconsole and MAGE. Once started, mgconsole will be opened in the terminal, + while Memgraph Lab is available at `http://localhost:3000`. + +The MAGE graph algorithm library includes [NVIDIA +cuGraph](https://github.com/rapidsai/cugraph) GPU-powered graph algorithms. To +use them, you need a specific kind of `memgraph-mage` image, so check the +[documentation](/mage/installation/cugraph) or +[DockerHub](https://hub.docker.com/r/memgraph/memgraph-mage/tags?page=1&name=cugraph) +for tags. + +### Do I need to define a schema before importing data? + +It is not necessary to define any data schema to import data. Data will be +imported into the database regardless of the number of properties and their +types. You can enforce property +[uniqueness](https://memgraph.com/docs/memgraph/how-to-guides/constraints/uniqueness-constraint) +and +[existence](https://memgraph.com/docs/memgraph/how-to-guides/constraints/existence-constraint). + +### Are there any educational materials available? + +You can try running queries on preloaded datasets in [Memgraph +Playground](https://playground.memgraph.com/). If you need help with Cypher +queries, check out [the Cypher manual](/cypher-manual). We've prepared +[tutorials](/memgraph/tutorials) and [how-to guides](/memgraph/how-to-guides) to +help you navigate Memgraph more easily. We also offer data modeling and Cypher +[e-mail courses](https://memgraph.com/email-courses) or watch one of our +webinars. You can even deep dive into code with Memgraph's CTO -> [Code with +Buda](https://www.youtube.com/playlist?list=PL7Eotag2rRhaYDrSNcltkbtj0S3yC7h-u). +For all the other questions and help, fell free to join our +[community](https://memgraph.com/community). + +### Can I try out Memgraph Enterprise before making a decision? + +Yes, Memgraph offers a free q30-day Memgraph Enterprise Trial. Send a request +via [the +form](https://webforms.pipedrive.com/f/1sUK9YYKJnygcFEDI0SOpSGB2YBK2nP8xdjAiwnhEVXXohYvodHTPAzB1o4bZ8Tuz). + +### Does Memgraph offer professional services such as data modelling, development, integration and similar? + +It depends on the scope of the project and the requirements. [Contact +us](https://webforms.pipedrive.com/f/1sUK9YYKJnygcFEDI0SOpSGB2YBK2nP8xdjAiwnhEVXXohYvodHTPAzB1o4bZ8Tuz) +for more information. + +## MemgraphDB + +### What is the fastest way to import data into Memgraph? + +Currently, the fastest way to import data is from a CSV file with a [LOAD CSV +clause](/memgraph/import-data/load-csv-clause). LOAD CSV clause imports between +100K and 350K nodes per second and between 60K and 80K edges per second. To +achieve this import speed, indexes have to be [set up +appropriately](/memgraph/how-to-guides/indexes). + +[Other import methods](/memgraph/import-data) include importing data from JSON +and CYPHERL files, migrating from SQL and Neo4j with mgmigrate tool, or +connecting to a data stream. + +### How to import data from MySQL or PostgreSQL? + +You can migrate from [MySQL](/memgraph/import-data/migrate/mysql) or +[PostgreSQL](/memgraph/import-data/migrate/postgresql) using the +[mgmigrate](https://github.com/memgraph/mgmigrate) tool. + +### What file formats does Memgraph support for import? + +You can import data from [CSV](/memgraph/import-data/load-csv-clause), +[JSON](/memgraph/import-data/files/load-json) or +[CYPHERL](/memgraph/import-data/files/cypherl) files. + +CSV files can be imported in on-premise instances using the [LOAD CSV +clause](/cypher-manual/clauses/load-csv). + +Local JSON files and files on a remote address can be imported in on-premise +instances using a [json_util](/docs/mage/query-modules/python/json-util) module +from the MAGE library. On a Cloud instance, data from JSON files can be imported +only from a remote address. + +CYPHERL file contains Cypher queries necessary for creating nodes and +relationships. + +### What data formats does Memgraph support for export? + +You can export data to JSON or CYPHERL files. Query results can be exported to a +CSV file. + +Data can be exported to a JSON file from on-premise instances using a +[export_util](/mage/query-modules/python/export-util) module from the MAGE +library. The same module can be used to export query results to a CSV file. + +CYPHERL file contains Cypher queries necessary for creating nodes and +relationships and you can export files via Memgraph Lab. + +### Can Memgraph database ingest streaming data? + +Yes, you can [connect your +instance](/memgraph/import-data/data-streams/overview) to Kafka, Redpanda or +Pulsar streams and ingest data. You will need to write a transformation module +that will instruct Memgraph on how to transform the incoming messages and +consume them correctly. + +### Is data automatically indexed during import? + +No, data is not automatically indexed during import. You need to [create a label +or label-property indexes](/memgraph/how-to-guides/indexes) manually once the +import is finished. + +### What languages can be used to communicate with the database? + +At the moment, you can [connect to a Memgraph +instance](/memgraph/connect-to-memgraph/drivers) using the Bolt protocol and +query the database using C#, C/C++, Go, Haskell, Java, JavaScript, Node.js, PHP, +Python, Ruby, and Rust. + +### Can I create logically separated graphs within the same database instance? + +You can create logically separated graphs within the same instance by [using +different +labels](/cypher-manual/updating-nodes-and-relationships#creating-and-updating-node-labels). +Each node can have multiple labels and [the cost of labels is 8B per +label](/memgraph/under-the-hood/storage#vertex-memory-layout) (but the memory is +allocated dynamically, so 3 labels take up as much memory as 4, and 5-7 labels +take as much space as 8, etc.) You can use the same technique to save multilayer +networks. + +### Can I run MAGE modules and algorithms on just a part of the graph/subgraph? + +You can [run MAGE modules and algorithms on +subgraphs](/mage/how-to-guides/run-a-subgraph-module) by using the [project() +function](/cypher-manual/functions#graph-projection-functions). + +### How can I visualize query results? + +You can use Memgraph Lab, a visual user interface that enables you to: + +* visualize graph data using [the Orb library](https://github.com/memgraph/orb) +* write and execute Cypher queries +* import and export data +* view and optimize query performance +* develop query modules in Python +* manage connections to streams. + +### Does replication affect performance? + +Replication should not in any way affect the performance of your database +instance. + +### How can I check storage information? + +You can check storage information by running the [SHOW STORAGE +INFO;](/memgraph/reference-guide/server-stats#storage-information) that will +provide information about the number of stored nodes and relationships and +memory usage. + +### Where does Memgraph save or preview logs? + +By default, Memgraph saves the log at `/var/log/memgraph/memgraph.log`. +Accessing logs depends on how you've started Memgraph, so check the [how-to +guide about accessing logs](/memgraph/how-to-guides/config-logs). + +You can check the logs using Memgraph Lab (the visual interface). Memgraph Lab +listens to logs on the 7444 port. You can also use this web socket port 7444 and +listen to the logs from your custom system. + +Log level and location can be modified using [configuration +flags](/memgraph/reference-guide/configuration#other). + +### Do I need to know Cypher to query the database? + +You don't need to know Cypher to query the database. You can use +[GQLAlchemy](/gqlalchemy), an Object Graph Mapper (OGM). OGM provides a +developer-friendly workflow for writing object-oriented notation to communicate +to a graph database. Instead of writing Cypher queries, you can write Python +code, which the OGM will automatically translate into Cypher queries. It +supports both Memgraph and Neo4j. + +## Cypher + +### Are there any differences in Cypher implementation between Memgraph and Neo4j? + +Although we tried to implement openCypher query language as closely to the +language reference as possible, we made some changes that can enhance the user +experience. You can find the differences listed in the [Cypher +manual](/cypher-manual/differences). + +### Can I expand Cypher query language with custom procedures? + +Yes, you can expand the Cypher query language with custom procedures grouped in +query modules. Modules can be written in C/C++ and Python (which also has a mock +API). For more details, check out the documentation on [query +modules](/memgraph/reference-guide/query-modules). + +## MAGE graph library + +### What is MAGE? + +[**Memgraph Advanced Graph Extensions (MAGE)**](/mage) is an open-source repository that contains graph algorithms and utility modules. It encourages developers to share innovative and useful [query modules](/mage/query-modules/available-queries) (custom Cypher procedures) the whole community can benefit from. It corresponds to APOC in Neo4j, except it's free and open source. + +The MAGE library also includes dynamic algorithms specially designed for analyzing real-time data, NetworkX and igraph integrations, Elasticsearch synchronization module and NVIDIA GPU-powered algorithms. Check [the full list of modules](/mage/query-modules/available-queries), and if there is a specific procedure you can't find in the MAGE library which you would like to use, please [let us know](overview.md). + +### What are query modules? + +[Query modules](/memgraph/reference-guide/query-modules) are collections of custom Cypher procedures that extend the basic functionalities of the Cypher query language. Each query module consists of procedures connected by a common theme (for example, community detection). [MAGE graph library](/mage) gathers a number of implemented graph algorithms and utility modules. Still, if you need a specific procedure unavailable in MAGE, you can [implement it using Python or C/C++ API](/memgraph/reference-guide/query-modules/implement-custom-query-modules/overview) and [contribute to the library](/mage/contributing) or [contact us](overview.md). + +## Memgraph Lab visual user interface + +### What is Memgraph Lab? + +[Memgraph Lab](/memgraph-lab) is a lightweight and intuitive visual user interface that enables you to: +- write and execute Cypher queries and algorithms +- visualize graph data using [the Orb library](https://github.com/memgraph/orb) +- import and export data +- generate data schema +- view and optimize query performance +- develop custom procedures in Python +- manage stream connections. + +### Can I only use Memgraph Lab? + +No, Memgraph Lab can connect only to a running Memgraph instance. + +### Can I customize the visual appearance of my graph results? + +Yes, you can customize the visual appearance of your graph results by using [the Graph Style Script language](/memgraph-lab/graph-style-script-language). You can add images to nodes, change their shape, size and color. Change the line appearance of relationships and their thickness. For a complete list of available features, consult [the GSS reference guide](/memgraph-lab/style-script/reference-guide). + +## Memgraph Cloud + +### What is the pricing? + +You can estimate the cost of Memgraph Cloud's service by selecting your cloud +region and instance size with our [Cost +Calculator](https://cloud.memgraph.com/pricing), or you can check them out at +[Project rates](/docs/memgraph-cloud/payment#project-rates)) + +### How can I redeem a coupon that I got for Memgraph Cloud? +First, you need to [add a credit card](/docs/memgraph-cloud/payment#add-a-payment-method) to your account. Then you can [redeem](/docs/memgraph-cloud/payment#add-a-payment-method) the +coupon. + +### What will happen to my instance after the free trial? +Instance will be stopped for next 7 days. If you wish to continue, [add a payment +method](/docs/memgraph-cloud/payment#add-a-payment-method). + +### Why can't I create more than 3 projects and 5 snapshots? +That is the initial limit for new users. If you want to create more projects, +[let us know](/help-center). + +### Is it possible to upgrade a project to use more RAM? +Yes, it is. You can find detailed instructions in [Memgraph Cloud +documentation](/docs/memgraph-cloud/cloud-projects#resize-a-project). + +### I've created a project with 2GB RAM, but Memgraph Labs says there is only 1.54GB available. Why is that so? +A par of RAM is allocated to the operating system and other services. They +usually take 13-15% of the total RAM. Approximate free RAM is: +- 1GB RAM Memgraph Cloud project has about 860 MB free RAM +- 2GB RAM Memgraph Cloud project has about 1.60 GB free RAM +- 4GB RAM Memgraph Cloud project has about 3.40 GB free RAM +- 8GB RAM Memgraph Cloud project has about 6.7 GB free RAM +- 16GB RAM Memgraph Cloud project has about 14 GB free RAM +- 32GB RAM Memgraph Cloud project has about 28 GB free RAM + +### I've created a new project, and when I try to connect to the instance, I get an error: Unable to connect. +Upon creating a project, Memgraph loads all the MAGE algorithms, so it takes +some time to load them all. Wait 30 seconds, and then try to connect again. + +### I’ve paused my project and resumed it, but my Memgraph’s IP is different now? +When you pause your project, usually the IP stays the same, but sometimes your +IP can be released and a new one will be allocated. You can always check the IP +in the connection details. + +### How can I retrieve my project password? +If you have forgotten your project password, we can't help you. We don't have a +way of finding out or resetting your project password. + +### How can I connect to my project? +You can connect to an instance running within the Memgraph Cloud project via +**Memgraph Lab**, a visual interface, **mgconsole**, command-line interface, or +one of many drivers. You can find detailed instructions in [Memgraph Cloud +documentation](/docs/memgraph-cloud/cloud-connect). + +### How can I backup my project? +A project is backed up by creating a snapshot with Amazon EBS. You cannot create +a snapshot if you are using a 14-day free trial version of Memgraph Cloud. You +can find detailed instructions in [Memgraph Cloud +documentation](/docs/memgraph-cloud/cloud-projects#back-up-a-project). + +### Is AWS available? +Yes, Memgraph cloud is running at AWS. + +### Is GCP available? +No, at the moment, Memgraph cloud is not available on the Google Cloud Platform. + + diff --git a/docs2/help-center/help-center.md b/docs2/help-center/help-center.md new file mode 100644 index 00000000000..c6d9d0fb3ed --- /dev/null +++ b/docs2/help-center/help-center.md @@ -0,0 +1,44 @@ +--- +id: overview +title: Help Center +sidebar_label: Help Center +slug: / +--- + +Are you stuck? Don't worry, here at Memgraph we are all eager to help - we'll +not leave you stranded! + +❓ Try to find an answer on one of our **FAQ** pages - maybe we already provided an +answer to your inquiry: + + - **[Memgraph FAQ](/faq/memgraph-faq.md)** + - **[Memgraph Cloud FAQ](/faq/cloud-faq.md)** + - **[Memgraph Lab FAQ](/faq/memgraph-lab-faq.md)** + - **[MAGE FAQ](/faq/mage-faq.md)** + +πŸ™‹ Post a question on +**[StackOverflow](https://stackoverflow.com/questions/tagged/memgraphdb)** with +the tag **memgraphdb**. You can also ask your question on our +[**Discord server**](https://discord.gg/memgraph). There is always someone from Memgraph +or graph community there to help! + +🎫 Open **[a GitHub issue](https://github.com/memgraph)** in the corresponding repository to: + + - report a bug or a technical issue + - submit a feature request or an improvement suggestion + - request information not present in the documentation + - ask any other kind of technical question. + +πŸ“§ Email us at **[tech@memgraph.com](mailto:tech@memgraph.com)**. + +## Community + +If you want to be a part of Memgraph's fast growing community, join us on our +path by following us, participating in discussions and asking questions. We are +available at the following platforms: + +- :purple_heart: [**Discord**](https://discord.gg/memgraph) +- :open_file_folder: [**Memgraph GitHub**](https://github.com/memgraph) +- :bird: [**Twitter**](https://twitter.com/memgraphdb) +- :movie_camera: + [**YouTube**](https://www.youtube.com/channel/UCZ3HOJvHGxtQ_JHxOselBYg) diff --git a/docs2/querying/clauses/call.md b/docs2/querying/clauses/call.md new file mode 100644 index 00000000000..b6f41070443 --- /dev/null +++ b/docs2/querying/clauses/call.md @@ -0,0 +1,255 @@ +--- +id: call +title: CALL clause +sidebar_label: CALL +--- + +The `CALL` clause is used to call a subquery inside the existing query. + +:::info + +[MAGE procedures](/docs/mage/usage/calling-procedures) are also run with a query with the `CALL` clause at the beginning. +Switch to MAGE documentation if you want to CALL a graph algorithm or some other procedure from the MAGE library. + +::: + +1. [Uses of CALL subquery](#1-uses-of-call-subquery)
+ 1.1. [Cartesian products](#11-cartesian-products)
+ 1.2. [Cartesian products with bounded symbols](#12-cartesian-products-with-bounded-symbols)
+ 1.3. [Post-union processing](#13-post-union-processing)
+ 1.4. [Observing changes from previous executions](#14-observing-changes-from-previous-executions)
+ 1.5. [Unit subqueries](#15-unit-subqueries) + +2. [Invalid uses of CALL subquery](#2-invalid-uses-of-call-subquery)
+ 2.1. [Returning variables with the same name as those in the outer scope](#21-returning-variables-with-same-name-as-those-in-the-outer-scope)
+ 2.2. [Returning non-aliased expressions](#22-returning-non-aliased-expressions)
+ 2.3. [Referencing outer scope variables that don't exist](#22-referencing-outer-scope-variables-that-dont-exist)
+ +## 1. Uses of CALL subquery + +### 1.1. Cartesian products + +`CALL` subquery is executed once for each incoming row. If multiple rows are produced from the +`CALL` subquery, the result is a Cartesian product of results. It is an output combined from 2 branches, +one being called the `input branch` (rows produced before calling the subquery), and the `subquery branch` +(rows produced by the subquery). +Imagine the data includes two `:Person` nodes, one named `John` and one named `Alice`, +as well as two `:Animal` nodes, one named `Rex` and one named `Lassie`. + +Running the following query would produce the output below: +```cypher +MATCH (p:Person) +CALL { + MATCH (a:Animal) + RETURN a.name as animal_name +} +RETURN p.name as person_name, animal_name +``` + + +Output: +```nocopy ++-------------+-------------+ +| person_name | animal_name | ++---------------------------+ +| 'John' | 'Rex' | +| 'John' | 'Lassie' | +| 'Alice' | 'Rex' | +| 'Alice' | 'Lassie' | ++---------------------------+ +``` + +### 1.2. Cartesian products with bounded symbols + +To reference variables from the outer scope in the subquery, start the subquery with the `WITH` clause. +It allows using the same symbols to expand on the neighborhood of the referenced nodes or relationships. +Otherwise, the subquery will behave as it sees the variable for the first time. + +In the following query, the WITH clause expanded the meaning of the variable person to the node with the +label `:Person` matched in the outer scope of the subquery: + +```cypher +MATCH (person:Person) +CALL { + WITH person + MATCH (person)-[:HAS_PARENT]->(parent:Parent) + RETURN parent +} +RETURN person.name, parent.name +``` + +Output: +```nocopy ++-------------+-------------+ +| person_name | parent_name | ++---------------------------+ +| 'John' | 'John Sr.' | +| 'John' | 'Anna' | +| 'Alice' | 'Roxanne' | +| 'Alice' | 'Bill' | ++---------------------------+ +``` + +### 1.3. Post-union processing + +Output from all UNION queries inside a subquery can be combined and +forwarded as a single output to make the queries more expressive: + +```cypher +CALL { + MATCH (n:Person) + RETURN n.name AS name, n.ssn AS ID_number + UNION + MATCH (n:Company) + RETURN n.name AS name, n.corporate_id AS ID_number +} +RETURN name, ID_number +``` + +Output: +```nocopy ++------------+-------------+ +| name | ID_number | ++--------------------------+ +| 'John' | '123456789' | +| 'Memgraph' | '555555555' | ++--------------------------+ +``` + +### 1.4. Observing changes from previous executions + +Each execution of a `CALL` clause can observe changes from previous executions. + + +```cypher +UNWIND [0, 1, 2] AS x +CALL { + MATCH (n:Counter) + SET n.count = n.count + 1 + RETURN n.count AS innerCount +} +WITH innerCount +MATCH (n:Counter) +RETURN + innerCount, + n.count AS totalCount +``` + +Output: +```nocopy ++------------+-------------+ +| innerCount | totalCount | ++--------------------------+ +| 1 | 3 | +| 2 | 3 | +| 3 | 3 | ++--------------------------+ +``` + +### 1.5. Unit subqueries + +Unit subqueries are used to perform a single action for every node from the input branch. +If the starting state of the database is that there is only one `:Person` node in the graph, +the following query will clone the node with desired preferences defined in the `FOREACH` clause. + +```cypher +MATCH (p:Person) +CALL { + FOREACH (i IN range(1, 5) | CREATE(:Person {id: i})) +} + +MATCH (n) RETURN COUNT(n) AS no_created_nodes; +``` + +Output: +```nocopy ++------------------+ +| no_created_nodes | ++------------------+ +| 6 | ++------------------+ +``` + +## 2. Invalid uses of CALL subquery + +### 2.1. Returning variables with the same name as those in the outer scope + +Invalid use: +```cypher +MATCH (n:Person) +CALL { + MATCH (n:Parent) + RETURN n +} +RETURN n; +``` + +The above query results in a semantic exception because the variable `n` has +already been used in the outer scope of the query. The query will +successfully execute by renaming either the outer scope variable or the subquery variable. + +Valid use: +```cypher +MATCH (n:Person) +CALL { + MATCH (p:Parent) + RETURN p +} +RETURN n, p; +``` + +### 2.2. Returning non-aliased expressions + +Invalid use: +```cypher +MATCH (n:Person) +CALL { + WITH n + MATCH (n)-[:HAS_PARENT]->(parent:Parent) + RETURN parent.age +} +RETURN n, parent.age; +``` + +The above query results in a semantic exception since the expression returned in the +subquery has not been aliased and can not be interpreted correctly. By aliasing the +returned expression upon exiting the subquery, it can be used in the outer scope. + +Valid use: +```cypher +MATCH (n:Person) +CALL { + WITH n + MATCH (n)-[:HAS_PARENT]->(parent:Parent) + RETURN parent.age AS parent_age +} +RETURN n, parent_age; +``` + +### 2.3. Referencing outer scope variables that don't exist + +Invalid use: +```cypher +MATCH (n:Person) +CALL { + WITH o + MATCH (o)-[:HAS_CHILD]->(child:Parent) + RETURN child +} +RETURN DISTINCT n; +``` + +The above query results in a semantic exception because the variable from the outer scope does not exist. +Queries can be executed only by referencing variables bounded in the input branch to the subquery. +By renaming the variable to the already bounded variable `n`, the query will be correctly executed. + +Valid use: +```cypher +MATCH (n:Person) +CALL { + WITH n + MATCH (n)-[:HAS_CHILD]->(child:Parent) + RETURN child +} +RETURN DISTINCT n; +``` diff --git a/docs2/querying/clauses/case.md b/docs2/querying/clauses/case.md new file mode 100644 index 00000000000..9a5b95414c6 --- /dev/null +++ b/docs2/querying/clauses/case.md @@ -0,0 +1,22 @@ +Conditional expressions can be expressed in the Cypher language with the `CASE` +expression. A simple form is used to compare an expression against multiple +predicates. For the first matched predicate result of the expression provided +after the `THEN` keyword is returned. If no expression is matched value +following `ELSE` is returned is provided, or `null` if `ELSE` is not used: + +```cypher +MATCH (n) +RETURN CASE n.currency WHEN "DOLLAR" THEN "$" WHEN "EURO" THEN "€" ELSE "UNKNOWN" END; +``` + +In generic form, you don't need to provide an expression whose value is compared +to predicates, but you can list multiple predicates and the first one that +evaluates to true is matched: + +```cypher +MATCH (n) +RETURN CASE WHEN n.height < 30 THEN "short" WHEN n.height > 300 THEN "tall" END; +``` + +Most expressions that take `null` as input will produce `null`. This includes boolean expressions that are used as +predicates. In this case, anything that is not true is interpreted as being false. This also concludes that logically `null!=null`. diff --git a/docs2/querying/clauses/clauses.md b/docs2/querying/clauses/clauses.md new file mode 100644 index 00000000000..5724379d5d0 --- /dev/null +++ b/docs2/querying/clauses/clauses.md @@ -0,0 +1,23 @@ +# Clauses + +The **Cypher** language enables users to perform standard database operations by using the following clauses: + + * [`CALL`](call.md) - calls a subquery inside the query + * ['CASE'](case.md) + * [`CREATE`](create.md) - creates new nodes and relationships + * [`DELETE`](delete.md) - deletes nodes and relationships + * [`EXPLAIN`](explain.md) - + * [`FOREACH`](foreach.md) - + * [`LOAD CSV`](load-csv.md) - loads data from CSV file + * [`MATCH`](match.md) - searches for patterns + * [`MERGE`](merge.md) - creates patterns if they don't exist + * [`OPTIONAL MATCH`](optional-match.md) - behaves the same as [`MATCH`](match.md), but when it fails to find the pattern it fills missing parts of the pattern with null values + * [`PROFILE`](profile.md) - + * [`REMOVE`](remove.md) - removes labels and properties + * [`RETURN`](return.md) - defines what will be presented to the user in the result set + * [`SET`](set.md) - adds new or updates existing labels and properties + * [`UNION`](union.md) and [`UNION ALL`](union.md) - combines results from multiple queries + * [`UNWIND`](unwind.md) - unwinds a list of values as individual rows + * [`WHERE`](where.md) - filters the matched data + * [`WITH`](with.md) - combines multiple reads and writes + diff --git a/docs2/querying/clauses/create.md b/docs2/querying/clauses/create.md new file mode 100644 index 00000000000..aef409cf73b --- /dev/null +++ b/docs2/querying/clauses/create.md @@ -0,0 +1,191 @@ +--- +id: create +title: CREATE clause +sidebar_label: CREATE +--- + +The `CREATE` clause is used to create nodes and relationships in a graph. + +:::info + +Indexing can increase performance when executing queries. Please take a look at +our [documentation on indexing](/docs/memgraph/reference-guide/indexing) for +more details. + +::: + +1. [Creating nodes](#1-creating-nodes)
+ 1.1. [Creating a single node](#11-creating-a-single-node)
+ 1.2. [Creating a node with properties](#12-creating-a-node-with-properties)
+ 1.3. [Creating multiple nodes](#13-creating-multiple-nodes)
+2. [Creating relationships](#2-creating-relationships)
+ 2.1. [Creating a relationship between two nodes](#21-creating-a-relationship-between-two-nodes)
+ 2.2. [Creating a relationship with properties](#22-creating-a-relationship-with-properties)
+3. [Creating a path](#3-creating-a-path) + +## 1. Creating nodes + +### 1.1. Creating a single node + +Use the following query to create a single node. +The `RETURN` clause is used to return results. A newly created node can be returned in the same query. + +```cypher +CREATE (n) +RETURN n; +``` + +Output: +```nocopy ++----+ +| n | ++----+ +| () | ++----+ +``` + +You can also specify a label while creating a node: + +```cypher +CREATE (n:Country) +RETURN n; +``` + +Output: +```nocopy ++------------+ +| n | ++------------+ +| (:Country) | ++------------+ +``` + +If you wish to add multiple labels to a node, use the following syntax: + +```cypher +CREATE (n:Country:City) +RETURN n; +``` + +Output: +```nocopy ++-----------------+ +| n | ++-----------------+ +| (:Country:City) | ++-----------------+ +``` + +### 1.2. Creating a node with properties + +A node can be created with initial properties. + +```cypher +CREATE (n:Country {name: 'San Marino', continent: 'Europe'}) +RETURN n; +``` + +Output: +```nocopy ++------------------------------------------------------+ +| n | ++------------------------------------------------------+ +| (:Country {continent: "Europe", name: "San Marino"}) | ++------------------------------------------------------+ +``` + +### 1.3. Creating multiple nodes + +To create multiple nodes, separate them with a comma. + +```cypher +CREATE (n:Country), (m:City) +RETURN n,m; +``` + +Output: +```nocopy ++------------+------------+ +| n | m | ++------------+------------+ +| (:Country) | (:City) | ++------------+------------+ +``` + +## 2. Creating relationships + +### 2.1. Creating a relationship between two nodes + +To create a relationship between two nodes, we need to specify which nodes +either by creating them or filtering them with the `WHERE` clause. + +```cypher +CREATE (c1:Country {name: 'Belgium'}), (c2:Country {name: 'Netherlands'}) +CREATE (c1)-[r:BORDERS_WITH]->(c2) +RETURN r; +``` + +Output: +```nocopy ++----------------+ +| r | ++----------------+ +| [BORDERS_WITH] | ++----------------+ +``` + +If the nodes already exist, the query would look like this: + +```cypher +MATCH (c1:Country),(c2:Country) +WHERE c1.name = 'Belgium' AND c2.name = 'Netherlands' +CREATE (c1)-[r:NEIGHBOURS]->(c2) +RETURN r; +``` + +Output: +```nocopy ++--------------+ +| r | ++--------------+ +| [NEIGHBOURS] | ++--------------+ +``` + +### 2.2. Creating a relationship with properties + +You can add properties to a relationship at the time of creation. + +```cypher +MATCH (c1:Country),(c2:Country) +WHERE c1.name = 'Belgium' AND c2.name = 'Netherlands' +CREATE (c1)-[r:BORDERS_WITH {length: '30KM'}]->(c2) +RETURN r; +``` + +Output: +```nocopy ++---------------------------------+ +| r | ++---------------------------------+ +| [BORDERS_WITH {length: "30KM"}] | ++---------------------------------+ +``` + +## 3. Creating a path + +When creating a path all the entities of the pattern will be created. + +```cypher +CREATE p=((n:Country {name: 'Belgium'})-[r:BORDERS_WITH {length: '30KM'}]->(m:Country {name: 'Netherlands'})) +RETURN p; +``` + +Output: +```nocopy ++------------------------------------------------------------------------------------------------+ +| p | ++------------------------------------------------------------------------------------------------+ +| (:Country {name: "Belgium"})-[BORDERS_WITH {length: "30KM"}]->(:Country {name: "Netherlands"}) | ++------------------------------------------------------------------------------------------------+ +``` diff --git a/docs2/querying/clauses/delete.md b/docs2/querying/clauses/delete.md new file mode 100644 index 00000000000..3c3336027c3 --- /dev/null +++ b/docs2/querying/clauses/delete.md @@ -0,0 +1,117 @@ +--- +id: delete +title: DELETE clause +sidebar_label: DELETE +--- + +The `DELETE` clause is used to delete nodes and relationships from the database. + +1. [Deleting a node](#1-deleting-a-node)
+2. [Deleting a node and its relationships](#2-deleting-a-node-and-its-relationships)
+3. [Deleting a relationship](#3-deleting-a-relationship)
+4. [Deleting everything](#4-deleting-everything) + +## Dataset + +The following examples are executed with this data et. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Deleting a node + +The `DELETE` clause can be used to delete a node: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +DELETE c; +``` + +Output: + +```nocopy +Failed to remove node because of it's existing connections. Consider using DETACH DELETE. +``` + +On the dataset we are using, this query results in an error because `DELETE` +can only be used on nodes that have no relationships. + +## 2. Deleting a node and its relationships + +The `DELETE` clause can be used to delete a node along with all of its relationships with the keyword `DETACH`: + +```cypher +MATCH (n:Country {name: 'United Kingdom'}) +DETACH DELETE n; +``` + +Output: + +```nocopy +Empty set (0.001 sec) +``` + +## 3. Deleting a relationship + +The `DELETE` clause can be used to delete a relationship: + +```cypher +MATCH (n:Country {name: 'Germany'})<-[r:LIVING_IN]-() +DELETE r; +``` + +Output: + +```nocopy +Empty set (0.003 sec) +``` + +## 4. Deleting everything + +To delete all nodes and relationships in a graph, use the following query: + +```cypher +MATCH (n) +DETACH DELETE n; +``` + +Output: + +```nocopy +Empty set (0.001 sec) +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name = 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name = 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/explain.md b/docs2/querying/clauses/explain.md new file mode 100644 index 00000000000..86b4221d25d --- /dev/null +++ b/docs2/querying/clauses/explain.md @@ -0,0 +1,26 @@ +--- +id: explain +title: EXPLAIN clause +sidebar_label: EXPLAIN +--- + +The EXPLAIN clause can be used to inspect a particular Cypher query in order to see its +execution plan. + +For example, the following query will return the execution plan: + +```cypher +EXPLAIN MATCH (n) RETURN n; +``` + +``` ++----------------+ +| QUERY PLAN | ++----------------+ +| * Produce {n} | +| * ScanAll (n) | +| * Once | ++----------------+ +``` + +For more information, check the [reference guide on inspecting queries](/memgraph/reference-guide/inspecting-queries). \ No newline at end of file diff --git a/docs2/querying/clauses/foreach.md b/docs2/querying/clauses/foreach.md new file mode 100644 index 00000000000..e4a424212b0 --- /dev/null +++ b/docs2/querying/clauses/foreach.md @@ -0,0 +1,57 @@ +--- +id: foreach +title: FOREACH clause +sidebar_label: FOREACH +--- + +`FOREACH` iterates over a list of elements. Each element is stored inside a +variable which can optionally be used inside the update clauses. All update +clauses are executed per iteration of the list. + +```cypher + FOREACH ( IN | ) +``` + +| Option | Description | +| :------------: | :-----------------------------------------------------------------------------------------------------------: | +| variable name | The variable name that stores each element | +| expression | Any expression that results in a list | +| update clauses | One or more Cypher update clauses: `SET`, `REMOVE`, `CREATE`, `MERGE`, `DELETE` including `FOREACH` extension | + +It must be noted that if the result `` is null, then `FOREACH` +will not fail but rather skip the execution of `` +altogether. + +Examples: + +```cypher + FOREACH ( i IN [1, 2, 3] | CREATE (n {id : i}) ) +``` + +Creates 3 nodes, each with the id property set to 1, 2 and 3 respectively. + +```cypher + CREATE (n { prop : [[1, 2], [3, 4]]); + + MATCH (n) FOREACH ( inner_list IN n.prop | FOREACH ( j IN inner_list | CREATE (u { prop : j }) ) ); +``` + +Creates 4 nodes, each with the id property set to 1, 2, 3 and 4 respectively. + +:::note + +Similarly, the rest of the clauses mentioned in the table above can be +used. + +::: + +One more important detail of FOREACH, is that it supports shadowing of variables +names. For example, the query below: + +```cypher + CREATE (n { prop : 0 }); + + MATCH (n) FOREACH ( i IN [1] | FOREACH ( i IN [3] | SET n.prop = i ) ); +``` + +will end up setting the property **prop** of the created node to 3. \ No newline at end of file diff --git a/docs2/querying/clauses/load-csv.md b/docs2/querying/clauses/load-csv.md new file mode 100644 index 00000000000..7fac129eb46 --- /dev/null +++ b/docs2/querying/clauses/load-csv.md @@ -0,0 +1,68 @@ +--- +id: load-csv +title: LOAD CSV clause +sidebar_label: LOAD CSV +--- + +The `LOAD CSV` clause enables you to load and use data from a CSV file of your +choosing in a row-based manner within a query. We support the Excel CSV dialect, +as it's the most commonly used one. + +[![Related - How-to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/docs/memgraph/import-data/load-csv-clause) + +The syntax of the clause is: + +```cypher +LOAD CSV FROM ( WITH | NO ) HEADER [IGNORE BAD] [DELIMITER ] [QUOTE ] [NULLIF ] AS +``` + +* `` is a string of the location to the CSV file. Without a URL + protocol it refers to a file path. There are no restrictions on where in your + filesystem the file can be located, as long as the path is valid (i.e., the + file exists). If using `http://`, `https://`, or `ftp://` the CSV file will + be fetched over the network. + +* `( WITH | NO ) HEADER ` flag specifies whether the CSV file is to be parsed as + though it has or hasn't got a header. + +* `IGNORE BAD` flag specifies whether rows containing errors should be ignored + or not. If it's set, the parser attempts to return the first valid row from + the CSV file. If it isn't set, an exception will be thrown on the first + invalid row encountered. + +* `DELIMITER ` option enables you to specify the CSV delimiter + character. If it isn't set, the default delimiter character `,` is assumed. + +* `QUOTE ` option enables you to specify the CSV quote character. + If it isn't set, the default quote character `"` is assumed. + +* `NULLIF ` option enables you to specify a sequence of characters that will be parsed as null. + By default, all empty columns in Memgraph are treated as empty strings, so if this option is not used, no values will be treated as null. + +* `` is a symbolic name representing the variable to which the + contents of the parsed row will be bound to, enabling access to the row + contents later in the query. + +The clause reads row by row from a CSV file and binds the contents of the parsed +row to the variable you specified. + +Adding a `MATCH` or `MERGE` clause before the LOAD CSV allows you to match +certain entities in the graph before running LOAD CSV, which is an optimization +as matched entities do not need to be searched for every row in the CSV file. + +But, the `MATCH` or `MERGE` clause can be used prior the `LOAD CSV` clause only +if the clause returns only one row. Returning multiple rows before calling the +`LOAD CSV` clause will cause a Memgraph runtime error. + +:::info +It's important to note that the parser parses the values as strings. +It's up to the user to convert the parsed row values to the appropriate type. +This can be done using the built-in conversion functions such as `ToInteger`, +`ToFloat`, `ToBoolean` etc. Consult the [documentation](/functions.md) on the +available conversion functions. +::: + +:::info +Compressed CSV content with, `gzip` or `bzip2` will be automatically +decompressed on read. +::: diff --git a/docs2/querying/clauses/match.md b/docs2/querying/clauses/match.md new file mode 100644 index 00000000000..6f0e0bf879f --- /dev/null +++ b/docs2/querying/clauses/match.md @@ -0,0 +1,459 @@ +--- +id: match +title: MATCH clause +sidebar_label: MATCH +--- + +The `MATCH` clause is used to obtain data from the database by matching it to a given pattern. + +1. [Matching nodes](#1-matching-nodes)
+ 1.1. [Get all nodes](#11-get-all-nodes)
+ 1.2. [Get all nodes with a label](#12-get-all-nodes-with-a-label)
+2. [Matching relationships](#2-matching-relationships)
+ 2.1. [Get all related nodes](#21-get-all-related-nodes)
+ 2.2. [Get related nodes with a label](#22-get-related-nodes-with-a-label)
+ 2.3. [Get related nodes with a directed relationship](#23-get-related-nodes-with-a-directed-relationship)
+ 2.4. [Get a relationship](#24-get-a-relationship)
+ 2.5. [Matching on a relationship with a type](#25-matching-on-a-relationship-with-a-type)
+ 2.6. [Matching on relationships with multiple types](#26-matching-on-relationships-with-multiple-types)
+ 2.7. [Uncommon characters in relationship types](#27-uncommon-characters-in-relationship-types)
+ 2.8. [Match with multiple relationships](#28-match-with-multiple-relationships)
+3. [Matching with variable length relationships](#3-matching-with-variable-length-relationships)
+ 3.1. [Variable length relationships](#31-variable-length-relationships)
+ 3.2. [Variable length relationships with multiple relationship types](#32-variable-length-relationships-with-multiple-relationship-types)
+ 3.3. [Returning multiple relationships with variable length](#33-returning-multiple-relationships-with-variable-length)
+4. [Using multiple MATCH clauses](#4-using-multiple-match-clauses)
+ 4.1. [Cartesian product of nodes](#41-cartesian-product-of-nodes)
+ 4.2. [Creating a list](#42-creating-a-list)
+ +:::tip + +Each node and relationship gets a identifier generated during their initialization which is persisted through the durability mechanism. + +Return it with the [`id()` function](/cypher-manual/functions#scalar-functions). + +::: + +## Data Set + +The following examples are executed with this data set. You can create this data set +locally by executing the queries at the end of the page: [Data Set](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Matching nodes + +### 1.1. Get all nodes + +Without specifying labels, the query will return all the nodes in a graph: + +```cypher +MATCH (n) +RETURN n; +``` + +Output: +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| n | ++-----------------------------------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}) | +| (:Country {continent: "Europe", language: "French", name: "France", population: 67000000}) | +| (:Country {continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | +| (:Person {name: "John"}) | +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | ++-----------------------------------------------------------------------------------------------------+ +``` + +### 1.2. Get all nodes with a label + +By specifying the label of a node, all the nodes with that label are returned: + +```cypher +MATCH (c:Country) +RETURN c; +``` + +Output: +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| c | ++-----------------------------------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}) | +| (:Country {continent: "Europe", language: "French", name: "France", population: 67000000}) | +| (:Country {continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | ++-----------------------------------------------------------------------------------------------------+ +``` + +## 2. Matching relationships + +### 2.1. Get all related nodes + +By using the *related to* symbol `--`, nodes that have a relationship with the specified node can be returned. +The symbol represents an undirected relationship which means the direction of the relationship is not taken into account. + +```cypher +MATCH (:Person {name: 'John'})--(n) +RETURN n; +``` + +Output: +```nocopy ++---------------------------------------------------------------------------------------------+ +| n | ++---------------------------------------------------------------------------------------------+ +| (:Person {name: "Anna"}) | +| (:Country {continent: "Europe", language: "French", name: "France", population: 67000000}) | +| (:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}) | +| (:Person {name: "Harry"}) | ++---------------------------------------------------------------------------------------------+ +``` + +### 2.2. Get related nodes with a label + +To only return *related to* nodes with a specific label you need to add it using the label syntax: + +```cypher +MATCH (:Person {name: 'John'})--(p:Person) +RETURN p; +``` + +Output: +```nocopy ++---------------------------+ +| p | ++---------------------------+ +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | ++---------------------------+ +``` + +### 2.3. Get related nodes with a directed relationship + +The *related to* symbol `--` can be extended by using: + * `-->` to specify outgoing relationships, + * `<--` to specify ingoing relationships. + +```cypher +MATCH (:Country {name: 'France'})<--(p:Person) +RETURN p; +``` + +Output: +```nocopy ++--------------------------+ +| p | ++--------------------------+ +| (:Person {name: "John"}) | ++--------------------------+ +``` + +### 2.4. Get a relationship + +If you want to return the relationship between two nodes or a property of the relationship, a variable is required. +A directed or undirected relationship can be used. + +This query returns the relationship and its type: + +```cypher +MATCH (:Person {name: 'John'})-[r]->() +RETURN type(r); +``` + +Output: +```nocopy ++--------------+ +| type(r) | ++--------------+ +| WORKING_IN | +| LIVING_IN | +| FRIENDS_WITH | ++--------------+ +``` + +This query also returns the property `date_of_start` of the relationship: + +```cypher +MATCH (:Person {name: 'John'})-[r]->() +RETURN type(r), r.date_of_start; +``` + +Output: +```nocopy ++-----------------+-----------------+ +| type(r) | r.date_of_start | ++-----------------+-----------------+ +| WORKING_IN | 2014 | +| LIVING_IN | 2014 | +| FRIENDS_WITH | 2011 | ++-----------------+-----------------+ +``` + +### 2.5. Matching on a relationship with a type + +To return a relationship with a specified type you need to use the type syntax. +A directed or undirected relationship can be used: + +```cypher +MATCH (p:Person {name: 'John'})-[:LIVING_IN]->(c) +RETURN c.name; +``` + +Output: +```nocopy ++---------+ +| c.name | ++---------+ +| Germany | ++---------+ +``` + +### 2.6. Matching on relationships with multiple types + +To return relationships with any of the specified types, the types need to be chained together with the pipe symbol `|`: + +```cypher +MATCH (p:Person {name: 'John'})-[:LIVING_IN |:WORKING_IN]->(c) +RETURN c.name; +``` + +Output: +```nocopy ++---------+ +| c.name | ++---------+ +| France | +| Germany | ++---------+ +``` + +### 2.7. Uncommon characters in relationship types + +If a type has non-letter characters, like spaces, for example, the backtick symbol \` needs to be used to quote these. +If the relationship type `LIVING_IN` had a space instead of an underscore, a possible query would look like this. + +```cypher +MATCH (:Country {name: 'France'})<-[r:`LIVING IN`]-() +RETURN r.name; +``` + +### 2.8. Match with multiple relationships + +Multiple relationship statements can be specified in the query: + +```cypher +MATCH (:Country {name: 'France'})<-[l:WORKING_IN]-(p)-[w:LIVING_IN]->(:Country {name: 'Germany'}) +RETURN p.name; +``` + +Output: +```nocopy ++--------+ +| p.name | ++--------+ +| John | ++--------+ +``` + +## 3. Matching with variable length relationships + +### 3.1. Variable length relationships + +If a node needs to be specified by its distance in relationship→node hops, the following syntax is used: `-[:TYPE*minHops..maxHops]→`. +`minHops` and `maxHops` are optional and default to 1 and infinity respectively. +The dots can be omitted if both are not specified or if only one is set which +implies a fixed length pattern. + +```cypher +MATCH ({name: 'United Kingdom'})<-[:LIVING_IN*1..2]-(n) +RETURN n; +``` + +Output: +```nocopy ++---------------------------------------------------------------------------------------------+ +| n | ++---------------------------------------------------------------------------------------------+ +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | +| (:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}) | ++---------------------------------------------------------------------------------------------+ +``` + +### 3.2. Variable length relationships with multiple relationship types + +If variable lengths are used with multiple stacked up relationship types, `*minHops..maxHops` applies to any combination of relationships: + +```cypher +MATCH ({name: 'United Kingdom'})<-[:WORKING_IN|FRIENDS_WITH*1..2]-(p:Person) +RETURN p; +``` + +Output: +```nocopy ++---------------------------+ +| p | ++---------------------------+ +| (:Person {name: "John"}) | +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | ++---------------------------+ +``` + +### 3.3. Returning multiple relationships with variable length + +If a variable length is used, the list of relationships can be returned by adding `variable=` at the beginning of the `MATCH` clause: + +```cypher +MATCH p=({name: 'John'})<-[:FRIENDS_WITH*1..2]-() +RETURN relationships(p); +``` + +Output: +```nocopy ++----------------------------------------+ +| relationships(p) | ++----------------------------------------+ +| [[FRIENDS_WITH {date_of_start: 2012}]] | ++----------------------------------------+ +``` + +## 4. Using multiple `MATCH` clauses + +### 4.1. Cartesian product of nodes + +To create a Cartesian product, match the nodes in two separate `MATCH` queries. + +For example, the following query will match each person from the dataset with each European country from the dataset: + +```cypher +MATCH (p:Person) +MATCH (c:Country {continent: "Europe"}) +RETURN p.name,c.name; +``` + +Output: +```nocopy ++------------------+------------------+ +| p.name | c.name | ++------------------+------------------+ +| "John" | "Germany" | +| "Harry" | "Germany" | +| "Anna" | "Germany" | +| "John" | "France" | +| "Harry" | "France" | +| "Anna" | "France" | +| "John" | "United Kingdom" | +| "Harry" | "United Kingdom" | +| "Anna" | "United Kingdom" | ++------------------+------------------+ +``` + +The query returns the Cartesian product of matched nodes. The output of the first `MATCH` clause is matched with each output of the second `MATCH` clause. In this case, each person from the dataset is matched with each European country. + +### 4.2. Creating a list + +If you want to create a list containing the results of different `MATCH` queries, you can achieve that with multiple `MATCH` clauses in one query: + +```cypher +MATCH (p:Person) +WITH COLLECT(p.name) as people +MATCH (c:Country {continent: "Europe"}) +WITH people + COLLECT(c.name) as names +RETURN names; +``` + +Output: +```nocopy ++------------------------------------------------------------------+ +| names | ++------------------------------------------------------------------+ +| ["John", "Harry", "Anna", "Germany", "France", "United Kingdom"] | ++------------------------------------------------------------------+ +``` + +The query returns a list of names of all people from the dataset concatenated with the names of all European countries. + +:::caution + +If any of the sets `MATCH` clause returns is empty, the whole output will be an empty list. + +::: + +The following query will return an empty list: + +```cypher +MATCH (p:Person) +WITH COLLECT(p.name) as people +MATCH (c:Country {continent: "Asia"}) +WITH people + COLLECT(c.name) as names +RETURN names; +``` + +Output: + +```nocopy ++-------+ +| names | ++-------+ +| Null | ++-------+ +``` + +Since the dataset doesn't contain any nodes labeled as `Country` with a property `continent` with the value `Asia`, the second `MATCH` clause returns an empty dataset and therefore the output is also an empty list. To avoid getting an empty list as an output, due to any of the `MATCH` clauses returning an empty set, use `OPTIONAL MATCH` clause: + +```cypher +MATCH (p:Person) +WITH COLLECT(p.name) as people +OPTIONAL MATCH (c:Country {continent: "Asia"}) +WITH people + COLLECT(c.name) as names +RETURN names; +``` + +Output: + +```nocopy ++---------------------------+ +| names | ++---------------------------+ +| ["John", "Harry", "Anna"] | ++---------------------------+ +``` + +The `OPTIONAL MATCH` clause bypasses the empty set and the query returns only non-empty sets. Therefore, the output of the query is a list containing only the results of the first `MATCH` clause. + +## Data set Queries + +We encourage you to try out the examples by yourself. +You can get our data set locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name = 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name = 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` \ No newline at end of file diff --git a/docs2/querying/clauses/merge.md b/docs2/querying/clauses/merge.md new file mode 100644 index 00000000000..08dbaf4db2c --- /dev/null +++ b/docs2/querying/clauses/merge.md @@ -0,0 +1,305 @@ +--- +id: merge +title: MERGE clause +sidebar_label: MERGE +--- + +The `MERGE` clause is used to ensure that a pattern you are looking for exists +in the database. This means that if the pattern is not found, it will be +created. In a way, this clause is like a combination of `MATCH` and `CREATE`. + +:::info + +Indexing can increase performance when executing queries. Please take a look at +our [documentation on indexing](/docs/memgraph/reference-guide/indexing) for +more details. + +::: + +1. [Merging nodes](#1-merging-nodes)
+ 1.1. [Merging nodes with labels](#11-merging-nodes-with-labels)
+ 1.2. [Merging nodes with properties](#12-merging-nodes-with-properties)
+ 1.3. [Merging nodes with labels and properties](#13-merging-nodes-with-labels-and-properties)
+ 1.4. [Merging nodes with existing node properties](#14-merging-nodes-with-existing-node-properties)
+2. [Merging relationships](#2-merging-relationships)
+ 2.1. [Merging relationships](#21-merging-relationships)
+ 2.2. [Merging on undirected relationships](#22-merging-on-undirected-relationships)
+3. [Merging with ON CREATE and ON MATCH](#3-merging-with-on-create-set-and-on-match-set)
+ 3.1. [Merging with ON CREATE SET](#31-merging-with-on-create-set)
+ 3.2. [Merging with ON MATCH SET](#32-merging-with-on-match-set)
+ 3.3. [Merging with ON CREATE SET and ON MATCH SET](#33-merging-with-on-create-set-and-on-match-set)
+ 3.4. [Merging with SET](#34-merging-with-set)
+ 3.5. [Combination of clauses](#35-combination-of-clauses) + +## Data Set + +The following examples are executed with this data set. You can create this data set +locally by executing the queries at the end of the page: [Data Set](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Merging nodes + +### 1.1. Merging nodes with labels + +If `MERGE` is used on a node with a label that doesn't exist in the database, the node is created: + +```cypher +MERGE (city:City) +RETURN city; +``` + +Output: +```nocopy ++---------+ +| city | ++---------+ +| (:City) | ++---------+ +``` + +### 1.2. Merging nodes with properties + +If `MERGE` is used on a node with properties that don't match any existing node, that node is created: + +```cypher +MERGE (city {name: 'London'}) +RETURN city; +``` + +Output: +```nocopy ++--------------------+ +| city | ++--------------------+ +| ({name: "London"}) | ++--------------------+ +``` + +### 1.3. Merging nodes with labels and properties + +If `MERGE` is used on a node with labels and properties that don't match any existing node, that node is created: + +```cypher +MERGE (city:City {name: 'London'}) +RETURN city; +``` + +Output: +```nocopy ++--------------------------+ +| city | ++--------------------------+ +| (:City {name: "London"}) | ++--------------------------+ +``` + +### 1.4. Merging nodes with existing node properties + +If `MERGE` is used with properties on an existing node, a new node is created for each unique value of that property: + +```cypher +MATCH (p:Person) +MERGE (h:Human {name: p.name}) +RETURN h.name; +``` + +Output: +```nocopy ++--------+ +| h.name | ++--------+ +| John | +| Harry | +| Anna | ++--------+ +``` + +## 2. Merging relationships + +### 2.1. Merging relationships + +Just as with nodes, `MERGE` can be used to match or create relationships: + +```cypher +MATCH (p1:Person {name: 'John'}), (p2:Person {name: 'Anna'}) +MERGE (p1)-[r:RELATED]->(p2) +RETURN r; +``` + +Output: +```nocopy ++-----------+ +| r | ++-----------+ +| [RELATED] | ++-----------+ +``` + +Multiple relationships can be matched or created with `MERGE` in the same query: + +```cypher +MATCH (p1:Person {name: 'John'}), (p2:Person {name:'Anna'}) +MERGE (p1)-[r1:RELATED_TO]->(p2)-[r2:RELATED_TO]->(p1) +RETURN r1, r2; +``` + +Output: +```nocopy ++--------------+--------------+ +| r1 | r2 | ++--------------+--------------+ +| [RELATED_TO] | [RELATED_TO] | ++--------------+--------------+ +``` + +### 2.2. Merging on undirected relationships + +If `MERGE` is used on an undirected relationship, the direction will be chosen at random: + +```cypher +MATCH (p1:Person {name: 'John'}), (p2:Person {name: 'Anna'}) +MERGE path=((p1)-[r:WORKS_WITH]->(p2)) +RETURN path; +``` + +Output: +```nocopy ++-----------------------------------------------------------------+ +| p | ++-----------------------------------------------------------------+ +| (:Person {name: "John"})-[WORKS_WITH]->(:Person {name: "Anna"}) | ++-----------------------------------------------------------------+ +``` + +In this example, a path is returned to show the direction of the relationships. + +## 3. Merging with `ON CREATE SET` and `ON MATCH SET` + +### 3.1. Merging with `ON CREATE SET` + +The `ON CREATE SET` part of a `MERGE` clause will only be executed if the node needs to be created: + +```cypher +MERGE (p:Person {name: 'Lucille'}) +ON CREATE SET p.date_of_creation = timestamp() +RETURN p.name, p.date_of_creation; +``` + +Output: +```nocopy ++--------------------+--------------------+ +| p.name | p.date_of_creation | ++--------------------+--------------------+ +| Lucille | 1605080852685000 | ++--------------------+--------------------+ +``` + +### 3.2. Merging with `ON MATCH SET` + +The `ON MATCH SET` part of a `MERGE` clause will only be executed if the node is found: + +```cypher +MERGE (p:Person {name: 'John'}) +ON MATCH SET p.found = TRUE +RETURN p.name, p.found; +``` + +Output: +```nocopy ++---------+---------+ +| p.name | p.found | ++---------+---------+ +| John | true | ++---------+---------+ +``` + +### 3.3. Merging with `ON CREATE SET` and `ON MATCH SET` + +The `MERGE` clause can be used with both the `ON CREATE SET` and `ON MATCH SET` options: + +```cypher +MERGE (p:Person {name: 'Angela'}) +ON CREATE SET p.notFound = TRUE +ON MATCH SET p.found = TRUE +RETURN p.name, p.notFound, p.found; +``` + +Output: +```nocopy ++------------+------------+------------+ +| p.name | p.notFound | p.found | ++------------+------------+------------+ +| Angela | true | Null | ++------------+------------+------------+ +``` + +### 3.4. Merging with `SET` + +If a certain property wants to be set to the same value in the case of `ON +CREATE SET` and `ON MATCH SET` you can just use `SET`: + +```cypher +MERGE (p:Person {name: 'Angela'}) +ON CREATE SET p.found = TRUE +ON MATCH SET p.found = TRUE; +``` + +is the same as the query below: + +```cypher +MERGE (p:Person {name: 'Angela'}) +SET p.Found = TRUE; +``` + +### 3.5. Combination of clauses + +You can also combine all three clauses (`ON CREATE SET`, `ON MATCH SET` and +`SET`) to set a certain property depending on whether the node has been merged +or created, and to set another property to a certain value regardless of the +creation or merger of the node: + +```cypher +MERGE (p:Person {name: 'Angela'}) +ON CREATE SET p.found = FALSE +ON MATCH SET p.found = TRUE +SET p.last_name = 'Smith' +``` + +The `found` property will be set to `FALSE` if the node was created, on `TRUE` +if it was merged, but in any case, the last name will be set to `Smith`. + +## Data set Queries + +We encourage you to try out the examples by yourself. +You can get our data set locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name = 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name = 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/optional-match.md b/docs2/querying/clauses/optional-match.md new file mode 100644 index 00000000000..57cd64feb02 --- /dev/null +++ b/docs2/querying/clauses/optional-match.md @@ -0,0 +1,98 @@ +--- +id: optional-match +title: OPTIONAL MATCH clause +sidebar_label: OPTIONAL MATCH +--- + +The `MATCH` clause can be modified by prepending the `OPTIONAL` keyword. +`OPTIONAL MATCH` clause behaves the same as a regular `MATCH`, but when it fails to find the pattern, +missing parts of the pattern will be filled with null values. + +1. [Get optional relationships](#1-get-optional-relationships)
+2. [Optional typed and named relationship](#2-optional-typed-and-named-relationship) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Get optional relationships + +Using `OPTIONAL MATCH` when returning a relationship that doesn't exist will return the default value `NULL` instead. + +The returned property of an optional element that is `NULL` will also be `NULL`: + +```cypher +MATCH (c1:Country {name: 'France'}) +OPTIONAL MATCH (c1)--(c2:Country {name: 'Germany'}) +RETURN c2; +``` + +Output: + +```nocopy ++------+ +| c2 | ++------+ +| Null | ++------+ +``` + +## 2. Optional typed and named relationship + +The `OPTIONAL MATCH` clause allows you to use the same conventions as `MATCH` when it comes to handling variables and relationship types: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +OPTIONAL MATCH (c)-[r:LIVES_IN]->() +RETURN c.name, r; +``` + +Output: + +```nocopy ++----------------+----------------+ +| c.name | r | ++----------------+----------------+ +| United Kingdom | Null | ++----------------+----------------+ +``` + +Because there are no outgoing relationships of type `LIVES_IN` for the node, the value of r is `null` while the value of `contry.name` is `'United Kingdom'`. + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/profile.md b/docs2/querying/clauses/profile.md new file mode 100644 index 00000000000..f9e55d3d7e5 --- /dev/null +++ b/docs2/querying/clauses/profile.md @@ -0,0 +1,42 @@ +--- +id: profile +title: PROFILE clause +sidebar_label: PROFILE +--- + +The PROFILE clause can be used to profile the execution of a query and get a detailed +report on how the query's plan behaved. To get a query plan, use the [EXPLAIN +clause](/clauses/explain.md). + +For every logical operator the following info is provided: + +- `OPERATOR` — the name of the operator, just like in the output of an + `EXPLAIN` query. + +- `ACTUAL HITS` — the number of times a particular logical operator was + pulled from. + +- `RELATIVE TIME` — the amount of time that was spent processing a + particular logical operator, relative to the execution of the whole plan. + +- `ABSOLUTE TIME` — the amount of time that was spent processing a + particular logical operator. + +A simple example to illustrate the output: + +```cypher +PROFILE MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *; +``` + +```plaintext ++---------------+---------------+---------------+---------------+ +| OPERATOR | ACTUAL HITS | RELATIVE TIME | ABSOLUTE TIME | ++---------------+---------------+---------------+---------------+ +| * Produce | 1 | 7.134628 % | 0.003949 ms | +| * Filter | 1 | 12.734765 % | 0.007049 ms | +| * Expand | 1 | 5.181460 % | 0.002868 ms | +| * ScanAll | 1 | 3.325061 % | 0.001840 ms | +| * ScanAll | 1 | 71.061241 % | 0.039334 ms | +| * Once | 2 | 0.562844 % | 0.000312 ms | ++---------------+---------------+---------------+---------------+ +``` \ No newline at end of file diff --git a/docs2/querying/clauses/remove.md b/docs2/querying/clauses/remove.md new file mode 100644 index 00000000000..f6340e5e1ad --- /dev/null +++ b/docs2/querying/clauses/remove.md @@ -0,0 +1,119 @@ +--- +id: remove +title: REMOVE clause +sidebar_label: REMOVE +--- + +The `REMOVE` clause is used to remove labels and properties from nodes and relationships. + +1. [Removing a property](#1-removing-a-property)
+2. [Removing a label](#2-removing-a-label) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Removing a property + +The `REMOVE` clause can be used to remove a property from a node or relationship: + +```cypher +MATCH (n:Country {name: 'United Kingdom'}) +REMOVE n.name +RETURN n; +``` + +Output: + +```nocopy ++-----------------------------------------------------------------------------+ +| n | ++-----------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "English", population: 66000000}) | ++-----------------------------------------------------------------------------+ +``` + +The `REMOVE` clause can't be used to remove all properties from a node or relationship. Instead, take a look at the `SET` clause. + +## 2. Removing a label + +The `REMOVE` clause can be used to remove a label from a node: + +```cypher +MATCH (n:Country {name: 'United Kingdom'}) +REMOVE n:Country +RETURN n; +``` + +Output: + +```nocopy ++--------------------------------------------------------------------------------------------+ +| n | ++--------------------------------------------------------------------------------------------+ +| ({continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | ++--------------------------------------------------------------------------------------------+ +``` + +Let's add the label `Country` back to the node with the name `United Kingdom` and the additional label `Kingdom`. + +```cypher +MATCH (n {name: 'United Kingdom'}) +SET n:Country:Kingdom; +``` + +You can now remove multiple labels from a node at the same time. + +```cypher +MATCH (n:Country {name: 'United Kingdom'}) +REMOVE n:Country:Kingdom +RETURN n; +``` + +Output: + +```nocopy ++--------------------------------------------------------------------------------------------+ +| n | ++--------------------------------------------------------------------------------------------+ +| ({continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | ++--------------------------------------------------------------------------------------------+ +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/return.md b/docs2/querying/clauses/return.md new file mode 100644 index 00000000000..97c6edd98cf --- /dev/null +++ b/docs2/querying/clauses/return.md @@ -0,0 +1,353 @@ +--- +id: return +title: RETURN clause +sidebar_label: RETURN +--- + +The `RETURN` clause defines which data should be included in the resulting set. + +1. [Returning nodes](#1-returning-nodes)
+2. [Returning relationships](#2-returning-relationships)
+3. [Returning properties](#3-returning-properties)
+4. [Returning multiple elements](#4-returning-multiple-elements)
+5. [Returning all elements](#5-returning-all-elements)
+6. [Handling uncommon characters](#6-handling-uncommon-characters)
+7. [Returning elements with an alias](#7-returning-elements-with-an-alias)
+8. [Optional properties](#8-optional-properties)
+9. [Returning expressions](#9-returning-expressions)
+10. [Returning unique results](#10-returning-unique-results)
+11. [Returning aggregated results](#11-returning-aggregated-results)
+12. [Limiting the number of returned results](#12-limiting-the-number-of-returned-results)
+13. [Order results](#13-order-results) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Returning nodes + +The node variable needs to be added to the `RETURN` statement: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c; +``` + +Output: + +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| c | ++-----------------------------------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | ++-----------------------------------------------------------------------------------------------------+ +``` + +## 2. Returning relationships + +The relationship variable needs to be added to the `RETURN` statement: + +```cypher +MATCH (c:Country {name: 'United Kingdom'})<-[r]-(:Person {name: 'Harry'}) +RETURN type(r); +``` + +Output: + +```nocopy ++------------+ +| type(r) | ++------------+ +| WORKING_IN | +| LIVING_IN | ++------------+ +``` + +## 3. Returning properties + +The property of a node or a relationship can be returned by using the dot separator: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c.name; +``` + +Output: + +```nocopy ++----------------+ +| c.name | ++----------------+ +| United Kingdom | ++----------------+ +``` + +## 4. Returning multiple elements + +To return multiple elements separate them with a comma character: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c.name, c.population, c.continent; +``` + +Output: + +```nocopy ++----------------+----------------+----------------+ +| c.name | c.population | c.continent | ++----------------+----------------+----------------+ +| United Kingdom | 66000000 | Europe | ++----------------+----------------+----------------+ +``` + +## 5. Returning all elements + +To return all the elements from a query, use the `*` symbol: + +```cypher +MATCH (:Country {name: 'United Kingdom'})-[]-(p:Person) +RETURN *; +``` + +Output: + +```nocopy ++---------------------------+ +| p | ++---------------------------+ +| (:Person {name: "Harry"}) | +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | ++---------------------------+ +``` + +## 6. Handling uncommon characters + +Uncommon characters are handled using placeholder variables enclosed with the symbol `\``. +For example, a query could look like this: + +```cypher +MATCH (`An uncommon variable!`) +WHERE `An uncommon variable!`.name = 'A' +RETURN `An uncommon variable!`.value; +``` + +## 7. Returning elements with an alias + +You can specify an alias for an element in the `RETURN` statement using `AS`: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c.name AS Name; +``` + +Output: + +```nocopy ++----------------+ +| Name | ++----------------+ +| United Kingdom | ++----------------+ +``` + +## 8. Optional properties + +If the property being returned does not exist, `null` will be returned: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c.color; +``` + +Output: + +```nocopy ++---------+ +| c.color | ++---------+ +| Null | ++---------+ +``` + +## 9. Returning expressions + +Expressions can be included in the `RETURN` statement: + +```cypher +MATCH (c:Country {name: 'United Kingdom'}) +RETURN c.name = 'United Kingdom', "Literal"; +``` + +Output: + +```nocopy ++---------------------------+---------------------------+ +| c.name = 'United Kingdom' | "Literal" | ++---------------------------+---------------------------+ +| true | Literal | ++---------------------------+---------------------------+ +``` + +## 10. Returning unique results + +The `RETURN` statement can be followed by the `DISTINCT` operator, which will remove duplicate results: + +```cypher +MATCH ()-[:LIVING_IN]->(c) +RETURN DISTINCT c; +``` + +Output: + +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| c | ++-----------------------------------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "German", name: "Germany", population: 83000000}) | +| (:Country {continent: "Europe", language: "English", name: "United Kingdom", population: 66000000}) | ++-----------------------------------------------------------------------------------------------------+ +``` + +## 11. Returning aggregated results + +The `RETURN` statement can be used with [`Aggregation`](https://memgraph.com/docs/cypher-manual/functions#aggregation-functions) functions: + +```cypher +MATCH (c:Country) +RETURN AVG(c.population) as average_population; +``` +Output: + +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| population | ++-----------------------------------------------------------------------------------------------------+ +| 72,000,000 | ++-----------------------------------------------------------------------------------------------------+ +``` + +Aggregations functions can be also used with `DISTINCT` operator, which will performe calculations only on unique values: + +```cypher +MATCH ()-[:LIVING_IN]->(c) +RETURN AVG(DISTINCT c.population) as average population; +``` +Output: + +```nocopy ++-----------------------------------------------------------------------------------------------------+ +| population | ++-----------------------------------------------------------------------------------------------------+ +| 74,500,000 | ++-----------------------------------------------------------------------------------------------------+ +``` +## 12. Limiting the number of returned results + +You can limit the number of returned results with the `LIMIT` sub-clause. +To get the first ten results, you can use this query: + +```cypher +MATCH (n:Person) RETURN n LIMIT 10; +``` + +## 13. Order results + +Since the patterns which are matched can come in any order, it is very useful to +be able to enforce some ordering among the results. In such cases, you can use +the `ORDER BY` sub-clause. + +For example, the following query will get all `:Person` nodes and order them by +their names: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name; +``` + +By default, ordering will be ascending. To change the order to be descending, +you should append `DESC`. + +For example, you can use this query to order people by their name descending: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name DESC; +``` + +You can also order by multiple variables. The results will be sorted by the +first variable listed. If the values are equal, the results are sorted by the +second variable, and so on. + +For example, ordering by first name descending and last name ascending: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name DESC, n.lastName; +``` + +Note that `ORDER BY` sees only the variable names as carried over by `RETURN`. +This means that the following will result in an error. + +```cypher +MATCH (old:Person) RETURN old AS new ORDER BY old.name; +``` + +Instead, the `new` variable must be used: + +```cypher +MATCH (old:Person) RETURN old AS new ORDER BY new.name; +``` + +The `ORDER BY` sub-clause may come in handy with `SKIP` and/or `LIMIT` +sub-clauses. For example, to get the oldest person you can use the following: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.age DESC LIMIT 1; +``` + +You can also order result before returning them. The following query will order +all the nodes according to name, and then return them in a list. + +```cypher +MATCH (n) +WITH n ORDER BY n.name DESC +RETURN collect(n.name) AS names; +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` + diff --git a/docs2/querying/clauses/set.md b/docs2/querying/clauses/set.md new file mode 100644 index 00000000000..08e94c2cd75 --- /dev/null +++ b/docs2/querying/clauses/set.md @@ -0,0 +1,256 @@ +--- +id: set +title: SET clause +sidebar_label: SET +--- + +The `SET` clause is used to update labels on nodes and properties on nodes and relationships. + +1. [Setting a property](#1-setting-a-property)
+2. [Setting multiple properties](#2-setting-multiple-properties)
+3. [Setting node labels](#3-setting-node-labels)
+4. [Update a property](#4-update-a-property)
+5. [Remove a property](#5-remove-a-property)
+6. [Copy all properties](#6-copy-all-properties)
+7. [Replace all properties using map](#7-replace-all-properties-using-map)
+8. [Update all properties using map](#8-update-all-properties-using-map) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Setting a property + +The `SET` clause can be used to set the value of a property on a node or relationship: + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c.population = 83000001 +RETURN c.name, c.population; +``` + +Output: + +```nocopy ++--------------+--------------+ +| c.name | c.population | ++--------------+--------------+ +| Germany | 83000001 | ++--------------+--------------+ +``` + +## 2. Setting multiple properties + +The `SET` clause can be used to set the value of multiple properties nodes or relationships by separating them with a comma: + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c.capital = 'Berlin', c.population = 83000002 +RETURN c.name, c.population, c.capital; +``` + +Output: + +```nocopy ++--------------+--------------+--------------+ +| c.name | c.population | c.capital | ++--------------+--------------+--------------+ +| Germany | 83000002 | Berlin | ++--------------+--------------+--------------+ +``` + +## 3. Setting node labels + +The `SET` clause can be used to set the label on a node. If the node has a label, a new one will be added while the old one is left as is: + +```cypher +MATCH (c {name: 'Germany'}) +SET c:Land +RETURN labels(c); +``` + +Output: + +```nocopy ++---------------------+ +| labels(c) | ++---------------------+ +| ["Country", "Land"] | ++---------------------+ +``` + +Multiple labels can be also set: + +```cypher +MATCH (c {name: 'Germany'}) +SET c:Place:Area +RETURN labels(c); +``` + +Output: + +```nocopy ++--------------------------------------+ +| labels(c) | ++--------------------------------------+ +| ["Country", "Land", "Place", "Area"] | ++--------------------------------------+ +``` + +## 4. Update a property + +The `SET` clause can be used to update the value or type of a property on a node or relationship: + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c.population = 'not available' +RETURN c.population; +``` + +Output: + +```nocopy ++---------------+ +| c.population | ++---------------+ +| not available | ++---------------+ +``` + +## 5. Remove a property + +The `SET` clause can be used to remove the value of a property on a node or relationship by setting it to `NULL`: + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c.population = NULL +RETURN c.population; +``` + +Output: + +```nocopy ++--------------+ +| c.population | ++--------------+ +| Null | ++--------------+ +``` + +## 6. Copy all properties + +If `SET` is used to copy the properties of one node/relationship to another, all the properties of the latter will be removed and replaced with the new ones: + +```cypher +MATCH (c1:Country {name: 'Germany'}), (c2:Country {name: 'France'}) +SET c2 = c1 +RETURN c2, c1; +``` + +Output: + +```nocopy ++----------------------------------------------------------------------------+----------------------------------------------------------------------------+ +| c2 | c1 | ++----------------------------------------------------------------------------+----------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "German", name: "Germany"}) | (:Country:Land {continent: "Europe", language: "German", name: "Germany"}) | ++----------------------------------------------------------------------------+----------------------------------------------------------------------------+ +``` + +## 7. Replace all properties using map + +If `SET` is used with the property replacement operator `=`, all the properties in the map that are on the node or relationship will be updated. +The properties that are not on the node or relationship but are in the map will be added. The properties that are not in the map will be removed. + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c = {name: 'Germany', population: '85000000'} +RETURN c; +``` + +Output: + +```nocopy ++------------------------------------------------------+ +| c | ++------------------------------------------------------+ +| (:Country {name: "Germany", population: "85000000"}) | ++------------------------------------------------------+ +``` + +If an empty map is used, all the properties of a node or relationship will be set to `NULL`: + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c = {} +RETURN c; +``` + +Output: + +```nocopy ++------------+ +| c | ++------------+ +| (:Country) | ++------------+ +``` + +## 8. Update all properties using map + +If `SET` is used with the property mutation operator `+=`, all the properties in the map that are on the node or relationship will be updated. +The properties that are not on the node or relationship but are in the map will be added. Properties that are not present in the map will be left as is. + +```cypher +MATCH (c:Country {name: 'Germany'}) +SET c += {name: 'Germany', population: '85000000'} +RETURN c; +``` + +Output: + +```nocopy ++-----------------------------------------------------------------------------------------------+ +| c | ++-----------------------------------------------------------------------------------------------+ +| (:Country {continent: "Europe", language: "German", name: "Germany", population: "85000000"}) | ++-----------------------------------------------------------------------------------------------+ +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/union.md b/docs2/querying/clauses/union.md new file mode 100644 index 00000000000..20f87e9f3f8 --- /dev/null +++ b/docs2/querying/clauses/union.md @@ -0,0 +1,119 @@ +--- +id: union +title: UNION clause +sidebar_label: UNION +--- + +The `UNION` clause is used to combine the result of multiple queries. + +1. [Combine queries and retain duplicates](#1-combine-queries-and-retain-duplicates)
+2. [Combine queries and remove duplicates](#2-combine-queries-and-remove-duplicates) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Combine queries and retain duplicates + +To combine two or more queries and return their results without removing duplicates, use the `UNION ALL` clause. +First, let's add a few existing nodes to the dataset: + +```cypher +CREATE (:Person {name: 'John'}); +CREATE (:Person {name: 'Anna'}); +``` + +A query with the `UNION ALL` clause could look like this: + +```cypher +MATCH (c:Country) +RETURN c.name as columnName +UNION ALL +MATCH (p:Person) +RETURN p.name AS columnName; +``` + +Output: + +```nocopy ++----------------+ +| columnName | ++----------------+ +| Germany | +| France | +| United Kingdom | +| John | +| Harry | +| Anna | +| John | +| Anna | ++----------------+ +``` + +## 2. Combine queries and remove duplicates + +To combine two or more queries and return their results while removing duplicates, use the `UNION` clause without `ALL`. + +```cypher +MATCH (c:Country) +RETURN c.name as columnName +UNION +MATCH (p:Person) +RETURN p.name AS columnName; +``` + +Output: + +```nocopy ++----------------+ +| columnName | ++----------------+ +| Germany | +| France | +| United Kingdom | +| John | +| Harry | +| Anna | ++----------------+ +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +CREATE (:Person {name: 'John'}); +CREATE (:Person {name: 'Anna'}); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/unwind.md b/docs2/querying/clauses/unwind.md new file mode 100644 index 00000000000..648fc702fdb --- /dev/null +++ b/docs2/querying/clauses/unwind.md @@ -0,0 +1,107 @@ +--- +id: unwind +title: UNWIND clause +sidebar_label: UNWIND +--- + +The `UNWIND` clause is used to unwind a list of values as individual rows. + +1. [Unwinding lists](#1-unwinding-lists)
+2. [Distinct list](#2-distinct-list)
+3. [Expression returning lists](#3-expression-returning-lists)
+4. [Unwinding lists of lists](#4-unwinding-lists-of-lists) + +## 1. Unwinding lists + +Use `UNWIND` to transform a literal list into rows: + +```cypher +UNWIND [1,2,3] AS listElement +RETURN listElement; +``` + +Output: + +```nocopy ++-------------+ +| listElement | ++-------------+ +| 1 | +| 2 | +| 3 | ++-------------+ +``` + +## 2. Distinct list + +The `UNWIND` clause can be used to remove duplicates from a list: + +```cypher +WITH [1,1,1,2,2,3] AS list +UNWIND list AS listElement +RETURN collect(DISTINCT listElement) AS distinctElements; +``` + +Output: + +```nocopy ++------------------+ +| distinctElements | ++------------------+ +| [1, 2, 3] | ++------------------+ +``` + +## 3. Expression returning lists + +An expression that returns a list can be used with the `UNWIND` clause: + +```cypher +WITH [1,2,3] AS listOne, [4,5,6] AS listTwo +UNWIND (listOne + listTwo) AS list +RETURN list; +``` + +Output: + +```nocopy ++------+ +| list | ++------+ +| 1 | +| 2 | +| 3 | +| 4 | +| 5 | +| 6 | ++------+ +``` + +## 4. Unwinding lists of lists + +Multiple `UNWIND` clauses can be combined to unwind nested lists: + +```cypher +WITH [[1,2,3],[4,5,6],[7,8,9]] AS listOne +UNWIND listOne AS listOneElement +UNWIND listOneElement AS element +RETURN element; +``` + +Output: + +```nocopy ++---------+ +| element | ++---------+ +| 1 | +| 2 | +| 3 | +| 4 | +| 5 | +| 6 | +| 7 | +| 8 | +| 9 | ++---------+ +``` diff --git a/docs2/querying/clauses/where.md b/docs2/querying/clauses/where.md new file mode 100644 index 00000000000..7dd3fa4186c --- /dev/null +++ b/docs2/querying/clauses/where.md @@ -0,0 +1,271 @@ +--- +id: where +title: WHERE clause +sidebar_label: WHERE +--- + +`WHERE` isn't usually considered a standalone clause but rather a part of the +`MATCH`, `OPTIONAL MATCH` and `WITH` clauses. + +When used next to the `WITH` clause, the `WHERE` clause only filters the +results, while when used with `MATCH` and `OPTIONAL MATCH` it adds constraints +to the described patterns. + +`WHERE` should be used directly after `MATCH` or `OPTIONAL MATCH` clauses in +order to avoid problems with performance or results. + +1. [Basic usage](#1-basic-usage)
+ 1.1. [Boolean Operators](#11-boolean-operators)
+ 1.2. [Inequality Operators Operators](#12-inequality-operators-operators)
+ 1.3. [Filter with node labels](#13-filter-with-node-labels)
+ 1.4. [Filter with node properties](#14-filter-with-node-properties)
+ 1.5. [Filter with relationship properties](#15-filter-with-relationship-properties)
+ 1.6. [Check if property is not null](#16-check-if-property-is-not-null)
+ 1.7. [Filter with pattern expressions](#17-filter-with-pattern-expressions)
+2. [String matching](#2-string-matching)
+3. [Regular Expressions](#3-regular-expressions) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Basic Usage + +### 1.1. Boolean Operators + +Standard boolean operators like `NOT`, `AND`, `OR` and `XOR` can be used: + +```cypher +MATCH (c:Country) +WHERE c.language = 'English' AND c.continent = 'Europe' +RETURN c.name; +``` + +Output: + +```nocopy ++----------------+ +| c.name | ++----------------+ +| United Kingdom | ++----------------+ +``` + +### 1.2. Inequality Operators Operators + +Standard inequality operators like `<`, `<=`, `>` and `>=` can be used: + +```cypher +MATCH (c:Country) +WHERE (c.population > 80000000) +RETURN c.name; +``` + +Output: + +```nocopy ++---------+ +| c.name | ++---------+ +| Germany | ++---------+ +``` + +### 1.3. Filter with node labels + +Nodes can be filtered by their label using the `WHERE` clause instead of specifying it directly in the `MATCH` clause: + +```cypher +MATCH (c) +WHERE c:Country +RETURN c.name; +``` + +Output: + +```nocopy ++----------------+ +| c.name | ++----------------+ +| Germany | +| France | +| United Kingdom | ++----------------+ +``` + +### 1.4. Filter with node properties + +Just as labels, node properties can be used in the WHERE clause to filter nodes: + +```cypher +MATCH (c:Country) +WHERE c.population < 70000000 +RETURN c.name; +``` + +Output: + +```nocopy ++----------------+ +| c.name | ++----------------+ +| France | +| United Kingdom | ++----------------+ +``` + +### 1.5. Filter with relationship properties + +Just as with node properties, relationship properties can be used as filters: + +```cypher +MATCH (:Country {name: 'United Kingdom'})-[r]-(p) +WHERE r.date_of_start = 2014 +RETURN p; +``` + +Output: + +```nocopy ++---------------------------+ +| p | ++---------------------------+ +| (:Person {name: "Harry"}) | +| (:Person {name: "Anna"}) | ++---------------------------+ +``` + +### 1.6. Check if property is not null + +To check if a node or relationship property exists use the `IS NOT NULL` option: + +```cypher +MATCH (c:Country) +WHERE c.name = 'United Kingdom' AND c.population IS NOT NULL +RETURN c.name, c.population; +``` + +Output: + +```nocopy ++----------------+----------------+ +| c.name | c.population | ++----------------+----------------+ +| United Kingdom | 66000000 | ++----------------+----------------+ +``` + +### 1.7. Filter with pattern expressions + +Currently, we support pattern expression filters with the `exists(pattern)` function, which can perform filters based on +neighboring entities: + +```cypher +MATCH (p:Person) +WHERE exists((p)-[:LIVING_IN]->(:Country {name: 'Germany'})) +RETURN p.name +ORDER BY p.name; +``` + +Output: + +```nocopy ++----------------+ +| c.name | ++----------------+ +| Anna | +| John | ++----------------+ +``` + +## 2. String matching + +Apart from comparison and concatenation operators Cypher provides special +string operators for easier matching of substrings: + +| Operator | Description | +| ----------------- | ---------------------------------------------------------------- | +| `a STARTS WITH b` | Returns true if the prefix of string a is equal to string b. | +| `a ENDS WITH b` | Returns true if the suffix of string a is equal to string b. | +| `a CONTAINS b` | Returns true if some substring of string a is equal to string b. | + +```cypher +MATCH (c:Country) +WHERE c.name STARTS WITH 'G' AND NOT c.name CONTAINS 't' +RETURN c.name; +``` + +Output: + +```nocopy ++---------+ +| c.name | ++---------+ +| Germany | ++---------+ +``` + +## 3. Regular expressions + +Inside `WHERE` clause, you can use regular expressions for text filtering. To +use a regular expression, you need to use the `=~` operator. + +For example, finding all `Person` nodes which have a name ending with `a`: + +```cypher +MATCH (n:Person) WHERE n.name =~ ".*a$" RETURN n; +``` + +Output: + +```nocopy ++--------------------------+ +| n | ++--------------------------+ +| (:Person {name: "Anna"}) | ++--------------------------+ +``` + +The regular expression syntax is based on the modified ECMAScript regular +expression grammar. The ECMAScript grammar can be found +[here](http://ecma-international.org/ecma-262/5.1/#sec-15.10), while the +modifications are described in [this +document](https://en.cppreference.com/w/cpp/regex/ecmascript). + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/clauses/with.md b/docs2/querying/clauses/with.md new file mode 100644 index 00000000000..772a5fe2c65 --- /dev/null +++ b/docs2/querying/clauses/with.md @@ -0,0 +1,161 @@ +--- +id: with +title: WITH clause +sidebar_label: WITH +--- + +The `WITH` is used to chain together parts of a query, piping the results from one to be used as starting points or criteria in the next. + +1. [Filter on aggregate functions](#1-filter-on-aggregate-functions)
+2. [Sorting results](#2-sorting-results)
+3. [Limited path searches](#3-limited-path-searches) + +## Dataset + +The following examples are executed with this dataset. You can create this dataset +locally by executing the queries at the end of the page: [Dataset queries](#data-set-queries). + +![Data set](../data/clauses/data_set.png) + +## 1. Filter on aggregate functions + +Aggregated results have to pass through a `WITH` if you want to filter them: + +```cypher +MATCH (p:Person {name: 'John'})--(person)-->() +WITH person, count(*) AS foaf +WHERE foaf > 1 +RETURN person.name; +``` + +Output: + +```nocopy ++-------------+ +| person.name | ++-------------+ +| Harry | +| Anna | ++-------------+ +``` + +Sorting unique aggregated results can be done with `DISTINCT` operator in aggregation function which can be then filtered: + +```cypher +MATCH (p:Person {name: 'John'})--(person)-->(m) +WITH person, count(DISTINCT m) AS foaf +WHERE foaf > 1 +RETURN person.name; +``` + +Output: + +```nocopy ++-------------+ +| person.name | ++-------------+ +| Harry | +| Anna | ++-------------+ +``` + +## 2. Sorting results + +The `WITH` clause can be used to order results before using `collect()` on them: + +```cypher +MATCH (n) +WITH n +ORDER BY n.name ASC LIMIT 3 +RETURN collect(n.name); +``` + +Output: + +```nocopy ++-------------------------------+ +| collect(n.name) | ++-------------------------------+ +| ["Anna", "France", "Germany"] | ++-------------------------------+ +``` + +if you want to `collect()` only unique values: + +```cypher +MATCH (n) +WITH n +ORDER BY n.name ASC LIMIT 3 +RETURN collect(DISTINCT n.name) as unique_names; +``` + +Output: + +```nocopy ++-------------------------------+ +| unique_names | ++-------------------------------+ +| ["Anna", "France", "Germany"] | ++-------------------------------+ +``` + +## 3. Limited path searches + +The `WITH` clause can be used to match paths, limit to a certain number, +and then match again using those paths as a base: + +```cypher +MATCH (p1 {name: 'John'})--(p2) +WITH p2 +ORDER BY p2.name ASC LIMIT 1 +MATCH (p2)--(p3) +RETURN p3.name; +``` + +Output: + +```nocopy ++----------------+ +| p3.name | ++----------------+ +| John | +| Harry | +| Germany | +| United Kingdom | ++----------------+ +``` + +## Dataset queries + +We encourage you to try out the examples by yourself. +You can get our dataset locally by executing the following query block. + +```cypher +MATCH (n) DETACH DELETE n; + +CREATE (c1:Country {name: 'Germany', language: 'German', continent: 'Europe', population: 83000000}); +CREATE (c2:Country {name: 'France', language: 'French', continent: 'Europe', population: 67000000}); +CREATE (c3:Country {name: 'United Kingdom', language: 'English', continent: 'Europe', population: 66000000}); + +MATCH (c1),(c2) +WHERE c1.name= 'Germany' AND c2.name = 'France' +CREATE (c2)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'John'})-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (c) +WHERE c.name= 'United Kingdom' +CREATE (c)<-[:WORKING_IN {date_of_start: 2014}]-(p:Person {name: 'Harry'})-[:LIVING_IN {date_of_start: 2013}]->(c); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)-[:FRIENDS_WITH {date_of_start: 2011}]->(p2); + +MATCH (p1),(p2) +WHERE p1.name = 'John' AND p2.name = 'Harry' +CREATE (p1)<-[:FRIENDS_WITH {date_of_start: 2012}]-(:Person {name: 'Anna'})-[:FRIENDS_WITH {date_of_start: 2014}]->(p2); + +MATCH (p),(c1),(c2) +WHERE p.name = 'Anna' AND c1.name = 'United Kingdom' AND c2.name = 'Germany' +CREATE (c2)<-[:LIVING_IN {date_of_start: 2014}]-(p)-[:LIVING_IN {date_of_start: 2014}]->(c1); + +MATCH (n)-[r]->(m) RETURN n,r,m; +``` diff --git a/docs2/querying/create-graph-objects.md b/docs2/querying/create-graph-objects.md new file mode 100644 index 00000000000..1f87ad41a1f --- /dev/null +++ b/docs2/querying/create-graph-objects.md @@ -0,0 +1,182 @@ +--- +id: create-graph-objects +title: Create graph objects +sidebar_label: Create graph objects +--- + +For creating graph objects, you can use the following clauses. + +- `CREATE`, for creating new nodes and relationships. +- `SET`, for adding new or updating existing labels and properties. + +You can still use the `RETURN` clause to produce results after writing, but it +is not mandatory. + +Details on which kind of data can be stored in Memgraph can be found in the +[Storage](/memgraph/concepts/storage) chapter. + +:::info + +Indexing can increase performance when executing queries. Please take a look at +our [documentation on indexing](/docs/memgraph/reference-guide/indexing) for +more details. + +::: + +## CREATE + +This clause is used to add new nodes and relationships to the database. The creation is +done by providing a pattern, similarly to `MATCH` clause. + +For example, use this query to create two new nodes connected with a new relationship. + +```cypher +CREATE (node1)-[:RELATIONSHIP_TYPE]->(node2); +``` + +Labels and properties can be set during creation using the same syntax as in +`MATCH` patterns. For example, creating a node with a label and a property: + +```cypher +CREATE (node:Label {property: 'my property value'}); +``` + +Additional information on `CREATE` is available [here](./clauses/create.md). + +## WITH + +The write part of the query cannot be simply followed by another read part. To +combine them, the `WITH` clause must be used. The names this clause establishes +are transferred from one part to another. + +For example, creating a node and finding all nodes with the same property. + +```cypher +CREATE (node {property: 42}) WITH node.property AS propValue +MATCH (n {property: propValue}) RETURN n; +``` + +Note that the `node` is not visible after `WITH`, since only `node.property` was +carried over. + +This clause behaves very much like `RETURN`, so you should refer to features of +`RETURN`. + +## MERGE + +The `MERGE` clause is used to ensure that a pattern you are looking for exists +in the database. This means that it will be created if the pattern is not found. +In a way, this clause is like a combination of `MATCH` and `CREATE`. + +For example, ensure that a person has at least one friend: + +```cypher +MATCH (n:Person) MERGE (n)-[:FRIENDS_WITH]->(m); +``` + +The clause also provides additional features for updating the values depending +on whether the pattern was created or matched. This is achieved with `ON CREATE` +and `ON MATCH` sub clauses. + +For example, set different properties depending on what `MERGE` did: + +```cypher +MATCH (n:Person) MERGE (n)-[:FRIENDS_WITH]->(m) +ON CREATE SET m.prop = "created" ON MATCH SET m.prop = "existed"; +``` + +For more details, check out [this guide](./clauses/merge.md). + +## Import existing data from CSV + +Using CSV files is just one of the ways to [import your +data](/docs/memgraph/import-data) into Memgraph. The `LOAD CSV` clause enables +you to [load and use data](/docs/memgraph/import-data/load-csv-clause) from a +CSV file. Memgraph supports the Excel CSV dialect, as it's the most commonly +used one. For the syntax of the clause, please check the [LOAD +CSV](/cypher-manual/clauses/load-csv) page in the Cypher manual. + +## Relationships + +**Relationships** (or edges) are the **lines that connect nodes** to each other +and represent a defined connection between them. Every relationship has a source +node and a target node that represent in which direction the relationship works. +If this direction is important, the relationship is considered directed, +otherwise, it's undirected. + +Relationships can also store data in the form of **properties**, just as nodes. +In most cases, relationships store quantitative properties such as weight, +costs, distances, ratings, etc. + +![](data/connecting-nodes/connecting-nodes.png) + +In our example, the relationship between two nodes labeled `Person` could be of +the type `MARRIED_TO`. The relationship between `Person` and `City` is +represented by the type `LIVES_IN`. + +The relationship of the type `MARRIED_TO` has the property `weddingDate`, which +represents the date when the marriage was formed. Relationships of the type +`LIVES_IN` have the property `durationInYears` which denotes how long a person +has lived in the specified location. + +### Creating relationships + +To create a relationship between two nodes, we need to specify which nodes +either by creating them or filtering them with the `WHERE` clause: + +```cypher +CREATE (p1:Person {name: 'Harry'}), (p2:Person {name: 'Anna'}) +CREATE (p1)-[r:MARRIED_TO]->(p2) +RETURN p1, r, p2; +``` + +If the nodes already existed, the query would look like this: + +```cypher +MATCH (p1:Person),(p2:Person) +WHERE p1.name = 'Harry' AND p2.name = 'Anna' +CREATE (p1)-[r:MARRIED_TO]->(p2) +RETURN p1, r, p2; +``` + +Instead of using the `CREATE` clause, you are just searching for existing nodes +using the `WHERE` clause and accessing them using variables `p1` and `p2`. + +### Retrieving relationship types + +The built-in function `type()` can be used to return the type of a relationship: + +```cypher +CREATE (p1:Person {name: 'Harry'}), (p2:Person {name: 'Anna'}) +CREATE (p1)-[r:MARRIED_TO {weddingDate: '27-06-2019'}]->(p2) +RETURN type(r); +``` + +### Querying using relationships + +You can query the database using relationship types. The following query will +return nodes connected with the relationship of the following type: + +```cypher +MATCH (p1)-[r:MARRIED_TO]->(p2) +RETURN p1, r, p2; +``` + +### Relationship properties + +Just like with properties on nodes, the same rules apply when creating or +matching a relationship. You can add properties to relationships at the time of +creation: + +```cypher +CREATE (p1:Person {name: 'Harry'}), (p2:Person {name: 'Anna'}) +CREATE (p1)-[r:MARRIED_TO {weddingDate: '27-06-2019'}]->(p2) +RETURN p1, r, p2; +``` + +You can also specify them in the `MATCH` clause: + +```cypher +MATCH (p1)-[r:MARRIED_TO {weddingDate: '27-06-2019'}]->(p2) +RETURN p1, r, p2; +``` \ No newline at end of file diff --git a/docs2/querying/differences-in-cypher-implementations.md b/docs2/querying/differences-in-cypher-implementations.md new file mode 100644 index 00000000000..ca2dd788ed7 --- /dev/null +++ b/docs2/querying/differences-in-cypher-implementations.md @@ -0,0 +1,60 @@ +--- +id: differences-in-cypher-implementations +title: Differences in Cypher implementations +sidebar_label: Differences in Cypher implementations +--- + +Although we try to implement the [openCypher](https://www.opencypher.org/) query +language as close to the language reference as possible, we had to make some +changes to enhance the user experience. + +## Unicode codepoints in string literals + +Use `\u` followed by 4 hex digits in string literals for UTF-16 codepoint and +`\U` with 8 hex digits for UTF-32 codepoint in Memgraph. + +## Difference from Neo4j's Cypher implementation + +The openCypher initiative stems from Neo4j's Cypher query language. Following is +a list of the most important differences between Neo's Cypher and Memgraph's +openCypher implementation for users already familiar with Neo4j. Other +differences might not be documented here (especially subtle semantic ones). + +### Unsupported constructs + +- Stored procedures. +- `shortestPath` and `allShortestPaths` functions. They can be expressed using + Memgraph's depth-first search and all shortest paths expansion syntax. Among + Memgraph's [built in + algorithms](/memgraph/reference-guide/built-in-graph-algorithms) are also + breadth-first search and weighted shortest path. +- Patterns in expressions. For example, Memgraph doesn't support + `size((n)-->())`. Most of the time, the same functionalities can be expressed + differently in Memgraph using `OPTIONAL` expansions, function calls etc. You + can check out [this example](#patterns-in-expressions). + +### Unsupported functions + +General purpose functions: + +- `exists(n.property)` - This can be expressed using `n.property IS NOT NULL`. +- `length()` is named `size()` in Memgraph. + +Mathematical functions: + +- `percentileDisc()` +- `stDev()` +- `point()` +- `distance()` +- `degrees()` + +List functions: + +- `none()` + +## Patterns in expressions + +Patterns in expressions are supported in Memgraph in particular functions, like `exists(pattern)`. +In other cases, Memgraph does not yet support patterns in functions, e.g. `size((n)-->())`. +Most of the time, the same functionalities can be expressed differently in Memgraph +using `OPTIONAL` expansions, function calls, etc. diff --git a/docs2/querying/exploring-datasets/analyzing-ted-talks.md b/docs2/querying/exploring-datasets/analyzing-ted-talks.md new file mode 100644 index 00000000000..801fdf189d8 --- /dev/null +++ b/docs2/querying/exploring-datasets/analyzing-ted-talks.md @@ -0,0 +1,176 @@ +--- +id: analyzing-ted-talks +title: Analyzing TED Talks +sidebar_label: Analyzing TED Talks +--- + +This article is a part of a series intended to show how to use Memgraph on +real-world data to retrieve some interesting and useful information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +[TED](https://www.ted.com/) is a nonprofit organization devoted to spreading +ideas, usually in the form of short, powerful talks. Today, TED talks are +influential videos from expert speakers on almost all topics — from +science to business to global issues. Here we present a small dataset which +consists of 97 talks, show how to model this data as a graph and demonstrate a +few example queries. + +## Data Model + +- Each TED talk has a main speaker, so we identify two types of nodes — + `Talk` and `Speaker`. +- We add an edge of type `Gave` pointing to a `Talk` from its main `Speaker`. +- Each speaker has a name so we can add property `name` to `Speaker` node. +- We'll add properties `name`, `title` and `description` to node `Talk`. +- Each talk is given in a specific TED event, so we can create node `Event` with + property `name` and relationship `InEvent` between talk and event. +- Talks are tagged with keywords to facilitate searching, hence we add node + `Tag` with property `name` and relationship `HasTag` between talk and tag. +- Users give ratings to each talk by selecting up to three predefined string + values. Therefore we add node `Rating` with these values as property `name` + and relationship`HasRating` with property `user_count` between talk and rating + nodes. + +![TED](../../data/TED_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/ted-talks) and continue with +the tutorial there. Note that you will not be able to execute `write` +operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `TED talks` and continue with the +tutorial. + +## Example queries using Cypher + +In the queries below, we are using [Cypher](/cypher-manual) to query Memgraph +via the console. + +**1\.** Find all talks given by specific speaker: + +```cypher +MATCH (n:Speaker {name: "Hans Rosling"})-[:Gave]->(m:Talk) +RETURN m.title; +``` + +**2\.** Find the top 20 speakers with most talks given: + +```cypher +MATCH (n:Speaker)-[:Gave]->(m) +RETURN n.name, count(m) AS talksGiven +ORDER BY talksGiven +DESC LIMIT 20; +``` + +**3\.** Find talks related by tag to specific talk and count them: + +```cypher +MATCH (n:Talk {name: "Michael Green: Why we should build wooden skyscrapers"}) + -[:HasTag]->(t:Tag)<-[:HasTag]-(m:Talk) +WITH * +ORDER BY m.name +RETURN t.name, collect(m.name) AS names, count(m) AS talksCount +ORDER BY talksCount DESC; +``` + +**4\.** Find 20 most frequently used tags: + +```cypher +MATCH (t:Tag)<-[:HasTag]-(n:Talk) +RETURN t.name AS tag, count(n) AS talksCount +ORDER BY talksCount DESC, tag +LIMIT 20; +``` + +**5\.** Find 20 talks most rated as "Funny". If you want to query by other + ratings, possible values are: Obnoxious, Jaw-dropping, OK, Persuasive, + Beautiful, Confusing, Longwinded, Unconvincing, Fascinating, Ingenious, + Courageous, Funny, Informative and Inspiring. + +```cypher +MATCH (r:Rating {name: "Funny"})<-[e:HasRating]-(m:Talk) +RETURN m.name, e.user_count +ORDER BY e.user_count DESC +LIMIT 20; +``` + +**6\.** Find inspiring talks and their speakers from the field of technology: + +```cypher +MATCH (n:Talk)-[:HasTag]->(m:Tag {name: "technology"}) +MATCH (n)-[r:HasRating]->(p:Rating {name: "Inspiring"}) +MATCH (n)<-[:Gave]-(s:Speaker) +WHERE r.user_count > 1000 +RETURN n.title, s.name, r.user_count +ORDER BY r.user_count DESC; +``` + +**7\.** Now let's see one real-world example — how to make a real-time + recommendation. If you've just watched a talk from a certain speaker (e.g. + Hans Rosling) you might be interested in finding more talks from the same + speaker on a similar topic: + +```cypher +MATCH (n:Speaker {name: "Hans Rosling"})-[:Gave]->(m:Talk) +MATCH (t:Talk {title: "New insights on poverty"}) + -[:HasTag]->(tag:Tag)<-[:HasTag]-(m) +WITH * +ORDER BY tag.name +RETURN m.title AS title, collect(tag.name) AS names, count(tag) AS tagCount +ORDER BY tagCount DESC, title; +``` + +The following few queries are focused on extracting information about TED +events. + +**8\.** Find how many talks were given per event: + +```cypher +MATCH (n:Event)<-[:InEvent]-(t:Talk) +RETURN n.name AS event, count(t) AS talksCount +ORDER BY talksCount DESC, event +LIMIT 20; +``` + +**9\.** Find the most popular tags in the specific event: + +```cypher +MATCH (n:Event {name:"TED2006"})<-[:InEvent]-(t:Talk)-[:HasTag]->(tag:Tag) +RETURN tag.name AS tag, count(t) AS talksCount +ORDER BY talksCount DESC, tag +LIMIT 20; +``` + +**10\.** Discover which speakers participated in more than 2 events: + +```cypher +MATCH (n:Speaker)-[:Gave]->(t:Talk)-[:InEvent]->(e:Event) +WITH n, count(e) AS eventsCount +WHERE eventsCount > 2 +RETURN n.name AS speaker, eventsCount +ORDER BY eventsCount DESC, speaker; +``` + +**11\.** For each speaker search for other speakers that participated in same +events: + +```cypher +MATCH (n:Speaker)-[:Gave]->()-[:InEvent]->(e:Event)<-[:InEvent]-()<-[:Gave]-(m:Speaker) +WHERE n.name != m.name +WITH DISTINCT n, m +ORDER BY m.name +RETURN n.name AS speaker, collect(m.name) AS others +ORDER BY speaker; +``` diff --git a/docs2/querying/exploring-datasets/backpacking-through-europe.md b/docs2/querying/exploring-datasets/backpacking-through-europe.md new file mode 100644 index 00000000000..79f49cd6f68 --- /dev/null +++ b/docs2/querying/exploring-datasets/backpacking-through-europe.md @@ -0,0 +1,185 @@ +--- +id: backpacking-through-europe +title: Backpacking through Europe +sidebar_label: Backpacking through Europe +--- + +This article is a part of a series intended to show users how to use Memgraph on +real-world data and, by doing so, retrieve some interesting and useful +information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +Backpacking is a form of low-cost independent travel. It includes the use of +public transportation, inexpensive hostels and is often longer in duration than +conventional vacations. This article explores the European Backpackers Index +from 2018. The dataset contains tourist prices and other data for 56 of the most +popular European cities. Here we showcase how Memgraph's graph traversal +algorithms can be used to make a real-time travelling recommendation system. + +## Data model + +The European Backpacker Index (2018) contains information for 56 cities from 36 +European countries. Two cities are connected via the `:CloseTo` edge if they are +from the same or from the neighboring countries. Every edge has an `eu_border` +property to indicate whether the EU border needs to be crossed to reach the +other city. The index lists the cheapest and most attractive hostel from each +city. The hostel name can be accessed via the `cheapest_hostel` parameter, and +its website is stored in `hostel_url`. The city nodes also contain parameters +for tourist information such as `local_currency`, `local_currency_code`, and +`total_USD`. `total_USD` is the sum of the most common tourist expenses, such as +`cost_per_night_USD`, `attractions_USD`, `drinks_USD`, `meals_USD`, and +`transportation_USD`. The country nodes are connected with the `:Borders` edge +if they are neighboring countries. This edge also has the `eu_border` property. +Every city node is connected to its parent country node via the `:Inside` edge. + +![Backpacking](../../data/backpacking_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/europe-backpacking) and +continue with the tutorial there. Note that you will not be able to execute +`write` operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Backpacking through Europe` and +continue with the tutorial. + +## Example queries + +**1\.** Let's list the top 10 cities with the cheapest hostels by cost per night +from the European Backpacker Index. + +```cypher +MATCH (n:City) +RETURN n.name, n.cheapest_hostel, n.cost_per_night_USD, n.hostel_url +ORDER BY n.cost_per_night_USD LIMIT 10; +``` + +**2\.** Say we want to visit Croatia. Which cities does Backpackers Index +recommend? Let's sort them by total costs. + +```cypher +MATCH (c:City)-[:Inside]->(:Country {name: "Croatia"}) +RETURN c.name, c.cheapest_hostel, c.total_USD +ORDER BY c.total_USD; +``` + +**3\.** What if we want to visit multiple cities in a single country and want to +know which country has the most cities in the index? + +```cypher +MATCH (n:Country)<-[:Inside]-(m:City) +RETURN n.name AS CountryName, COUNT(m) AS HostelCount +ORDER BY HostelCount DESC, CountryName LIMIT 10; +``` + +Now, let's start backpacking. This is where Memgraph's graph traversal +capabilities come into play. + +**4\.** We're on a trip from Spain to Russia and want to cross the least amount +of borders. This is a great job for the breadth-first search (BFS) algorithm. + +```cypher +MATCH p = (n:Country {name: "Spain"}) + -[r:Borders * bfs]- + (m:Country {name: "Russia"}) +UNWIND (nodes(p)) AS rows +RETURN rows.name; +``` + +**5\.** What if we're interested in going from Bratislava to Madrid with the +least amount of stops? Also, we can't be bothered to switch currencies and want +to pay with Euro everywhere along the trip. + +```cypher +MATCH p = (:City {name: "Bratislava"}) + -[:CloseTo * bfs (e, v | v.local_currency = "Euro")]- + (:City {name: "Madrid"}) +UNWIND (nodes(p)) AS rows +RETURN rows.name; +``` + +Here we can see how to use the _filter lambda_ to filter paths where the local +currency in the city vertex `v` is the Euro. `nodes(p)` returns the path as a +list, and `UNWIND` unpacks the list into individual rows. + +**6\.** This time we're going from Brussels to Athens on a budget. We're +interested in the route with the cheapest stays. But there's a problem, we've +lost our passport! Luckily, we're a European Union citizen and can travel freely +within the EU. Let's find the cheapest route from Brussels to Athens with no EU +border crossings. This is a good use case for the Dijkstra's shortest path +algorithm. + +```cypher +MATCH p = (:City {name: "Brussels"}) + -[:CloseTo * wShortest(e, v | v.cost_per_night_USD) total_cost (e, v | e.eu_border=FALSE)]- + (:City {name: "Athens"}) +WITH extract(city in nodes(p) | city.name) AS trip, total_cost +RETURN trip, total_cost; +``` + +Here we used the _weight lambda_ to specify the cost of expanding to the +specified vertex using the given edge (`v.cost_per_night_USD`), and the _total +cost_ symbol to calculate the cost of the trip. This can be done using an edge +property like in the [Exploring the European Road +Network](exploring-the-european-road-network.md) tutorial. Here we use +`cost_per_night` property of the city vertex `v` as our weight. Finally, we use +the _filter lambda_ to only consider paths with no EU border crossings. The +`extract` function is used to only show the city names. To get the full city +information, we would simply return `nodes(p)`. + +**7\.** We're on a trip with our friends from Madrid to Belgrade, but want to +visit Vienna along the way. We want to party it up on the first part of our trip +and are only interested in the cost of staying and drinks. After that, we plan +on sightseeing and are interested in the cost of attractions from Vienna to +Belgrade. What is our cheapest option? + +```cypher +MATCH p = (:City {name: "Madrid"}) + -[:CloseTo * wShortest(e, v | v.cost_per_night_USD + v.drinks_USD) cost1]- + (:City {name: "Vienna"}) + -[:CloseTo * wShortest(e, v | v.cost_per_night_USD + v.attractions_USD) cost2]- + (:City {name: "Belgrade"}) +WITH extract(city in nodes(p) | city.name) AS trip, cost1, cost2 +RETURN trip, cost1 + cost2 AS total_cost; +``` + +**8\.** We're on a trip from Paris to Zagreb and want to visit at least 3 +cities, but no more than 5 (excluding the starting location — Paris). +Let's list our top 10 options sorted by the total trip cost and number of cities +in the path. + +```cypher +MATCH path = (n:City {name: "Paris"})-[:CloseTo *3..5]-(m:City {name: "Zagreb"}) +WITH nodes(path) AS trip +WITH extract(city in trip | [city, trip]) AS lst +UNWIND lst AS rows +WITH rows[0] AS city, extract(city in rows[1] | city.name) AS trip +RETURN trip, + toInteger(sum(city.total_USD)) AS trip_cost_USD, + count(trip) AS city_count +ORDER BY trip_cost_USD, city_count DESC LIMIT 10; +``` + +Here we can see the usage of the variable length paths. By using the `*` +_(asterisk)_ symbol, we can traverse from one node to another by following any +number of connections. We then use the extract function to get a list of (city, +trip) tuples. The city is used to calculate the total cost of the trip using the +sum function. Finally, we sort our results by price first, and then by city +count. + +To learn more about these algorithms, we suggest you check out their Wikipedia +pages: + +- [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search) +- [Dijkstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) diff --git a/docs2/querying/exploring-datasets/exploring-datasets.md b/docs2/querying/exploring-datasets/exploring-datasets.md new file mode 100644 index 00000000000..8432339c6df --- /dev/null +++ b/docs2/querying/exploring-datasets/exploring-datasets.md @@ -0,0 +1,47 @@ +--- +id: exploring-datasets +title: Exploring datasets with graph analytics +--- + +The tutorials that focus on exploring datasets want to showcase how to use +Memgraph on that particular dataset using Cypher queries. We encourage all +Memgraph users to go through at least one of the tutorials to get familiar with +Memgraph. + +You can explore the datasets in two ways. If you just want to take a better look +at the data and the data model, and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandboxes/) and continue with the +tutorials there. Note that you will not be able to execute `write` operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to the `Datasets` in the +sidebar. From there, choose the dataset that seems interesting to you and +continue with the tutorial. + +You can also run an instance in [Memgraph +Cloud](https://memgraph.com/docs/memgraph-cloud/). Once you [sing +up](https://cloud.memgraph.com/), create a new project. From the project you can +connect to Memgraph Lab web application and navigate to the `Datasets` in the +sidebar to choose the preferred dataset. + +So far we have covered the following topics with basic tutorials: + +- **[Analyzing TED Talks](analyzing-ted-talks.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/ted-talks) +- **[Backpacking Through Europe](backpacking-through-europe.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/europe-backpacking) +- **[Exploring the European Road + Network](exploring-the-european-road-network.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/europe-roads) +- **[Football Transfers](football-transfers.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/football-transfers) +- **[Game of Thrones deaths](got-deaths.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/game-of-thrones-deaths) +- **[Graphing the Premier League](graphing-the-premier-league.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/football-premier-league) +- **[Marvel Comic Universe Social Network](marvel-universe.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/marvel-comics) +- **[Movie Recommendation System](movie-recommendation.md)** - [Try it on + Playground!](https://playground.memgraph.com/sandbox/movielens) \ No newline at end of file diff --git a/docs2/querying/exploring-datasets/exploring-the-european-road-network.md b/docs2/querying/exploring-datasets/exploring-the-european-road-network.md new file mode 100644 index 00000000000..04fdb2c4bad --- /dev/null +++ b/docs2/querying/exploring-datasets/exploring-the-european-road-network.md @@ -0,0 +1,172 @@ +--- +id: exploring-the-european-road-network +title: Exploring the European road network +sidebar_label: Exploring the European road network +--- + +This article is a part of a series intended to show users how to use Memgraph on +real-world data and, by doing so, retrieve some interesting and useful +information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +This particular article outlines how to use some of Memgraph's built-in graph +algorithms. More specifically, the article shows how to use breadth-first search +graph traversal algorithm, and Dijkstra's algorithm for finding weighted +shortest paths between nodes in the graph. + +## Data model + +One of the most common applications of graph traversal algorithms is driving +route computation, so we will use European road network graph as an example. The +graph consists of 999 major European cities from 39 countries in total. Each +city is connected to the country it belongs to via an edge of type `:In_`. There +are edges of type `:Road` connecting cities less than 500 kilometers apart. +Distance between cities is specified in the `length` property of the edge. + +![Road network](../../data/road_network_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/europe-roads) and continue +with the tutorial there. Note that you will not be able to execute `write` +operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Europe road network` and continue with +the tutorial. + +## Example queries + +**1\.** Let's list all of the countries in our road network. + +```cypher +MATCH (c:Country) +RETURN c.name +ORDER BY c.name; +``` + +**2\.** Which Croatian cities are in our road network? + +```cypher +MATCH (c:City)-[:In_]->(:Country {name: "Croatia"}) +RETURN c.name +ORDER BY c.name; +``` + +**3\.** Which cities in our road network are less than 200 km away from Zagreb? + +```cypher +MATCH (:City {name: "Zagreb"})-[r:Road]->(c:City) +WHERE r.length < 200 +RETURN c.name +ORDER BY c.name; +``` + +Now let's try some queries using Memgraph's graph traversal capabilities. + +**4\.** Say you want to drive from Zagreb to Paris. You might wonder, what is +the least number of cities you have to visit if you don't want to drive more +than 500 kilometers between stops. Since the edges in our road network don't +connect cities that are more than 500 km apart, this is a great use case for the +breadth-first search (BFS) algorithm. + +```cypher +MATCH p = (:City {name: "Zagreb"}) + -[:Road * bfs]-> + (:City {name: "Paris"}) +RETURN nodes(p); +``` + +**5\.** What if we want to bike to Paris instead of driving? It is unreasonable +(and dangerous!) to bike 500 km per day. Let's limit ourselves to biking no more +than 200 km in one go. + +```cypher +MATCH p = (:City {name: "Zagreb"}) + -[:Road * bfs (e, v | e.length <= 200)]-> + (:City {name: "Paris"}) +RETURN nodes(p); +``` + +"What is this special syntax?", you might wonder. + +`(e, v | e.length <= 200)` is called a _filter lambda_. It's a function that +takes an edge symbol `e` and a vertex symbol `v` and decides whether this edge +and vertex pair should be considered valid in breadth-first expansion by +returning true or false (or Null). In the above example, lambda is returning +true if edge length is not greater than 200, because we don't want to bike more +than 200 km in one go. + +**6\.** Let's say we also don't want to visit Vienna on our way to Paris, +because we have a lot of friends there and visiting all of them would take up a +lot of our time. We just have to update our filter lambda. + +```cypher +MATCH p = (:City {name: "Zagreb"}) + -[:Road * bfs (e, v | e.length <= 200 AND v.name != "Vienna")]-> + (:City {name: "Paris"}) +RETURN nodes(p); +``` + +As you can see, without the additional restriction we could visit 11 cities. If +we want to avoid Vienna, we must visit at least 12 cities. + +**7\.** Instead of counting the cities visited, we might want to find the +shortest paths in terms of distance travelled. This is a textbook application of +Dijkstra's algorithm. The following query will return the list of cities on the +shortest path from Zagreb to Paris along with the total length of the path. + +```cypher +MATCH p = (:City {name: "Zagreb"}) + -[:Road * wShortest (e, v | e.length) total_weight]-> + (:City {name: "Paris"}) +RETURN nodes(p) AS cities, total_weight; +``` + +As you can see, the syntax is quite similar to breadth-first search syntax. +Instead of a filter lambda, we need to provide a _weight lambda_ and the _total +weight symbol_. Given an edge and vertex pair, weight lambda must return the +cost of expanding to the given vertex using the given edge. The path returned +will have the smallest possible sum of costs and it will be stored in the total +weight symbol. A limitation of Dijkstra's algorithm is that the cost must be +non-negative. + +**8\.** We can also combine weight and filter lambdas in the shortest-path +query. Let's say we're interested in the shortest path that doesn't require +travelling more that 200 km in one go for our bike route. + +```cypher +MATCH p = (:City {name: "Zagreb"}) + -[:Road * wShortest (e, v | e.length) total_weight (e, v | e.length <= 200)]-> + (:City {name: "Paris"}) +RETURN nodes(p) AS cities, total_weight; +``` + +**9\.** Let's try and find 10 cities that are furthest away from Zagreb. + +```cypher +MATCH (:City {name: "Zagreb"}) + -[:Road * wShortest (e, v | e.length) total_weight]-> + (c:City) +RETURN c, total_weight +ORDER BY total_weight DESC +LIMIT 10; +``` + +It is not surprising to see that they are all in Siberia. + +To learn more about these algorithms, we suggest you check out their Wikipedia +pages: + +- [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search) +- [Dijkstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) diff --git a/docs2/querying/exploring-datasets/football-transfers.md b/docs2/querying/exploring-datasets/football-transfers.md new file mode 100644 index 00000000000..fe5ff86f079 --- /dev/null +++ b/docs2/querying/exploring-datasets/football-transfers.md @@ -0,0 +1,374 @@ +--- +id: football-transfers +title: Football transfers +sidebar_label: Football transfers +--- + +This article is a part of a series intended to show how to use Memgraph on +real-world data to retrieve some interesting and useful information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +Football is a word that could mean one of several sports. In this article, we +are referring to the best-known type of football, association football. In North +America, South Africa, and Australia, to avoid confusion with other types of +football, it is called "soccer". + +In professional football, a transfer is the action taken whenever a player under +contract moves between teams. It refers to the transferring of a player's +registration from one association football club to another. In general, the +players can only be transferred during a transfer window and according to the +rules. The transfer window is a period during the year in which a football team +can transfer players. There are two transfer windows per season: winter and +summer windows. Winter transfer windows are throughout January while the summer +windows are from July till August. + +Usually some sort of compensation is paid for the player's rights, which is +known as a transfer fee. When a player moves from one team to another, their old +contract is terminated and they negotiate a new one with the team they are +moving to. In some cases, however, transfers can function similarly to player +trades, as teams can offer another player on their team as part of the fee. + +As you may presume, there is a lot of money involved in the game of transfers. +According to FIFA, in 2018, from January till September, there were 15,626 +international transfers with fees totaling US$ 7.5 billion dollars. + +Football season is that part of the year during which football matches are held. +A typical football season is generally from August/September to May, although in +some countries, such as Northern Europe or East Asia, the season starts in the +spring and finishes in autumn due to weather conditions encountered during the +winter. + +## Data model + +In this article, we will present a graph model of football transfers from season +1992/1993 to season 2019/2020 in following five leagues: + +- English Premier League +- French Ligue 1 +- German Bundesliga +- Italian Serie A +- Spanish Primera Division + +The model consists of the following nodes: + +- `Team` - a football team with a property `name` (e.g. `"FC Barcelona"`). +- `Player` - a professional football player, contains properties `name` (e.g. + `"Luka Modric"`) and `position` (e.g. `"Central Midfield"`). +- `League` - a football league where multiple teams play in, contains one + property `name` (e.g. `"Premier League"`). +- `Transfer` - represents football transfer that connects a `Player` that is + transferred from one `Team` to another `Team` within a `Season`. Transfer + contains one optional property `fee` (e.g. `80.50`) that represents a transfer + fee in millions of euros and one regular property `year` (e.g. `1995`) that + represents how old was a player when the transfer occurred. +- `Season` - a football season with two properties `name` (e.g. `"2019/2020"`) + and `year` (e.g. `2019`). + +Nodes are connected with the following edges: + +- `:TRANSFERRED_FROM` - connects team node `Team` to node `Transfer` + representing a team where the player is being transferred from. +- `:TRANSFERRED_TO` - connects node `Transfer` to team node `Team` where player + is being transferred to. +- `:TRANSFERRED_IN` - connects player node `Player` to node `Transfer` + representing a player that was transferred in the connected transfer. +- `:HAPPENED_IN` - connects node `Transfer` to the node `Season` in which + transfer has happened. +- `:PLAYS_IN` - connects node `Team` that plays in league node `League`. + +![Football transfers](../../data/football_transfers_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/football-transfers) and +continue with the tutorial there. Note that you will not be able to execute +`write` operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Football player's transfers` and +continue with the tutorial. + +## Example queries using Cypher + +In the queries below, we are using [Cypher](/cypher-manual) to query Memgraph +via the console. + +Now when we have a dataset of football transfers from season 1992/1993 to season +2019/2020 loaded in Memgraph, we are ready to gain some information out of it. + +**1\.** Let's say you want to find 20 most expensive transfers. As mentioned +before, transfers fees are represented in millions of euros. + +```cypher +MATCH (t:Transfer)<-[:TRANSFERRED_IN]-(p:Player) +WHERE t.fee IS NOT NULL +RETURN round(t.fee) + 'M €' AS transfer_fee, p.name AS player_name +ORDER BY t.fee DESC +LIMIT 20; +``` + +**2\.** What about finding the most expensive transfer per season? + +```cypher +MATCH (s:Season)<-[:HAPPENED_IN]-(t:Transfer)<-[:TRANSFERRED_IN]-(:Player) +WHERE t.fee IS NOT NULL +WITH s.name AS season_name, max(t.fee) AS max_fee +RETURN round(max_fee) + 'M €' AS max_transfer_fee, season_name +ORDER BY max_fee DESC; +``` + +**3\.** How about finding out which teams your favorite player has played for? +If you wish to check the teams for another player, replace "Sime Vrsaljko" with +the name of your favorite player. + +```cypher +MATCH (player:Player)-[:TRANSFERRED_IN]->(t:Transfer)-[]-(team:Team) +WHERE player.name = "Sime Vrsaljko" +WITH DISTINCT team +RETURN team.name AS team_name; +``` + +You might wonder why we haven't specified a direction in our Cypher traversal +with `(:Transfer)-[]-(:Team)`. As we want to find the teams that player was +transferred from (`(:Transfer)<-[]-(:Team)`) and transferred to +(`(:Transfer)-[]->(:Team)`), we want to collect both inbound and outbound +connections. In order to do so, we omit the arrow (`>`, `<`) in our Cypher +command. + +**4\.** Find players that were transferred to and played for FC Barcelona and +count them by the player game position. + +```cypher +MATCH (team:Team)<-[:TRANSFERRED_TO]-(t:Transfer)<-[:TRANSFERRED_IN]-(player:Player) +WHERE team.name = "FC Barcelona" +WITH DISTINCT player +RETURN player.position AS player_position, count(player) AS position_count, collect(player.name) AS player_names +ORDER BY position_count DESC; +``` + +**5\.** Football has seen a lot of rivalries develop between clubs during its +rich and long history. One of the most famous ones is between fierce rivals FC +Barcelona and Real Madrid. There is a term, El Clasico, for a match between +those two teams. Let's find all the transfers between FC Barcelona and Real +Madrid. + +```cypher +MATCH (m:Team)-[:TRANSFERRED_FROM]-(t:Transfer)-[:TRANSFERRED_TO]-(n:Team), + (t)<-[:TRANSFERRED_IN]-(p:Player) +WHERE + (m.name = "FC Barcelona" AND n.name = "Real Madrid") OR + (m.name = "Real Madrid" AND n.name = "FC Barcelona") +RETURN m.name AS transferred_from_team, p.name AS player_name, n.name AS transfered_to_team; +``` + +**6\.** FC Barcelona is one of the most valuable football clubs in the world. +Players often want to play there as long as possible. But what about those +players who didn't fit in well? Where do they go? + +```cypher +MATCH (m:Team)-[:TRANSFERRED_FROM]->(t:Transfer)<-[:TRANSFERRED_IN]-(p:Player), + (t)-[:TRANSFERRED_TO]->(n:Team) +WHERE m.name = "FC Barcelona" +RETURN n.name AS team_name, collect(p.name) AS player_names, count(p) AS number_of_players +ORDER BY number_of_players DESC; +``` + +**7\.** What are the teams that most players went to in season 2003/2004? The +results may surprise you. + +```cypher +MATCH (season:Season)<-[:HAPPENED_IN]-(t:Transfer)<-[:TRANSFERRED_IN]-(player:Player), + (t)-[:TRANSFERRED_TO]->(team:Team) +WHERE season.name = "2003/2004" +WITH DISTINCT player, team +RETURN team.name AS team_name, count(player) AS number_of_players, collect(player.name) AS player_names +ORDER BY number_of_players DESC, team_name +LIMIT 20; +``` + +**8\.** In great teams, there are players who seem to be irreplaceable. When +they leave, the club board is often struggling to find a proper replacement for +them. Let's find out which positions club "FC Barcelona" spent money on in +season 2015/2016. + +```cypher +MATCH (:Team)-[:TRANSFERRED_FROM]->(t:Transfer)<-[:TRANSFERRED_IN]-(player:Player), + (s:Season)<-[:HAPPENED_IN]-(t)-[:TRANSFERRED_TO]->(m:Team) +WHERE t.fee IS NOT NULL AND + s.name = "2015/2016" AND + m.name = "FC Barcelona" +RETURN collect(player.name) AS player_names, player.position AS player_position, round(sum(t.fee)) + 'M €' AS money_spent_per_position +ORDER BY money_spent_per_position DESC; +``` + +**9\.** But what was the highest transfer amount per position FC Barcelona spent +on in seasons from 1992/1993 till 2019/2020? + +```cypher +MATCH (:Team)-[:TRANSFERRED_FROM]->(t:Transfer)<-[:TRANSFERRED_IN]-(player:Player), + (t)-[:TRANSFERRED_TO]->(team:Team) +WHERE t.fee IS NOT NULL AND + team.name = "FC Barcelona" +RETURN max(t.fee) + 'M €' AS max_money_spent, player.position AS player_position +ORDER BY max_money_spent DESC; +``` + +**10\.** Now, let's find who were the most expensive players per position in +team FC Barcelona. + +```cypher +MATCH (team:Team)<-[:TRANSFERRED_TO]-(t:Transfer)<-[:TRANSFERRED_IN]-(p:Player), + (t)-[:HAPPENED_IN]->(s:Season) +WHERE t.fee IS NOT NULL AND + team.name = "FC Barcelona" +WITH p.position AS player_position, max(t.fee) AS max_fee +MATCH (p:Player)-[:TRANSFERRED_IN]->(t:Transfer)-[:TRANSFERRED_TO]->(team:Team) +WHERE p.position = player_position AND + t.fee = max_fee AND + team.name = "FC Barcelona" +RETURN max_fee, player_position, collect(p.name) AS player_names +ORDER BY max_fee DESC; +``` + +If we needed to get the maximum transfer fee per position we would only need +first `MATCH` in the above query, making it way shorter. In order to match +players with maximum transfer fees per position our query is split into two +parts: + +- First `MATCH` in the query finds the maximum transfer fee per position. +- Second `MATCH` in the query is finding all players transferred to "FC + Barcelona" with the same position and transfer fee equal to the maximum one + from the previous query. + +**11\.** If you want to find all player transfers between two clubs you can do +that also. + +```cypher +MATCH (t:Transfer)<-[:TRANSFERRED_IN]- + (player:Player)-[:TRANSFERRED_IN]-> + (:Transfer)<-[:TRANSFERRED_FROM]-(team:Team) +WHERE team.name = "FC Barcelona" +WITH player, collect(t) AS transfers +MATCH player_path = (a:Team) + -[*bfs..10 (e, n | 'Team' IN labels(n) OR ('Transfer' IN labels(n) AND n IN transfers) )]->(b:Team) +WHERE a.name = "FC Barcelona" AND + b.name = "Sevilla FC" +UNWIND nodes(player_path) AS player_path_node +WITH player_path_node, player +WHERE 'Team' IN labels(player_path_node) +WITH collect(player_path_node.name) AS team_names, player +RETURN player.name AS player_name, team_names; +``` + +In the above query, we will find all players that transferred from "FC +Barcelona" to "Sevilla FC". It will include direct transfers (from "FC +Barcelona" to "Sevilla FC") and indirect transfers (from "FC Barcelona" to one +or multiple other clubs and lastly "Sevilla FC"). That is the reason why we +started first `MATCH` with searching for all players and transfers that were +transferred from "FC Barcelona". Next up is the player transfer traversal +through transfers and teams all the way to the "Sevilla FC". + +For this part, we used the breadth-first search (BFS) algorithm with lambda +filter `(e, v | condition)`. It's a function that takes an edge symbol `e` and a +vertex symbol `v` and decides whether this edge and vertex pair should be +considered valid in breadth-first expansion by returning true or false (or +Null). In the above example, lambda is returning true if a vertex has a label +`Team` or a label `Transfer`. If a vertex is `Transfer` there is an additional +check where we need to make sure the transfer is one of the transfers of players +transferred from "FC Barcelona". It needs to be either `Team` or `Transfer` +because to get from a team that made the transfer to the team where the player +is being transferred to, we need to go through the node `Transfer` that connects +those two teams. So the traversal from "FC Barcelona" to "Sevilla FC" will go +through the following nodes: Transfer, Team, Transfer, Team, Transfer, etc. + +**12\.** In the previous query, we found all transfers between two clubs. Let's +filter out direct ones now. We need to add a small change in the query to only +get indirect transfers. + +```cypher +MATCH (player:Player)-[:TRANSFERRED_IN]->(t:Transfer)<-[:TRANSFERRED_FROM]-(barca:Team), + (t)-[:TRANSFERRED_TO]->(sevilla:Team) +WHERE barca.name = "FC Barcelona" AND + sevilla.name = "Sevilla FC" +WITH collect(player) AS players_direct_to_sevilla +MATCH (t:Transfer)<-[e:TRANSFERRED_IN]- + (player:Player)-[:TRANSFERRED_IN]-> + (:Transfer)<-[:TRANSFERRED_FROM]-(barca:Team) +WHERE barca.name = "FC Barcelona" AND + NOT player IN players_direct_to_sevilla +WITH player, collect(t) AS transfers +MATCH path_indirect = (a:Team) + -[*bfs..10 (e, n | 'Team' IN labels(n) OR ('Transfer' IN labels(n) AND n IN transfers) )]->(b:Team) +WHERE a.name = "FC Barcelona" AND + b.name = "Sevilla FC" +UNWIND nodes(path_indirect) AS player_path_node +WITH player_path_node, player +WHERE 'Team' IN labels(player_path_node) +WITH collect(player_path_node.name) AS team_names, player +RETURN player.name AS player_name, team_names; +``` + +In this query, the only difference is that we need to find players who had a +direct transfer to Sevilla first. In the next `MATCH` we use that information to +check whether players that were transferred from FC Barcelona, didn't have +direct transfer to Sevilla FC. + +If you are running this in [Memgraph Lab](https://memgraph.com/product/lab) you +can change the query a bit in order to get all nodes and edges required for a +visual graph representation of players transferring through teams. + +```cypher +MATCH (player:Player)-[:TRANSFERRED_IN]-> + (t:Transfer)<-[:TRANSFERRED_FROM]-(barca:Team) +MATCH (t)-[:TRANSFERRED_TO]->(sevilla:Team) +WHERE barca.name="FC Barcelona" AND + sevilla.name="Sevilla FC" +WITH collect(player) AS players_direct_to_sevilla +MATCH (t:Transfer)<-[e:TRANSFERRED_IN]- + (player:Player)-[:TRANSFERRED_IN]-> + (tr:Transfer)<-[:TRANSFERRED_FROM]-(barca:Team) +WHERE barca.name = "FC Barcelona" AND + NOT player IN players_direct_to_sevilla +WITH player, collect(t) AS transfers, collect(e) AS player_to_transfers +MATCH path_indirect = (a:Team) + -[*bfs..10 (e, n | 'Team' IN labels(n) OR ('Transfer' IN labels(n) AND n IN transfers) )]->(b:Team) +WHERE a.name = "FC Barcelona" AND + b.name = "Sevilla FC" +UNWIND player_to_transfers AS player_to_transfer +RETURN player, player_to_transfer, path_indirect; +``` + +MemgraphLab graph visual representation draws nodes and edges from query +results. If you only have nodes in the results then only nodes will be drawn on +the canvas. If you have both nodes and edges present in the results, MemgraphLab +is able to draw nodes and connections between them because it has all the +information relevant for drawing. + +In order to change the query to accommodate that, we need to change the types of +results that are returned and collect any missing edge or node information +throughout the query. The first part of the query where we check whether the +player was transferred from "FC Barcelona" to "Sevilla FC" stays the same. In +order to draw all connections from players to transfers, we need to collect +edges connecting them. That is the reason why we are collecting edges `e` +through variable `player_to_transfers` because it contains information on which +player is connected to which transfer. With that in mind, our results contain +all the information for the graph visual: + +- A path that contains `Transfer` and `Team` nodes, and all the edges collected + on the `Team` to `Team` traversal +- A list of `Player` nodes +- A list of `Player - Transfer` edges + +Here is a picture of how it will look if you run the query in MemgraphLab. + +![football_transfers_MemgraphLab_visual](../../data/football_transfers_MemgraphLab_visual.png) diff --git a/docs2/querying/exploring-datasets/got-deaths.md b/docs2/querying/exploring-datasets/got-deaths.md new file mode 100644 index 00000000000..35819cf09c1 --- /dev/null +++ b/docs2/querying/exploring-datasets/got-deaths.md @@ -0,0 +1,280 @@ +--- +id: got-deaths +title: Game of Thrones deaths +sidebar_label: Game of Thrones deaths +--- + +This article is part of a series intended to show how to use Memgraph on +real-world data to retrieve some interesting and useful information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +**WARNING** - this tutorial could contain Game of Thrones **_spoilers_**. + +Game of Thrones is an American fantasy drama television series created by David +Benioff and D. B. Weiss for HBO. It is an adaptation of A Song of Ice and Fire, +George R. R. Martin's series of fantasy novels, the first of which is A Game of +Thrones. The Game of Thrones world is full of interesting characters, locations, +scenarios, unexpected deaths, and plot twists. + +Even though the COVID-19 pandemic hit the entire world in 2020 and is now +starting to become one of the worst years in recent history, 2019 was also a +huge disappointment to all Game of Thrones fans. According to user reactions, a +seven-year build-up resulted in a poorly written ending of the last season and +ruined the ending of one of the most popular shows on the planet. Nonetheless, +if you want to know how many characters would have survived if Jon Snow had +stayed dead, which House had the best Kill/Death Ratio, or who was the biggest +traitor in the show, you came to the right place! + +## Data model + +Although the Game of Thrones TV show is based on a series of books, our graph +database contains only characters from the previously mentioned TV shows as the +books are still not finished. This tutorial would not be possible without data +analyst David Murphy who shared his [collection of on-screen +deaths](https://data.world/datasaurusrex/game-of-thones-deaths). For more +information, you can visit his +[blogpost](https://datasaurus-rex.com/gallery/gotviz-mkiii) with interactive +analysis on the show deaths. We won't be working with kills and deaths that +happened off-screen or were tied to the undead (wraiths). The dataset we used +was slightly modified, in which columns "Episode Name" and "IMDb Rating" were +added. + +The model consists of the following nodes: + +- a `Character` node has a `name` attribute corresponding to the character's + name (e.g. `"Jon Snow"`) +- an `Allegiance` node has a `name` attribute corresponding to the house name + the character is loyal to (e.g. `"House Stark"`) +- a `Death` node has an `order` attribute corresponding to the order in which + the death happened in the show (e.g. `602`) +- an `Episode` node has a `number` attribute corresponding to the number of + episodes (e.g. `10`), `name` attribute corresponding to the name of the + episode (e.g. `"Mothers Mercy"`) and `imdb_rating` episode corresponding to + the IMDB rating of the episode (e.g. "9.1") +- a `Season` node has a `number` attribute corresponding to the number of the + season (e.g. `10`) +- a `Location` node has a `name` attribute corresponding to the location's name + (e.g. `"Castle Black"`) + +Nodes are connected with the following edges: + +- `:KILLED` - connect two Character nodes and they have 2 attributes, `method` + which says how was the character killed (e.g. `"Knife"`) and `count` attribute + representing a number of how many of these characters were killed (e.g. if + `"Jon Snow"` killed `10` `"Lannister soldiers"` then on this edge `count` + would be `10`) +- `:LOYAL_TO` - connects `Character` node with `Allegiance` node representing an + allegiance the character is loyal to +- `:VICTIM_IN` and `:KILLER_IN` - connects `Character` node with `Death` node in + which death happened +- `:HAPPENED_IN` - connects node `Death` with `Episode`, `Season` and `Location` + nodes representing details of the death +- `:PART_OF` connects node `Episode` with `Season` node which episode is part of + +![GOT deaths](../../data/got-deaths.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/game-of-thrones-deaths) and +continue with the tutorial there. Note that you will not be able to execute +`write` operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Game of Thrones deaths` and continue +with the tutorial. + +## Example queries using Cypher + +In the queries below, we are using [Cypher](/cypher-manual) to query Memgraph +via the console. + +Here are some queries you might find interesting: + +**MINI-GAME** - If you have watched the TV Show, try to guess each result before +executing the query! + +**1\.** Let's start with a couple of simple queries. List the locations where +most deaths occurred. Can you guess which one is it? + +```cypher +MATCH (l:Location)<-[:HAPPENED_IN]-(d:Death) +RETURN l.name AS location_name, count(d) AS death_count +ORDER BY death_count DESC; +``` + +**2\.** Now that we have the location with the most deaths, let's list the +episodes with the most deaths as well. + +```cypher +MATCH (d:Death)-[:HAPPENED_IN]->(e:Episode) +RETURN e.name AS episode_name, count(d) AS kill_count +ORDER BY kill_count DESC; +``` + +**3\.** List the number of kills per season. If you have watched the show, you +should be able to guess this one. + +```cypher +MATCH (d:Death)-[:HAPPENED_IN]->(s:Season) +RETURN s.number AS season_number, count(d) AS death_count +ORDER BY season_number ASC; +``` + +**4\.** The most poorly rated season by far was the last one, but can you guess +the best one? Let's list the seasons by IMDB ratings. The problem we get with +using the `avg()` function is that it gives us too many decimals, therefore a +useful solution is given in this example using `round()`. + +```cypher +MATCH (e:Episode)-[:PART_OF]->(s:Season) +RETURN s.number AS season_name, round(100 * avg(e.imdb_rating))/100 AS rating +ORDER BY rating DESC; +``` + +**5\.** There are many methods by which characters were killed such as sword or +Dragonfire, but let's list victims of unique methods. + +```cypher +MATCH (:Character)-[k:KILLED]->(:Character) +WITH k.method AS kill_method, count(k.method) AS method_count +WHERE method_count < 2 +MATCH (killer:Character)-[k:KILLED]->(victim:Character) +WHERE k.method = kill_method +RETURN kill_method, victim.name AS victim; +``` + +**6\.** Daenerys Stormborn of House Targaryen, the First of Her Name, Queen of +the Andals and the First Men, Protector of the Seven Kingdoms, the Mother of +Dragons, the Khaleesi of the Great Grass Sea, the Unburnt, the Breaker of Chains +or shortened to "Daenerys Targaryen" in our database is the biggest killer on +the show. Let's list all the episodes she killed in as well as characters she +killed. + +```cypher +MATCH (daenerys:Character {name: 'Daenerys Targaryen'})-[:KILLED]->(victim:Character) +MATCH (daenerys)-[:KILLER_IN]->(d:Death)<-[:VICTIM_IN]-(victim) +MATCH (d)-[:HAPPENED_IN]-(e:Episode) +RETURN DISTINCT victim.name AS victim, count(d) AS kill_count, e.name AS episode_name +ORDER BY kill_count DESC; +``` + +**7\.** Houses or allegiances are one of the main aspects of Westeros. Some +houses killed more characters than others, but that doesn't matter in the end, +what matters is efficiency. Let's list the allegiances with the best Kill/Death +Ratios or KDR for short. Here we came across one additional problem. If an +allegiance had more deaths than kills, the KDR would be 0. This can easily be +fixed with the `toFloat()` function. + +```cypher +MATCH (:Character)-[death:KILLED]->(:Character)-[:LOYAL_TO]->(a:Allegiance) +WITH a, sum(death.count) AS deaths +MATCH (:Character)<-[kill:KILLED]-(:Character)-[:LOYAL_TO]->(a) +RETURN a.name AS allegiance_name, + sum(kill.count) AS kills, + deaths, + round(100 *(tofloat(sum(kill.count))/deaths))/100 AS KDR +ORDER BY KDR DESC; +``` + +**8\.** One of the best-rated episodes, Battle of the Bastards, showed us a +fight between two houses: Stark and Bolton. Let's see which one had more +casualties. + +```cypher +MATCH (c:Character)-[:LOYAL_TO]->(a:Allegiance) +MATCH (c)-[:VICTIM_IN]-(d:Death)-[:HAPPENED_IN]-(:Episode {name: 'Battle of the Bastards'}) +RETURN a.name AS house_name, count(d) AS death_count +ORDER BY death_count DESC +LIMIT 2; +``` + +**9\.** One of the biggest features of Memgraph is drawing the graphs of queries +we execute. Let's visualize all the Loyalties with Characters. Execute the +following query and head out to the `GRAPH` tab. + +```cypher +MATCH (character:Character)-[loyal_to:LOYAL_TO]-(allegiance) +RETURN character, loyal_to, allegiance; +``` + +**10\.** Remember that shocking last episode of the fifth season when they +killed Jon Snow and we totally thought he was gonna stay dead? Well, let's list +all the characters that would survive if he actually stayed dead. + +```cypher +MATCH (jon:Character {name: 'Jon Snow'})-[:KILLED]->(victim:Character) +MATCH (jon)-[:VICTIM_IN]->(jon_death:Death) +MATCH (jon)-[:KILLER_IN]->(victim_death:Death)<-[:VICTIM_IN]-(victim) +WHERE victim_death.order > jon_death.order +RETURN DISTINCT victim.name AS victim, count(victim_death) AS kill_count +ORDER BY kill_count DESC; +``` + +**11\.** If we want to see the above example in graph form, we have to add some +modifications to the query, such as saving paths to variables that could be then +written in `RETURN`. + +```cypher +MATCH (jon:Character {name: 'Jon Snow'})-[:KILLED]->(victim:Character) +MATCH (jon)-[:VICTIM_IN]->(jon_death:Death) +MATCH (jon)-[killed:KILLER_IN]->(victim_death:Death)<-[died:VICTIM_IN]-(victim) +WHERE victim_death.order > jon_death.order +RETURN jon, killed, victim_death, died, victim; +``` + +**12\.** Let's see how it looks like if we want to visualize all of Jon Snow +kills with their locations. + +```cypher +MATCH (jon:Character {name: 'Jon Snow'})-[:KILLED]->(victim:Character) +MATCH (jon)-[:KILLER_IN]->(death:Death)<-[victim_to_death:VICTIM_IN]-(victim) +MATCH (death)-[death_to_location:HAPPENED_IN]->(location:Location) +RETURN victim, victim_to_death, death, death_to_location, location +``` + +**13\.** Who do you think was the biggest traitor in terms of killing in its own +allegiance? Well, let's check it out! + +```cypher +MATCH (killer:Character)-[:KILLED]->(victim:Character) +MATCH (killer)-[:LOYAL_TO]->(a:Allegiance)<-[:LOYAL_TO]-(victim) +RETURN killer.name AS traitor, count(victim) AS kill_count +ORDER BY kill_count DESC; +``` + +**14\.** To visualize the last example, we have to add paths between nodes in +the result. + +```cypher +MATCH (killer:Character)-[killed:KILLED]->(victim:Character) +MATCH (killer)-[:LOYAL_TO]->(allegiance:Allegiance)<-[loyal_to:LOYAL_TO]-(victim) +RETURN killer, killed, victim, loyal_to, allegiance; +``` + +**15\.** Memgraph supports graph algorithms as well. Let's use Dijkstra's +shortest path algorithm to show the most gruesome path of kills. An example kill +path is: `Jon Snow` killed `5` `Lannister Soldiers` and they killed `10` `Stark +soldiers` with total `kill_count` of `15`. + +```cypher +MATCH p = (:Character)-[:KILLED * wShortest (e,v | e.count) kill_count]->(:Character) +RETURN nodes(p) AS kill_list, kill_count +ORDER BY kill_count DESC +LIMIT 1; +``` + +To learn more about these algorithms, we suggest you check out their Wikipedia +pages: + +- [Breadth-first search](https://en.wikipedia.org/wiki/Breadth-first_search) +- [Dijkstra's algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm) diff --git a/docs2/querying/exploring-datasets/graphing-the-premier-league.md b/docs2/querying/exploring-datasets/graphing-the-premier-league.md new file mode 100644 index 00000000000..cde90074589 --- /dev/null +++ b/docs2/querying/exploring-datasets/graphing-the-premier-league.md @@ -0,0 +1,159 @@ +--- +id: graphing-the-premier-league +title: Graphing the Premier League +sidebar_label: Graphing the Premier League +--- + +This article is a part of a series intended to show users how to use Memgraph on +real-world data and, by doing so, retrieve some interesting and useful +information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +[Football](https://en.wikipedia.org/wiki/Association_football) is a team sport +played between two teams of eleven players with a spherical ball. The game is +played on a rectangular pitch with a goal at each and. The object of the game is +to score by moving the ball beyond the goal line into the opposing goal. The +game is played by more than 250 million players in over 200 countries, making it +the world's most popular sport. + +In this article, we will present a graph model of a reasonably sized dataset of +football matches across world's most popular leagues. + +## Data model + +In essence, we are trying to model a set of football matches. All information +about a single match is going to be contained in three nodes and two edges. Two +of the nodes will represent the teams that have played the match, while the +third node will represent the game itself. Both edges are directed from the team +nodes to the game node and are labeled as `:Played`. + +Every bit of information regarding the data model is nicely condensed in the +following visual representation. + +![Football](../../data/football_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/football-premier-league) and +continue with the tutorial there. Note that you will not be able to execute +`write` operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Football Premier league games` and +continue with the tutorial. + +## Example queries + +**1\.** You might wonder, what leagues are supported? + +```cypher +MATCH (n:Game) +RETURN DISTINCT n.league +ORDER BY n.league; +``` + +**2\.** We have stored a certain number of seasons for each league. What is the +oldest/newest season we have included? + +```cypher +MATCH (n:Game) +RETURN DISTINCT n.league AS league, min(n.season) AS oldest, max(n.season) AS newest +ORDER BY league; +``` + +**3\.** You have already seen one game between Chelsea and Arsenal, let's list +all of them in chronological order. + +```cypher +MATCH (n:Team {name: "Chelsea"})-[e:Played]->(w:Game)<-[f:Played]-(m:Team {name: "Arsenal"}) +RETURN w.date AS date, e.side AS chelsea, f.side AS arsenal, + w.FT_home_score AS home_score, w.FT_away_score AS away_score +ORDER BY date; +``` + +**4\.** How about filtering games in which Chelsea won? + +```cypher +MATCH (n:Team {name: "Chelsea"})-[e:Played {outcome: "won"}]-> + (w:Game)<-[f:Played]-(m:Team {name: "Arsenal"}) +RETURN w.date AS date, e.side AS chelsea, f.side AS arsenal, + w.FT_home_score AS home_score, w.FT_away_score AS away_score +ORDER BY date; +``` + +**5\.** Home field advantage is a thing in football. Let's list the number of +home defeats for each Premier League team in the 2016/2017 season. + +```cypher +MATCH (n:Team)-[:Played {side: "home", outcome: "lost"}]-> + (w:Game {league: "ENG-Premier League", season: 2016}) +RETURN n.name AS team, count(w) AS home_defeats +ORDER BY home_defeats, team; +``` + +**6\.** At the end of the season the team with the most points wins the league. +For each victory, a team is awarded 3 points and for each draw it is awarded 1 +point. Let's find out how many points did reigning champions (Chelsea) have at +the end of 2016/2017 season. + +```cypher +MATCH (n:Team {name: "Chelsea"})-[:Played {outcome: "drew"}]->(w:Game {season: 2016}) +WITH n, count(w) AS draw_points +MATCH (n)-[:Played {outcome: "won"}]->(w:Game {season: 2016}) +RETURN draw_points + 3 * count(w) AS total_points; +``` + +**7\.** In fact, why not retrieve the whole table? + +```cypher +MATCH (n)-[:Played {outcome: "drew"}]->(w:Game {league: "ENG-Premier League", season: 2016}) +WITH n, count(w) AS draw_points +MATCH (n)-[:Played {outcome: "won"}]->(w:Game {league: "ENG-Premier League", season: 2016}) +RETURN n.name AS team, draw_points + 3 * count(w) AS total_points +ORDER BY total_points DESC; +``` + +**8\.** People have always debated which of the major leagues is the most +exciting. One basic metric is the average number of goals per game. Let's see +the results at the end of the 2016/2017 season. WARNING: This might shock you. + +```cypher +MATCH (w:Game {season: 2016}) +RETURN w.league, avg(w.FT_home_score) + avg(w.FT_away_score) AS avg_goals_per_game +ORDER BY avg_goals_per_game DESC; +``` + +**9\.** Another metric might be the number of comebacks—games where one +side was winning at half time but were overthrown by the other side by the end +of the match. Let's count such occurrences during all supported seasons across +all supported leagues. + +```cypher +MATCH (g:Game) +WHERE (g.HT_result = "H" AND g.FT_result = "A") OR + (g.HT_result = "A" AND g.FT_result = "H") +RETURN g.league AS league, count(g) AS comebacks +ORDER BY comebacks DESC; +``` + +**10\.** Exciting leagues also tend to be very unpredictable. On that note, +let's list all triplets of teams where, during the course of one season, team A +won against team B, team B won against team C and team C won against team A. + +```cypher +MATCH (a)-[:Played {outcome: "won"}]->(p:Game {league: "ENG-Premier League", season: 2016})<-- + (b)-[:Played {outcome: "won"}]->(q:Game {league: "ENG-Premier League", season: 2016})<-- + (c)-[:Played {outcome: "won"}]->(r:Game {league: "ENG-Premier League", season: 2016})<--(a) +WHERE p.date < q.date AND q.date < r.date +RETURN a.name AS team1, b.name AS team2, c.name AS team3; +``` diff --git a/docs2/querying/exploring-datasets/marvel-universe.md b/docs2/querying/exploring-datasets/marvel-universe.md new file mode 100644 index 00000000000..71b202f050b --- /dev/null +++ b/docs2/querying/exploring-datasets/marvel-universe.md @@ -0,0 +1,264 @@ +--- +id: marvel-universe +title: Marvel Comic Universe social network +sidebar_label: Marvel Comic Universe social network +--- + +This article is a part of a series intended to show how to use Memgraph on +real-world data to retrieve some interesting and useful information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +Spandex. Muscles. Big egos. Bad hair. No, we're not talking about your high +school thrash metal band. We're talking about one of the largest fictional +social networks that is the Marvel Comic Universe! Here we'll teach you how to +navigate this complex and confusing assembly of heroes and villains. If you've +ever wanted to know who's Spider-Man's best super-buddy, or wanted to find all +the comic issues where Hulk, Wolverine, Thor, and Black Panther appear together, +look no further and fire up that Memgraph copy of yours! + +## Data model + +Although the MCU is chock-full of heroes, the real hero here is Russ Chappell, +who painstakingly gathered the MCU data for the [Marvel Chronology +Project](http://www.chronologyproject.com). In addition, R. Alberich, J. +Miro-Julia, and F. Rossello, three data scientists, scraped the Chronology +Project database, processed the data and put it into a format that can be easily +imported into any data-processing framework available today. Their aim was to +investigate whether this fictional "social network" has a structure similar to a +real-life social network. You can find their interesting findings in the paper +that was the culmination of their work, linked +[here](https://arxiv.org/pdf/cond-mat/0202174.pdf). The data they used, on the +other hand, can be found +[here](https://www.kaggle.com/csanhueza/the-marvel-universe-social-network). +We've used a slightly modified version of this data to create a graph database +snapshot ready for use. + +Now, the data we'll be using in our queries can be classified as follows: + +- nodes, labeled as "Hero", "Comic", or "ComicSeries" + - a "Hero" node has a "name" attribute corresponding to both a hero's moniker + and her/his real name (e.g. "SPIDER-MAN/PETER PARKER") + - a "Comic" node has a "name" attribute corresponding to the comic series name + and the issue/volume number if it's included (e.g. "Astonishing Tales Vol. 2 + 12") + - a "ComicSeries" node has a "title" attribute corresponding to the title of + the series a given comic is a part of, e.g. the "Comic" node "AVENGERS VOL. + 3 17" is part of the "AVENGERS VOL. 3" series. In addition, each + "ComicSeries" node has a "publishYear" attribute, which is a list of years + in which the series was published. +- edges, of type "AppearedIn", "AppearedInSameComic", or "IsPartOfSeries" + - edges connecting a "Hero" node to the "Comic" node it appears in are of type + "AppearedIn" + - edges connecting two "Hero" nodes that appeared in the same comic are of + type "AppearedInSameComic" + - edges connecting a "Comic" node and its corresponding "ComicSeries" node, + representing the inclusion relationship between a particular comic issue and + the series it's part of, are of type "IsPartOfSeries" + +A visual scheme of our graph database is given below. + +![MCU](../../data/mcu_metagraph.png) + +## Exploring the dataset + +You have two options for exploring this dataset. If you just want to take a look +at the dataset and try out a few queries, open [Memgraph +Playground](https://playground.memgraph.com/sandbox/marvel-comics) and continue +with the tutorial there. Note that you will not be able to execute `write` +operations. + +On the other hand, if you would like to add changes to the dataset, download the +[Memgraph Platform](https://memgraph.com/download#memgraph-platform). Once you +have it up and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `Marvel Comic Universe social network` +and continue with the tutorial. + +## Example queries using Cypher + +In the queries below, we are, as usual, using [Cypher](/cypher-manual) to query +Memgraph via the console. + +Here are some queries you might find interesting: + +**1\.** List all the comic series present in the database, along with the number +of comics it contains: + +```cypher +MATCH (series:ComicSeries)-[:IsPartOfSeries]-(comic:Comic) +RETURN series.title AS title, count(comic) +ORDER BY title; +``` + +**2\.** List all heroes that have "SPIDER" in their name: + +If you take a peek at the Hero nodes, you'll find that their names, while +accurate in most cases, can be a bit mangled. We didn't have time to check and +update all the names that were already present. We swear! Super-busy! But, no +worries, we'll show you how to get a list of potential heroes you might be +looking for. One of the most flexible ways is to use regex matching (represented +by the regex-matching operator "=~"). + +```cypher +MATCH (hero:Hero) +WHERE hero.name =~ ".*SPIDER.*" +RETURN hero.name AS potential_spider_dude +ORDER BY potential_spider_dude; +``` + +The other option is to use the CONTAINS operator: + +```cypher +MATCH (hero:Hero) +WHERE hero.name CONTAINS "SPIDER" +RETURN hero.name AS potential_spider_dude +ORDER BY potential_spider_dude; +``` + +We recommend you search for your heroes of interest this way, which might save +you some time! + +**3\.** List all the comic issues where Spider-Man (Peter Parker) and Venom +(Eddie Brock) appear together: + +```cypher +MATCH (:Hero {name: "SPIDER-MAN/PETER PARKER"}) + -[:AppearedIn]->(c:Comic) + <-[:AppearedIn]-(:Hero {name: "VENOM/EDDIE BROCK"}) +RETURN c.name AS spidey_and_venom_comic +ORDER BY spidey_and_venom_comic; +``` + +**4\.** List all the comic series in which Spider-Man/Peter Parker appears: + +```cypher +MATCH (:Hero {name: "SPIDER-MAN/PETER PARKER"}) + -[:AppearedIn]->(c:Comic) + -[:IsPartOfSeries]-(s:ComicSeries) +RETURN DISTINCT s.title AS series +ORDER BY series; +``` + +**5\.** List 10 heroes with whom Spider-Man (Peter Parker) appeared most +frequently together: + +```cypher +MATCH (:Hero {name: "SPIDER-MAN/PETER PARKER"}) + -[:AppearedIn]->(c:Comic) + <-[:AppearedIn]-(h:Hero) +RETURN DISTINCT h AS spidey_friend, count(h) AS friend_count +ORDER BY friend_count DESC +LIMIT 10; +``` + +**6\.** Find if there's a connection between Peter Parker/Spider-Man and Beef: + +"Who the hell is Beef?", you might ask. Well, let's just run a +breadth-first-search starting from good ol' Spider-Man, with the constraint that +we stay within the "radius" of maximum 10 hops from him, and see whether there's +a way Spidey can reach Beef. According to the six degrees of separation +philosophy, we should be able to find him on some path of maximally six hops +away, but we relax that strategy a bit just to be sure. + +```cypher +MATCH p = (:Hero {name: "SPIDER-MAN/PETER PARKER"}) + -[*bfs 1..10]-(b:Hero {name: "BEEF"}) +RETURN p; +``` + +**7\.** List the 10 most popular heroes and comic series in the MCU: + +Quickly, name the five most popular heroes in the MCU! Alright, how did your +brain decide what to give as the answer? We're assuming that you have no clue, +but it vaguely has to do with the number and quality of connections each of +those heroes have in your brain. However, how to explain the concept of +"popular" to our database engine? + +Well, our philosophy is as follows - a popular hero is the one who's "known" by +more other heroes, or in terms of our MCU graph, a hero that the other heroes +have more connections (edges) to than some other hero is deemed "more popular". +We'll apply analogous reasoning to define the "most popular" comic book series +as well. This philosophy is the one underlying Google's search engine, and the +algorithm embodying it is PageRank, so it would be convenient if we could make +use of it. + +However, the query engine doesn't support PageRank out-of-the-box, so we have to +come up with a way to plug in PageRank to our database. That's precisely the +purpose of [query modules](/reference-guide/query-modules/overview.md)! + +Long story short, the query module system enables us to write C or Python +modules that can access the data stored in our graph database, do some +processing, and return the results of this processing to the query engine, so we +can perform further queries on them. In this particular case, the PageRank +algorithm is implemented as a Python module, and can be found in the query +module directory `/usr/lib/memgraph/query-modules/`, along with its description +and the examples of usage. What you as a user must know is that the pagerank +procedure automatically takes the MCU graph as an argument, and returns a record +of pairs of nodes and the corresponding rank values (rank is a number +representing the "popularity" of a given node). + +```cypher +CALL pagerank.pagerank() YIELD node, rank +WITH node, rank +WHERE node:Hero +RETURN node.name AS most_popular_heroes +ORDER BY rank DESC +LIMIT 10; +``` + +How do the results of this query match with your own list? Not bad, right? + +Now, let's figure out the most popular comic series: + +```cypher +CALL pagerank.pagerank() YIELD node, rank +WITH node, rank +WHERE node:ComicSeries +RETURN node.title AS most_popular_comic_series +ORDER BY rank DESC +LIMIT 10; +``` + +Or we can do it without query modules: + +```cypher +MATCH (hero:Hero)-[r]-() +RETURN hero.name, count(r) AS relationships +ORDER BY relationships DESC +LIMIT 10; +``` + +And that, folks, is all there is to it, so go and try out some graph magic of +your own! + +If you're interested in the PageRank algorithm, we recommend you start +[here](https://en.wikipedia.org/wiki/PageRank). + +## Nifty things you could do + +While the thing we've shown you how to do might be fun for a while, there are +loads of cool things you could do to improve the fun-factor. Here's a very short +list of things we think you could pull off: + +- we have loads of Hero nodes, so even the Hobgoblin or Magneto are deemed + "heroes", but if you were the mayor of the Marvel Comic Universe Town, you + wouldn't give those guys medals of honor, would you? It would be pretty cool + if we could classify the MCU entities into "Hero" and "Villain" categories. + Then you could ask the query engine to give you a list of Spidey's + arch-nemeses in addition to Spidey's best hero buddies. +- similar to the previous idea, it would be insanely cool if someone would add + more attributes to the heroes like "Superpower", "Level", "Affiliation", + "Signature moves" etc. If you had that, you could perhaps make a simple + Pokemon-like game where you'd randomly pick a team of villains and choose a + team of heroes to fight them. +- you could write your own query module that could run more sophisticated + analyses on the social network like closeness centrality, Louvain modularity + etc. + +Now go and use your graph database superpowers for the greater good! Although +the comic universe is full of heroes, there's always room for one more! diff --git a/docs2/querying/exploring-datasets/movie-recommendation.md b/docs2/querying/exploring-datasets/movie-recommendation.md new file mode 100644 index 00000000000..a59ac953882 --- /dev/null +++ b/docs2/querying/exploring-datasets/movie-recommendation.md @@ -0,0 +1,228 @@ +--- +id: movie-recommendation +title: Movie recommendation system +sidebar_label: Movie recommendation system +--- + +This article is a part of a series intended to show users how to use Memgraph on +real-world data and, by doing so, retrieve some interesting and useful +information. + +We highly recommend checking out the other articles from this series which are +listed in our [tutorial overview section](/tutorials/overview.md), where you +can also find instructions on how to start with the tutorial. + +## Introduction + +This example shows how to implement a simple recommendation system with +`openCypher` in Memgraph. First, we will show how to perform simple operations, +and then we will implement a query for the movie recommendation. + +## Data model + +In this example, we will use MovieLens dataset, which consists of 9742 movies across 20 genres. +There are three types of nodes: `Movie`, `User` and `Genre`. Movie nodes +have properties: `id` and `title`. Users have an `id` property, while genres nodes +have a property: `name` + +Each movie can be connected with `:OF_GENRE` relationship to different genres. A user can +rate some movies. Rating is modeled with `:RATED` relationship and this relationship has +a property `rating` — float number between 0 and 5. + +![Movies](../../data/movielens_model.png) + +## Exploring the dataset + +To follow this tutorial, download the [Memgraph +Platform](https://memgraph.com/download#memgraph-platform). Once you have it up +and running, open Memgraph Lab web application within the browser on +[`localhost:3000`](http://localhost:3000) and navigate to `Datasets` in the +sidebar. From there, choose the dataset `MovieLens: Movies, genres and users` +and continue with the tutorial. + +## Example queries + +**1\.** List first 10 movies sorted by title: + +```cypher +MATCH (movie:Movie) +RETURN movie +ORDER BY movie.title +LIMIT 10; +``` + +**2\.** List 15 users from the dataset: + +```cypher +MATCH (user:User) +RETURN user +LIMIT 15; +``` + +**3\.** List 10 movies that have _Comedy_ and _Action_ genres and sort them by +title: + +```cypher +MATCH (movie:Movie)-[:OF_GENRE]->(:Genre {name:'Action'}) +MATCH (movie)-[:OF_GENRE]->(:Genre {name:'Comedy'}) +RETURN movie.title +ORDER BY movie.title +LIMIT 10; +``` + +**4\.** Average score for _Star Wars: Episode IV - A New Hope (1977)_ movie: + +```cypher +MATCH (:User)-[r:RATED]->(:Movie {title:"Star Wars: Episode IV - A New Hope (1977)"}) +RETURN avg(r.rating) +``` + +**5\.** Return the first 10 movies that are ordered by rating: + +```cypher +MATCH (:User)-[r:RATED]->(movie:Movie) +RETURN movie.title, avg(r.rating) AS rating +ORDER BY rating DESC +LIMIT 10; +``` + +**6\.** Create a new user and rate some movies: + +```cypher +CREATE (:User {id:1000}); +``` + +**7\.** Check if new user is created: + +```cypher +MATCH (user:User{id:1000}) +RETURN user; +``` + +**8\.** Create some ratings for the user: + +```cypher +MATCH (u:User {id:1000}), (m:Movie {title:"2 Guns (2013)"}) +MERGE (u)-[:RATED {rating:3.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"21 Jump Street (2012)"}) +MERGE (u)-[:RATED {rating:3.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Toy Story (1995)"}) +MERGE (u)-[:RATED {rating:3.5}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Lion King, The (1994)"}) +MERGE (u)-[:RATED {rating:4.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Dark Knight, The (2008)"}) +MERGE (u)-[:RATED {rating:4.5}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Star Wars: Episode VI - Return of the Jedi (1983)"}) +MERGE (u)-[:RATED {rating:4.5}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Godfather, The (1972)"}) +MERGE (u)-[:RATED {rating:5.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Lord of the Rings: The Return of the King, The (2003)"}) +MERGE (u)-[:RATED {rating:4.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Aladdin (1992)"}) +MERGE (u)-[:RATED {rating:4.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Pirates of the Caribbean: The Curse of the Black Pearl (2003)"}) +MERGE (u)-[:RATED {rating:4.5}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Departed, The (2006)"}) +MERGE (u)-[:RATED {rating:4.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Texas Rangers (2001)"}) +MERGE (u)-[:RATED {rating:2.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Eve of Destruction (1991)"}) +MERGE (u)-[:RATED {rating:1.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Sharkwater (2006)"}) +MERGE (u)-[:RATED {rating:2.0}]->(m); +MATCH (u:User {id:1000}), (m:Movie {title:"Extreme Days (2001)"}) +MERGE (u)-[:RATED {rating:1.5}]->(m); +``` + +**9\.**Check all the movies user with `id = 1000` has rated: + +``` +MATCH (user:User {id:1000})-[rating:RATED]->(movie:Movie) +RETURN user, movie, rating +``` + +**10\.** Recommendation system: + +The idea is to implement simple [memory based collaborative +filtering](https://en.wikipedia.org/wiki/Collaborative_filtering). + +Let's recommend some movies for user with `id = 1000`: + +```cypher +MATCH (u:User {id:1000})-[r:RATED]-(m:Movie) + -[other_r:RATED]-(other:User) +WITH other.id AS other_id, + avg(abs(r.rating-other_r.rating)) AS similarity, + count(*) AS same_movies_rated +WHERE same_movies_rated > 2 +WITH other_id +ORDER BY similarity +LIMIT 10 +WITH collect(other_id) AS similar_user_set +MATCH (some_movie:Movie)-[fellow_rate:RATED]-(fellow_user:User) +WHERE fellow_user.id IN similar_user_set +WITH some_movie, avg(fellow_rate.rating) AS prediction_rating +RETURN some_movie.title AS Title, prediction_rating +ORDER BY prediction_rating DESC; +``` + +How does this query work? + +This query has two parts: + +- Finding similar users +- Predicting the score for some movie (recommendation) + +In the first part, we are looking for similar users. First, we need to define +similar users: Two users are considered similar if they tend to give similar +ratings to the same movies. For the target user and some other user we +are searching for the same movies: + +```cypher +MATCH (u:User {id:1000})-[r:RATED]-(m:Movie)-[other_r:RATED]-(other:User) +RETURN *; +``` +If you try to execute the query above with added `RETURN` statement, you will get all +potential similar users and movies they rated. +But this is not enough for finding similar users. We need to choose users with +the same movies and similar ratings: + +```cypher +WITH other.id AS other_id, + avg(abs(r.rating-other_r.rating)) AS similarity, + count(*) AS same_movies_rated +WHERE same_movies_rated > 2 +WITH other_id +ORDER BY similarity +LIMIT 10; +WITH collect(other_id) AS similar_user_set +``` + +Here we calculate similarities as the average distance between the target user rating +and some other user rating on the same set of movies. There are two parameters: +`same_movies_rated` defines the number of same movies (more than 3) that the target user and other users need to rate, and `similar_user_set` represents the users that gave a similar rating to the movies that the target user has rated. These parameters enable extracting the best users for movie recommendations. + +Now we have a similar user set. We will use those users to calculate the average +rating value for all movies they rated in the database as `prediction_rating` variable, and return the best-rated movies order by `prediction_rating` variable. + +```cypher +MATCH (some_movie: Movie)-[fellow_rate:RATED]-(fellow_user:User) +WHERE fellow_user.id IN similar_user_set +WITH some_movie, avg(fellow_rate.rating) AS prediction_rating +RETURN some_movie.title AS title, prediction_rating +ORDER BY prediction_rating DESC; +``` + +We encourage you to play with some parameters, like `same_movies_rated` limit and +`similar_user_set` size limit. You can also try to use different similarity +functions, for example [Euclidean +distance](https://en.wikipedia.org/wiki/Euclidean_distance): + +```cypher +sqrt(reduce(a=0, x IN collect((r.rating - other_r.rating) * (r.rating - other_r.rating)) | a + x)) AS similarity; +``` + +Here we use `reduce` function. Reduce function accumulate list elements into a +single result by applying an expression. In our query, this function starts with +0 and sums up squared differences. `collect` function is used for putting +squared differences into the list. diff --git a/docs2/querying/expressions.md b/docs2/querying/expressions.md new file mode 100644 index 00000000000..81e4651e10c --- /dev/null +++ b/docs2/querying/expressions.md @@ -0,0 +1,90 @@ +--- +id: expressions +title: Expressions +sidebar_label: Expressions +--- + +The following sections describe some of the other supported features. + +## String operators + +Apart from comparison and concatenation operators Cypher provides special +string operators for easier matching of substrings: + +| Operator | Description | +| ----------------- | ---------------------------------------------------------------- | +| `a STARTS WITH b` | Returns true if the prefix of string a is equal to string b. | +| `a ENDS WITH b` | Returns true if the suffix of string a is equal to string b. | +| `a CONTAINS b` | Returns true if some substring of string a is equal to string b. | + +## Parameters + +When automating the queries for Memgraph, it comes in handy to change only some +parts of the query. Usually, these parts are values that are used for filtering +results or similar, while the rest of the query remains the same. + +Parameters allow reusing the same query but with different parameter values. The +syntax uses the `$` symbol to designate a parameter name. We don't allow old +Cypher parameter syntax using curly braces. For example, you can parameterize +filtering a node property: + +```cypher +MATCH (node1 {property: $propertyValue}) RETURN node1; +``` + +You can use parameters instead of any literal in the query. Using parameters as +property maps is partially supported in `CREATE`, but not in `MATCH` nor `MERGE` +clause. For example, the following query is illegal: + +```cypher +MATCH (n $propertyMap) RETURN n; +``` + +but this is supported: + +```cypher +CREATE (n $propertyMap) RETURN n; +``` + +To use parameters with a Python driver use the following syntax: + +```python +session.run('CREATE (alice:Person {name: $name, age: $ageValue}', + name='Alice', ageValue=22)).consume() +``` + +To use parameters whose names are integers, you will need to wrap parameters in +a dictionary and convert them to strings before running a query: + +```python +session.run('CREATE (alice:Person {name: $0, age: $1}', + {'0': "Alice", '1': 22})).consume() +``` + +To use parameters with some other driver, please consult appropriate +documentation. + +## CASE + +Conditional expressions can be expressed in the Cypher language with the `CASE` +expression. A simple form is used to compare an expression against multiple +predicates. For the first matched predicate result of the expression provided +after the `THEN` keyword is returned. If no expression is matched value +following `ELSE` is returned is provided, or `null` if `ELSE` is not used: + +```cypher +MATCH (n) +RETURN CASE n.currency WHEN "DOLLAR" THEN "$" WHEN "EURO" THEN "€" ELSE "UNKNOWN" END; +``` + +In generic form, you don't need to provide an expression whose value is compared +to predicates, but you can list multiple predicates and the first one that +evaluates to true is matched: + +```cypher +MATCH (n) +RETURN CASE WHEN n.height < 30 THEN "short" WHEN n.height > 300 THEN "tall" END; +``` + +Most expressions that take `null` as input will produce `null`. This includes boolean expressions that are used as +predicates. In this case, anything that is not true is interpreted as being false. This also concludes that logically `null!=null`. \ No newline at end of file diff --git a/docs2/querying/functions.md b/docs2/querying/functions.md new file mode 100644 index 00000000000..c7f2c013da1 --- /dev/null +++ b/docs2/querying/functions.md @@ -0,0 +1,138 @@ +--- +id: functions +title: Functions +sidebar_label: Functions +--- + +## User-defined Memgraph Magic functions + +Memgraph offers the flexibility of implementing custom functions. When supported built-in functions are not enough, there is an option to define a custom one by using C, C++, Python or Rust. The mechanism of [query modules](/memgraph/reference-guide/query-modules) enables the integration of custom functionalities. + +Semantically, functions should be a small fragment of functionality that does not require long computations and large memory consumption. The only requirement for functions is to not modify the graph. Mentioned functionality offers flexibility in terms of nested calls within the Cypher. + +## Supported built-in functions + +This section contains the list of supported functions. + +### Temporal functions + + | Name | Signature | Description | + | --------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------- | + | `duration` | duration(value: string\|Duration) -> (Duration) | Returns the data type that represents a period of time. | + | `date` | date(value: string\|Date) -> (Date) | Returns the data type that represents a date with year, month, and day. | + | `localTime` | localTime(value: string\|LocalTime) -> (LocalTime) | Returns the data type that represents time within a day without timezone. | + | `localDateTime` | localDateTime(value: string\|LocalDateTime)-> (LocalDateTime) | Returns the data type that represents a date and local time. | + + ### Scalar functions + + | Name | Signature | Description | + | ------------ | ------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | + | `assert` | `assert(expression: boolean, message: string = null) -> ()` | Raises an exception if the given argument is not `true`. | + | `coalesce` | `coalesce(expression: any [, expression: any]*) -> (any)` | Returns the first non-`null` value in the given list of expressions. | + | `counter` | `counter(name: string, initial-value: integer, increment: integer = 1) -> (integer)` | Generates integers that are guaranteed to be unique within a single query for a given counter name. The increment parameter can be any integer besides zero. | + | `degree` | `degree(node: Node) -> (integer)` | Returns the number of relationships (both incoming and outgoing) of a node. | + | `outDegree` | `outDegree(node: Node) -> (integer)` | Returns the number of outgoing relationships of a node. | + | `inDegree` | `inDegree(node: Node) -> (integer)` | Returns the number of incoming relationships of a node. | + | `endNode` | `endNode(relationship: Relationship) -> (Node)` | Returns the destination node of a relationship. | + | `head` | `head(list: List[any]) -> (any)` | Returns the first element of a list. | + | `id` | id(value: Node\|Relationship) -> (integer) | Returns identifier for a given node or relationship. The identifier is generated during the initialization of a node or a relationship and will be persisted through the durability mechanism. | + | `last` | `last(list: List[any]) -> (any)` | Returns the last element of a list. | + | `properties` | properties(value: Node\|Relationship) -> (Map[string, any]) | Returns the property map of a node or a relationship. | + | `size` | size(value: List[any]\|string\|Map[string, any]\|Path) -> (integer) | Returns the number of elements in the value. When given a **list** it returns the size of the list. When given a string it returns the number of characters. When given a path it returns the number of expansions (relationships) in that path. | + | `startNode` | `startNode(relationship: Relationship) -> (Node)` | Returns the starting node of a relationship. | + | `toBoolean` | toBoolean(value: boolean\|integer\|string) -> (boolean) | Converts the argument to a boolean. | + | `toFloat` | toFloat(value: number\|string) -> (float) | Converts the argument to a floating point number. | + | `toInteger` | toInteger(value: boolean\|number\|string) -> (integer) | Converts the argument to an integer. | + | `toString` | toString(value: string\|number\|Date\|LocalTime\|LocalDateTime\|Duration\|boolean) -> (string) | Converts the argument to a string. | + | `type` | `type(relationship: Relationship) -> (string)` | Returns the type of a relationships as a character string. | + | `timestamp` | `timestamp() -> (integer)` | Returns the difference, measured in microseconds, between the current time and midnight, January 1, 1970 UTC. | + +### Pattern functions + | Name | Signature | Description | + | --------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | + | `exists` | `exists(pattern: Pattern)` | Checks if a pattern exists as part of the filtering clause. Symbols provided in the MATCH clause can also be used here. | + + ### Lists + + | Name | Signature | Description | + | --------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | + | `all` | `all(variable IN list WHERE predicate)` | Check if all elements of a list satisfy a predicate.
NOTE: Whenever possible, use Memgraph's lambda functions when matching instead. | + | `any` | `any(element IN list WHERE predicate_using_element)` | Check if any element in the list satisfies the predicate. | + | `extract` | extract(variable IN list\|expression) | A list of values obtained by evaluating an expression for each element in list. | + | `keys` | keys(value: Node\|Relationship) -> (List[string]) | Returns a list keys of properties from a relationship or a node. Each key is represented as string. | + | `labels` | `labels(node: Node) -> (List[string])` | Returns a list of labels from a node. Each label is represented as string. | + | `nodes` | `nodes(path: Path) -> (List[Node])` | Returns a list of nodes from a path. | + | `range` | `range(start-number: integer, end-number: integer, increment: integer = 1) -> (List[integer])` | Constructs a list of value in given range. | + | `reduce` | reduce(accumulator = initial_value, variable IN list\|expression) | Accumulate list elements into a single result by applying an expression. | + | `relationships` | `relationships(path: Path) -> (List[Relationship])` | Returns a list of relationships (edges) from a path. | + | `single` | `single(variable IN list WHERE predicate)` | Check if only one element of a list satisfies a predicate. | + | `tail` | `tail(list: List[any]) -> (List[any])` | Returns all elements after the first of a given list. | + | `uniformSample` | `uniformSample(list: List[any], size: integer) -> (List[any])` | Returns elements of a given list randomly oversampled or undersampled to desired size | + + +### Math functions + + | Name | Signature | Description | + | ------- | --------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- | + | `abs` | abs(number: integer\|float) -> (integer\|float) | Returns the absolute value of a number. | + | `acos` | acos(number: integer\|float) -> (float) | Calculates the arccosine of a number between -1 and 1 in radians. | + | `asin` | asin(number: integer\|float) -> (float) | Calculates the arcsine of a number between -1 and 1 in radians. | + | `atan` | atan(number: integer\|float) -> (float) | Calculates the arctangent of a given number in radians. | + | `atan2` | atan2(y: integer\|float, x: integer\|float) -> (float) | Calculates a unique arctangent value from a set of coordinates in radians. | + | `ceil` | `ceil(number: float) -> (integer)` | Returns the smallest integer greater than or equal to the given float number. | + | `cos` | cos(number: integer\|float) -> (float) | Calculates the cosine of an angle specified in radians. | + | `e` | `e() -> (float)` | Returns the base of the natural logarithm (2.71828).. | + | `exp` | exp(number: integer\|float) -> (float) | Calculates `e^n` where `e` is the base of the natural logarithm, and `n` is the given number. | + | `floor` | `floor(number: float) -> (integer)` | Returns the largest integer smaller than or equal to the given float number. | + | `log` | log(number: integer\|float) -> (float) | Calculates the natural logarithm of a given number. | + | `log10` | log10(number: integer\|float) -> (float) | Calculates the logarithm (base 10) of a given number. | + | `pi` | `pi() -> (float)` | Returns the constant *pi* (3.14159). | + | `rand` | `rand() -> (float)` | Returns a random floating point number between 0 (inclusive) and 1 (exclusive). | + | `round` | `round(number: float) -> (integer)` | Returns the number, rounded to the nearest integer. Tie-breaking is done using the *commercial rounding*, where -1.5 produces -2 and 1.5 produces 2. | + | `sign` | sign(number: integer\| float) -> (integer) | Applies the signum function to a given number and returns the result. The signum of positive numbers is 1, of negative -1 and for 0 returns 0. | + | `sin` | sin(number: integer\|float) -> (float) | Calculates the sine of an angle specified in radians. | + | `sqrt` | sqrt(number: integer\|float) -> (float) | Calculates the square root of a given number. | + | `tan` | tan(number: integer\|float) -> (float) | Calculates the tangent of an angle specified in radians. | + + + ### Aggregation functions + + | Name | Signature | Description | + | --------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | + | `avg` | avg(row: int\|float] -> (float) | Returns an average value of rows with numerical values generated with the `MATCH` or `UNWIND` clause. | + | `collect` | `collect(values: any) -> (List[any])` | Returns a single aggregated list containing returned values. | + | `count` | `count(values: any) -> (integer)` | Counts the number of non-null values returned by the expression. | + | `max` | max(row: integer\|float) -> (integer\|float) | Returns the maximum value in a set of values. | + | `min` | min(row: integer\|float) -> (integer\|float) | Returns the minimum value in a set of values. | + | `sum` | sum(row: integer\|float) -> (integer\|float) | Returns a sum value of rows with numerical values generated with the `MATCH` or `UNWIND` clause. | + + ### Graph projection functions + + | Name | Signature | Description | + | --------- | ------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------- | + | `project` | project(row: path) -> map("nodes":list[Node], "edges":list[Edge])| Creates a projected graph consisting of nodes and edges from aggregated paths.| + + +:::info +All aggregation functions can be used with the `DISTINCT` operator to perform calculations only on unique values. For example, `count(DISTINCT n.prop)` and `collect(DISTINCT n.prop)`. +::: + +### String functions + +| Name | Signature | Description | +| ------------ | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | +| `contains` | `contains(string: string, substring: string) -> (boolean)` | Check if the first argument has an element which is equal to the second argument. | +| `endsWith` | `endsWith(string: string, substring: string) -> (boolean)` | Check if the first argument ends with the second. | +| `left` | `left(string: string, count: integer) -> (string)` | Returns a string containing the specified number of leftmost characters of the original string. | +| `lTrim` | `lTrim(string: string) -> (string)` | Returns the original string with leading whitespace removed. | +| `replace` | `replace(string: string, search-string: string, replacement-string: string) -> (string)` | Returns a string in which all occurrences of a specified string in the original string have been replaced by another (specified) string. | +| `reverse` | `reverse(string: string) -> (string)` | Returns a string in which the order of all characters in the original string have been reversed. | +| `right` | `right(string: string, count: integer) -> (string)` | Returns a string containing the specified number of rightmost characters of the original string. | +| `rTrim` | `rTrim(string: string) -> (string)` | Returns the original string with trailing whitespace removed. | +| `split` | `split(string: string, delimiter: string) -> (List[string])` | Returns a list of strings resulting from the splitting of the original string around matches of the given delimiter. | +| `startsWith` | `startsWith(string: string, substring: string) -> (boolean)` | Check if the first argument starts with the second. | +| `substring` | `substring(string: string, start-index: integer, length: integer = null) -> (string)` | Returns a substring of the original string, beginning with a 0-based index start and length. | +| `toLower` | `toLower(string: string) -> (string)` | Returns the original string in lowercase. | | +| `toUpper` | `toUpper(string: string) -> (string)` | Returns the original string in uppercase. | +| `trim` | `trim(string: string) -> (string)` | Returns the original string with leading and trailing whitespace removed. | + diff --git a/docs2/querying/performance-optimization.md b/docs2/querying/performance-optimization.md new file mode 100644 index 00000000000..12f6143399a --- /dev/null +++ b/docs2/querying/performance-optimization.md @@ -0,0 +1,291 @@ +--- +id: performance-optimization +title: Performance optimization +--- + +The `ANALYZE GRAPH` will check and calculate certain properties of a graph so +the database can choose a more optimal index or `MERGE` transaction. + +Before the introduction of the `ANALYZE GRAPH` query, the database would choose +an index solely based on the number of indexed nodes. But if the number of nodes +is the only condition, in some cases the database would choose a non-optimal +index. Once the `ANALYZE GRAPH` is run, Memgraph analyzes the distribution of +property values and can select a more optimal label-property index, the one with +the smallest average property value size. + +The average property value's group size directly represents the database's +expected number of hits which can be used to estimate the query's cost. When the +average group size is the same, the chi-squared statistic is used to measure how +close the distribution of property-value group size is to the uniform +distribution. The index with a distribution closest to the uniform distribution +is selected. + + + +Upon running the `ANALYZE GRAPH` query, Memgraph also check the node degree of +every indexed nodes and calculates the average degree. By having these values, +Memgraph can make a more optimal `MERGE` expansion and improve performance. It's +always better to perform a `MERGE` by expanding from the node that has a lesser +degree than the connecting node. + +The `ANALYZE GRAPH;` command should be run only once after all indexes have been +created and nodes inserted in the database. In rare situations when one property +is set on many more nodes than another property, choosing an index based on +average group size and uniform distribution would be misleading. That's why the +database always selects the label-property index with >= 10x fewer nodes than +the other label-property index. + +[![Related - Reference +Guide](https://img.shields.io/static/v1?label=Related&message=Reference%20Guide&color=yellow&style=for-the-badge)](/reference-guide/indexing.md) +[![Related - How +to](https://img.shields.io/static/v1?label=Related&message=How-to&color=blue&style=for-the-badge)](/how-to-guides/indexes.md) +[![Related - Under the +Hood](https://img.shields.io/static/v1?label=Related&message=Under%20the%20hood&color=orange&style=for-the-badge)](/under-the-hood/indexing.md) +[![Related - Blog +Post](https://img.shields.io/static/v1?label=Related&message=Blog%20post&color=9C59DB&style=for-the-badge)](https://memgraph.com/blog/implementing-data-replication) + + +## Calculate the statistic + +Run the following query to calculate the statistics: + +```cypher +ANALYZE GRAPH; +``` + +The query will iterate over all label and label-property indices in the database +and calculate the average group size, chi-squared statistic and avg degree for +each one, then return the following output: + +| label | property | num estimation nodes | num groups | avg group size | chi-squared value | avg degree +| ----- | -------- | -------------------- | ---------- | -------------- | ----------------- | ---------- +| index's label | index's property | number of nodes used for estimation | number of distinct values the property contains | average group size of property's values | value of the chi-squared statistic | average degree of the indexed nodes + + +Once the necessary information is obtained, Memgraph can choose the optimal +index and `MERGE` expansion. If you don't want to run the analysis on all labels, +you can specify which labels to use by adding the labels to the query: + +```cypher +ANALYZE GRAPH ON LABELS :Label1, :Label2; +``` + +## Delete statistic + +If you want the database to ignore information about the average group size, the +chi-squared statistic and the average degree, the existing statistic can be +deleted by running: + +```cypher +ANALYZE GRAPH DELETE STATISTICS; +``` + +The results will contain all label-property indices that were successfully deleted: + +| label | property | +| ----- | -------- | +| index's label | index's property | + +Specific labels can be specified with the construct `ON LABELS`: + +```cypher +ANALYZE GRAPH ON LABELS :Label1 DELETE STATISTICS; +``` + +## Inspecting queries + +Before a Cypher query is executed, it is converted into an internal form +suitable for execution, known as a *plan*. A plan is a tree-like data structure +describing a pipeline of operations which will be performed on the database in +order to yield the results for a given query. Every node within a plan is known +as a *logical operator* and describes a particular operation. + +Because a plan represents a pipeline, the logical operators are iteratively +executed as data passes from one logical operator to the other. Every logical +operator *pulls* data from the logical operator(s) preceding it, processes it +and passes it onto the logical operator next in the pipeline for further +processing. + +Using the `EXPLAIN` clause, it is possible for the user to inspect the +produced plan and gain insight into the execution of a query. + +## Operators + +| Operator | Description | +| ----------------------------- | -------------------------------------------------------------------------------------------------------------------------- | +| `Accumulate` | Accumulates the input it received. | +| `Aggregate` | Aggregates the input it received. | +| `Apply` | Joins the returned symbols from two branches of execution. | +| `CallProcedure` | Calls a procedure. | +| `Cartesian` | Applies the Cartesian product (the set of all possible ordered combinations consisting of one member from each of those sets) on the input it received. | +| `ConstructNamedPath` | Creates a path. | +| `CreateNode` | Creates a node. | +| `CreateExpand` | Creates edges and new nodes to connect with existing nodes. | +| `Delete` | Deletes nodes and edges. | +| `EdgeUniquenessFilter` | Filters unique edges. | +| `EmptyResult` | Discards results from the previous operator. | +| `EvaluatePatternFilter` | Part of the filter operator that contains a sub-branch which yields either true or false. | +| `Expand` | Expands the node by finding the node's relationships. | +| `ExpandVariable` | Performs a node expansion of a variable number of relationships | +| `Filter` | Filters the input it received. | +| `Foreach` | Iterates over a list and applies one or more update clauses. | +| `Limit` | Limits certain rows from the pull chain. | +| `LoadCsv` | Loads CSV file in order to import files into the database. | +| `Merge` | Applies merge on the input it received. | +| `Once` | Forms the beginning of an operator chain with "only once" semantics. The operator will return false on subsequent pulls. | +| `Optional` | Performs optional matching. | +| `OrderBy` | Orders the input it received. | +| `Produce` | Produces results. | +| `RemoveLabels` | Removes a variable number of node labels. | +| `RemoveProperty` | Removes a node or relationship property. | +| `ScanAll` | Produces all nodes in the database. | +| `ScanAllById` | Produces nodes with a certain index. | +| `ScanAllByLabel` | Produces nodes with a certain label. | +| `ScanAllByLabelProperty` | Produces nodes with a certain label and property. | +| `ScanAllByLabelPropertyRange` | Produces nodes with a certain label and property value within the given range (both inclusive and exclusive). | +| `ScanAllByLabelPropertyValue` | Produces nodes with a certain label and property value. | +| `SetLabels` | Sets node labels of variable length. | +| `SetProperty` | Sets a node or relationship property. | +| `SetProperties` | Sets a list of node or relationship properties. | +| `Skip` | Skips certain rows from the pull chain. | +| `Unwind` | Unwinds an expression to multiple records. | +| `Distinct` | Applies a distinct filter on the input it received. | + +## Example plans + +As an example, let's inspect the plan produced for a simple query: + +```cypher +EXPLAIN MATCH (n) RETURN n; +``` + +``` ++----------------+ +| QUERY PLAN | ++----------------+ +| * Produce {n} | +| * ScanAll (n) | +| * Once | ++----------------+ +``` + +The output of the query using the `EXPLAIN` clause is a representation of the produced plan. Every +logical operator within the plan starts with an asterisk character (`*`) and is +followed by its name (and sometimes additional information). The execution of +the query proceeds iteratively (generating one entry of the result set at a +time), with data flowing from the bottom-most logical operator(s) (the start of +the pipeline) to the top-most logical operator(s) (the end of the pipeline). + +In the example above, the resulting plan is a pipeline of 3 logical operators. +`Once` is the identity logical operator which does nothing and is always found +at the start of the pipeline; `ScanAll` is a logical operator which iteratively +produces all of the nodes in the graph; and `Produce` is a logical operator +which takes data produced by another logical operator and produces data for the +query's result set. + +A slightly more complicated example would be: + +```cypher +EXPLAIN MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *; +``` + +``` ++--------------------------------+ +| QUERY PLAN | ++--------------------------------+ +| * Produce {m, n} | +| * Filter | +| * Expand (m)-[anon1:Edge]-(n) | +| * ScanAllByLabel (n :Node) | +| * ScanAllByLabel (m :Node) | +| * Once | ++--------------------------------+ +``` + +In this example, the `Filter` logical operator is used to filter the matched +nodes because of the `WHERE n.prop = 42` construct. The `Expand` logical +operator is used to find an edge between two nodes, in this case `m` and `n` +which were matched previously using the `ScanAllByLabel` logical operator (a +variant of the `ScanAll` logical operator mentioned previously). + +The execution of the query proceeds iteratively as follows. First, two vertices +of type `:Node` are found as the result of the two scans. Then, we try to find a +path that consists of the two vertices and an edge between them. If a path is +found, it is further filtered based on a property of one of the vertices. +Finally, if the path satisfied the filter, its two vertices are added to the +query's result set. + +A simple example showcasing the fully general tree structure of the plan could +be: + +```cypher +EXPLAIN MERGE (n) RETURN n; +``` + +``` ++------------------+ +| QUERY PLAN | ++------------------+ +| * Produce {n} | +| * Accumulate | +| * Merge | +| |\ On Match | +| | * ScanAll (n) | +| | * Once | +| |\ On Create | +| | * CreateNode | +| | * Once | +| * Once | ++------------------+ +``` + +The `Merge` logical operator (constructed as a result of the `MERGE` construct) +can take input from up to 3 places. The `On Match` and `On Create` branches are +"pulled from" only if a match was found or if a new vertex has to be created, +respectively. + +## Profiling queries + +Along with inspecting a query's plan as described in the [Inspecting +queries](./inspecting-queries.md) guide, it is also possible to profile the +execution of a query and get a detailed report on how the query's plan behaved. +For every logical operator the following info is provided: + +- `OPERATOR` — the name of the operator, just like in the output of an + `EXPLAIN` query. + +- `ACTUAL HITS` — the number of times a particular logical operator was + pulled from. + +- `RELATIVE TIME` — the amount of time that was spent processing a + particular logical operator, relative to the execution of the whole plan. + +- `ABSOLUTE TIME` — the amount of time that was spent processing a + particular logical operator. + +A simple example to illustrate the output: + +```cypher +PROFILE MATCH (n :Node)-[:Edge]-(m :Node) WHERE n.prop = 42 RETURN *; +``` + +```plaintext ++---------------+---------------+---------------+---------------+ +| OPERATOR | ACTUAL HITS | RELATIVE TIME | ABSOLUTE TIME | ++---------------+---------------+---------------+---------------+ +| * Produce | 1 | 7.134628 % | 0.003949 ms | +| * Filter | 1 | 12.734765 % | 0.007049 ms | +| * Expand | 1 | 5.181460 % | 0.002868 ms | +| * ScanAll | 1 | 3.325061 % | 0.001840 ms | +| * ScanAll | 1 | 71.061241 % | 0.039334 ms | +| * Once | 2 | 0.562844 % | 0.000312 ms | ++---------------+---------------+---------------+---------------+ +``` + +## Where to next? + +To learn more about Memgraph's functionalities, visit the **[Reference +guide](/reference-guide/overview.md)**. For real-world examples of how to use +Memgraph, we strongly suggest going through one of the available +**[Tutorials](/tutorials/overview.md)**. + diff --git a/docs2/querying/querying.md b/docs2/querying/querying.md new file mode 100644 index 00000000000..a1026afd152 --- /dev/null +++ b/docs2/querying/querying.md @@ -0,0 +1,158 @@ +--- +id: querying +title: Querying +sidebar_label: Querying +--- + + +**Cypher** is the most widely adopted, fully-specified, and open query language +for property graph databases. It provides an intuitive way to work with property +graphs. + +## Quick start + +If you are new to the **Cypher** query language, take a look at what you can do +with a few simple commands. You will use our sandbox that we have already filled +with sample data. There is no need for you to install anything at this point. +Simply open Game of Thrones Deaths dataset on [**Memgraph +playground**](https://playground.memgraph.com/sandbox/game-of-thrones-deaths). +You will find some predefined queries there that will help you to get a glimpse +of what you can accomplish with Cypher. + +:::info + +Playground supports only read operations. If you'd like to modify the dataset, +you will need to [install and run Memgraph](../memgraph/installation) on your +computer. + +::: + +:::tip + +Check out our free [**Cypher e-mail +course**](https://memgraph.com/learn-cypher-query-language) and learn the Cypher +query language in 10 days. + +::: + +## What is Cypher? + +Cypher is a declarative query language specifically designed to handle querying +graph data efficiently. With Cypher, you express what to retrieve but not how to +retrieve it. This allows you to focus on the problem domain instead of worrying +about the syntax. + +Cypher was designed to be easy to learn but very powerful when it comes to graph +analytics. This means that you can use Cypher to write complex queries +relatively easily. + +You can think of Cypher as mapping English language sentence structure to +patterns in a graph. In most cases, the nouns are nodes of the graph, the verbs +are the relationships in the graph, and the adjectives and adverbs are the +properties. + +In the following image, you can see one such example. We have a graph that +consists of two nodes and one relationship: + +![](data/cypher-query-language/graph-example.png) + +We can interpret this graph by using the said method of mapping patterns to +language structures: + +```nocopy +A person named Harry is married to a person named Anna. +``` + +## Cypher styling and syntax + +Same as other languages, Cypher has its own set of syntax rules and styling +recommendations. And as always, it is sensible to add comments to code as you +write it. + +### Comments + +To specify a comment in Cypher, place the characters `//` before the line you +want to be a comment: + +```cypher +// This is a Cypher comment +CREATE (p1:Person {name: 'Harry'}), (p2:Person {name: 'Anna'}) +CREATE (p1)-[r:MARRIED_TO]->(p2) +RETURN r; +``` + +### Naming convention + +**Node labels** should be written using CamelCase and start with an upper-case +letter. Node labels are case-sensitive. + +```nocopy +(:Country) +(:City) +(:CapitalCity) +``` + +**Property keys**, **variables**, **parameters**, **aliases**, and **functions** +are camelCase and begin with a lower-case letter. These components are +case-sensitive. + +```cypher +dateOfBirth // Property key +largestCountry // Variable +size() // Function +countryOne // Alias +``` + +**Relationship types** are styled upper-case and use the underscore character +`_` to separate multiple words. Relationship types are case-sensitive and you +cannot use the `-` character in a relationship type. + +```cypher +[:LIVES_IN] +(:BORDERS_WITH) +``` + +Aside from clauses, there is a number of **keywords** that should be styled with +capital letters even though they are not case sensitive. These include: +`DISTINCT`, `IN`, `STARTS WITH`, `CONTAINS`, `NOT`, `AND`, `OR` and `AS`. + +```cypher +MATCH (c:Country) +WHERE c.name CONTAINS 'United' AND c.population > 9000000 +RETURN c AS Country; +``` + +### Indentations and line breaks + +Sometimes it's helpful to separate new clauses with an indent. Even though they +are in a new line, subqueries should be indented to ensure readability. If there +are multiple subqueries, they can be further grouped with curly brackets. + +```cypher +//Indent 2 spaces on lines with ON CREATE or ON MATCH subqueries +MATCH (p:Person {name: 'Helga'}) +MERGE (c:Country {name: 'UK'}) +MERGE (p)-[l:LIVES_IN]->(c) + ON CREATE SET l.movedIn = date({year: 2020}) + ON MATCH SET l.modified = date() +RETURN p, l, c; + +``` + +An exception to this rule would be a one-line subquery where you don't need to +use a new line or an indent. + +### Quotes + +When it comes to quotes, a simple rule is to use whichever provides the fewest +escaped characters in the string. If escaped characters are not needed, or their +number is the same for single and double quotes, then single quotes should be +favored. + +```cypher +// Bad syntax +RETURN 'Memgraph\'s mission is: ', "A very famous quote is: \"Astra inclinant, sed non obligant.\"" + +// Recommended syntax +RETURN "Memgraph's mission is: ", 'A very famous quote is: "Astra inclinant, sed non obligant."' +``` diff --git a/docs2/querying/read-and-modify-data.md b/docs2/querying/read-and-modify-data.md new file mode 100644 index 00000000000..dc5ff25c568 --- /dev/null +++ b/docs2/querying/read-and-modify-data.md @@ -0,0 +1,517 @@ +--- +id: reading-and-modify-data +title: Read and modify data +sidebar_label: Read and modify data +--- + +The simplest usage of the language is to find data stored in the database. For +that, you can use one of the following clauses: + +- `MATCH` which searches for patterns. +- `WHERE` for filtering the matched data. +- `RETURN` for defining what will be presented to the user in the result set. +- `UNION` and `UNION ALL` for combining results from multiple queries. +- `UNWIND` for unwinding a list of values as individual rows. + +## MATCH + +This clause is used to obtain data from Memgraph by matching it to a given +pattern. For example, you can use the following query to find each node in the +database: + +```cypher +MATCH (node) RETURN node; +``` + +Finding connected nodes can be achieved by using the query: + +```cypher +MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2; +``` + +In addition to general pattern matching, you can narrow the search down by +specifying node labels and properties. Similarly, relationship types and properties can +also be specified. For example, finding each node labeled as `Person` and with +property `age` being 42, is done with the following query: + +```cypher +MATCH (n:Person {age: 42}) RETURN n; +``` + +```tip + +Each node and relationship gets a identifier generated during their initialization which is persisted through the durability mechanism. + +Return it with the [`id()` function](/cypher-manual/functions#scalar-functions). + +``` + +You can use the following query to find their friends: + +```cypher +MATCH (n:Person {age: 42})-[:FRIENDS_WITH]-(friend) RETURN friend; +``` + +There are cases when a user needs to find data that is connected by traversing a +path of connections, but the user doesn't know how many connections need to be +traversed. Cypher allows for designating patterns with _variable path +lengths_. Matching such a path is achieved by using the `*` (_asterisk_) symbol +inside the relationship element of a pattern. For example, traversing from `node1` to +`node2` by following any number of connections in a single direction can be +achieved with: + +```cypher +MATCH (node1)-[r*]->(node2) RETURN node1, r, node2; +``` + +If paths are very long, finding them could take a long time. To prevent that, a +user can provide the minimum and maximum length of the path. For example, paths +of length between two and four nodes can be obtained with a query like: + +```cypher +MATCH (node1)-[r*2..4]->(node2) RETURN node1, r, node2; +``` + +It is possible to name patterns in the query and return the resulting paths. +This is especially useful when matching variable length paths: + +```cypher +MATCH path = ()-[r*2..4]->() RETURN path; +``` + +More details on how `MATCH` works can be found [here](./clauses/match.md). + +The `MATCH` clause can be modified by prepending the `OPTIONAL` keyword. +`OPTIONAL MATCH` clause behaves the same as a regular `MATCH`, but when it fails +to find the pattern, missing parts of the pattern will be filled with `null` +values. Examples can be found [here](./clauses/optional-match.md). + +## WHERE + +You have already seen how to achieve simple filtering by using labels and +properties in `MATCH` patterns. When more complex filtering is desired, you can +use `WHERE` paired with `MATCH` or `OPTIONAL MATCH`. For example, finding each +person older than 20 is done with this query: + +```cypher +MATCH (n:Person) WHERE n.age > 20 RETURN n; +``` + +Additional examples can be found [here](./clauses/where.md). + +### Regular expressions + +Inside `WHERE` clause, you can use regular expressions for text filtering. To +use a regular expression, you need to use the `=~` operator. + +For example, finding all `Person` nodes which have a name ending with `son`: + +```cypher +MATCH (n:Person) WHERE n.name =~ ".*son$" RETURN n; +``` + +The regular expression syntax is based on the modified ECMAScript regular +expression grammar. The ECMAScript grammar can be found +[here](http://ecma-international.org/ecma-262/5.1/#sec-15.10), while the +modifications are described in [this +document](https://en.cppreference.com/w/cpp/regex/ecmascript). + +## RETURN + +The `RETURN` clause defines which data should be included in the resulting set. +Basic usage was already shown in the examples for `MATCH` and `WHERE` clauses. +Another feature of `RETURN` is renaming the results using the `AS` keyword. + +For example: + +```cypher +MATCH (n:Person) RETURN n AS people; +``` + +That query would display all nodes under the header named `people` instead of +`n`. + +When you want to get everything that was matched, you can use the `*` +(_asterisk_) symbol. + +This query: + +```cypher +MATCH (node1)-[connection]-(node2) RETURN *; +``` + +is equivalent to: + +```cypher +MATCH (node1)-[connection]-(node2) RETURN node1, connection, node2; +``` + +`RETURN` can be followed by the `DISTINCT` operator, which will remove duplicate +results. For example, getting unique names of people can be achieved with: + +```cypher +MATCH (n:Person) RETURN DISTINCT n.name; +``` + +Besides choosing what will be the result and how it will be named, the `RETURN` +clause can also be used to: + +- limit results with `LIMIT` sub-clause; +- skip results with `SKIP` sub-clause; +- order results with `ORDER BY` sub-clause and +- perform aggregations (such as `count`). + +More details on `RETURN` can be found [here](./clauses/return.md). + +### SKIP & LIMIT + +These sub-clauses take a number of how many results to skip or limit. For +example, to get the first three results you can use this query: + +```cypher +MATCH (n:Person) RETURN n LIMIT 3; +``` + +If you want to get all the results after the first 3, you can use the following: + +```cypher +MATCH (n:Person) RETURN n SKIP 3; +``` + +The `SKIP` and `LIMIT` can be combined. So for example, to get the 2nd result, +you can do: + +```cypher +MATCH (n:Person) RETURN n SKIP 1 LIMIT 1; +``` + +### ORDER BY + +Since the patterns which are matched can come in any order, it is very useful to +be able to enforce some ordering among the results. In such cases, you can use +the `ORDER BY` sub-clause. + +For example, the following query will get all `:Person` nodes and order them by +their names: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name; +``` + +By default, ordering will be ascending. To change the order to be descending, +you should append `DESC`. + +For example, you can use this query to order people by their name descending: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name DESC; +``` + +You can also order by multiple variables. The results will be sorted by the +first variable listed. If the values are equal, the results are sorted by the +second variable, and so on. + +For example, ordering by first name descending and last name ascending: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.name DESC, n.lastName; +``` + +Note that `ORDER BY` sees only the variable names as carried over by `RETURN`. +This means that the following will result in an error. + +```cypher +MATCH (old:Person) RETURN old AS new ORDER BY old.name; +``` + +Instead, the `new` variable must be used: + +```cypher +MATCH (old:Person) RETURN old AS new ORDER BY new.name; +``` + +The `ORDER BY` sub-clause may come in handy with `SKIP` and/or `LIMIT` +sub-clauses. For example, to get the oldest person you can use the following: + +```cypher +MATCH (n:Person) RETURN n ORDER BY n.age DESC LIMIT 1; +``` + +You can also order result before returning them. The following query will order +all the nodes according to name, and then return them in a list. + +```cypher +MATCH (n) +WITH n ORDER BY n.name DESC +RETURN collect(n.name) AS names; +``` + +### Aggregating + +Cypher has functions for aggregating data. Memgraph currently supports the +following aggregating functions. + +- `avg`, for calculating the average value. +- `sum`, for calculating the sum of numeric values. +- `collect`, for collecting multiple values into a single list or map. If + given a single expression values are collected into a list. If given two + expressions, values are collected into a map where the first expression + denotes map keys (must be string values) and the second expression denotes + map values. +- `count`, for counting the resulting values. +- `max`, for returning the maximum value. +- `min`, for returning the minimum value. + +Example, calculating the average age: + +```cypher +MATCH (n:Person) RETURN avg(n.age) AS averageAge; +``` + +Collecting items into a list: + +```cypher +MATCH (n:Person) RETURN collect(n.name) AS list_of_names; +``` + +Collecting items into a map: + +```cypher +MATCH (n:Person) RETURN collect(n.name, n.age) AS map_name_to_age; +``` + +Check the detailed signatures of [aggregation +functions](./functions.md#aggregation-functions). + +## UNION and UNION ALL + +Cypher supports combining results from multiple queries into a single result +set. That result will contain rows that belong to queries in the union +respecting the union type. + +Using `UNION` will contain only distinct rows, while `UNION ALL` will keep all +rows from all given queries. + +Restrictions when using `UNION` or `UNION ALL`: + +- The number and the names of columns returned by queries must be the same for + all of them. +- There can be only one union type between single queries, i.e. a query can't + contain both `UNION` and `UNION ALL`. + +For example to get distinct names that are shared between persons and movies use +the following query: + +```cypher +MATCH (n:Person) RETURN n.name AS name UNION MATCH (n:Movie) RETURN n.name AS name; +``` + +To get all names that are shared between persons and movies (including +duplicates) do the following: + +```cypher +MATCH (n:Person) RETURN n.name AS name UNION ALL MATCH (n:Movie) RETURN n.name AS name; +``` + +## UNWIND + +The `UNWIND` clause is used to unwind a list of values as individual rows. + +To produce rows out of a single list, use the following query: + +```cypher +UNWIND [1,2,3] AS listElement RETURN listElement; +``` + +More examples can be found [here](./clauses/unwind.md). + +## Traversing relationships + +Patterns are used to indicate specific graph traversals given directional +relationships. How a graph is traversed for a query depends on what directions +are defined for relationships and how the pattern is specified in the `MATCH` +clause. + +### Patterns in a query + +Here is an example of a pattern that utilizes the `FRIENDS_WITH` relationships +from our graph: + +```cypher +MATCH (p1:Person)-[r:FRIENDS_WITH]->(p2:Person {name:'Alison'}) +RETURN p1, r, p2; +``` + +The output is: + +![patterns-in-a-query](data/read-existing-data/patterns-in-a-query.png) + +Because the `FRIENDS_WITH` relationship is directional, only these two nodes are +returned. + +### Reversing traversals + +When the relationship from the previous query is reversed, with the person +named Alison being the anchor node, the returned results are: + +```cypher +MATCH (p1:Person)-[r:FRIENDS_WITH]->(p2:Person {name:'Alison'}) +RETURN p1, r, p2; +``` + +The output is: + +![reversing-traversals](data/read-existing-data/reversing-traversals.png) + +### Bidirectional traversals + +We can also find out what `Person` nodes are connected with the `FRIENDS_WITH` +relationship in either direction by removing the directional arrow from the +pattern: + +```cypher +MATCH (p1:Person)-[r:FRIENDS_WITH]-(p2:Person {name:'Alison'}) +RETURN p1, r, p2; +``` + +The output is: + +![bidirectional-traversals](data/read-existing-data/bidirectional-traversals.png) + +### Traversing multiple relationships + +Since we have a graph, we can traverse through nodes to obtain relationships +further into the traversal. + +For example, we can write a Cypher query to return all friends of friends of the +person named Alison: + +```cypher +MATCH (p1:Person {name:'Alison'})-[r1:FRIENDS_WITH]-> + (p2:Person)-[r2:FRIENDS_WITH]-(p3:Person) +RETURN p1, r1, p2, r2, p3; +``` + +Keep in mind that the first relationship is directional while the second one +isn't. The output is: + +![traversing-multiple-relationships](data/read-existing-data/traversing-multiple-relationships.png) + +## Modify data + +### SET clause + +Use the `SET` clause to update labels on nodes and properties on nodes and +relationships. + +Click [here](./clauses/set.md) for a more detailed explanation of what can be +done with `SET`. + +Cypher supports combining multiple reads and writes using the `WITH` clause. +In addition to combining, the `MERGE` clause is provided which may create +patterns if they do not exist. + +#### Creating and updating properties + +The `SET` clause can be used to create/update the value of a property on a node or +relationship: + +```cypher +MATCH (c:City) +WHERE c.name = 'London' +SET c.population = 8900000 +RETURN c; +``` + +The `SET` clause can be used to create/update the value of multiple properties +on nodes or relationships by separating them with a comma: + +```cypher +MATCH (c:City) +WHERE c.name = 'London' +SET c.population = 8900000, c.country = 'United Kingdom' +RETURN c; +``` + +#### Creating and updating node labels + +The `SET` clause can be used to create/update the label on a node. If the node has +a label, a new one will be added while the old one is left as is: + +```cypher +MATCH (c:City:Location) +SET c:City +RETURN labels(c); +``` + +#### Removing a property + +The `SET` clause can be used to remove the value of a property on a node or +relationship by setting it to `NULL`: + +```cypher +MATCH (c:City) +WHERE c.name = 'London' +SET c.country = NULL +RETURN c; +``` + +#### Copy all properties + +If `SET` is used to copy the properties of one node/relationship to another, all +the properties of the latter will be removed and replaced with the new ones: + +```cypher +CREATE (p1:Person {name: 'Harry'}), (p2:Person {name: 'Anna'}) +SET p1 = p2 +RETURN p1, p2; +``` + +#### Bulk update + +You can use `SET` clause to do a bulk update. Here is an example of how to +increment everyone's age by 1: + +```cypher +MATCH (n:Person) SET n.age = n.age + 1; +``` + +## Delete data + +### DELETE + +This clause is used to delete nodes and relationships from the database. + +For example, removing all relationships of a single type: + +```cypher +MATCH ()-[relationship :type]-() DELETE relationship; +``` + +When testing the database, you often want to have a clean start by deleting +every node and relationship in the database. It is reasonable that deleting each node +should delete all relationships coming into or out of that node. + +```cypher +MATCH (node) DELETE node; +``` + +But, Cypher prevents accidental deletion of relationships. Therefore, the above +query will report an error. Instead, you need to use the `DETACH` keyword, which +will remove relationships from a node you are deleting. The following should work and +*delete everything* in the database. + +```cypher +MATCH (node) DETACH DELETE node; +``` + +More examples are available [here](./clauses/delete.md). + +### REMOVE + +The `REMOVE` clause is used to remove labels and properties from nodes and +relationships: + +```cypher +MATCH (n:WrongLabel) REMOVE n:WrongLabel, n.property; +``` \ No newline at end of file diff --git a/docs2/release-notes.md b/docs2/release-notes.md new file mode 100644 index 00000000000..5bc5d90d99f --- /dev/null +++ b/docs2/release-notes.md @@ -0,0 +1,1896 @@ +# Release notes + +import VideoBySide from '@site/src/components/VideoBySide'; + +## Memgraph v2.9 - Jul 21, 2023 + +:::caution + +Memgraph 2.9 introduced a new configuration flag +`--replication-restore-state-on-startup` which is `false` by default. + +If you want instances to remember their role and configuration in a replication +cluster upon restart, the `--replication-restore-state-on-startup` needs to be +set to `true` when first initializing the instances and remain `true` throughout +the instances' lifetime. + +When reinstating a cluster it is advised to first initialize the MAIN +instance, then the REPLICA instances. + +::: + +### New features and improvements + +- The new [`ON_DISK_TRANSACTIONAL` storage + mode](/reference-guide/storage-modes.md) allows you to store data on disk + rather than in-memory. Check the implementation and implications in the + reference guide. [#850](https://github.com/memgraph/memgraph/pull/850) +- Memgraph now works with all Bolt v5.2 drivers. + [#938](https://github.com/memgraph/memgraph/pull/938) +- The [LOAD CSV clause](/import-data/files/load-csv-clause.md) has several new improvements: + - You can now import data from web-hosted CSV files by passing the URL as a + file location. You can also import files compressed with `gzip` or `bzip2` + algorithms. [#1027](https://github.com/memgraph/memgraph/pull/1027) + - To speed up the execution of the `LOAD CSV` clause, you can add `MATCH` and + `MERGE` entities prior to reading the rows from a CSV file. But, the `MATCH` + or `MERGE` clause has to return just one row or Memgraph will throw an + exception. [#916](https://github.com/memgraph/memgraph/pull/916) + - If a certain sequence of characters in a CSV file needs to be imported as + null, you can now specify them with the NULLIF option of the LOAD CSV + clause. [#914](https://github.com/memgraph/memgraph/pull/914) +- You can now use `mgp::Type::Any` while developing a custom query procedure + with [the C++ + API](/reference-guide/query-modules/implement-custom-query-modules/api/cpp-api.md) + to specify that the argument of the procedure can be of any type. + [#982](https://github.com/memgraph/memgraph/pull/982) +- When you need to differentiate transactions, you can now define and pass + [transaction metadata](/reference-guide/transactions.md) via the client and check it in Memgraph by running the + `SHOW TRANSACTIONS;` query. + [#945](https://github.com/memgraph/memgraph/pull/945) +- You can now create [custom batch procedures in Python and + C++](/reference-guide/query-modules/implement-custom-query-modules/overview.md) + that process data in batches, thus consuming less memory. +[#964](https://github.com/memgraph/memgraph/pull/964) +- The [`ANALYZE GRAPH;` query](/reference-guide/analyze-graph.md) now includes information about the degree of all + nodes to enhance the MERGE optimizations on supernodes. + [#1026](https://github.com/memgraph/memgraph/pull/1026) +- The `--replication-restore-state-on-startup` configuration flag allows you to + define whether instances in the [replication + cluster](/reference-guide/replication.md) will regain their roles upon restart + (`true`) or restart as disconnected "blank" MAIN instances (default setting + `false`). This flag resolved the unwanted behavior of restarted REPLICA + instances disconnecting from the cluster, but it also needs to be introduced + to MAIN instances so they are not disconnected from the cluster upon restart. + [#791](https://github.com/memgraph/memgraph/pull/791) + +### Bug fixes + +- `init-file` and `init-data-file` configuration flags allow the execution of + queries from a CYPHERL file, prior to or immediately after the Bolt server + starts and are now possible to configure in the Community Edition as well. + [#850](https://github.com/memgraph/memgraph/pull/850) +- The IN_MEMORY_ANALYTICAL storage mode now deallocates memory as expected and + no longer consumes memory excessively. [#1025](https://github.com/memgraph/memgraph/pull/1025) +- When no values are returned from a map, a null is returned instead of an + exception occurring. [#1039](https://github.com/memgraph/memgraph/pull/1039) + + + +## MAGE v1.8.0 - Jul 21, 2023 + +### Features and improvements + +- With the [`llm_util` module](/query-modules/python/llm-util.md) you can generate a graph schema in a format best + suited for large language models (LLMs). + [#225](https://github.com/memgraph/mage/pull/225) +- When executing complex queries, the [`periodic` + module](/query-modules/cpp/periodic.md) allows batching results from one query + into another to improve execution time. + [#221](https://github.com/memgraph/mage/pull/221) +- The [`conditional_execution` + module](/query-modules/cpp/conditional-execution.md) which allows the + execution of different queries depending on certain conditions being met, has + been rewritten from Python to C++ to improve performance and it can also + periodically iterates. [#222](https://github.com/memgraph/mage/pull/222) +- The [`migrate` module](/query-modules/python/migrate.md) has the option to get data from MySQL, SQL server, or + Oracle DB to migrate it to Memgraph. + [#209](https://github.com/memgraph/mage/pull/209) + +## Memgraph Lab v2.7.1 - Jul 05, 2023 + +### Improvements + +- System Default style has been renamed to System Style. +- If you run a query that has errors in the Graph Style Script code, you can + decide to run it using the System Style. + +### Bug fixes + +- Bug that would allow multiple styles to be the default has been fixed. +- The System Default Style now has `background-color` set to white. +- Queries selected in the Query Editor now execute as expected. +- Creating and editing a query module as well as selecting a transformation + module in the Streams section now work as expected. +- All links are now linked with appropriate external resources. +- The pop-up window in the Run History that allows rerunning the query now + closes once an option is selected. + +## Memgraph Lab v2.7.0 - Jun 28, 2023 + +### What's new + +- Now you can adjust the following settings: + - Code completion and automatic graph rendering limits + - The capacity of run history and its clearing + - The limit for visible logs + + +- The new interfaces for managing saved styles enables searching and changing + the default style in the Lab. The saved styles now also have a preview. + +- The run history now also tracks changes to the query, style, or both. You can + also filter out records to show All (both queries runs and applied styles), + Query history (only query runs), and Style history (only applied style + changes). You can expand both the query and style to see the full Cypher or + GSS code. + + + +- Queries inside a collection can be expanded and collapsed by clicking on their + name. +- When testing and trying out different functions in the GSS you can use + single-line (`// comment`) and multi-line (`/* comment */`) comments in the + GSS code editor without losing previous state. +- Change the canvas color of the graph view with the new property + `background-color` in [`@ViewStyle`](/style-script/gss-viewstyle-directive.md). +- Change the stack order of how nodes and edges are rendered in the graph view + with the property `z-index` in + [`@NodeStyle`](/style-script/gss-nodestyle-directive.md) and + [`@EdgeStyle`](/style-script/gss-edgestyle-directive.md) directives. It works + the same as the CSS z-index property. +- Set up transparent colors with the [new GSS + functions](/style-script/gss-functions.md) `RGBA` and `HSLA`. You can also get + the transparency value with the function `Alpha`. +- [New functions](/style-script/gss-functions.md) allow more customizations: `Sort`, `Coalesce`, `Reverse`, + `IsMap`, `AsMap`, `Execute`, `Get`, `Set`, `Del`, `MapKeys`, `MapValues`, + `AsIterator`, `IsIterator`, `Next` +- [Global and local + variables](/memgraph-lab/graph-style-script-language#caching-results-for-faster-performance) + make developing new styles easier: + - Variable graph is now available outside `@NodeStyle` and `@EdgeStyle` + context + - Local variables can be defined with `Define` within `@NodeStyle` and + `@EdgeStyle` context +- Memgraph Lab is now packaged as an RPM package and arm64 (M1 chip) for MacOS. + +### Bug fixes + +- Running a selected part of the Cypher query would just place that selected + part in the run history. Now, the full query will be saved in the run history, + and on its run, only the selected part will be executed again. +- Rows that would hide when scrolling the data results view now preview as expected. +- The System Default style colors all the nodes with the same label with a + unique color. +- When showing a graph view on a map, you will no longer see a progress + percentage which is unnecessary as each node has a fixed and known position + due to its latitude and longitude values. +- All tables across the Lab are responsive as expected. +- Layouts no longer cause memory leaks and work as expected. +- By fixing a bug, you can now successfully connect to Memgraph using a hostname + that contains numbers in the top-level domain. +- Markdown lists in query descriptions are indented as expected. + +## Memgraph v2.8 - May 18, 2023 + +### New features and improvements + +- Data recovery is now up to 6x faster depending on the number of available + cores, as [snapshot loading is distributed among several + threads](/memgraph/reference-guide/backup#snapshots). + [#868](https://github.com/memgraph/memgraph/pull/868) +- During the recovery, indexes can also be created using multiple threads, thus + speeding up the process. [#882](https://github.com/memgraph/memgraph/pull/882) +- In the Enterprise Edition, Memgraph now [exposes system + metrics](/reference-guide/exposing-system-metrics.md) via an HTTP endpoint, so + you can get information about transactions, query latency and various other + properties. [#940](https://github.com/memgraph/memgraph/pull/940) +- It’s now possible to use the [map projection + syntax](/memgraph/reference-guide/data-types#maps) to create maps. Map + projections are convenient for building maps based on existing values and they + are used by data access tools like GraphQL. + [#892](https://github.com/memgraph/memgraph/pull/892) +- You can now check if [the data directory](/reference-guide/backup.md) is + (un)locked with the `DATA DIRECTORY LOCK STATUS;` query. + [#933](https://github.com/memgraph/memgraph/pull/933) +- You can now check the current [storage + mode](/reference-guide/storage-modes.md) and [isolation + levels](/reference-guide/transactions.md) by running the `SHOW STORAGE INFO;` + query. [#883](https://github.com/memgraph/memgraph/pull/883) +- Check the suspected [build type of the Memgraph + executable](/memgraph/reference-guide/server-stats#build-information) by + running the `SHOW BUILD INFO;` query. + [#894](https://github.com/memgraph/memgraph/pull/894) +- Performance has been improved by optimizing the deallocation of resources in + Memgraph's custom `PoolResource` memory allocator. + [#898](https://github.com/memgraph/memgraph/pull/898) + + + +### Bug fixes + +- Running Python procedures now consume less memory. + [#932](https://github.com/memgraph/memgraph/pull/932) +- Memory allocation in LOAD CSV queries has been optimized to avoid performance + degradation. [#877](https://github.com/memgraph/memgraph/pull/877) +- Query profiles of the LOAD CSV queries now show the correct values of memory + usage. [#885](https://github.com/memgraph/memgraph/pull/885) + + + +## Memgraph Lab v2.6.0 - Apr 20, 2023 + +### What's new + +- If you execute multiple Cypher queries, you can now view the result of each + query instead of viewing just the last result. + ![results](./data/lab-260/results.png) +- Besides exporting query results to JSON, you can also export them to CSV and + TSV file format. + ![download](./data/lab-260/download.png) +- If the dataset contains millions or billions nodes and relationships, their + count in the status bar will be in the following format: X.XXM or X.XXB. +- Syntax of code blocks in the query collection description can now be + highlighted by using one of the following language + styles:Β `cypher`,Β `bash`,Β `python`,Β `css`,Β `c`,Β `cpp`,Β `json`,Β `sql`, + andΒ `yaml`. Check the examples of syntax highlighting in theΒ [Markdown + Guide](https://www.markdownguide.org/extended-syntax/#syntax-highlighting). + ![markdown](./data/lab-260/markdown.png) +- [New functions of the Graph Style Script + language](/style-script/gss-functions.md) used for customizing graph appearance + are: `Reduce`, `Sum`, `Avg`, `Min`, `Max`, `IsArray`, `Hue`, `Saturation`, + `Lightness`, `HSL`. + ![gss](./data/lab-260/gss.png) + +### Bug fixes + +- The initial node count has been removed from the connection initialization so + connecting to the Memgraph instance containing a huge number of nodes will no + longer cause a timeout. +- Run History now logs queries as expected. +- When switching between the map view and the default view the graph view no + longer becomes unresponsive. +- Using a new line character `\n` in the query module doesn’t result in a new + line, but an explicit character `\n`. +- Viewing the code of multiple query modules in the split screen now + works as expected. +- Notifications no longer mix with the Query Editor and Query Collections visual + elements. +- Pressing CMD/CTRL + S will save a query within a query collection execution section as intended. +- The autosave in query collection is now triggered on every query run as it was + intended. +- Running a selected portion of the query won’t remove the rest of the query + from the query collection execution view. +- GSS `Blue` function was returning a wrong number. It is fixed now. `Lighter` + and `Darker` functions now work correctly as well because they were depending + on the output from the `Blue` function. + +## Memgraph v2.7 - Apr 5, 2023 + +### New features and improvements + +- You can now choose between [two different storage modes](/reference-guide/storage-modes.md): + - Transactional mode - the default database mode that favors + strongly-consistent ACID transactions using WAL files and periodic + snapshots, but requires more time and resources during data import and + analysis. + - Analytical mode - speeds up import and data analysis but offers no ACID + guarantees besides manually created snapshots. + Switch between modes within the session using the `STORAGE MODE + IN_MEMORY_{TRANSACTIONAL|ANALYTICAL};` query. [#772](https://github.com/memgraph/memgraph/pull/772) +- You can now call [subqueries](/cypher-manual/clauses/call) inside existing queries using the CALL clause. + [#794](https://github.com/memgraph/memgraph/pull/794) +- When you want to filter data using properties that all have label:property + indexes set, you can make Memgraph analyze the properties on all or several + labels with the [`ANALYZE GRAPH;` + query](/memgraph/reference-guide/indexing#analyze-graph). By calculating the + distribution of property values, Memgraph will be able to select the optimal + index for the query and it will execute faster. + [#812](https://github.com/memgraph/memgraph/pull/812) +- If, for example, you are no longer interested in the results of the query you + ran, or the procedure you built is running in an infinite loop, you can [stop + the transaction](/memgraph/reference-guide/transactions#managing-transactions) + with the `TERMINATE TRANSACTIONS tid;` query; Find out the transaction ID with + `SHOW TRANSACTIONS;` query. + [#790](https://github.com/memgraph/memgraph/pull/790) +- With the [new flag](/memgraph/reference-guide/configuration#other) + `password-encryption-algorithm` you can choose between `bcrypt`, `sha256`, and + `sha256-multiple` encryption algorithms. SHA256 offers better performance + compared to the more secure but less performant bcrypt. + [#839](https://github.com/memgraph/memgraph/pull/839) +- Import using the [LOAD CSV clause](/import-data/files/load-csv-clause.md) has + been further improved by using a memory allocator which will reuse memory + blocks allocated while processing the `LOAD CSV` query. + [#825](https://github.com/memgraph/memgraph/pull/825) + + + +### Bug fixes + +- The users who have [global visibility on the + graph](/memgraph/reference-guide/security#label-based-access-control) will + experience a slight improvement in performance regarding label-based access + control as the engine no longer check privileges for each node. + [#837](https://github.com/memgraph/memgraph/pull/837) +- The [All shortest paths + algorithm](/memgraph/reference-guide/built-in-graph-algorithms#all-shortest-paths) + now supports multiedges. [#832](https://github.com/memgraph/memgraph/pull/832) + +## MAGE v1.7.0 - Apr 5, 2023 + +### Features and improvements + +- The [new conditional execution +module](/query-modules/cpp/conditional-execution.md) lets you define + conditions not expressible in Cypher and and use them to control query + execution. [#203](https://github.com/memgraph/mage/pull/203) + +## MAGE v1.6.1 - Mar 20, 2023 + +### Features and improvements + +- With the `export_util.csv_query()` procedure, you can export query results to a CSV file or as a stream. [#199](https://github.com/memgraph/mage/pull/199) +- Similarity algorithms (`jaccard`, `overlap` and `cosine`) have been rewritten in C++ to improve performance. [#196](https://github.com/memgraph/mage/pull/196) + +## Memgraph Lab v2.5.0 - Mar 17, 2023 + +### What's New + +* If there are several Cypher queries in the query editor you can select a single query and run + it without commenting out all the other queries. + +![run-selected-query-in-the-lab](./data/lab-run-selected-demo.gif) + +* You can now open multiple query executions views side by side and compare query execution speed or results. + +![multiple-query-executions-in-the-lab](./data/lab-multiple-editors-demo.gif) + +* Query modules are now sorted alphabetically for easier and faster browsing. A search box has also been added to query modules with more + than 5 procedures to help narrow them down (e.g. `nxalg` query module has [49 procedures](https://memgraph.com/docs/mage/query-modules/python/nxalg)). +* When rendering a graph with more than 3,500 nodes or 8,500 relationships, which might take considerable amount of time to preview, you will be + asked if you want to proceed with the graph visualization or switch to the data view. +* Besides manually saving changes in the Cypher query and GSS style editor in the query collections section, they will also be saved + automatically after each query run. +* Memgraph Lab will now notify you of any product updates and offer various tips and tricks for using the Memgraph ecosystem. + +### Bug Fixes + +* Cypher code suggestions can now handle labels and properties of 250k nodes and 500k relationships, compared to the previous limit of + 100k nodes nad 200k relationships. +* Multiple scrollable elements of the query collections was making scrolling difficult. Now you can focus on a particular element and + scroll through it by clicking on it. +* Browser's back button is now working as expected when using Lab as a web application. +* Data in the query results, query modules and query run history tables now loads faster making the scrolling smoother and improving + the user experience. +* Graph schema is now generated even if the database has no relationships. +* In-progress feedback when generating graph schema and exporting datasets for graphs with more than 10M nodes + is now previewed as expected. +* A scrolling issue with expanded results in the Data view where you couldn't see a completely expanded row because the + scroll would jump to the next row is now fixed. +* Dataset cards no longer spread apart when conducting a search. + +## Memgraph v2.6 - Mar 07, 2023 + +### Major features and improvements + +- Importing speed using the LOAD CSV clause has been improved due to two changes: + - Performance improvement in accessing values from large arrays or maps with numerous properties. [#744](https://github.com/memgraph/memgraph/pull/774) + - Upon creating a large number of node or relationship properties, properties are stored in a property store all at once, instead of individually. [#788](https://github.com/memgraph/memgraph/pull/788) +- Newly implemented `exists()` function allows using patterns as part of the filtering clause. Check the [Cypher Manual](/cypher-manual/clauses/where) for usage. [#818](https://github.com/memgraph/memgraph/pull/818) +- With the new [Python mock query module API](/reference-guide/query-modules/implement-custom-query-modules/api/mock-python-api.md), you can now develop and test Python query modules for Memgraph without having to run a Memgraph instance. The mock API is compatible with the Python API and thus developed modules can be added to Memgraph as-is. [#757](https://github.com/memgraph/memgraph/pull/757) +- Memgraph now supports Fedora 36 and Ubuntu 22.04 for ARM. [#787](https://github.com/memgraph/memgraph/pull/787) [#810](https://github.com/memgraph/memgraph/pull/810) + +### Bug fixes + +- `torch` and `igraph` can no longer be removed from the `svs.modules` cache to avoid issues after reload. [#720](https://github.com/memgraph/memgraph/pull/720) +- Newly created nodes now comply with the set label based authorization rules. [#755](https://github.com/memgraph/memgraph/pull/755) +- Constructing LocalDateTime objects with invalid parameters doesn’t crash Memgraph anymore, but throws an informative exception. [#819](https://github.com/memgraph/memgraph/pull/819) +- Error message warning about incompatible `epoch_id` between a MAIN and REPLICA instance has been improved. [#786](https://github.com/memgraph/memgraph/pull/786) + +## MAGE v1.6 - Jan 30, 2023 + +### Major Features and Improvements + +- The `setup` script now halts if the build fails on C++ or Rust side. [#194](https://github.com/memgraph/mage/pull/194) +- With the [`meta_util.schema()` procedure](/query-modules/python/meta-util.md), you can generate a graph schema as a graph result. [#187](https://github.com/memgraph/mage/pull/187) +- The execution of the `single` method multiple times has been improved by rewriting [the distance calculator](/query-modules/cpp/distance-calculator.md) from Python to C++. [#191](https://github.com/memgraph/mage/pull/191) +- [Dynamic graph analytics](/algorithms/dynamic-graph-analytics/betweenness-centrality-online-algorithm.md) have been ported to C++ to improve performance. [#182](https://github.com/memgraph/mage/pull/182) +- [New module `elastic_search_serialization`](/query-modules/python/elasticsearch-synchronization.md) enables developers to serialize Memgraph into Elasticsearch instance using basic authentication. [#170](https://github.com/memgraph/mage/pull/170) + +## Memgraph v2.5.2 - Jan 26, 2023 + +### Bug Fixes + +- Variables can be used inside nested [FOREACH clauses](/cypher-manual/extension-clauses). [#725](https://github.com/memgraph/memgraph/pull/725) +- [FOREACH clause](/cypher-manual/extension-clauses) can now use indexes if needed (e.g. in case of MERGE). [#736](https://github.com/memgraph/memgraph/pull/736) +- [C++ API](/reference-guide/query-modules/implement-custom-query-modules/api/cpp-api.md) now allows setting and getting node and relationship properties. [#732](https://github.com/memgraph/memgraph/pull/732) +- [OPTIONAL MATCH](/cypher-manual/clauses/optional-match) can now use label property indexes that are referencing the previously matched variables. [#736](https://github.com/memgraph/memgraph/pull/736) +- Iterating over all relationships in a graph now works as expected, as well as checking whether the graph contains a given relationship. [#743](https://github.com/memgraph/memgraph/pull/743) +- Implementation of the [All Shortest Paths algorithm](/memgraph/reference-guide/built-in-graph-algorithms#all-shortest-paths) was fixed so the paths are no longer duplicated when the upper bound is used. [#737](https://github.com/memgraph/memgraph/pull/737) + +## MAGE v1.5.1 - Jan 20, 2023 + +### Major Features and Improvements + +- The version of MemgraphDB that will be used in the Docker image has been updated to 2.5.1. + [#193](https://github.com/memgraph/mage/pull/193) + +## Memgraph v2.5.1 - Jan 19, 2023 + +### Bug Fixes + +- The LOAD CSV clause now uses less RAM to load a whole CSV file. Modification + made to improve the LOAD CSV operation, also influenced high memory usage + operations with objects such as lists and map. Modifying or accessing elements + inside those objects now also uses less RAM. + [#712](https://github.com/memgraph/memgraph/pull/712) +- The logic of the `read_write_type_checker` was corrected so queries now get + the right `rw_type`, making the replication system work as expected. + [#709](https://github.com/memgraph/memgraph/pull/709) +- Bolt protocol has been improved by adding the server-assigned query ID (`qid`) + as part of the transactions' metadata. + [#721](https://github.com/memgraph/memgraph/pull/721) +- Fixed a trigger bug that would cause an error if Memgraph is configured to run + without properties on edges. As a result of the fiy, triggers are now working + as expected when there are no properties on edges. + [#717](https://github.com/memgraph/memgraph/pull/717) + +## MAGE v1.5 - Dec 20, 2022 + +### Major Features and Improvements + +- Now you can find ancestors (all the nodes from which a path exists ) and descendants (all nodes to which a path exists) starting from a certain node, sort directed acyclic graph in a way that a node which appears before others is first, return a subgraph from nodes using `connect_nodes` method, and create relationships between nodes in a list using the `chain_nodes` method. + [#180](https://github.com/memgraph/mage/pull/180) +- C++ API is now aligned with Memgraph 2.5 + [#184](https://github.com/memgraph/mage/pull/184) +- Graph Coloring no longer outputs strings but vertices and integers. This allows you to use the result of graph coloring directly in Memgraph Lab. + [#177](https://github.com/memgraph/mage/pull/177) + +### Bug Fixes +- By enabling module reset, you can now train and evaluate the model without shutting down the database. + Class labels can now start from 0 or negative numbers. + The low limit of the early stopping flag no longer prematurely stops the training of the model while running the Node classification module. + [#173](https://github.com/memgraph/mage/pull/173) + +## Memgraph v2.5.0 - Dec 13, 2022 + +### Major Features and Improvements + +- `DISTINCT` operator can now be used with aggregation functions. Until now, if + you wanted to use an aggregation function with distinct values, you had to + write a query similar to this one `WITH DISTINCT n.prop as distinct_prop + RETURN COUNT(distinct_prop)`. Now you can use the `DISTINCT` operator like in + the following query, `RETURN COUNT(DISTINCT n.prop)`. + [#654](https://github.com/memgraph/memgraph/pull/665) +- You can now create a user before the Bolt server starts using the environment + variables `MEMGRAPH_USER` for the username, `MEMGRAPH_PASSWORD` for the + password and `MEMGRAPH_PASSFILE` file that contains username and password for + creating the user in the following format: `username:password`. + [#696](https://github.com/memgraph/memgraph/pull/696) +- With the new configuration flag `init_file` you can execute queries from the + CYPHERL file which need to be executed before the Bolt server starts and with + the configuration flag `init_data_file` you can execute queries from the + CYPHERL file immediately after the Bolt server starts. + [#696](https://github.com/memgraph/memgraph/pull/696) + +### Bug Fixes + +- Constructors and assignment operators in the C++ query modules API now work as + expected, and the API type check in the `ValueNumeric` method now correctly + recognizes numeric types. + [#688](https://github.com/memgraph/memgraph/pull/688) +- Error message support (`SetErrorMessage`) has been added to query methods that + use the MAGE C++ API. [#688](https://github.com/memgraph/memgraph/pull/688) +- The `EmptyResult` sink operator was added to the Memgraph's planner. This +means that results produced by a query `MATCH (n) SET n.test_prop = 2` will get +exhausted which was a problem in some Bolt clients implementations, e.g in +Golang's client. [#667](https://github.com/memgraph/memgraph/pull/667) +- Fixed Python submodules reloading when calling `CALL mg.load()` and `CALL + mg.load_all()`. Before, only the Python module would be reloaded, but now all + dependencies get reloaded as well. This includes Python's utility submodules + and Python packages, which means that the environment with Python packages can + be changed without turning off the database. + [#653](https://github.com/memgraph/memgraph/pull/653) + +## Memgraph Lab v2.4.0 - Dec 2, 2022 + +### What's New + +* Memgraph Lab now supports manual transaction workflows you can construct using transaction commands `BEGIN`, `COMMIT`, and `ROLLBACK`. +* Cypher intellisense has been updated to suggest new Cypher features from Memgraph 2.4.0 such as: + * Privileges for user-role authorization. + * Commands and privileges for label-based authorization. + * Manual transaction commands: `BEGIN`, `COMMIT`, `ROLLBACK`. + * Checking configuration with `SHOW CONFIG`. + * All shortest path algorithm `allShortest`. + * Graph projection function `project`. + * Additional query module signature that accepts a projected graph as an optional first argument. +* Graph results view will check for nodes and relationships in arrays and projected graphs. It simplifies + the visualization of a projected graph or an array of nodes/relationships without using `UNWIND`. + +### Bug Fixes + +* Once the table results view is selected, the results of the following query run will also preview in the table results view, instead of automatically switching to the graph view. +* Exploring a dataset's query collection now works as expected. It opens up a list of queries that can be used to explore the dataset. +* Failed queries from the rich collections now return a detailed error message. +* _Save code changes_ button in rich collections will now be enabled only if there are unsaved changes for the Cypher query + or GSS. +* A bug that would only show the first node label instead of all node labels in the table results view has been fixed. + +## MAGE v1.4 - Nov 15, 2022 + +### Major Features and Improvements + +- Implemented Link prediction with [DGL](https://www.dgl.ai/). + [#160](https://github.com/memgraph/mage/pull/160) +- Implemented Node classification with PyTorch. + [#161](https://github.com/memgraph/mage/pull/161) +- Added igraph support. + [#150](https://github.com/memgraph/mage/pull/150) +- Added _k_-means embedding clustering algorithm. + [#105](https://github.com/memgraph/mage/pull/105) +- Added better support for C++ API. + [#174](https://github.com/memgraph/mage/pull/174) + +### Bug Fixes +- Enable module reset to be able to train and evaluate without shutting down database, enable working with class labels which don't start from 0, and fix potential early stopping due to low limit in the Node classification module. + [#173](https://github.com/memgraph/mage/pull/173) + +## Memgraph v2.4.2 - Nov 7, 2022 + +### Bug Fixes + +- Fixed a bug when calling `AllShortestPath` with `id` function. + [#636](https://github.com/memgraph/memgraph/pull/636) +- Fixed bug when getting iterating over in-edges of a Node. + [#582](https://github.com/memgraph/memgraph/pull/613) + +## Memgraph Lab v2.3.1 - Nov 4, 2022 + +### Bug Fixes + +* Writing a single-line comment in the Cypher code no longer results in an error. +* Having different map tiles (e.g. "light" map tile on one map view, but "dark" map tile on another map view) for multiple graph map views in the rich collection is enabled and works as expected. +* Graph rendering freeze when toggling the map view on/off during the graph rendering process has been fixed. +* All the information about nodes and edges on the graph schema is now previewed as expected. +* A bug that would mix query title and description when queries are reordered in the rich query collection has been fixed. +* A bug that would not reset the description field when adding a new query to the query collection has been fixed. +* Saving a new style now works as expected. The active style is saved, not the last applied one. + +## Memgraph Lab v2.3.0 - Oct 24, 2022 + +### What's New + +* Add new updates to the prepared datasets: + * Add a search bar for searching and filtering datasets. + * Add featured (highlighted) datasets. + * Add rich collections with prepared queries, descriptions, and GSS for each dataset. +* Add new updates to the latest queries: + * Change the name from "Latest queries" to "Run history" because it contains both queries and GSS changes. + * Show GSS changes in the "Run history" section. +* Replace previous collections with "Rich collections": + * Add more context to each collected query: title, markdown description, Cypher query, and GSS. + * Add the ability to run multiple query executions within the query collection. + * Add import and export functionality of a collection. +* Add a new version of GSS: + * Add new GSS directive `@ViewStyle` to configure physics, link distance, repel force, and view type (`default` or `map`). + * Add new GSS directive `@ViewStyle.Map` to configure map tiles for map view. + * Add new GSS functions: `Slice`, `Split`, `Replace`, `Trim`, `Nodes`, `Edges`, `IsNumber`, `IsBoolean`, `IsString`, `IsNull`. + * Add new GSS node properties `latitude` and `longitude` used to define the latitude and longitude of each node for the map view. +* Integrate graph visualization library `orb`. +* Add the ability to enable/disable map background view for nodes with geo information. +* Add the ability to connect to Neo4j, load datasets, and run Cypher queries. + +### Bug Fixes + +* Fix map view to use latitude and longitude from GSS style instead of `lat` and `lng` node properties. +* Fix the default GSS to match new the map view configuration by checking `lat` and `lng` node properties. + +## MAGE v1.3.2 - Oct 10, 2022 + +### Major Features and Improvements +- Allowed restricting community detection to subgraphs. + [#152](https://github.com/memgraph/mage/pull/152) +- Implemented the degree centrality algorithm. + [#162](https://github.com/memgraph/mage/pull/162) +- Updated Memgraph version. + [#171](https://github.com/memgraph/mage/pull/171) + +### Bug Fixes +- Dynamic betweenness centrality bugfix. + [#147](https://github.com/memgraph/mage/pull/147) + +## Memgraph v2.4.1 - Oct 7, 2022 + +### Bug Fixes + +- Fixed bug when getting EdgeType from Edge object or Label from Vertex object + in query modules. [#582](https://github.com/memgraph/memgraph/pull/582) +- Fix a bug when changing role permissions for label based authorization, by + passing user's instead of role's `fine_grained_access_handler`. + [#579](https://github.com/memgraph/memgraph/pull/579) + +## Memgraph v2.4.0 - Sep 15, 2022 + +### Major Features and Improvements + +- Add replica state to `SHOW REPLICAS` query. + [#379](https://github.com/memgraph/memgraph/pull/379) +- Add `current_timestamp` and `number_of_timestamp_behind_master` to `SHOW + REPLICAS` query. [#412](https://github.com/memgraph/memgraph/pull/412) +- Query `REGISTER REPLICA replica_name SYNC` no longer supports `TIMEOUT` + parameter. To mimic the previous behavior of `REGISTER REPLICA replica_name + SYNC WITH TIMEOUT 1`, one should use `REGISTER REPLICA replica_name ASYNC` + instead. [#423](https://github.com/memgraph/memgraph/pull/423) +- Make behavior more [openCypher](http://opencypher.org/) compliant regarding + checking against `NULL` values is `CASE` expressions. + [#432](https://github.com/memgraph/memgraph/pull/432) +- Previously registered replicas are automatically registered on restart of + Memgraph. [#415](https://github.com/memgraph/memgraph/pull/415) +- Add new command `SHOW CONFIG` that returns the configuration of the currently + running Memgraph instance. + [#459](https://github.com/memgraph/memgraph/pull/459) +- Extend the shortest paths functionality with [All Shortest + Path](/reference-guide/graph-algorithms.md#all-shortest-paths) + query. [#409](https://github.com/memgraph/memgraph/pull/409) +- Extend the query modules C and Python API to enable logging on different + levels. [#417](https://github.com/memgraph/memgraph/pull/417) +- Added C++ query modules API. Instead of using the C API call, C++ API calls + significantly simplify the implementation of fast query modules. + [#546](https://github.com/memgraph/memgraph/pull/546) +- [Enterprise] Added support for label-based authorization. In addition to + clause-based authorization rules, each user can now be granted `NOTHING`, + `READ`, `UPDATE`, or `CREATE_DELETE` permission on a given label or edge + type. [#484](https://github.com/memgraph/memgraph/pull/484) +- New Cypher function `project()` creates a projected graph consisting of nodes + and edges from aggregated paths. Any query module or algorithm can be now run + on a subgraph, by passing the variable of the projected graph as the first + argument of the query module procedure. [#535](https://github.com/memgraph/memgraph/pull/535) + +### Bug Fixes + +- Added a check to ensure two replicas cannot be registered to an identical + end-point. [#406](https://github.com/memgraph/memgraph/pull/406) +- `toString` function is now able to accept `Date`, `LocalTime`, `LocalDateTime` + and `Duration` data types. + [#429](https://github.com/memgraph/memgraph/pull/429) +- Aggregation functions now return the openCypher-compliant results on `null` + input and display the correct behavior when grouped with other operators. + [#448](https://github.com/memgraph/memgraph/pull/448) +- Corrected inconsistencies and incorrect behavior with regards to sync + replicas. For more detail about the behavior, please check [Under the + hood view on replication](/under-the-hood/replication.md). + [#435](https://github.com/memgraph/memgraph/pull/435) +- Fixed handling `ROUTE` Bolt message. Memgraph didn't handle the fields of + `ROUTE` message properly. Therefore the session might be stuck in a state + where even the `RESET` message did not help. With this fix, sending a `RESET` + message will properly reset the session. + [#475](https://github.com/memgraph/memgraph/pull/475) + +## Memgraph Lab v2.2.1 - Aug 12, 2022 + +### What's New + +* Add improved and more precise progress when importing built-in datasets. +* Add an indicator for the total count of error log messages in the sidebar. +* Change the color scheme of code snippets for query modules. +* Add a help section when Lab's connection is reconnecting. +* Add breadcrumbs for the layout titles. + +### Bug Fixes + +* Fix issues with query collections. +* Fix vertical layout usability when the help sidebar is opened. +* Fix various UI and UX issues across the application. +* Fix query results on the reconnected connection. + +## Memgraph Lab v2.2.0 - Jul 15, 2022 + +### What's New + +* Add a new table look and feel across the application: query results, the latest queries, modules, streams. +* Add a help section with relevant links, guides, and documentation search capability. +* Add test parameters (batch size, timeout) for testing stream transformation. +* Add new GSS functions: `Round`, `Floor`, and `Ceil`. + +### Bug Fixes + +* Fix various issues in graph view, streams, and query collections. + +## MAGE v1.3.1 - Jul 14, 2022 + +### Major Features and Improvements +- Updated Memgraph version. + [#154](https://github.com/memgraph/mage/pull/154) +- Introduced E2E group testing. + [#145](https://github.com/memgraph/mage/pull/145) + +## Memgraph Lab v2.1.2 - Jun 21, 2022 + +### What's New + +* Add a dashboard and overview page for the better onboarding experience. +* Add environment variables for query, modules, and streams name length validator limits. +* Add logs connection status messages in the logs view. + +### Bug Fixes + +* Fix several bugs with the stream configuration creation. +* Fix showing the logs when connected to Memgraph via an encrypted SSL connection. + +## Memgraph v2.3.1 - Jun 23, 2022 + +### Improvement + +- Updated results return by [`CHECK + STREAM`](/reference-guide/streams/overview.md#check-stream) query to group + all queries/raw messages on single line per batch. + [#394](https://github.com/memgraph/memgraph/pull/394) +- Add frequent replica ping. `main` instance checks state of the replicas with + given frequency controller by `--replication-replica-check-delay-sec`. The + check allows latest information about the state of each replica from `main` + point of view. [#380](https://github.com/memgraph/memgraph/pull/380) +- Added `BATCH_LIMIT` and `TIMEOUT` options to [`START + STREAM`](/reference-guide/streams/overview.md#start-a-stream) query that + returns the raw message received by the transformation. [#392](https://github.com/memgraph/memgraph/pull/392) + +### Bug Fixes + +- Fix header on `SHOW REPLICATION ROLE` query and wrong timeout info on `SHOW + REPLICAS` query. [#376](https://github.com/memgraph/memgraph/pull/376) +- Fix WebSocket connection with clients that do not use binary protocol header. [#403](https://github.com/memgraph/memgraph/pull/403) +- Fix SSL connection shutdown hanging. [#395](https://github.com/memgraph/memgraph/pull/395) +- Fix module symbol loading with python modules. [#335](https://github.com/memgraph/memgraph/pull/335) +- Fix header on `SHOW REPLICATION ROLE` query and wrong timeout info on + `SHOW REPLICAS query`. [#376](https://github.com/memgraph/memgraph/pull/376) +- Adapted compilation flag so that the memory allocator uses JEMALLOC while + counting allocated memory. [#401](https://github.com/memgraph/memgraph/pull/401) + + +## Memgraph Lab v2.1.1 - May 27, 2022 + +### What's New + +* Add tooltips and highlights throughout the application. + +### Bug Fixes + +* Fix several bugs with streams. + +## Memgraph Lab v2.1.0 - May 25, 2022 + +### What's New + +* Add the ability to view, create, edit, start, stop, test, and remove streams. +* Add a new connecting screen with the ability to set monitoring (logs) port. +* Add Cypher query persistence when closing/opening Cypher query editor. +* Add node label, relationship type, and node/relationship property Cypher code suggestions for small graphs (number of nodes < 100k and number of relationships < 200k). +* Add module function Cypher code suggestions. +* Add module support for adding functions along with `mgp` suggestions and documentation. +* Add new GSS graph functions: `InEdges`, `OutEdges`, `Edges`, `AdjacentNodes`, `StartNode`, `EndNode`, `NodeCount`, `EdgeCount`. +* Add new GSS array functions: `RandomOf`, `Find`, `Filter`, `Map`, `All`, `Any`, `Uniq`. + +### Bug Fixes + +* Fix the UI for the GSS error messages. +* Fix the Cypher code suggestion for modules with `.` in the namespace name. +* Fix several bugs with query collections. +* Fix the empty states across the application. +* Fix the import progress bar. +* Fix the graph schema for an empty database. +* Fix the responsiveness across the application. +* Add the maximum limit of five vertical layouts. +* Fix the loading issue when running multiple Cypher queries at once. + +## MAGE v1.3 - May 23, 2022 + +### Major Features and Improvements +- Added integration between cuGraph and Memgraph integration. + [#99](https://github.com/memgraph/mage/pull/99) + +### Bug Fixes +- Fixed node deletion. + [#141](https://github.com/memgraph/mage/pull/141) + +## Memgraph v2.3.0 - Apr 27, 2022 + +### Major Features and Improvements + +- Added [`FOREACH`](/cypher-manual/extension-clauses) clause. + [#351](https://github.com/memgraph/memgraph/pull/351) +- Added [Bolt over WebSocket](/connect-to-memgraph/websocket.md) support to + Memgraph. [#384](https://github.com/memgraph/memgraph/pull/384) +- Added [user-defined Memgraph magic + functions](/cypher-manual/functions/#user-defined-memgraph-magic-functions). + [#345](https://github.com/memgraph/memgraph/pull/345) + +### Bug Fixes + +- Fixed incorrect loading of C query modules. + [#387](https://github.com/memgraph/memgraph/pull/387) + +## Memgraph Lab v2.0.3 - Apr 27, 2022 + +### Bug Fixes + +* Fix the encrypted connection creation towards Memgraph. +* Fix duplicate keywords in Cypher and Python code suggestion tools. + +## Memgraph Lab v2.0.2 - Apr 22, 2022 + +### Major Features and Improvements + +- Add guides for empty states throughout the app. +- Add an ability to close hints for transformations and procedures in module view. +- Add an ability to download query results in JSON format. +- Add confirmation step for all delete actions throughout the app. +- Add the generic Cypher query as a sample query after custom dataset file import. + +### Bug Fixes + +- Fix the table view with a better resize functionality throughout the app. +- Change the color of the node labels and relationship types in the Cypher query editor. +- Fix the delete query collection action. +- Fix opening an external link in the browser instead of the Lab app. +- Fix the initial render of the map for geo graph results. +- Replace the toast message "Web socket stopped working" with better notice in the "Logs" view. + +## MAGE v1.2 - Apr 20, 2022 + +### Major Features and Improvements + +- Implemented Temporal graph networks. + [#121](https://github.com/memgraph/mage/pull/121) +- Implemented Dynamic Betweenness Centrality. + [#127](https://github.com/memgraph/mage/pull/127) +- Implemented Dynamic Katz Centrality. + [#117](https://github.com/memgraph/mage/pull/117) +- Implemented Louvain Community Detection. + [#48](https://github.com/memgraph/mage/pull/48) +- Implemented Maximum Flow. + [#125](https://github.com/memgraph/mage/pull/125) +- Implemented Static Katz Centrality. + [#117](https://github.com/memgraph/mage/pull/117) +- Added utility Import/Export module (JSON). + [#100](https://github.com/memgraph/mage/pull/100) +- Bumped the version of Black formatter. + [#132](https://github.com/memgraph/mage/pull/132) + +### Bug Fixes + +- Fixed IsSubset checking for unordered set. + [#135](https://github.com/memgraph/mage/pull/135) +- Fixed Continuous integration. + [#133](https://github.com/memgraph/mage/pull/133) +- Fixed E2E testing. + [#128](https://github.com/memgraph/mage/pull/128) +- Fixed ID validity check. + [#129](https://github.com/memgraph/mage/pull/129) + +## Memgraph Lab v2.0.1 - Apr 8, 2022 + +### Major Features and Improvements + +- Add context (graph schema, description) to each dataset template. +- Add an action to download query results. + +### Bug Fixes + +- Fix the bug when adding a query to the query collection. +- Fix several typos and copies. +- Fix the web socket connection issue for the manual Memgraph connect. +- Fix initial code suggestions which are dependent on the Memgraph version. + +## Memgraph Lab v2.0.0 - Mar 31, 2022 + +### Major Features and Improvements + +- Add horizontal and vertical layouts for custom layout configuration. +- Add more query information in the latest queries: runtime, status, number of + results. +- Add query collections to structure and save favorite queries. +- Add better Cypher code suggestion for functions, modules, nodes, + relationships, properties. +- Add Cypher code documentation on highlight. +- Add Graph Style Script code suggestion for `@NodeStyle`, `@EdgeStyle`, + properties and functions. +- Add Graph Style Script code documentation on highlight. +- Add improved table views throughout the app. +- Add new rendering and simulation engine based on D3.js. +- Add new rendering simulation options: collision, repel force and link + distance. +- Remove definition of query parameters when running a Cypher query with + `$variable`. +- Add real-time logs view from Memgraph. +- Add a status tray with connection status and main Memgraph metrics. +- Add real-time connection status and automatic reconnect ability. +- Add new graph schema view with distribution of present properties in + nodes/relationships. +- Add ability to view, edit, remove and change query modules. + +## Memgraph v2.2.1 - Mar 17, 2022 + +### Bug Fixes + +- Added CentOS 7 release by fixing the compatibility issue with the older + version of SSL used on CentOS 7. + [#361](https://github.com/memgraph/memgraph/pull/361) + +## Memgraph v2.2.0 - Feb 18, 2022 + +### Major Features and Improvements + +- Added support for compilation on ARM architectures (aarch64) and Docker + support for running Memgraph on Apple M1 machines. + [#340](https://github.com/memgraph/memgraph/pull/340) +- Added [monitoring server](/reference-guide/monitoring-server.md) that forwards + certain information from Memgraph to the clients connected to it (e.g. logs) + using WebSocket. [#337](https://github.com/memgraph/memgraph/pull/337) +- Added `CONFIGS` and `CREDENTIALS` options to [Kafka streams](/reference-guide/streams/overview.md/#kafka). + [#328](https://github.com/memgraph/memgraph/pull/328) +- Added [built-in procedures used for handling Python module + files](/reference-guide/query-modules/module-file-utilities.md). + `mg.create_module_file`, `mg.update_module_file`, `mg.delete_module_file`, + `mg.get_module_file`, and `mg.get_module_files` allow you to do modifications + on your Python module files, get their content, and list all the files present + in your query module directories directly from Memgraph. + [#330](https://github.com/memgraph/memgraph/pull/330) +- Built-in procedures + [`mg.procedures`](/mage/usage/loading-modules#utility-query-module) and + [`mg.transformations`](/reference-guide/streams/transformation-modules/overview.md#utility-procedures-for-transformations) + return additional information about the procedures and transformations + scripts. `path` returns an absolute path to the module file containing the + procedure, while `is_editable` returns `true` if the file can be edited using + Memgraph or `false` otherwise. + [#310](https://github.com/memgraph/memgraph/pull/310) +- [Added `SHOW VERSION` query](/reference-guide/server-stats.md) that returns the version of the Memgraph server + which is being queried. [#265](https://github.com/memgraph/memgraph/pull/265) + +### Bug Fixes + +- The reference count is increased when `Py_None` is returned from the `_mgp` + module. This fixes a nondeterministic fatal Python error. + [#320](https://github.com/memgraph/memgraph/pull/320) +- Use correct error when printing warning in rebalance callback of Kafka + consumer. [#321](https://github.com/memgraph/memgraph/pull/321) +- Fix transaction handling in streams in case of serialization error. + Previously, a serialization error caused an exception to be thrown since + nested transactions are not supported. After this fix, the transactions are + handled correctly in the transaction retry logic. + [#339](https://github.com/memgraph/memgraph/pull/339) +- Temporal types `LocalTime` and `LocalDateTime` instantiations return subsecond + precision. Additionally, query modules `mg_local_date_time_now()` and + `mg_local_time_now()` also return subsecond precision. + [#333](https://github.com/memgraph/memgraph/pull/333) + +## MAGE v1.1 - Dec 13, 2021 + +### Major Features and Improvements + +- Updated rsmgp-sys to the new MGP API. + [#78](https://github.com/memgraph/mage/pull/78) +- Add temporal type to rsmgp-sys. + [#82](https://github.com/memgraph/mage/pull/82) +- Implemented node2vec. [#81](https://github.com/memgraph/mage/pull/81) +- Updated GraphView abstraction. [#85](https://github.com/memgraph/mage/pull/85) +- Implemented approximative streaming PageRank. + [#69](https://github.com/memgraph/mage/pull/69) +- Implemented weighted graph methods built for dynamic community detection. + [#89](https://github.com/memgraph/mage/pull/89) +- Implemented LabelRankT dynamic community detection algorithm. + [#66](https://github.com/memgraph/mage/pull/66) + +### Bug Fixes + +- Fixed memory leakage. [#77](https://github.com/memgraph/mage/pull/77) +- Solved dependency vulnerability. + [#83](https://github.com/memgraph/mage/pull/83) +- Fixed `set_cover.greedy` result type bug. + [#76](https://github.com/memgraph/mage/pull/76) +- Fixed MAGE installation on Linux based distro. + [#92](https://github.com/memgraph/mage/pull/92) + +## Memgraph v2.1.1 - Dec 07, 2021 + +:::warning + +### Breaking Changes + +- Loading streams created by versions of Memgraph older than 2.1 is not + possible. We suggest you extract the necessary information using the older + version of Memgraph and recreate the streams in a newer version (Memgraph 2.1 + and newer). + +::: + +### Major Features and Improvements + +- Added procedures for retrieving configuration information specific for each + stream type. `mg.pulsar_stream_info` will return information about a specific + Pulsar stream and `mg.kafka_stream_info` will return information about a + specific Kafka stream. [#301](https://github.com/memgraph/memgraph/pull/301) +- `SHOW STREAMS` now returns default values for batch interval and batch size if + they weren't specified. [#306](https://github.com/memgraph/memgraph/pull/306) + +### Bug Fixes + +- Query execution stats, returned after a Cypher query was executed, are now + updated with the changes made in write procedures. + [#304](https://github.com/memgraph/memgraph/pull/304) +- Loading streams created by older versions won't cause Memgraph to crash. + [#302](https://github.com/memgraph/memgraph/pull/302) + +## Memgraph Lab v1.3.6 - Dec 3, 2021 + +### Bug Fixes + +* Fix the bug when returning edges: `Cannot read properties of undefined (reading 'push')`. + +## Memgraph v2.1.0 - Nov 22, 2021 + +:::warning + +### Breaking Changes + +- Loading streams created by older versions cause Memgraph to crash. The only + possible workaround involves **deleting the existing streams**. + The streams can be deleted by the `DROP STREAM` query in the old versions of + Memgraph. After upgrading to this version, the `streams` directory has to be + deleted manually from Memgraph's data directory (on Debian-based systems, it + is `/var/lib/memgraph` by default). +- The query for creating a Kafka stream now requires the `KAFKA` keyword. The + previous form `CREATE STREAM ...` was changed to `CREATE KAFKA STREAM ...`. + +::: + +### Major Features and Improvements + +- Now supporting Bolt protocol version 4.3. + [#226](https://github.com/memgraph/memgraph/pull/226) +- Streams support for retrying conflicting transactions. When a message is + processed from a certain stream source, a query is executed as a part of the + transaction. If that transaction fails because of other conflicting + transactions, the transaction is retried a set number of times. The number of + retries and interval between each retry can be controlled with configs + `--stream-transaction-conflict-retries` and + `--stream-transaction-retry-interval`. + [#294](https://github.com/memgraph/memgraph/pull/294) +- Added procedure to configure the starting offset (to consume messages from) of + a topic (and its partitions). + [#282](https://github.com/memgraph/memgraph/pull/282) +- Added `BOOTSTRAP_SERVERS` option to `CREATE KAFKA STREAM` which you can check + [here](reference-guide/streams/overview.md). + [#282](https://github.com/memgraph/memgraph/pull/282) +- Added Bolt notifications in the query summary to inform the user about results + or to give useful tips. When a query executes successfully, sometimes is + necessary to give users tips or extra information about the execution. + [#285](https://github.com/memgraph/memgraph/pull/285) +- Added execution statistics in the query summary to inform user on how many + objects were affected. E.g., when you run a query with a `CREATE` clause, + you'll know how many nodes/edges were created by it. + [#285](https://github.com/memgraph/memgraph/pull/285) +- Added support for connecting to Pulsar as a new stream source. For more + details, check out our [reference pages](reference-guide/streams). + [#293](https://github.com/memgraph/memgraph/pull/293) + +### Bug Fixes + +- Allow duration values to be used as weights in the [Weighted Shortest + Path](/memgraph/reference-guide/built-in-graph-algorithms#weighted-shortest-path) + query. [#278](https://github.com/memgraph/memgraph/pull/278) +- Fix linkage error when `mgp_local_time_get_minute` is used. + [#273](https://github.com/memgraph/memgraph/pull/273) +- Fix crash when temporal types are used with `ORDER BY` clause. + [#299](https://github.com/memgraph/memgraph/pull/299) + +## Memgraph Lab v1.3.5 - Nov 17, 2021 + +### What's New + +* Add new Cypher stream keywords from Memgraph 2.1.0 release. + +### Bug Fixes + +* Fix the copy to the clipboard bug to keep new lines. + +## Memgraph Lab v1.3.4 - Nov 15, 2021 + +### What's New + +* Add quick connect for Memgraph running locally. +* Add guides on how to install Memgraph locally. + +## Memgraph Lab v1.3.3 - Oct 22, 2021 + +### Bug Fixes + +- Fixed the action of exporting the database to a `cypherl` file. +- Added support for the temporal types in query responses. + +## Memgraph v2.0.1 - Oct 12, 2021 + +### Major Features and Improvements + +- Updated a startup message with a link to the [getting started + page](getting-started.md). + [#259](https://github.com/memgraph/memgraph/pull/259) +- Updated certain error and warning messages in the logs with links to the + documentation explaining the problem in more detail. + [#243](https://github.com/memgraph/memgraph/pull/243) +- Updated mgconsole to + [v1.1.0](https://github.com/memgraph/mgconsole/releases/tag/v1.1.0). + [#260](https://github.com/memgraph/memgraph/pull/260) + +### Bug Fixes + +- Graph updates made in the write procedures are now correctly registered in the + triggers. [#262](https://github.com/memgraph/memgraph/pull/262) +- Fixed `DETACH DELETE` interaction with the triggers. Previously, vertices + deleted by the `DETACH DELETE` would not be registered by triggers if only `ON () DELETE` trigger existed. + [#266](https://github.com/memgraph/memgraph/pull/266) + +## Memgraph v2.0.0 - Oct 5, 2021 + +:::warning + +### Breaking Changes + +- Changed the `timestamp()` function to return `microseconds` instead of + `milliseconds`. +- Most of the query modules C API functions are changed to return a `mgp_error` + as a more fine-grained way of error reporting. The only exceptions are the + functions that free allocated memory (`mgp_free` and `mgp_global_free`) and + destroy objects (`mgp_value_destroy`, `mgp_list_destroy`, etc.) which + remain the same. +- The first user created using the `CREATE USER` query will have all the + privileges granted to him. Previously, you could've locked yourself out of + Memgraph by creating a user and immediately disconnecting. + +::: + +### Major Features and Improvements + +- Added support for temporal types, a feature that allows the user to manipulate + and store time related data in the graph. For more information take a look at + the [reference guide](/reference-guide/data-types.md) +- Added support for parameters with `CREATE` clause in the following form: + `CREATE (n $param)`. +- Introduced settings to Memgraph that can be modified during runtime. You can + check out more details [here](reference-guide/runtime-settings). +- Added writeable procedure support, so + [procedures](/reference-guide/query-modules/implement-custom-query-modules/custom-query-module-example.md) + can modify the graph by creating and deleting vertices and edges, modifying + the labels of vertices or setting the properties of vertices and edges. + +### Bug Fixes + +- Fixed planning of queries with `MERGE` clause. If a previously defined symbol + is used as property value inside the `MERGE` clause, the planner will + correctly use the label-property index if present. +- Unused memory is correctly returned to OS when `FREE MEMORY` query is used. + Before, Memgraph would free up the memory internally and not return it to the + OS. Because of that Memgraph could allocate more memory from the OS than it's + allowed. +- Fixed recovery from durability files. Because of a wrong check, Memgraph could + crash and leave the durability files in an invalid state making recovery not + possible. +- Fixed usage of `execute` keyword in queries. Because of the special way we + handle the `EXECUTE` keyword from the `CREATE TRIGGER` query using that same + keyword in other contexts caused Memgraph to crash. + +## Memgraph Lab v1.3.2 - Oct 5, 2021 + +### Bug Fixes + +- Fixed the copy to clipboard bug with removed spaces. +- Updated the Cypher IntelliSense with the latest commands. + +## Memgraph Lab v1.3.1 - Sep 27, 2021 + +### Major Features and Improvements + +- Signed the Memgraph Lab applications for macOS and Windows. + +### Bug Fixes + +- Fixed the paste overwrite action in the query editor. +- Fixed the bug `Cannot read property 'class' of null`. + +## Memgraph v1.6.1 - Jul 24, 2021 + +### Major Features and Improvements + +- Added proper privilege checks for queries executed by triggers and stream + transformations. + +### Bug Fixes + +- Fixed error handling in streams to make restarting streams possible after + failing. The issue is caused by not rolling back the transaction in which the + query failed, so when the stream was restarted and tried to process the next + batch of messages it was still in a transaction, but it tried to start a new + one. Now the transaction is rolled back in case of any errors during query + execution, so a new transaction can be started during the processing of the + next batch of messages. + +## Memgraph v1.6.0 - Jul 7, 2021 + +:::warning + +### Breaking Changes + +- Changed the `LOCK_PATH` permission to `DURABILITY`. + +::: + +### Major Features and Improvements + +- Added support for consuming Kafka streams. You can connect Memgraph to a Kafka + cluster and run queries based on the messages received. The transformation + from Kafka to Cypher queries is done using **Transformation Modules**, a + concept similar to Query Modules. Using our Python and C API, you can easily + define functions that analyze Kafka messages and generate different queries + based on them. The stream connection can be configured, tested, stopped, + started, checked, and dropped. +- Introduced global allocators for Query Modules using C API, so the data can be + preserved between multiple runs of the same procedure. +- Introduced new isolation levels, `READ COMMITTED` and `READ_UNCOMMITTED`. The + isolation level can be set with a config. Also, you can set the isolation + level for a certain session or the next transaction. The names of the + isolation levels should be self-explanatory, unlike the `SNAPSHOT ISOLATION` + which is still the default isolation level. +- The query timeouts are now triggered using a different method. Before, we used + the TSC to measure the execution time. Unfortunately, this proved unreliable + for certain CPUs (AMD Ryzen 7 and M1), which caused queries to timeout almost + instantly. We switched to POSIX timer which **should** work on every hardware, + while not affecting the performance. +- Added a config, `allow-load-csv`, with which you can disable `LOAD CSV` + clause. `LOAD CSV` can read and display data from any file on the system which + could be insecure for some systems. Because of that, we added a config that + allows you to disable that clause in every case. +- Added `CREATE SNAPSHOT` query. Snapshots are created every few minutes, using + this query you can trigger snapshot creation instantly. +- Increased the default query timeout to 10 minutes. The previous default amount + of 3 minutes proved too small, especially for queries that use `LOAD CSV` with + a large dataset. + +### Bug Fixes + +- Fixed parsing of certain types in Query Modules using Python API. +- Fixed a concurrency bug for Query Modules using Python API. Running the same + procedure from multiple clients caused the Memgraph instance to crash. +- Fixed restoring triggers that call procedures. Because the triggers were + restored before the procedures, the query trigger executes couldn't find the + called procedure, which caused the restore to fail. Switching up the order was + enough to fix the problem. + +## Memgraph v1.5.0 - May 28, 2021 + +### Major Features and Improvements + +- Added database triggers. You can now create, delete, and print out triggers + that execute Cypher statements. You can create custom actions whenever a node + or an edge is created, updated, or deleted. All the triggers are persisted on + the disk, so no information is lost between runs. +- Replaced mg_client with the mgconsole command-line interface, which ships + directly with Memgraph. You can now install mgconsole directly on Windows and + macOS. + +### Bug Fixes + +- Fixed parsing of types for Python procedures for types nested in `mgp. List`. + For example, parsing of `mgp.List[mgp.Map]` works now. +- Fixed memory tracking issues. Some of the allocation and deallocation wasn't + tracked during the query execution. +- Fixed reading CSV files that are using CRLF as the newline symbol. +- Fixed permission issues for `LOAD CSV`, `FREE MEMORY`, `LOCK DATA DIRECTORY`, + and replication queries. + +## Memgraph v1.4.0 - Apr 2, 2021 + +:::warning + +### Breaking Changes + +- Changed `MEMORY LIMIT num (KB|MB)` clause in the procedure calls to `PROCEDURE MEMORY LIMIT num (KB|MB)`. The functionality is still the same. + +::: + +### Major Features and Improvements + +- Added replication to community version. +- Added support for multiple query modules directories at the same time. You can + now define multiple, comma-separated paths to directories from which the + modules will be loaded using the `--query-modules-directory` flag. +- Added support for programatically reading in data from CSV files through the + `LOAD CSV` clause. We support CSV files with and without a header, the + supported dialect being Excel. +- Added a new flag `--memory-limit` which enables the user to set the maximum + total amount of memory memgraph can allocate during its runtime. +- Added `FREE MEMORY` query which tries to free unusued memory chunks in + different parts of storage. +- Added the memory limit and amount of currently allocated bytes in the result + of `SHOW STORAGE INFO` query. +- Added `QUERY MEMORY LIMIT num (KB|MB)` to Cypher queries which allows you to + limit memory allocation for the entire query. It can be added only at the end + of the entire Cypher query. +- Added logs for the different parts of the recovery process. `INFO`, `DEBUG` + and `TRACE` level all contain additional information that is printed out while + the recovery is in progress. + +### Bug Fixes + +- Fixed garbage collector by correctly marking the oldest current timestamp + after the database was recovered using the durability files. +- Fixed reloading of the modules with changed result names. +- Fixed profile query to show the correct name of the ScanAll operator variant. + +## Memgraph Lab v1.3.0 - Feb 19, 2021 + +### Major Features and Improvements + +- Added option to show predefined datasets with the ability to import them to + Memgraph. +- Added option to show sample query for every loaded predefined dataset. +- Added import of custom Cypher file datasets (`cypherl` format). +- Added export of current database state to Cypher file (`cypherl` format). +- Added default node label in graph view if name property is missing. +- Added default relationship type label in graph view for smaller graphs. + +### Bug Fixes and Other Changes + +- Fixed sidebar links in the browser Lab. +- Fixed columns in favorite queries view. +- Fixed showing large amounts of properties in a popup when viewing node details + in the graph view. +- Fixed the label in the popup when switching between edges and nodes in the + graph view. +- Fixed node count in the dashboard view. +- Added descriptive and better error messages when connecting to Memgraph with + encryption on/off. +- Fixed the close button in a node popup in the graph view. +- Fixed the spacing of the close button and relationship type in a relationship + popup in the graph view. +- Fixed storing physics and styles across multiple query runs. +- Fixed initial positioning in graph view when running query in the data view. +- Fixed graph view reset when a query on data view had no results to show. +- Fixed map disappearing when running query multiple times in a row. +- Fixed running multiple Lab instances of the application on Windows and Linux. +- Fixed node size and spacing in graph view when showing smaller graphs. +- Fixed transition state issues between graph view and data view. + +## Memgraph v1.3.0 - Jan 26, 2021 + +:::warning + +### Breaking Changes + +- Added extra information in durability files to support replication, making it + incompatible with the durability files generated by older versions of + Memgraph. Even though the replication is an Enterprise feature, the files are + compatible with the Community version. + +::: + +### Major Features and Improvements + +- Added support for data replication across a cluster of Memgraph instances. + Supported instance types are MAIN and REPLICA. Supported replication modes are + SYNC (all SYNC REPLICAS have to receive data before the MAIN can commit the + transaction), ASYNC (MAIN doesn't care if data is replicated), SYNC WITH + TIMEOUT (MAIN will wait for REPLICAS within the given timeout period, after + timout, replication isn't aborted but the replication demotes the REPLICA to + the ASYNC mode). +- Added support for query type deduction. Possible query types are `r` (read), + `w` (write), `rw` (read-write). The query type is returned as a part of the + summary. +- Improved logging capabilities by introducing granular logging levels. Added + new flag, `--log-level`, which specifies the minimum log level that will be + printed. E.g., it's possible to print incoming queries or Bolt server states. +- Added ability to lock the storage data directory by executing the `LOCK DATA DIRECTORY` query which delays the deletion of the files contained in the + data directory. The data directory can be unlocked again by executing the + `UNLOCK DATA DIRECTORY` query. + +## Memgraph Lab v1.2.0 - Nov 3, 2020 + +### Major Features and Improvements + +- Added ability to create custom graph styling for nodes and edges in graph view + with graph style language (similar to CSS). +- Added ability to save and load custom graph styling. +- Added ability to show map background for nodes with lat and lng numeric + properties. +- Added ability to change map background style. +- Removed edge labels to be shown by default in graph view. +- Fixed overall UI and UX. +- Set encrypted connection to be turned off by default on login screen (Memgraph + v1.2.0 comes with SSL off by default). + +### Bug Fixes + +- Added ability to hide graph view if there are no node/edge data in response. + +## Memgraph v1.1.0 - Jul 1, 2020 + +### Major Features and Improvements + +- Properties in nodes and edges are now stored encoded and compressed. This + change significantly reduces memory usage. Depending on the specific dataset, + total memory usage can be reduced up to 50%. +- Added support for rescanning query modules. Previously, the query modules + directory was scanned only upon startup. Now it is scanned each time the user + requests to load a query module. The functions used to load the query modules + were renamed to `mg.load()` and `mg.load_all()` (from `mg.reload()` and + `mg.reload_all()`). +- Improved execution performance of queries that have an IN list filter by using + label+property indices. Example: `MATCH (n: Label) WHERE n.property IN [] ...` + +- Added support for `ANY` and `NONE` openCypher functions. Previously, only + `ALL` and `SINGLE` functions were implemented. + +### Bug Fixes and Other Changes + +- Fixed invalid paths returned by variable expansion when the starting node and + destination node used the same symbol. Example: `MATCH path = (n: Person {name: "John"})-[: KNOWS*]->(n) RETURN path` + +- Improved semantics of `ALL` and `SINGLE` functions to be consistent with + openCypher when handling lists with `Null` s. +- `SHOW CONSTRAINT INFO` now returns property names as a list for unique + constraints. +- Escaped label/property/edgetype names in `DUMP DATABASE` to support names with + spaces in them. +- Fixed handling of `DUMP DATABASE` queries in multi-command transactions ( + `BEGIN`, ..., `COMMIT`). +- Fixed handling of various query types in explicit transactions. For example, + constraints were allowed to be created in multi-command transactions (`BEGIN` + , ..., `COMMIT`) but that isn't a transactional operation and as such can't + be allowed in multi-command transactions. +- Fixed integer overflow bugs in `COUNT`, `LIMIT` and `SKIP`. +- Fixed integer overflow bugs in weighted shortest path expansions. +- Fixed various other integer overflow bugs in query execution. +- Added Marvel Comic Universe tutorial. +- Added FootballTransfers tutorial. + +## Memgraph Lab v1.1.3 - Jun 5, 2020 + +### Bug Fixes + +* Disable hardware acceleration. + +## Memgraph Lab v1.1.2 - Apr 10, 2020 + +### Bug Fixes + +* Fix side menu documentation and support links. + +## Memgraph v1.0.0 - Apr 3, 2020 + +### Major Features and Improvements + +- [Enterprise Ed.] Exposed authentication username/rolename regex as a flag ( + `--auth-user-or-role-name-regex`). +- [Enterprise Ed.] Improved auth module error handling and added support for + relative paths. +- Added support for Python query modules. This release of Memgraph supports + query modules written using the already existing C API and the new Python API. +- Added support for unique constraints. The unique constraint is created with a + label and one or more properties. +- Implemented support for importing CSV files (`mg_import_csv`). The importer is + compatible with the Neo4j batch CSV importer. +- Snapshot and write-ahead log format changed (backward compatible with v0.50). +- Vertices looked up by their openCypher ID (`MATCH (n) WHERE ID(n) = ...`) will + now find the node in O(logn) instead of O(n). +- Improved planning of BFS expansion, a faster, specific approach is now favored + instead of a ScanAll+Filter operation. +- Added syntax for limiting memory of `CALL`. +- Exposed server name that should be used for Bolt handshake as flag ( + `--bolt-server-name-for-init`). +- Added several more functions to the query module C API. +- Implemented a storage locking mechanism that prevents the user from + concurrently starting two Memgraph instances with the same data directory. + +### Bug Fixes and Other Changes + +- [Enterprise Ed.] Fixed a bug that crashed the database when granting + privileges to a user. +- [Enterprise Ed.] Improved Louvain algorithm for community detection. +- Type of variable expansion is now printed in `EXPLAIN` (e.g. ExpandVariable, + STShortestPath, BFSExpand, WeightedShortestPath). +- Correctly display `CALL` in `EXPLAIN` output. +- Correctly delimit arguments when printing the signature of a query module. +- Fixed a planning issue when `CALL` preceded filtering. +- Fixed spelling mistakes in the storage durability module. +- Fixed storage GC indices/constraints subtle race condition. +- Reduced memory allocations in storage API and indices. +- Memgraph version is now outputted to `stdout` when Memgraph is started. +- Improved RPM packaging. +- Reduced number of errors reported in production log when loading query + modules. +- Removed `early access` wording from the Community Offering license. + +## Memgraph Lab v1.1.1 - Apr 3, 2020 + +### Bug Fixes + +* Fix bug showing integers in node properties as strings. + +## Memgraph Lab v1.1.0 + +### Major Features and Improvements + +- Enable explain and profile view. +- Memgraph v0.15.0 keywords support. + +### Bug Fixes and Other Changes + +- Fix bug with a new line in parsing multi-command queries. +- On empty data for graph redirect to data view. + +## Memgraph Lab v1.0.0 + +### Major Features and Improvements + +- Added unsecure connection option. +- Improved UX of login screen. +- Added basic tutorial that shows on the initial run. +- Added text search of history and favorite queries. +- Added storage statistics on overview screen. +- Added debug view with query explain and profile capabilities. +- Added graph schema (metagraph) generator. +- Improved query data (table) view. + +## Memgraph Lab v0.1.2 + +### Bug Fixes and Other Changes + +- Fixed app icon on MacOS. +- Improved error handling on the initial connect screen. Handle availability and + secure connection errors. + +## Memgraph Lab v0.1.1 + +### Major Features and Improvements + +- Added overview view. +- Added query view (Monaco editor). +- Added graph, data and table data views. +- Added JSON export. +- Added electron builder packages for MacOS and Debian. + +## Memgraph v0.50.0 - Dec 11, 2019 + +:::warning + +### Breaking Changes + +- [Enterprise Ed.] Remove support for Kafka streams. +- Snapshot and write-ahead log format changed (not backward compatible). +- Removed support for unique constraints. +- Label indices aren't created automatically, create them explicitly instead. +- Renamed several database flags. Please see the configuration file for a list + of current flags. + +::: + +### Major Features and Improvements + +- [Enterprise Ed.] Add support for auth module. +- [Enterprise Ed.] LDAP support migrated to auth module. +- Implemented new graph storage engine. +- Add support for disabling properties on edges. +- Add support for existence constraints. +- Add support for custom openCypher procedures using a C API. +- Support loading query modules implementing read-only procedures. +- Add `CALL YIELD ` syntax for invoking loaded procedures. +- Add `CREATE INDEX ON :Label` for creating label indices. +- Add `DROP INDEX ON :Label` for dropping label indices. +- Add `DUMP DATABASE` clause to openCypher. +- Add functions for treating character strings as byte strings. + +### Bug Fixes and Other Changes + +- Fix several memory management bugs. +- Reduce memory usage in query execution. +- Fix bug that crashes the database when `EXPLAIN` is used. + +## Memgraph v0.15.0 - Jul 17, 2019 + +:::warning + +### Breaking Changes + +- Snapshot and write-ahead log format changed (not backward compatible). +- `indexInfo()` function replaced with `SHOW INDEX INFO` syntax. +- Removed support for unique index. Use unique constraints instead. +- `CREATE UNIQUE INDEX ON :label (property)` replaced with `CREATE CONSTRAINT ON (n:label) ASSERT n.property IS UNIQUE`. +- Changed semantics for `COUNTER` openCypher function. + +::: + +### Major Features and Improvements + +- [Enterprise Ed.] Add new privilege, `STATS` for accessing storage info. +- [Enterprise Ed.] LDAP authentication and authorization support. +- [Enterprise Ed.] Add audit logging feature. +- Add multiple properties unique constraint which replace unique indices. +- Add `SHOW STORAGE INFO` feature. +- Add `PROFILE` clause to openCypher. +- Add `CREATE CONSTRAINT` clause to openCypher. +- Add `DROP CONSTRAINT` clause to openCypher. +- Add `SHOW CONSTRAINT INFO` feature. +- Add `uniformSample` function to openCypher. +- Add regex matching to openCypher. + +### Bug Fixes and Other Changes + +- Fix bug in query comment parsing. +- Fix bug in query symbol table. +- Fix OpenSSL memory leaks. +- Make authentication case insensitive. +- Remove `COALESCE` function. +- Add movie tutorial. +- Add backpacking tutorial. + +## Memgraph v0.14.1 - Jan 22, 2019 + +### Bug Fixes and Other Changes + +- Fix bug in explicit transaction handling. +- Fix bug in edge filtering by edge type and destination. + +## Memgraph v0.14.0 - Oct 30, 2018 + +:::warning + +### Breaking Changes + +- Write-ahead log format changed (not backward compatible). + +::: + +### Major Features and Improvements + +- [Enterprise Ed.] Reduce memory usage in distributed usage. +- Add `DROP INDEX` feature. +- Improve SSL error messages. + +### Bug Fixes and Other Changes + +- [Enterprise Ed.] Fix issues with reading and writing in a distributed query. +- Correctly handle an edge case with unique constraint checks. +- Fix a minor issue with `mg_import_csv`. +- Fix an issue with `EXPLAIN`. + +## Memgraph v0.13.0 - Oct 18, 2018 + +:::warning + +### Breaking Changes + +- Write-ahead log format changed (not backward compatible). +- Snapshot format changed (not backward compatible). + +::: + +### Major Features and Improvements + +- [Enterprise Ed.] Authentication and authorization support. +- [Enterprise Ed.] Kafka integration. +- [Enterprise Ed.] Support dynamic worker addition in distributed. +- Reduce memory usage and improve overall performance. +- Add `CREATE UNIQUE INDEX` clause to openCypher. +- Add `EXPLAIN` clause to openCypher. +- Add `inDegree` and `outDegree` functions to openCypher. +- Improve BFS performance when both endpoints are known. +- Add new `node-label`, `relationship-type` and `quote` options to + `mg_import_csv` tool. +- Reduce memory usage of `mg_import_csv`. + +### Bug Fixes and Other Changes + +- [Enterprise Ed.] Fix an edge case in distributed index creation. +- [Enterprise Ed.] Fix issues with Cartesian in distributed queries. +- Correctly handle large messages in Bolt protocol. +- Fix issues when handling explicitly started transactions in queries. +- Allow openCypher keywords to be used as variable names. +- Revise and make user visible error messages consistent. +- Improve aborting time consuming execution. + +## Memgraph v0.12.0 - Jul 4, 2018 + +:::warning + +### Breaking Changes + +- Snapshot format changed (not backward compatible). + +::: + +### Major Features and Improvements + +- Improved Id Cypher function. +- Added string functions to openCypher (`lTrim`, `left`, `rTrim`, `replace`, + `reverse`, `right`, `split`, `substring`, `toLower`, `toUpper`, `trim` + ). +- Added `timestamp` function to openCypher. +- Added support for dynamic property access with `[]` operator. + +## Memgraph v0.11.0 - Jun 20, 2018 + +### Major Features and Improvements + +- [Enterprise Ed.] Improve Cartesian support in distributed queries. +- [Enterprise Ed.] Improve distributed execution of BFS. +- [Enterprise Ed.] Dynamic graph partitioner added. +- Static nodes/edges id generators exposed through the Id Cypher function. +- Properties on disk added. +- Telemetry added. +- SSL support added. +- `toString` function added. + +### Bug Fixes and Other Changes + +- Document issues with Docker on OS X. +- Add BFS and Dijkstra's algorithm examples to documentation. + +## Memgraph v0.10.0 - Apr 24, 2018 + +:::warning + +### Breaking Changes + +- Snapshot format changed (not backward compatible). + +::: + +### Major Features and Improvements + +- [Enterprise Ed.] Distributed storage and execution. +- `reduce` and `single` functions added to openCypher. +- `wShortest` edge expansion added to openCypher. +- Support packaging RPM on CentOS 7. + +### Bug Fixes and Other Changes + +- Report an error if updating a deleted element. +- Log an error if reading info on available memory fails. +- Fix a bug when `MATCH` would stop matching if a result was empty, but later + results still contain data to be matched. The simplest case of this was the + query: `UNWIND [1, 2, 3] AS x MATCH (n: Label {prop: x}) RETURN n`. If there + was no node `(: Label {prop: 1})`, then the `MATCH` wouldn't even try to find + for `x` being 2 or 3. +- Report an error if trying to compare a property value with something that + cannot be stored in a property. +- Fix crashes in some obscure cases. +- Commit log automatically garbage collected. +- Add minor performance improvements. + +## Memgraph v0.9.0 - Dec 18, 2017 + +:::warning + +### Breaking Changes + +- Snapshot format changed (not backward compatible). +- Snapshot configuration flags changed, general durability flags added. + +::: + +### Major Features and Improvements + +- Write-ahead log added. +- `nodes` and `relationships` functions added. +- `UNION` and `UNION ALL` is implemented. +- Concurrent index creation is now enabled. + +### Bug Fixes and Other Changes + +## Memgraph v0.8.0 + +### Major Features and Improvements + +- CASE construct (without aggregations). +- Named path support added. +- Maps can now be stored as node/edge properties. +- Map indexing supported. +- `rand` function added. +- `assert` function added. +- `counter` and `counterSet` functions added. +- `indexInfo` function added. +- `collect` aggregation now supports Map collection. +- Changed the BFS syntax. + +### Bug Fixes and Other Changes + +- Use \u to specify 4 digit codepoint and \U for 8 digit +- Keywords appearing in header (named expressions) keep original case. +- Our Bolt protocol implementation is now completely compatible with the + protocol version 1 specification. (https://boltprotocol.org/v1/) +- Added a log warning when running out of memory and the + `memory_warning_threshold` flag +- Edges are no longer additionally filtered after expansion. + +## Memgraph v0.7.0 + +### Major Features and Improvements + +- Variable length path `MATCH`. +- Explicitly started transactions (multi-query transactions). +- Map literal. +- Query parameters (except for parameters in place of property maps). +- `all` function in openCypher. +- `degree` function in openCypher. +- User specified transaction execution timeout. + +### Bug Fixes and Other Changes + +- Concurrent `BUILD INDEX` deadlock now returns an error to the client. +- A `MATCH` preceeded by `OPTIONAL MATCH` expansion inconsistencies. +- High concurrency Antlr parsing bug. +- Indexing improvements. +- Query stripping and caching speedups. + +## Memgraph v0.6.0 + +### Major Features and Improvements + +- AST caching. +- Label + property index support. +- Different logging setup & format. + +## Memgraph v0.5.0 + +### Major Features and Improvements + +- Use label indexes to speed up querying. +- Generate multiple query plans and use the cost estimator to select the best. +- Snapshots & Recovery. +- Abandon old yaml configuration and migrate to gflags. +- Query stripping & AST caching support. + +### Bug Fixes and Other Changes + +- Fixed race condition in MVCC. Hints exp+aborted race condition prevented. +- Fixed conceptual bug in MVCC GC. Evaluate old records w.r.t. the oldest. + transaction's id AND snapshot. +- User friendly error messages thrown from the query engine. + +## Build 837 + +### Bug Fixes and Other Changes + +- List indexing supported with preceeding IN (for example in query `RETURN 1 IN [[1, 2]][0]`). + +## Build 825 + +### Major Features and Improvements + +- RETURN _, count(_), OPTIONAL MATCH, UNWIND, DISTINCT (except DISTINCT in + aggregate functions), list indexing and slicing, escaped labels, IN LIST + operator, range function. + +### Bug Fixes and Other Changes + +- TCP_NODELAY -> import should be faster. +- Clear hint bits. + +## Build 783 + +### Major Features and Improvements + +- SKIP, LIMIT, ORDER BY. +- Math functions. +- Initial support for MERGE clause. + +### Bug Fixes and Other Changes + +- Unhandled Lock Timeout Exception.