Cypher Tutorial

If you’re new to Cypher and just getting started with Kuzu, you’re in the right place! This tutorial will guide you through the basics of Cypher, including how to create nodes and relationships, and how to scan, copy and query your data that’s in a Kuzu database.

See the link below for the example dataset used in this tutorial.

Example dataset Social network dataset used in this tutorial

The dataset is a social network dataset of users and posts. Download the dataset and unzip it in your current working directory.

Working with the Kuzu CLI

Kuzu’s command line interface (CLI) is a great way to get started with Kuzu. It allows you to interact with your database in a terminal and quickly test out your ideas in Cypher.

Let’s start by running the Kuzu CLI shell. Install the Kuzu CLI and then run the following command to start the CLI.

# Open an in-memory database
kuzu

Define a schema

The first step in getting your data into Kuzu is creating node and relationship tables. This step is called schema definition. Follow the steps in the example database to create the node and relationship tables.

Ingest the data

The COPY FROM command is used to ingest data from various file formats into Kuzu. In this case, the data is stored in CSV format, in the local directory tutorial_data.

COPY User FROM 'tutorial_data/node/user.csv';
COPY Post FROM 'tutorial_data/node/post.csv';
COPY LIKES FROM 'tutorial_data/relation/LIKES.csv';
COPY FOLLOWS FROM 'tutorial_data/relation/FOLLOWS.csv';
COPY POSTED FROM 'tutorial_data/relation/POSTS.csv';

You should see messages in your terminal indicating how many tuples were copied into each table. An example is shown below.

┌───────────────────────────────────────────────┐
│ result                                        │
│ STRING                                        │
├───────────────────────────────────────────────┤
│ 20 tuples have been copied to the User table. │
└───────────────────────────────────────────────┘

MATCH

In Kuzu, a graph’s nodes and relationships are stored in tables. The MATCH clause is used to find nodes that match the pattern specified in the clause. All entities in the pattern are returned via the RETURN clause.

Match nodes

Let’s say we want to match only User nodes in the database. We can do this by specifying the label in the MATCH clause.

MATCH (a:User) RETURN a.* LIMIT 3;

Note the a.* in the RETURN clause. This is a wildcard that returns all properties of the User node, regardless of how many properties there are.

┌──────────┬────────────────┬─────────────────────────┐
│ a.userID │ a.username     │ a.account_creation_date │
│ INT64    │ STRING         │ DATE                    │
├──────────┼────────────────┼─────────────────────────┤
│ 1        │ epicwolf202    │ 2022-09-09              │
│ 2        │ silentninja637 │ 2023-01-27              │
│ 3        │ stormcat597    │ 2023-02-25              │
└──────────┴────────────────┴─────────────────────────┘

You can also match nodes across multiple node tables, as shown below. However, it only makes sense to do this when there exists a common property between the two tables.

MATCH (a:User:Post) RETURN * LIMIT 3;

Match a relationship pattern

You can match a relationship pattern by specifying the relationship in the MATCH clause. In the below example, we match the LIKES relationship between a User node and a Post node.

MATCH (a:User)-[r:LIKES]->(b:Post) RETURN a.username, b.* LIMIT 3;

┌─────────────┬──────────┬─────────────────┬──────────────┬─────────────────┐
│ a.username  │ b.postID │ b.creation_date │ b.like_count │ b.retweet_count │
│ STRING      │ INT64    │ DATE            │ INT64        │ INT64           │
├─────────────┼──────────┼─────────────────┼──────────────┼─────────────────┤
│ epicwolf202 │ 5        │ 2022-10-26      │ 103          │ 73              │
│ epicwolf202 │ 11       │ 2020-02-03      │ 448          │ 168             │
│ epicwolf202 │ 18       │ 2021-03-12      │ 244          │ 166             │
└─────────────┴──────────┴─────────────────┴──────────────┴─────────────────┘

The user named epicwolf202 has liked the posts with IDs 5, 11 and 18.

Match on all nodes and relationships

If you want to match on an arbitrary node or relationship in the database, you can use the MATCH clause without any label in the pattern.

// Ask for all nodes and relationships in the database
MATCH (a)-[b]->(c) RETURN * LIMIT 3;

┌──────────────────────────────────┬──────────────────────────────────┬──────────────────────────────────┐
│ a                                │ c                                │ b                                │
│ NODE                             │ NODE                             │ REL                              │
├──────────────────────────────────┼──────────────────────────────────┼──────────────────────────────────┤
│ {_ID: 0:14, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:14)-{_LABEL: FOLLOWS, _ID:... │
│ {_ID: 0:15, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:15)-{_LABEL: FOLLOWS, _ID:... │
│ {_ID: 0:17, _LABEL: User, use... │ {_ID: 0:0, _LABEL: User, user... │ (0:17)-{_LABEL: FOLLOWS, _ID:... │
└──────────────────────────────────┴──────────────────────────────────┴──────────────────────────────────┘

An alternate way to do this would be to leave the node and relationship fields blank, but store the path in a variable p as shown below.

// Ask for all paths in the database
MATCH p=()-[]->() RETURN p LIMIT 3;

Both the above queries return similar results.

Match multiple patterns

You can combine multiple match clauses that each specify a particular pattern.

MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User),
      (a)-[:Follows]->(c)
RETURN
  a.username AS user1,
  b.username AS user2,
  c.username AS user3
LIMIT 3;

The above query is the same as having written the following two match clauses one after the other.

MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User)
MATCH (a)-[:Follows]->(c)
RETURN a.username, b.username, c.username LIMIT 3;

Instead of repeating the match clause, you can comma-separate the clauses as shown above. The following result is returned.

┌────────────────┬───────────────┬───────────────┐
│ user1          │ user2         │ user3         │
│ STRING         │ STRING        │ STRING        │
├────────────────┼───────────────┼───────────────┤
│ silentninja637 │ stormninja678 │ darkdog878    │
│ silentninja637 │ darkdog878    │ stormninja678 │
│ silentninja637 │ stormcat597   │ darkdog878    │
└────────────────┴───────────────┴───────────────┘

Match variable-length relationships

One of the most powerful features of Cypher is the ability to match variable-length relationships. This is done using the Kleene star operator *. The following query aims to find all users that are one or two hops (i.e., paths with 1 or 2 edges) away from a particular user.

MATCH (a:User)-[:Follows*1..2]->(b:User)
RETURN a.username, b.username
LIMIT 5;

┌─────────────┬─────────────┐
│ a.username  │ b.username  │
│ STRING      │ STRING      │
├─────────────┼─────────────┤
│ epicwolf202 │ epicking81  │
│ epicwolf202 │ darkdog878  │
│ epicwolf202 │ coolking201 │
│ epicwolf202 │ stormfox762 │
│ epicwolf202 │ coolwolf752 │
└─────────────┴─────────────┘

The full Kleene star syntax has the form *<low>..<high>. In the above example, the lower bound was 1 and the upper bound was 2, and so we match any paths with one or two edges. Alternatively, we can do *..<high> in which case the lower bound defaults to 1. And, if we just want to match paths with an exact number of edges, we can simply do *<len> instead of specifying the same bound twice.

For example, if we want to find all users that are exactly two hops away, we can use the following query.

MATCH (a:User)-[:Follows*2]->(b:User)
RETURN a.username, b.username
LIMIT 5;

┌────────────────┬───────────────┐
│ a.username     │ b.username    │
│ STRING         │ STRING        │
├────────────────┼───────────────┤
│ epicwolf202    │ epicking81    │
│ epicwolf202    │ darkdog878    │
│ epicwolf202    │ coolking201   │
│ silentninja637 │ coolking201   │
│ silentninja637 │ smartdragon25 │
└────────────────┴───────────────┘

The above result is telling us that the user in column b is followed by a user that the user in column a follows. We can verify if this is true by manually writing the following query without the * operator.

MATCH (a:User)-[:Follows]->(x:User)-[:Follows]->(b:User)
WHERE a.username = 'epicwolf202'
RETURN a.username, b.username;

┌─────────────┬─────────────┐
│ a.username  │ b.username  │
│ STRING      │ STRING      │
├─────────────┼─────────────┤
│ epicwolf202 │ coolking201 │
│ epicwolf202 │ darkdog878  │
│ epicwolf202 │ epicking81  │
└─────────────┴─────────────┘

Indeed, the same result is returned.

DISTINCT

The DISTINCT clause is used to return unique tuples in the result. The following query returns all users who have liked a post. No duplicates are returned.

MATCH (u:User)-[:LIKES]->(p:Post)
RETURN DISTINCT u.username
LIMIT 3;

┌────────────────┐
│ u.username     │
│ STRING         │
├────────────────┤
│ stormcat597    │
│ silentninja637 │
│ epicwolf202    │
└────────────────┘

OPTIONAL MATCH

The OPTIONAL MATCH clause is used to define a pattern to find in the database. The difference from a regular MATCH is that if the system cannot match a pattern defined by OPTIONAL MATCH, it will set the values in the variables defined only in the OPTIONAL MATCH, to NULL. Depending on what the end goal is, returning nulls may or may not be acceptable, so use a conventional MATCH if no nulls are desired.

MATCH (u1:User)
OPTIONAL MATCH (u2)-[:Follows]->(u1:User)
RETURN u1.username, u2.username
LIMIT 3;

┌─────────────┬───────────────┐
│ u1.username │ u2.username   │
│ STRING      │ STRING        │
├─────────────┼───────────────┤
│ epicwolf202 │ smartdragon25 │
│ epicwolf202 │ fastqueen400  │
│ epicwolf202 │ mysticwolf198 │
└─────────────┴───────────────┘

WHERE

The WHERE clause allows you to specify predicates/constraints on a part of your query. The query below shows how to filter the results to only include users whose account was created before a particular date.

MATCH (a:User)
WHERE a.account_creation_date < DATE('2023-02-01')
RETURN a.username, a.account_creation_date
LIMIT 3;

The date format in the WHERE predicate is specified in the format YYYY-MM-DD, as a string, and transformed into a date object that can be compared with the account_creation_date property of the User node.

┌────────────────┬─────────────────────────┐
│ a.username     │ a.account_creation_date │
│ STRING         │ DATE                    │
├────────────────┼─────────────────────────┤
│ epicwolf202    │ 2022-09-09              │
│ silentninja637 │ 2023-01-27              │
│ fastgirl798    │ 2021-06-11              │
└────────────────┴─────────────────────────┘

WHERE EXISTS subquery

You can specify a subquery in a WHERE clause with the EXISTS keyword. The following query returns all users who have at least one post.

MATCH (u:User)
WHERE EXISTS {
  MATCH (u)-[:POSTED]->()
}
RETURN u.username
LIMIT 3;

┌────────────────┐
│ u.username     │
│ STRING         │
├────────────────┤
│ epicwolf202    │
│ silentninja637 │
│ stormcat597    │
└────────────────┘

Grouping and Aggregation

Cypher does not have an explicit GROUP BY clause. Instead, you can simply apply an aggregation function in the RETURN clause and group by the specified property. The following query returns the total number of posts.

MATCH (p:Post)
RETURN COUNT(p) AS num_posts;

┌───────────┐
│ num_posts │
│ INT64     │
├───────────┤
│ 50        │
└───────────┘

The following example shows how to group by the userID property and return the number of posts for each user.

MATCH (u:User)-[:POSTED]->(p:Post)
RETURN u.userID, COUNT(p) AS num_posts
LIMIT 3;

┌──────────┬───────────┐
│ u.userID │ num_posts │
│ INT64    │ INT64     │
├──────────┼───────────┤
│ 14       │ 3         │
│ 8        │ 2         │
│ 15       │ 4         │
└──────────┴───────────┘

ORDER BY

The ORDER BY clause is used to sort the results of a query. The following query returns all users sorted in descending order of the number of posts they have.

MATCH (u:User)-[:POSTED]->(p:Post)
RETURN u.username, COUNT(p) AS num_posts
ORDER BY num_posts DESC
LIMIT 3;

┌───────────────┬───────────┐
│ u.username    │ num_posts │
│ STRING        │ INT64     │
├───────────────┼───────────┤
│ stormqueen831 │ 5         │
│ cooldragon866 │ 5         │
│ mysticcat651  │ 4         │
└───────────────┴───────────┘

WITH

The WITH clause is used to chain the results of one query to another. The example below shows how to find the top 3 users with the most posts and then find the users who follow them.

MATCH (u:User)-[:POSTED]->(p:Post)
WITH u, COUNT(p) AS num_posts
ORDER BY num_posts DESC
LIMIT 3
WITH u
MATCH (u2:User)-[:Follows]->(u)
RETURN u.username AS most_prolific_user, u2.username AS follower
LIMIT 8;

Note the use of the AS keyword to rename the columns in the result.

┌────────────────────┬───────────────┐
│ most_prolific_user │ follower      │
│ STRING             │ STRING        │
├────────────────────┼───────────────┤
│ stormqueen831      │ brightking765 │
│ stormqueen831      │ coolking201   │
│ stormqueen831      │ mysticwolf198 │
│ stormqueen831      │ fastqueen400  │
│ cooldragon866      │ silentguy245  │
│ mysticcat651       │ coolking201   │
│ mysticcat651       │ darkdog878    │
│ mysticcat651       │ cooldragon866 │
└────────────────────┴───────────────┘

UNWIND

The UNWIND clause is used to unnest (i.e., explode) a list and return the elements as separate rows. Consider that you have a list of user IDs and you want to find their account creation dates.

UNWIND ["epicwolf202", "cooldragon866", "stormcat597"] AS user
MATCH (u:User {username: user})
RETURN u.username, u.account_creation_date;

┌───────────────┬─────────────────────────┐
│ u.username    │ u.account_creation_date │
│ STRING        │ DATE                    │
├───────────────┼─────────────────────────┤
│ epicwolf202   │ 2022-09-09              │
│ cooldragon866 │ 2020-10-30              │
│ stormcat597   │ 2023-02-25              │
└───────────────┴─────────────────────────┘

CASE

Cypher supports the CASE expression to handle conditional logic. The following query returns the number of posts for each user, but only if the post was created after a certain date.

MATCH (u:User)-[:POSTED]->(p:Post)
RETURN u.username,
       COUNT(CASE WHEN p.creation_date > DATE('2023-04-01') THEN p ELSE NULL END) AS num_posts
ORDER BY num_posts DESC LIMIT 3;

┌────────────────┬───────────┐
│ u.username     │ num_posts │
│ STRING         │ INT64     │
├────────────────┼───────────┤
│ cooldragon866  │ 2         │
│ silentninja637 │ 2         │
│ epiccat105     │ 1         │
└────────────────┴───────────┘

This was a basic introduction to the kinds of queries you can ask in Cypher. There are many other clauses in Cypher that were not covered in this tutorial. Check out the Cypher manual for more information!