Cypher Tutorial
If youβre new to Cypher and just getting started with Kuzu, youβre in the right place! This tutorial will guide you through the basics of Cypher, including how to create nodes and relationships, and how to scan, copy and query your data thatβs in a Kuzu database.
See the link below for the example dataset used in this tutorial.
The dataset is a social network dataset of users and posts. Download the dataset and unzip it in your current working directory.
Working with the Kuzu CLI
Kuzuβs command line interface (CLI) is a great way to get started with Kuzu. It allows you to interact with your database in a terminal and quickly test out your ideas in Cypher.
Letβs start by running the Kuzu CLI shell. Install the Kuzu CLI and then run the following command to start the CLI.
# Open an in-memory databasekuzuDefine a schema
The first step in getting your data into Kuzu is creating node and relationship tables. This step is called schema definition. Follow the steps in the example database to create the node and relationship tables.
Ingest the data
The COPY FROM command is used to ingest data from various file formats into Kuzu. In this case,
the data is stored in CSV format, in the local directory tutorial_data.
COPY User FROM 'tutorial_data/node/user.csv';COPY Post FROM 'tutorial_data/node/post.csv';COPY LIKES FROM 'tutorial_data/relation/LIKES.csv';COPY FOLLOWS FROM 'tutorial_data/relation/FOLLOWS.csv';COPY POSTED FROM 'tutorial_data/relation/POSTS.csv';You should see messages in your terminal indicating how many tuples were copied into each table. An example is shown below.
ββββββββββββββββββββββββββββββββββββββββββββββββββ result ββ STRING ββββββββββββββββββββββββββββββββββββββββββββββββββ€β 20 tuples have been copied to the User table. ββββββββββββββββββββββββββββββββββββββββββββββββββMATCH
In Kuzu, a graphβs nodes and relationships are stored in tables. The MATCH clause is used to
find nodes that match the pattern specified in the clause. All entities in the pattern are
returned via the RETURN clause.
Match nodes
Letβs say we want to match only User nodes in the database. We can do this by specifying the label
in the MATCH clause.
MATCH (a:User) RETURN a.* LIMIT 3;Note the a.* in the RETURN clause. This is a wildcard that returns all properties of the User
node, regardless of how many properties there are.
ββββββββββββ¬βββββββββββββββββ¬βββββββββββββββββββββββββββ a.userID β a.username β a.account_creation_date ββ INT64 β STRING β DATE βββββββββββββΌβββββββββββββββββΌββββββββββββββββββββββββββ€β 1 β epicwolf202 β 2022-09-09 ββ 2 β silentninja637 β 2023-01-27 ββ 3 β stormcat597 β 2023-02-25 βββββββββββββ΄βββββββββββββββββ΄ββββββββββββββββββββββββββYou can also match nodes across multiple node tables, as shown below. However, it only makes sense to do this when there exists a common property between the two tables.
MATCH (a:User:Post) RETURN * LIMIT 3;Match a relationship pattern
You can match a relationship pattern by specifying the relationship in the MATCH clause.
In the below example, we match the LIKES relationship between a User node and a Post node.
MATCH (a:User)-[r:LIKES]->(b:Post) RETURN a.username, b.* LIMIT 3;βββββββββββββββ¬βββββββββββ¬ββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββ a.username β b.postID β b.creation_date β b.like_count β b.retweet_count ββ STRING β INT64 β DATE β INT64 β INT64 ββββββββββββββββΌβββββββββββΌββββββββββββββββββΌβββββββββββββββΌββββββββββββββββββ€β epicwolf202 β 5 β 2022-10-26 β 103 β 73 ββ epicwolf202 β 11 β 2020-02-03 β 448 β 168 ββ epicwolf202 β 18 β 2021-03-12 β 244 β 166 ββββββββββββββββ΄βββββββββββ΄ββββββββββββββββββ΄βββββββββββββββ΄ββββββββββββββββββThe user named epicwolf202 has liked the posts with IDs 5, 11 and 18.
Match on all nodes and relationships
If you want to match on an arbitrary node or relationship in the database, you can use the MATCH clause without any label
in the pattern.
// Ask for all nodes and relationships in the databaseMATCH (a)-[b]->(c) RETURN * LIMIT 3;ββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ a β c β b ββ NODE β NODE β REL βββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββ€β {_ID: 0:14, _LABEL: User, use... β {_ID: 0:0, _LABEL: User, user... β (0:14)-{_LABEL: FOLLOWS, _ID:... ββ {_ID: 0:15, _LABEL: User, use... β {_ID: 0:0, _LABEL: User, user... β (0:15)-{_LABEL: FOLLOWS, _ID:... ββ {_ID: 0:17, _LABEL: User, use... β {_ID: 0:0, _LABEL: User, user... β (0:17)-{_LABEL: FOLLOWS, _ID:... βββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββAn alternate way to do this would be to leave the node and relationship fields blank, but store the path in a variable p as shown below.
// Ask for all paths in the databaseMATCH p=()-[]->() RETURN p LIMIT 3;Both the above queries return similar results.
Match multiple patterns
You can combine multiple match clauses that each specify a particular pattern.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User), (a)-[:Follows]->(c)RETURN a.username AS user1, b.username AS user2, c.username AS user3LIMIT 3;The above query is the same as having written the following two match clauses one after the other.
MATCH (a:User)-[:Follows]->(b:User)-[:Follows]->(c:User)MATCH (a)-[:Follows]->(c)RETURN a.username, b.username, c.username LIMIT 3;Instead of repeating the match clause, you can comma-separate the clauses as shown above. The following result is returned.
ββββββββββββββββββ¬ββββββββββββββββ¬βββββββββββββββββ user1 β user2 β user3 ββ STRING β STRING β STRING βββββββββββββββββββΌββββββββββββββββΌββββββββββββββββ€β silentninja637 β stormninja678 β darkdog878 ββ silentninja637 β darkdog878 β stormninja678 ββ silentninja637 β stormcat597 β darkdog878 βββββββββββββββββββ΄ββββββββββββββββ΄ββββββββββββββββMatch variable-length relationships
One of the most powerful features of Cypher is the ability to match variable-length relationships.
This is done using the Kleene star operator *. The following query aims to find all users that
are one or two hops (i.e., paths with 1 or 2 edges) away from a particular user.
MATCH (a:User)-[:Follows*1..2]->(b:User)RETURN a.username, b.usernameLIMIT 5;βββββββββββββββ¬βββββββββββββββ a.username β b.username ββ STRING β STRING ββββββββββββββββΌββββββββββββββ€β epicwolf202 β epicking81 ββ epicwolf202 β darkdog878 ββ epicwolf202 β coolking201 ββ epicwolf202 β stormfox762 ββ epicwolf202 β coolwolf752 ββββββββββββββββ΄ββββββββββββββThe full Kleene star syntax has the form *<low>..<high>. In the above example, the lower bound was 1 and the upper bound was 2,
and so we match any paths with one or two edges. Alternatively, we can do *..<high> in which case the lower bound defaults to 1.
And, if we just want to match paths with an exact number of edges, we can simply do *<len> instead of specifying the same bound twice.
For example, if we want to find all users that are exactly two hops away, we can use the following query.
MATCH (a:User)-[:Follows*2]->(b:User)RETURN a.username, b.usernameLIMIT 5;ββββββββββββββββββ¬βββββββββββββββββ a.username β b.username ββ STRING β STRING βββββββββββββββββββΌββββββββββββββββ€β epicwolf202 β epicking81 ββ epicwolf202 β darkdog878 ββ epicwolf202 β coolking201 ββ silentninja637 β coolking201 ββ silentninja637 β smartdragon25 βββββββββββββββββββ΄ββββββββββββββββThe above result is telling us that the user in column b is followed by a user that the user in column a follows.
We can verify if this is true by manually writing the following query without the * operator.
MATCH (a:User)-[:Follows]->(x:User)-[:Follows]->(b:User)WHERE a.username = 'epicwolf202'RETURN a.username, b.username;βββββββββββββββ¬βββββββββββββββ a.username β b.username ββ STRING β STRING ββββββββββββββββΌββββββββββββββ€β epicwolf202 β coolking201 ββ epicwolf202 β darkdog878 ββ epicwolf202 β epicking81 ββββββββββββββββ΄ββββββββββββββIndeed, the same result is returned.
DISTINCT
The DISTINCT clause is used to return unique tuples in the result. The following query returns all
users who have liked a post. No duplicates are returned.
MATCH (u:User)-[:LIKES]->(p:Post)RETURN DISTINCT u.usernameLIMIT 3;βββββββββββββββββββ u.username ββ STRING βββββββββββββββββββ€β stormcat597 ββ silentninja637 ββ epicwolf202 βββββββββββββββββββOPTIONAL MATCH
The OPTIONAL MATCH clause is used to define a pattern to find in the database. The difference from a regular
MATCH is that if the system cannot match a pattern defined by OPTIONAL MATCH, it will set the values in
the variables defined only in the OPTIONAL MATCH, to NULL.
Depending on what the end goal is, returning nulls may or may not be acceptable, so use a conventional
MATCH if no nulls are desired.
MATCH (u1:User)OPTIONAL MATCH (u2)-[:Follows]->(u1:User)RETURN u1.username, u2.usernameLIMIT 3;βββββββββββββββ¬βββββββββββββββββ u1.username β u2.username ββ STRING β STRING ββββββββββββββββΌββββββββββββββββ€β epicwolf202 β smartdragon25 ββ epicwolf202 β fastqueen400 ββ epicwolf202 β mysticwolf198 ββββββββββββββββ΄ββββββββββββββββWHERE
The WHERE clause allows you to specify predicates/constraints on a part of your query. The query
below shows how to filter the results to only include users whose account was created before a
particular date.
MATCH (a:User)WHERE a.account_creation_date < DATE('2023-02-01')RETURN a.username, a.account_creation_dateLIMIT 3;The date format in the WHERE predicate is specified in the format YYYY-MM-DD, as a string, and
transformed into a date object that can be compared with the account_creation_date property of the
User node.
ββββββββββββββββββ¬βββββββββββββββββββββββββββ a.username β a.account_creation_date ββ STRING β DATE βββββββββββββββββββΌββββββββββββββββββββββββββ€β epicwolf202 β 2022-09-09 ββ silentninja637 β 2023-01-27 ββ fastgirl798 β 2021-06-11 βββββββββββββββββββ΄ββββββββββββββββββββββββββWHERE EXISTS subquery
You can specify a subquery in a WHERE clause with the EXISTS keyword. The following query returns all users
who have at least one post.
MATCH (u:User)WHERE EXISTS { MATCH (u)-[:POSTED]->()}RETURN u.usernameLIMIT 3;βββββββββββββββββββ u.username ββ STRING βββββββββββββββββββ€β epicwolf202 ββ silentninja637 ββ stormcat597 βββββββββββββββββββGrouping and Aggregation
Cypher does not have an explicit GROUP BY clause. Instead, you can simply apply an aggregation function
in the RETURN clause and group by the specified property. The following query returns the total number of
posts.
MATCH (p:Post)RETURN COUNT(p) AS num_posts;ββββββββββββββ num_posts ββ INT64 ββββββββββββββ€β 50 ββββββββββββββThe following example shows how to group by the userID property and return the number of posts for each user.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.userID, COUNT(p) AS num_postsLIMIT 3;ββββββββββββ¬βββββββββββββ u.userID β num_posts ββ INT64 β INT64 βββββββββββββΌββββββββββββ€β 14 β 3 ββ 8 β 2 ββ 15 β 4 βββββββββββββ΄ββββββββββββORDER BY
The ORDER BY clause is used to sort the results of a query. The following query returns all users
sorted in descending order of the number of posts they have.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3;βββββββββββββββββ¬βββββββββββββ u.username β num_posts ββ STRING β INT64 ββββββββββββββββββΌββββββββββββ€β stormqueen831 β 5 ββ cooldragon866 β 5 ββ mysticcat651 β 4 ββββββββββββββββββ΄ββββββββββββWITH
The WITH clause is used to chain the results of one query to another. The example below shows how to
find the top 3 users with the most posts and then find the users who follow them.
MATCH (u:User)-[:POSTED]->(p:Post)WITH u, COUNT(p) AS num_postsORDER BY num_posts DESCLIMIT 3WITH uMATCH (u2:User)-[:Follows]->(u)RETURN u.username AS most_prolific_user, u2.username AS followerLIMIT 8;Note the use of the AS keyword to rename the columns in the result.
ββββββββββββββββββββββ¬βββββββββββββββββ most_prolific_user β follower ββ STRING β STRING βββββββββββββββββββββββΌββββββββββββββββ€β stormqueen831 β brightking765 ββ stormqueen831 β coolking201 ββ stormqueen831 β mysticwolf198 ββ stormqueen831 β fastqueen400 ββ cooldragon866 β silentguy245 ββ mysticcat651 β coolking201 ββ mysticcat651 β darkdog878 ββ mysticcat651 β cooldragon866 βββββββββββββββββββββββ΄ββββββββββββββββUNWIND
The UNWIND clause is used to unnest (i.e., explode) a list and return the elements as separate rows.
Consider that you have a list of user IDs and you want to find their account creation dates.
UNWIND ["epicwolf202", "cooldragon866", "stormcat597"] AS userMATCH (u:User {username: user})RETURN u.username, u.account_creation_date;βββββββββββββββββ¬βββββββββββββββββββββββββββ u.username β u.account_creation_date ββ STRING β DATE ββββββββββββββββββΌββββββββββββββββββββββββββ€β epicwolf202 β 2022-09-09 ββ cooldragon866 β 2020-10-30 ββ stormcat597 β 2023-02-25 ββββββββββββββββββ΄ββββββββββββββββββββββββββCASE
Cypher supports the CASE expression to handle conditional logic. The following query returns
the number of posts for each user, but only if the post was created after a certain date.
MATCH (u:User)-[:POSTED]->(p:Post)RETURN u.username, COUNT(CASE WHEN p.creation_date > DATE('2023-04-01') THEN p ELSE NULL END) AS num_postsORDER BY num_posts DESC LIMIT 3;ββββββββββββββββββ¬βββββββββββββ u.username β num_posts ββ STRING β INT64 βββββββββββββββββββΌββββββββββββ€β cooldragon866 β 2 ββ silentninja637 β 2 ββ epiccat105 β 1 βββββββββββββββββββ΄ββββββββββββThis was a basic introduction to the kinds of queries you can ask in Cypher. There are many other clauses in Cypher that were not covered in this tutorial. Check out the Cypher manual for more information!