Optimizing Graphs with Indexing
Indexing is a technique that enhances query performance by providing an efficient way to access data in the database. In the context of graph queries, indexing can considerably improve traversals, searches, and other operations involving vertices and edges. Indexes allow the database to quickly locate relevant data without scanning the entire collection.
For more information about indexes and indexing, refer to Indexing.
Key Indexing Aspects for Graph Queries
Here are some important aspects to consider when indexing for graph queries.
Vertex-Centric Indexes
Vertex-centric indexes optimize queries involving filtering or sorting based on vertex properties. In Macrometa, create a "persistent" index on a vertex collection to expedite queries filtering vertices by specific attributes. For example, in a social network graph, if querying users by their email address is frequent, a "persistent" index on the email attribute can enhance query performance.
Edge-Centric Indexes
Edge-centric indexes optimize queries involving filtering or sorting based on edge properties. In Macrometa, create a "persistent" index on an edge collection to expedite queries filtering edges by specific attributes. For example, if querying relationships in a social network graph by a "relationship type" attribute is common, a "persistent" index on the relationship type can enhance query performance.
Improving Traversal Performance
Traversal performance can be impacted by the choice of indexes on vertex and edge collections. Proper indexing can quickly locate traversal starting points and navigate the graph efficiently. Analyze your application's specific traversal patterns to determine the most beneficial indexes.
Balancing Indexing
When creating indexes, balance the benefits of indexing with the overhead of maintaining them. Indexes improve query performance but consume storage space and require extra maintenance during data modifications. Also, indexes speed up reads, but they might slow down writes. Create indexes selectively, concentrating on the properties and relationships frequently used in your application's queries.
Choosing the Right Index Type for Graph Performance
Selecting the appropriate index type for your specific use case and query patterns is crucial for enhancing graph performance. Consider the following index types for graph data in Macrometa.
Persistent Index
A persistent index is stored on disk rather than in memory. This index type is useful for large datasets that do not fit in memory, but it might have slightly higher access latency compared to in-memory indexes. If your graph queries involve large datasets and require filtering or sorting based on attribute values, then a persistent index might be an appropriate choice.
Use case: Large graphs where the index size exceeds available memory, and queries involve sorting or filtering based on attribute values.
Fulltext Index
A fulltext index is designed for text-based searches, supporting word or phrase searches within the indexed attributes. If your graph queries involve searching for vertices or edges based on text content, then a fulltext index can be useful.
Use case: Searching for posts containing specific keywords or finding users based on their interests.
Hash Index
Hash indexes are deprecated and replaced by persistent indexes. You can still use these indexes in the API, but not the web console.
A hash index is ideal for equality lookups on single attributes. It offers fast document access based on the indexed attribute but does not support range queries or sorting. If your graph queries involve filtering vertices or edges by a specific attribute's exact value, then a hash index can be useful.
Use case: Searching for users by their email address or finding edges with a specific relationship type.
Skiplist Index
Skiplist indexes are deprecated and replaced by persistent indexes. You can still use these indexes in the API, but not the web console.
A skiplist index is suitable for single or multiple attributes, supporting both equality lookups and range queries. It also enables sorting by the indexed attribute(s). If your graph queries involve filtering or sorting vertices or edges based on one or multiple attributes' values, particularly when range queries are involved, a skiplist index can be helpful.
Use case: Locating posts within a specific date range or sorting edges by a weight attribute.
Example: Using Indexes with Graphs
In this theoretical example, a simple graph with two vertex collections and one edge collection is created, indexes are added to optimize query performance, and then a graph query is executed using those indexes. The graph represents a social network with users and posts, where users can create posts and follow other users.
Creating Collections
Before traversing graphs, collections need to be created and data added to them:
- Vertex (document) collection
users
- stores user information. - Vertex (document) collection
posts
- stores post information. - Edge collection
relations
- stores relationships between users (follows) and user-post relationships (creates).
For more information about creating collections, refer to Collections.
Adding Indexes
After creating these collections and inserting sample data, indexes can be added to optimize query performance.
For more information on adding indexes, refer to Working with Indexes or the gdnsl index command.
email_index: Add a "persistent" index on the email
attribute in the users
collection. This index would speed up queries that filter users based on their email addresses.
Running the Query
With the added indexes, a graph query could be executed. Suppose the goal is to find all posts created by users followed by a specific user, identified by their email address. A C8QL query to achieve this might look like:
LET followers = (FOR follower IN users FILTER follower.email == 'jane@example.com' RETURN follower)
LET relevantRelations = (FOR follower IN followers FOR relation IN relations FILTER relation._from == follower._id RETURN relation)
LET followees = (FOR relation IN relevantRelations FOR followee IN users FILTER followee._id == relation._to RETURN followee)
LET relevantPostRelations = (FOR followee IN followees FOR postRelation IN relations FILTER postRelation._from == followee._id RETURN postRelation)
FOR postRelation IN relevantPostRelations
FOR post IN posts
FILTER post._id == postRelation._to
RETURN post
In this query, the user with the given email address is first found using the email_index
. Then, the relations
collection is traversed using the from_index
to find edges that start from the follower. Next, the followee
users and their associated posts are found by traversing the graph.
By using the appropriate indexes, this query would execute more efficiently, reducing the time it takes to search for relevant vertices and edges in the collections.
Keep in mind that the actual performance improvement depends on various factors, including the size of the collections, the complexity of the graph, and the specific query patterns used in an application. It is essential to analyze an application's query patterns and create indexes selectively, focusing on the properties and relationships that are most frequently used in your queries.