Why does neo4j warn: “This query builds a cartesian product between disconnected patterns”?

The browser is telling you that:

  1. It is handling your query by doing a comparison between every Gene instance and every Chromosome instance. If your DB has G genes and C chromosomes, then the complexity of the query is O(GC). For instance, if we are working with the human genome, there are 46 chromosomes and maybe 25000 genes, so the DB would have to do 1150000 comparisons.
  2. You might be able to improve the complexity (and performance) by altering your query. For example, if we created an index on :Gene(chromosomeID), and altered the query so that we initially matched just on the node with the smallest cardinality (the 46 chromosomes), we would only do O(G) (or 25000) “comparisons” — and those comparisons would actually be quick index lookups! This is approach should be much faster.

    Once we have created the index, we can use this query:

    MATCH (c:Chromosome)
    WITH c
    MATCH (g:Gene) 
    WHERE g.chromosomeID = c.chromosomeID
    CREATE (g)-[:PART_OF]->(c);
    

    It uses a WITH clause to force the first MATCH clause to execute first, avoiding the cartesian product. The second MATCH (and WHERE) clause uses the results of the first MATCH clause and the index to quickly get the exact genes that belong to each chromosome.

[UPDATE]

The WITH clause was helpful when this answer was originally written. The Cypher planner in newer versions of neo4j (like 4.0.3) now generate the same plan even if the WITH is omitted, and without creating a cartesian product. You can always PROFILE both versions of your query to see the effect with/without the WITH.

Leave a Comment