内容主要来自 Coursera 课程 Big Data Graph Analytics，在这对 Cypher 语句做个整理，方便查阅。包括基本语句、以及路径分析、链接分析样例。

Neo4j 使用 Cypher 查询图形数据，Cypher 是描述性的图形查询语言，语法简单功能强大，由于 Neo4j 在图形数据库家族中处于绝对领先的地位，拥有众多的用户基数，Cypher 成为图形查询语言的事实上的标准。
和 SQL 很相似，Cypher 语言的关键字不区分大小写，但是属性值，标签，关系类型和变量是区分大小写的。

看一下基本的概念

变量(Variable)
变量用于对搜索模式的部分进行命名，并在同一个查询中引用，在小括号()中命名变量，变量名是区分大小写的，示例代码创建了两个变量：n 和 b，通过 return 子句返回变量 b；
1
2
MATCH (n)-->(b)
RETURN b
访问属性(Property)
在 Cypher 查询中，通过点来访问属性，格式是：Variable.PropertyKey，通过 id 函数来访问实体的 ID，格式是 id(Variable)。
1
2
3
match (n)-->(b)
where id(n)=5 and b.age=18
return b;
节点(Node)
节点模式的构成：(Variable:Lable1 {Key1:Value1,Key2,Value2})
每个节点都有一个整数 ID，在创建新的节点时，Neo4j 自动为节点设置 ID 值，在整个数据库中，节点的 ID 值是递增和唯一的。
下面的 Cypher 查询创建一个节点，标签是 Person，具有两个属性 name 和 born，通过 RETURN 子句，返回新建的节点：
1
create (n:Person { name: 'Tom Hanks', born: 1956 }) return n;
匹配(Match)
通过match子句查询数据库，match子句用于指定搜索的模式（Pattern），where子句为match模式增加谓词（Predicate），用于对Pattern进行约束；
1
match(n) return n;
关系(Relation)
关系的构成：StartNode - [Variable:RelationshipType {Key1:Value1, Key2:Value2}] -> EndNode
创建关系时，必须指定关系类型
1
2
3
4
MATCH (a:Person),(b:Movie)
WHERE a.name = 'Robert Zemeckis' AND b.title = 'Forrest Gump'
CREATE (a)-[r:DIRECTED]->(b)
RETURN r;

本篇数据集：
链接: http://pan.baidu.com/s/1dEHWQch
密码: 00v4

Create and Delete

建图要求

==============
Five Nodes
N1 = Tom
N2 = Harry
N3 = Julian
N4 = Michele
N5 = Josephine
Five Edges
e1 = Harry ‘is known by’ Tom
e2 = Julian ‘is co-worker of’ Harry
e3 = Michele ‘is wife of’ Harry
e4 = Josephine ‘is wife of’ Tom
e5 = Josephine ‘is friend of’ Michele
==============
A simple text description of a graph
N1 - e1 -> N2
N2 - e2 -> N3
N2 - e3 -> N4
N1 - e4 -> N5
N4 - e5 -> N5
==============

创建完整的 graph

create (N1:ToyNode {name: 'Tom'}) - [:ToyRelation {relationship: 'knows'}] -> (N2:ToyNode {name: 'Harry'}),
(N2) - [:ToyRelation {relationship: 'co-worker'}] -> (N3:ToyNode {name: 'Julian', job: 'plumber'}),
(N2) - [:ToyRelation {relationship: 'wife'}] -> (N4:ToyNode {name: 'Michele', job: 'accountant'}),
(N1) - [:ToyRelation {relationship: 'wife'}] -> (N5:ToyNode {name: 'Josephine', job: 'manager'}),
(N4) - [:ToyRelation {relationship: 'friend'}] -> (N5)
;

ToyNode is a node type and ToyRelation is an edge type. ToyNode can have properties, so can ToyRelation.

//View the resulting graph

1	match (n:ToyNode)-[r]-(m) return n, r, m

//Delete all nodes and edges

1	match (n)-[r]-() delete n, r

//Delete all nodes which have no edges

1	match (n) delete n

//Delete only ToyNode nodes which have no edges

1	match (n:ToyNode) delete n

//Delete all edges

1	match (n)-[r]-() delete r

//Delete only ToyRelation edges

1	match (n)-[r:ToyRelation]-() delete r

//Selecting an existing single ToyNode node

1	match (n:ToyNode {name:'Julian'}) return n

Adding or Modify

Merge 子句的作用：当模式（Pattern）存在时，匹配该模式；当模式不存在时，创建新的模式，功能是 match 子句和 create 的组合。在 merge 子句之后，可以显式指定 on create 和 on match 子句，用于修改绑定的节点或关系的属性。

通过 merge 子句，可以指定图形中必须存在一个节点，该节点必须具有特定的标签，属性等，如果不存在，那么 merge 子句将创建相应的节点。

//Adding a Node Correctly
First find a node you wanna add to, then add the node.

1 2	match (n:ToyNode {name:'Julian'}) merge (n)-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})

//Adding a Node Incorrectly

1	create (n:ToyNode {name:'Julian'})-[:ToyRelation {relationship: 'fiancee'}]->(m:ToyNode {name:'Joyce', job:'store clerk'})

//Correct your mistake by deleting the bad nodes and edge

1	match (n:ToyNode {name:'Joyce'})-[r]-(m) delete n, r, m

//Modify a Node’s Information

1 2	match (n:ToyNode) where n.name = 'Harry' set n.job = 'drummer' match (n:ToyNode) where n.name = 'Harry' set n.job = n.job + ['lead guitarist']

Import

//One way to “clean the slate” in Neo4j before importing (run both lines):

1 2	match (a)-[r]->() delete a,r match (a) delete a

//Script to Import Data Set: test.csv (simple road network)
//[NOTE: replace any spaces in your path with %20, “percent twenty” ]

LOAD CSV WITH HEADERS FROM "file:///test.csv" AS line
MERGE (n:MyNode {Name:line.Source})
MERGE (m:MyNode {Name:line.Target})
MERGE (n) -[:TO {dist:line.distance}]-> (m)

//Script to import global terrorist data

LOAD CSV WITH HEADERS FROM "file:///terrorist_data_subset.csv" AS row
MERGE (c:Country {Name:row.Country})
MERGE (a:Actor {Name: row.ActorName, Aliases: row.Aliases, Type: row.ActorType})
MERGE (o:Organization {Name: row.AffiliationTo})
MERGE (a)-[:AFFILIATED_TO {Start: row.AffiliationStartDate, End: row.AffiliationEndDate}]->(o)
MERGE(c)<-[:IS_FROM]-(a);

When you are loading CSVs you get an error like “Couldn’t load the external resource at: file: […]”, put your csv file in the right path like /Users/shuang/Documents/Neo4j/default.graphdb/import/, the problem will be solved.

Basic Graph Operations

//Counting the number of nodes

1 2	match (n:MyNode) return count(n)

//Counting the number of edges

1 2	match (n:MyNode)-[r]->() return count(r)

//Finding leaf nodes:
Leaf node: the node which have no outgoing edges

1
2
3

match (n:MyNode)-[r:TO]->(m)
where not ((m)-->())
return m

//Finding root nodes:
Root node: the node which have no incoming edges

1
2
3

match (m)-[r:TO]->(n:MyNode)
where not (()-->(m))
return m

//Finding triangles:
Triangle: a three cycle, consisting of three nodes and three edges where the beginning and end node are the same

1 2	match (a)-[:TO]->(b)-[:TO]->(c)-[:TO]->(a) return distinct a, b, c

//Finding 2nd neighbors of D:
2nd neighbor: two nodes away from D

1
2
3

match (a)-[:TO*..2]-(b)
where a.Name='D'
return distinct a, b

Some nodes appear to be only one node away from the node D but we can get to those nodes indirectly through another node, which means that they’re not only a first neighbor but they’re also a second neighbor.

//Finding the types of a node:

1
2
3

match (n)
where n.Name = 'Afghanistan'
return labels(n)

//Finding the label of an edge:

1 2	match (n {Name: 'Afghanistan'})<-[r]-() return distinct type(r)

//Finding all properties of a node:

1 2	match (n:Actor) return * limit 20

//Finding loops:

1 2	match (n)-[r]->(n) return n, r limit 10

//Finding multigraphs:
Multigraph: any two nodes which have two or more edges between them

1
2
3

match (n)-[r1]->(m), (n)-[r2]-(m)
where r1 <> r2
return n, r1, r2, m limit 10

remember to apply a constraint in which the edges must be different for the same pairs of nodes

//Finding the induced subgraph given a set of nodes:

1
2
3

match (n)-[r:TO]-(m)
where n.Name in ['A', 'B', 'C', 'D', 'E'] and m.Name in ['A', 'B', 'C', 'D', 'E']
return n, r, m

Path Analytics

//Viewing the graph

1 2	match (n:MyNode)-[r]->(m) return n, r, m

//Finding paths between specific nodes:
Use the match command to match p which is a variable we’re using to represent our path, = node a, going through an edge to node c. There’s something slightly different about this edge, and that is that we’re using a star to represent an arbitrary number of edges in sequence between a and c, and we’ll be returning all of those edges that are necessary to complete the path. And in this case we only want to return a single path.

1
2
3

match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return p limit 1

*Your results might not be the same as the video hands-on demo. If not, try the following query and it should return the shortest path between nodes H and P:

1	match p=(a)-[:TO*]-(c) where a.Name='H' and c.Name='P' return p order by length(p) asc limit 1

//Finding the length between specific nodes:

1
2
3

match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return length(p) limit 1

//Finding a shortest path between specific nodes:
Use a built-in command shortestPath

1
2
3

match p=shortestPath((a)-[:TO*]-(c))
where a.Name='A' and c.Name='P'
return p, length(p) limit 1

//All Shortest Paths:
Use a built-in command allShortestPaths

1
2
3

MATCH p = allShortestPaths((source)-[r:TO*]-(destination))
WHERE source.Name='A' AND destination.Name = 'P'
RETURN EXTRACT(n IN NODES(p)| n.Name) AS Paths

//All Shortest Paths with Path Conditions:

1
2
3

MATCH p = allShortestPaths((source)-[r:TO*]->(destination))
WHERE source.Name='A' AND destination.Name = 'P' AND LENGTH(NODES(p)) > 5
RETURN EXTRACT(n IN NODES(p)| n.Name) AS Paths,length(p)

//Diameter of the graph:
Diameter: the longest shortest path between two nodes in the graph
Returned in the form of an array. We’re using a new term, extract, which is based on the following. Assuming we have matched our path p, we want to identify all of the nodes in p and extract their names. And we’ll return these names as a listing, which we’ll call the variable paths. If there’s more than one shortest path, we’ll get multiple listings of node names.

match (n:MyNode), (m:MyNode)
where n <> m
with n, m
match p=shortestPath((n)-[*]->(m))
return n.Name, m.Name, length(p)
order by length(p) desc limit 1

//Extracting and computing with node and properties:
Returned as the variable pathLength.
Reduce line begins by setting a variable s equal to 0. And then define a variable e, which represents the set of relationships in a path that’s returned,
or in other words, the edges. And we pass that into this variable s, and add to it, the value of the distance that we’ve assigned to that edge.

match p=(a)-[:TO*]-(c)
where a.Name='H' and c.Name='P'
return extract(n in nodes(p)|n.Name) as Nodes, length(p) as pathLength,
reduce(s=0, e in relationships(p)| s + toInt(e.dist)) as pathDist limit 1

The path itself, as we know, begins in H and ends in P. And it has a pathLength of 8, but it has a pathDist of 40.
So we could interpret this to mean that even though there are 7 towns between the source town and the destination town, or a pathLength of 8,
the actual distance in miles would be a value of 40.

//Dijkstra’s algorithm for a specific target node:
This is not the path in our network with the least weights. It is the weight of the shortest path based on numbers of hops.

MATCH (from: MyNode {Name:'A'}), (to: MyNode {Name:'P'}),
path = shortestPath((from)-[:TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(path) | dist + toInt(rel.dist)) AS distance, path
RETURN path, distance

//Dijkstra’s algorithm SSSP:
What we’ve calculated is the shortest hop path with the weights added, the sum of the weights of the edges in that path. This is not the least weight path of the entire network.

MATCH (from: MyNode {Name:'A'}), (to: MyNode),
path = shortestPath((from)-[:TO*]->(to))
WITH REDUCE(dist = 0, rel in rels(path) | dist + toInt(rel.dist)) AS distance, path, from, to
RETURN from, to, path, distance order by distance desc

Problem not solved. Refer to allshortestPaths error start/end nodes the same with cypher.forbid_shortestpath_common_node=false

//Graph not containing a selected node:

1
2
3

match (n)-[r:TO]->(m)
where n.Name <> 'D' and m.Name <> 'D'
return n, r, m

//Shortest path over a Graph not containing a selected node:

1
2
3

match p=shortestPath((a {Name: 'A'})-[:TO*]-(b {Name: 'P'}))
where not('D' in (extract(n in nodes(p)|n.Name)))
return p, length(p)

//Graph not containing the immediate neighborhood of a specified node:
Remember to take leaf and root node into account.

match (d {Name:'D'})-[:TO]-(b)
with collect(distinct b.Name) as neighbors
match (n)-[r:TO]->(m)
where
not (n.Name in (neighbors+'D'))
and
not (m.Name in (neighbors+'D'))
return n, r, m
;
match (d {Name:'D'})-[:TO]-(b)-[:TO]->(leaf)
where not((leaf)-->())
return (leaf)
;
match (d {Name:'D'})-[:TO]-(b)<-[:TO]-(root)
where not((root)<--())
return (root)

The result for first statement.

//Graph not containing a selected neighborhood:

match (a {Name: 'F'})-[:TO*..2]-(b)
with collect(distinct b.Name) as MyList
match (n)-[r:TO]->(m)
where not(n.Name in MyList) and not (m.Name in MyList)
return distinct n, r, m

Connectivity Analytics

Connectivity analytics in terms of network robustness. In other words, a measure of how resistant a graph network is to being disconnected
Two ways of connectivity analytics: One computed the eigenvalues, and the second computed the degree distribution. For these examples, we’re going to use the second one, degree distributions.

//Viewing the graph

1 2	match (n:MyNode)-[r]->(m) return n, r, m

// Find the outdegree of all nodes

match (n:MyNode)-[r]->()
return n.Name as Node, count(r) as Outdegree
order by Outdegree
union
match (a:MyNode)-[r]->(leaf)
where not((leaf)-->())
return leaf.Name as Node, 0 as Outdegree

// Find the indegree of all nodes

match (n:MyNode)<-[r]-()
return n.Name as Node, count(r) as Indegree
order by Indegree
union
match (a:MyNode)<-[r]-(root)
where not((root)<--())
return root.Name as Node, 0 as Indegree

// Find the degree of all nodes

1
2
3

match (n:MyNode)-[r]-()
return n.Name, count(distinct r) as degree
order by degree

// Find degree histogram of the graph

1
2
3

match (n:MyNode)-[r]-()
with n as nodes, count(distinct r) as degree
return degree, count(nodes) order by degree asc

//Save the degree of the node as a new node property

match (n:MyNode)-[r]-()
with n, count(distinct r) as degree
set n.deg = degree
return n.Name, n.deg

// Construct the Adjacency Matrix of the graph
Philosophical issue：
Every database will allow you some analytical computation and the remainder of the analytical computations must be done outside of the database. However, it is always a judicious idea to get the database to achieve an intermediate result formatted in a way that you would need for the next computation. And then, you use that intermediate result as the input to the next computation. We’ve seen that a number of computations in graph analytics start with the adjacency matrix. So we should be able to force Cypher to produce an adjacency matrix

match (n:MyNode), (m:MyNode)
return n.Name, m.Name,
case
when (n)-->(m) then 1
else 0
end as value

// Construct the Normalized Laplacian Matrix of the graph

match (n:MyNode), (m:MyNode)
return n.Name, m.Name,
case
when n.Name = m.Name then 1
when (n)-->(m) then -1/(sqrt(toInt(n.deg))*sqrt(toInt(m.deg)))
else 0
end as value

Scale View

可以调整显示区的大小，浏览器调到 inspect 模式，在 d3 代码区域添加 scale 函数，如下。

References:

Neo3j Cypher Refcard 3.2
Neo4j Cypher查询语言详解