What is Neo4j?
Neo4j is a high-performance, NoSQL database based on graph principle. There is no such thing as a table with strictly defined fields, it operates with a flexible structure in the form of nodes and links between them.
Terminology of Neo4j and graph databases in general.
- graph database , graph database - a database built on graphs - nodes and links between them
- Cypher is a language for writing queries to the Neo4j database (approximately, like SQL in MYSQL)
- node , node - an object in the database, a graph node. The number of nodes is limited to 2 to the extent of 35 ~ 34 billion
- node label , node label - is used as a conditional “node type”. For example, movie nodes can be associated with actor nodes. The node labels are case-sensitive , and Cypher does not give an error if you type the name in the wrong register.
- relation , a connection is a connection between two nodes, a graph edge. The number of links is limited to 2 to the extent of 35 ~ 34 billion
- relation identirfier , the type of connection is in Neo4j for connections. The maximum number of link types is 32767
- properties , node properties - a set of data that can be assigned to a node. For example, if a node is a product, then in the properties of the node, you can store the product id from the MySQL database
- node ID , node ID - the node 's unique identifier. By default, it is this ID that is displayed when viewing the result. How to use it in Cypher queries I did not find
Simple Cypher Team
Creating a node with a label
create (n:Ware {wareId: 1});
Select all nodes
MATCH (n) RETURN n;
Counter
MATCH (n:Ware {wareId:1}) RETURN "Our graph have "+count(*)+" Nodes with label Ware and wareId=1" as counter;
Create 2 related nodes
CREATE (n{wareId:1})-[r:SUIT]->(m{criteriaId:1})
Link 2 existing nodes
MATCH (a {wareId: 1}), (b {criteriaId: 2}) MERGE (a)-[r:SUIT]->(b)
Delete all related nodes
match (n)-[r]-() DELETE n,r;
Delete all unbound nodes - if you try to run this command in the database where there are related nodes, it will not work. It is necessary to delete the connected nodes first.
match n DELETE n;
Choose products that fit criterion 3
MATCH (a:Ware)-->(b:Criteria {criteriaId: 3}) RETURN a;
Immediately several Cypher commands the web client can not perform. They say that the old client can do it, but I have not found such an opportunity. Therefore, you need to copy 1 line.
You can create a set of nodes with connections by one command, you need to give different names to the nodes, you can give names to the links
CREATE (w1:Ware{wareId:1})-[:SUIT]->(c1:Criteria{criteriaId:1}), (w2:Ware{wareId:2})-[:SUIT]->(c2:Criteria{criteriaId:2}), (w3:Ware{wareId:3})-[:SUIT]->(c3:Criteria{criteriaId:3}), (w4:Ware{wareId:4})-[:SUIT]->(c1), (w5:Ware{wareId:5})-[:SUIT]->(c1), (w4)-[:SUIT]->(c2), (w5)-[:SUIT]->(c3);
Get this structure. If you look less clear, you can rearrange the node with the mouse.
How did I get to this?
For more than a year I have not used SQL in my projects since I tried the document-oriented DBMS "MongoDB". After MySQL, my joy knew no bounds, how simple and convenient everything can be done in MongoDB. For the year, in our website development studio, we rewrote the top three CMS using Mongo's main features with its documents, and about a dozen sites operating on their basis. Everything was fine, and I had already begun to forget what it was to write requests in fifty lines for each action from the database and everything would be fine until the project fell on my head with a bunch of relationships that didn’t fit into the documents. I really didn’t want to go back to SQL, and I spent a couple of days purely searching for a NoSQL solution that allows you to make flexible connections — on graph DBMS. And for a number of reasons my choice was on Neo4j, one of the main reasons is that my engine was written in PHP, and for it a good driver "Neo4jPHP" was written, which covers almost 100% of the REST interface provided by the Noe4j server.
Closer to the point
Graph databases, in the first place, are designed to solve those problems where the data are closely related to each other in relationships that can go into several levels. For example, in relational databases it’s not difficult for us to fulfill a query: “Give me a list of all the actors who were in the film with Kevin Bacon.”
> SELECT actor_name, role_name FROM roles WHERE movie_title IN (SELECT DISTINCT movie_title FROM roles WHERE actor_name='Kevin Bacon')
Cited an example with a request, you can rewrite it in your head using "JOIN".
But suppose we want to get the names of all the actors who were in the movie with someone who was in the movie with Kevin Bacon. And here we have another JOIN. And now try adding a third degree: “The one who was in the movie with someone, who was in the movie with someone who was in the film with Kevin Bacon.” It sounds scary, but the task is real and with each new connection we need add JOIN, and the query will be becoming more complex, time consuming, less and less productive.
Deep ties are especially relevant in various social projects, when we need to get friends of friends, in the tasks of finding routes, etc. Graph databases are designed to solve these problems when our data can be separated from one another in two or more relationships. They are solved very elegantly when we model the data as “graph vertices”, and connections as graph edges between these nodes. We can do a graph traversal using well-known and efficient algorithms.
The above example can be easily modeled as follows: each actor and film are nodes, and the roles are the relationships that go from the actor to the movie where they played:
Now it becomes very easy to find a path from Kevin Bacon to any other actor.
Some code
First, we need to establish a connection to the database. Since Neo4jPHP works with the database server through the REST interface, there is no permanent connection, and data transfer occurs only when we need to read or write data:
use Everyman\Neo4j\Client, Everyman\Neo4j\Transport, Everyman\Neo4j\Node, Everyman\Neo4j\Relationship; $client = new Client(new Transport('localhost', 7474));
Now we need to create nodes for each actor and film. This is similar to how we do INSERT in traditional relational database management systems:
$keanu = new Node($client); $keanu->setProperty('name', 'Keanu Reeves')->save(); $laurence = new Node($client); $laurence->setProperty('name', 'Laurence Fishburne')->save(); $jennifer = new Node($client); $jennifer->setProperty('name', 'Jennifer Connelly')->save(); $kevin = new Node($client); $kevin->setProperty('name', 'Kevin Bacon')->save(); $matrix = new Node($client); $matrix->setProperty('title', 'The Matrix')->save(); $higherLearning = new Node($client); $higherLearning->setProperty('title', 'Higher Learning')->save(); $mysticRiver = new Node($client); $mysticRiver->setProperty('title', 'Mystic River')->save();
Each node has
setProperty and
getProperty methods that allow you to write arbitrary data to the node to read it. The node does not have a specified structure, it looks like documents in document-oriented DBMS, although we cannot make nested data and the property can only be of one of two types: a string or a number.
Data is sent to the server only when we call save () and this needs to be done for each node.
Now we have to define the connections between the actors and the films they played. In relational DBMS for this purpose we would create a foreign key, here we will create a relation that can be arbitrarily called to store any parameters in us, like a node, and it is also stored in the database:
$keanu->relateTo($matrix, 'IN')->save(); $laurence->relateTo($matrix, 'IN')->save(); $laurence->relateTo($higherLearning, 'IN')->save(); $jennifer->relateTo($higherLearning, 'IN')->save(); $laurence->relateTo($mysticRiver, 'IN')->save(); $kevin->relateTo($mysticRiver, 'IN')->save();
As you can see, all relations are called “IN”, but we can give them any other name, for example “ACTED IN”. We can also set the inverse relationship from films to actors and formulate it as a film “HAS” (has) an actor. Paths can be found no matter what direction of communication we will create, i.e. we can use any semantics suitable for the specific subject area. At the same time, there can be multiple relations between the nodes in both directions.
All relationships are tuned, and now we are ready to find a connection between any actor in our system and Kevin Bacon to any given depth:
$path = $keanu->findPathsTo($kevin) ->setMaxDepth(12) ->getSinglePath(); foreach ($path as $i => $node) { if ($i % 2 == 0) { echo $node->getProperty('name'); if ($i+1 != count($path)) { echo " was in\n"; } } else { echo "\t" . $node->getProperty('title') . " with\n"; } }
We can also choose not the nodes themselves, but the connections between them, for example:
echo $laurence->getProperty('name') . " was in:\n"; $relationships = $laurence->getRelationships('IN'); foreach ($relationships as $relationship) { $movie = $relationship->getEndNode(); echo "\t" . $movie->getProperty('title') . "\n"; }
getRelationships - can return all relations for a node; it is not necessary to limit it to only a specific type of relationship. We can also receive only all incoming or outgoing communications.
At this point, I’ll finish this post, and I hope it will give some resonance to writing articles on the subject of graph databases and neo4j in particular.
Comments
To leave a comment
Neo4j - graph database
Terms: Neo4j - graph database