Recently, we were working on a group project in my Data Science Primer Course. A dataset scraped from iFlix.com is given in an Excel spreadsheet, tabbed with the category name in each sheet. My teammate managed to transform all these sheets into a single master dataset and export as .csv file (his article). I copied his first few sentences to start off my post… thanks bro! My task is to import the .csv file into Neo4j and visualize it using Neo4j to help data story telling.
Step 1. Create a iFlix Neo4j database
We need to work on a clean database. So, I created a new folder call “iflix.graphdb” and selected as my “Database Location” on the Neo4j desktop app dialog. Click the “Start” button to start the Neo4j engine then click on the localhost link to launch the Neo4j browser.
Step 2. Create index
CREATE INDEX ON :Category(name)
CREATE INDEX ON :Title(name)
CREATE INDEX ON :Type(name)
Step 3. Import the CSV file into Neo4j
There are 3 columns in the CSV file: title, category and type. In order to get the correct result, I will need to apply some fundamental relational database design concept where normalization is needed.
I created a new CSV file which contains only title and category.
Load the CSV file into Neo4j:-
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///D:/cheet/Downloads/iflix_category.csv" AS csvline WITH csvline WHERE csvline.title IS NOT NULL AND csvline.category IS NOT NULL MERGE (t:Title {name: csvline.title}) MERGE (c:Category {name: csvline.category}) CREATE (t)-[:CATEGORIZED_AS]->(c)
This will basically:-
- Create Title nodes with the property name label as t
- Create Category nodes with the property name label as c
- Create the CATEGORIZED_AS relationship to link Title nodes to Category nodes
* The database will only create unique nodes due to the MERGE command
Then, I did a pivot on the original CSV file to get unique values of category and type.
Load the CSV file into Neo4j:-
USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM "file:///D:/cheet/Downloads/iflix_type.csv" AS csvline WITH csvline WHERE csvline.type IS NOT NULL AND csvline.category IS NOT NULL MERGE (p:Type {name: csvline.type}) MERGE (c:Category {name: csvline.category}) CREATE (c)-[:UNDER]->(p)
This will basically:-
- Create Type nodes with the property name label as p
- Create Category nodes with the property name label as c
- Create the UNDER relationship to link Category nodes to Type nodes
Step 4. Result(s)
Run the following to get some result
MATCH (n) RETURN n
Due to the Neo4j browser limitation, I can only make use of my Chrome zooming function to get a better view of the data.
Neo4j is so powerful!
The final step is to run the following to get the categories for TV, Movie & Studio
MATCH (c:Category)-[:UNDER]->(p:Type) RETURN p, c
From the graph, we can see clearly which are the common categories between TV and Movie, which category can be expanded into TV, etc…
I have uploaded the files to https://app.box.com/s/vwp3ixlcyknbkzsikbtedti7zzadtgj2. You may try it out 🙂
Anything to add? Do share with me in the comments 🙂