Thursday, December 10, 2020

OpenPyXL colouring tabs

Below is how to color the tabs 

from xlsxwriter.workbook import Workbook

workbook = Workbook('tab_colors.xlsx')


# Set up some worksheets.

worksheet1 = workbook.add_worksheet()

worksheet2 = workbook.add_worksheet()

worksheet3 = workbook.add_worksheet()

worksheet4 = workbook.add_worksheet()


# Set tab colours

worksheet1.set_tab_color('red')

worksheet2.set_tab_color('green')

worksheet3.set_tab_color('#FF9900')  # Orange


# worksheet4 will have the default colour.

workbook.close()


References:

https://stackoverflow.com/questions/15667750/coloring-a-tab-in-openpyxl


neo4J Loading XLS using Apache

Neo4J uses Apache POI library

The installation of the neo4j desktop was done using installer and attempting to perform the below command was giving the error below 

There is no procedure with the name `apoc.load.xls` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

CALL apoc.load.xls('file:////Users/rekumarm/Downloads/Sheet.xlsx','Full',{mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})

However, looking around this error is due to the fact that apache plugin was not installed. 

There is a poorly documented difference between installing neo4j using an installer (which is most common) versus from a zip or tar file.


(A) When you install neo4j using an installer, the installer configures the neo4j server to look for the plugins folder in the database folder (i.e., under the default.graphdb folder).


(B) When you install neo4j from a zip or tar file, the neo4j server will look for the plugins folder in the neo4j installation folder (for example, the plugins folder should be a sibling of the neo4j bin folder).


It sounds like (A) applies to me.



These below jar files are to be placed inside the library folder. By default the neo4j site specifies these are HTTP version of the file and hence it gives a 501 error when directly try to link to it


https://repo1.maven.org/maven2/org/apache/poi/poi/3.17/poi-3.17.jar

https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.1/commons-collections4-4.1.jar

https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-schemas/3.17/poi-ooxml-schemas-3.17.jar

https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/2.6.0/xmlbeans-2.6.0.jar

https://repo1.maven.org/maven2/com/github/virtuald/curvesapi/1.04/curvesapi-1.04.jar


Now, placed all these in the plugins folder and restarted the server, and attempted to call the below command to load data from a spreadsheet, 

This was a simple spreadsheet. However, this was not getting successful 


CALL apoc.load.xls('file:////Users/rekumarm/Desktop/neo_test_book1.xlsx','Full',{mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})


Neo.ClientError.Procedure.ProcedureNotFound

There is no procedure with the name `apoc.load.xls` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.


Looking around, below two were giving some suggestion to get around this. 


But if added the jar file the apoc related, it was not starting the server itself, it was giving the below issue 


2020-12-10 15:13:23.842+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address.


Even appal.help was not working. But then then looked at this page https://neo4j.com/labs/apoc/4.1/installation/ and found the apoc is available in the labs directory and placing it in plugins folder, it was helpful to resolve the error with apoc.help. the command is now giving all the available commands with apoc. But still XL loading procedure is not found


CALL dbms.procedures()



References:

https://neo4j.com/labs/apoc/4.1/import/xls/

https://stackoverflow.com/questions/42286508/apoc-is-only-partially-installing-its-extension-in-neo4j-one-procedure




"apoc.load.json"

"apoc.load.jsonArray"

"apoc.load.xml"


So from this reading it looked like there are two pieces. APOC core and APOC full. The labs contained only core. So from below link could download the full 


https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/tag/4.2.0.0


Also, in this doc, it mentions about the configuration to be made. 

https://community.neo4j.com/t/how-can-i-install-apoc-library-for-neo4j-version-3-4-6-edition-community/1495


But still, it was coming as error.  


020-12-11 01:26:50.300+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address.

java.lang.RuntimeException: Error starting Neo4j database server at /Users/rekumarm/Documents/RR/projects/KenNetworkVisualization/tools/CommunityServer/neo4j-community-4.2.1/data/databases

at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:207) ~[neo4j-4.2.1.jar:4.2.1]



Then looking to the error bit more, found the exception trace saying aooc,collection was already enabled. This was because both core and full was present in the plugins folder and removing the core actually helped to start the server successfully i.e. only full is required, to note the same apoc version as dbms version was the one I used. Which was 4.2.0.0 


Now, the excel load procedure complaint has gone, but results in the below error


Failed to invoke procedure `apoc.load.xls`: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf


This below link helped to work around this. 


https://community.neo4j.com/t/setting-apoc-import-file-enabled-true-in-your-neo4j-conf/4293


Essentially, these below notes


The apoc.conf file must be created manually (next to neo4j.conf file) because it's not necessary for other procedures.


In this file, you must write:


apoc.import.file.enabled=true

apoc.import.file.use_neo4j_config=true

After, you must restart your database in order to apply changes :slight_smile:



Now left with the one error of unable to load the file because trying from browser, the file path 


So it appears that the neo4j by default looks into the import folder. So that solves the problem! Now Able get past all of the issues related to connectivities 


References:

https://stackoverflow.com/questions/43893457/understanding-inplace-true


Neo4J installing community server

It was as simple as this


Open up your terminal/shell.

Extract the contents of the archive, using:

tar -xf <filecode>.

For example,

tar -xf neo4j-community-4.2.1-unix.tar.gz

Place the extracted files in a permanent home on your server. The top level directory is referred to as NEO4J_HOME.

To run Neo4j as a console application, use:

<NEO4J_HOME>/bin/neo4j console

To run Neo4j in a background process, use:

<NEO4J_HOME>/bin/neo4j start

For additional commands see the Unix tarball installation documentation.

Visit http://localhost:7474 in your web browser.

Connect using the username 'neo4j' with default password 'neo4j'. You'll then be prompted to change the password.



Only thing to note is that if the DB is already running this does not work very well as the database provided by the community server 

Does not start OK.


References:

https://neo4j.com/download-thanks/?edition=community&release=4.2.1&flavour=unix&_gl=1*1gycz6j*_ga*MTE2NzMwNjc4Ny4xNjAzMzY1Njg1*_ga_DL38Q8KGQC*MTYwNzU5MzIzOS4yMy4xLjE2MDc1OTMyNTUuMA..&_ga=2.189343721.31438437.1607496289-1167306787.1603365685

What is Cypher Shell?

Cypher Shell is a command-line tool that comes with the Neo4j installation. It can also be downloaded from Neo4j Download Center and installed separately.


Cypher Shell CLI is used to run queries and perform administrative tasks against a Neo4j instance. By default, the shell is interactive, but you can also use it for scripting, by passing cypher directly on the command line or by piping a file with cypher statements (requires PowerShell on Windows). It communicates via the Bolt protocol.



Below is the syntax


cypher-shell [-u USERNAME, --username USERNAME]

              [cypher]

              [-h, --help]

              [--fail-fast]

              [--fail-at-end]

              [--format]

              [--debug]

              [--non-interactive]

              [-v, --version]

              [-a ADDRESS, --address ADDRESS]

              [-p PASSWORD, --password PASSWORD]

              [--encryption]

              [-d DATABASE, --database DATABASE]

              [--P PARAM, --param PARAM]

              [--sample-rows SAMPLE-ROWS]

              [--wrap]

              [--driver-version]

              [-f FILE, --file FILE]


 

$neo4j-home> bin/cypher-shell -u neo4j -p <password>



References:

https://neo4j.com/docs/operations-manual/current/tools/cypher-shell/#:~:text=Cypher%20Shell%20is%20a%20command,comes%20with%20the%20Neo4j%20installation.&text=Cypher%20Shell%20CLI%20is%20used,tasks%20against%20a%20Neo4j%20instance.


Wednesday, December 9, 2020

Importing CSV to neo4J

CSV is a file of comma-separated values, often viewed in Excel or some other spreadsheet tool. There can be other types of values as the delimiter, but the most standard is the comma. Many systems and processes today already convert their data into CSV format for file outputs to other systems, human-friendly reports, and other needs. It is a standard file format that humans and systems are already familiar with using and handling.


Ways to Import CSV Files



LOAD CSV Cypher command: this command is a great starting point and handles small- to medium-sized data sets (up to 10 million records).


neo4j-admin bulk import tool: command line tool useful for straightforward loading of large data sets.


Kettle import tool: maps and executes steps for the data process flow and works well for very large data sets, especially if developers are already familiar with using this tool.





LOAD CSV command with Cypher


Supports loading / ingesting CSV data from a URI


Directly maps input data into complex graph/domain structure


Handles data conversion


Supports complex computations


Creates or merges entities, relationships, and structure




LOAD CSV can handle local and remote files, and there is some syntax associated with each. This can be an easy thing to miss and end up with an access error, so we will try to clarify the rules here.



Local files are referenced with a file:/// prefix before the file name. Neo4j security has a default setting that local files can only be read from the Neo4j import directory, which is different based on your operating system.


recommend putting files in Neo4j’s import directory, as it keeps the environment secure


//Example 1 - file directly placed in import directory (import/data.csv)

LOAD CSV FROM "file:///data.csv"


//Example 2 - file placed in subdirectory within import directory (import/northwind/customers.csv)

LOAD CSV FROM "file:///northwind/customers.csv"




mportant Tips for LOAD CSV

There are a few things to keep in mind with LOAD CSV and a few helpful tips for handling the variety of data scenarios you are likely to encounter.


Newer versions of Neo4j will most likely be faster due to continued optimization.


All data from the CSV file is read as a string, so you need to use toInteger(), toFloat(), split() or similar functions to convert values.


Check your Cypher import statement for typos. Labels, property names, relationship-types, and variables are case-sensitive.


The cleaner the data, the easier the load. Try to handle complex cleanup/manipulation before load.




References:

https://neo4j.com/developer/guide-import-csv/


Tuesday, December 8, 2020

Importing Data from Spreadsheet to Neo4J

From this article, the sample sheet is here https://docs.google.com/spreadsheets/d/1ggI2s-ttysSxJ1PymNz3GESE5vkbRIQI1MSsFVbe-3U/edit#gid=1 

This sheet is composed of two parts:

Columns A, B and C: These contain the data for the Nodes of our graph, using an “id”, a “name”, and a “type”

Columns F, G and H: These contain the data for the Relationships of our graph, having a “from-id” (where the relationship starts), a “to-id” (where the relationship ends), and a “relationship type”. Columns F and G reference the nodes and their id’s in column A.


For this we use very simple statements that leverage the columns mentioned above, the Cypher syntax and string concatenation. Look at the columns D and I:


Cypher statements to create the nodes:


create n={id:'"&A2&"', name:'"&B2&"', type:'"&C2&"'};


output for row 2:


create n={id:'1', name:'Amada Emory', type:'Female'};


Copy/paste the Cypher statements from the Import Sheet into a text file.

Wrap these with a Neo4j transaction (begin, commit) – so that all of the statements get persisted to disk in the same transaction (or not in case of an error). (This is not important for smaller datasets, but is much more important for larger datasets.)

Some instructions on how to enable auto-indexing on Neo4j. This is important, because as you insert data into the database, it needs to get indexed for setting up the relationships properly (see above) and for future use.

And some instructions on how you can pipe the text file into the Neo4j shell – if necessary. For small datasets (and therefore, a limited number of Cypher statements) you can do with copy/pasting the text file into the Web-UI console – but that might not always work.

Starting the server and browsing the Web-UI


Thats it mainly. This is quite straight forward. 



References:

https://neo4j.com/blog/importing-data-into-neo4j-the-spreadsheet-way/


Friday, December 4, 2020

What are various symbols in Excel

= This is an equals sign and is used at the beginning of a formula

+ This is an addition sign and is used in sums and formulas

- This is a subtraction sign and is used in sums and formulas

/ This is a division sign and is used in sums and formulas

* This is a multiplication sign and is used in sums and formulas

( ) These are rounded brackets and are used to group together smaller sums in more complex formulas

: This is a colon and is used in a formula to create a range of cells (e.g. A2:B4)

, This is a comma and is used for separating cell references in formulas (often for non-adjacent cells)

$ This is a dollar sign and is used when creating absolute references

% This is a percentage sign and is used when dealing with figures in percentages

[ ] These are square brackets and are used for identifying a workbook which is being linked in to a formula e.g. =[Book1]Sheet1!$A$1

! This is an exclamation mark and is used for identifying a worksheet which is being linked in to a formula e.g. =Sheet2!A1 


references

https://www.jplcomputer.co.uk/excel-symbols/