Thursday, December 10, 2020

neo4J Loading XLS using Apache

Neo4J uses Apache POI library

The installation of the neo4j desktop was done using installer and attempting to perform the below command was giving the error below 

There is no procedure with the name `apoc.load.xls` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.

CALL apoc.load.xls('file:////Users/rekumarm/Downloads/Sheet.xlsx','Full',{mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})

However, looking around this error is due to the fact that apache plugin was not installed. 

There is a poorly documented difference between installing neo4j using an installer (which is most common) versus from a zip or tar file.


(A) When you install neo4j using an installer, the installer configures the neo4j server to look for the plugins folder in the database folder (i.e., under the default.graphdb folder).


(B) When you install neo4j from a zip or tar file, the neo4j server will look for the plugins folder in the neo4j installation folder (for example, the plugins folder should be a sibling of the neo4j bin folder).


It sounds like (A) applies to me.



These below jar files are to be placed inside the library folder. By default the neo4j site specifies these are HTTP version of the file and hence it gives a 501 error when directly try to link to it


https://repo1.maven.org/maven2/org/apache/poi/poi/3.17/poi-3.17.jar

https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.1/commons-collections4-4.1.jar

https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-schemas/3.17/poi-ooxml-schemas-3.17.jar

https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/2.6.0/xmlbeans-2.6.0.jar

https://repo1.maven.org/maven2/com/github/virtuald/curvesapi/1.04/curvesapi-1.04.jar


Now, placed all these in the plugins folder and restarted the server, and attempted to call the below command to load data from a spreadsheet, 

This was a simple spreadsheet. However, this was not getting successful 


CALL apoc.load.xls('file:////Users/rekumarm/Desktop/neo_test_book1.xlsx','Full',{mapping:{Integer:{type:'int'}, Array:{type:'int',array:true,arraySep:';'}}})


Neo.ClientError.Procedure.ProcedureNotFound

There is no procedure with the name `apoc.load.xls` registered for this database instance. Please ensure you've spelled the procedure name correctly and that the procedure is properly deployed.


Looking around, below two were giving some suggestion to get around this. 


But if added the jar file the apoc related, it was not starting the server itself, it was giving the below issue 


2020-12-10 15:13:23.842+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address.


Even appal.help was not working. But then then looked at this page https://neo4j.com/labs/apoc/4.1/installation/ and found the apoc is available in the labs directory and placing it in plugins folder, it was helpful to resolve the error with apoc.help. the command is now giving all the available commands with apoc. But still XL loading procedure is not found


CALL dbms.procedures()



References:

https://neo4j.com/labs/apoc/4.1/import/xls/

https://stackoverflow.com/questions/42286508/apoc-is-only-partially-installing-its-extension-in-neo4j-one-procedure




"apoc.load.json"

"apoc.load.jsonArray"

"apoc.load.xml"


So from this reading it looked like there are two pieces. APOC core and APOC full. The labs contained only core. So from below link could download the full 


https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/tag/4.2.0.0


Also, in this doc, it mentions about the configuration to be made. 

https://community.neo4j.com/t/how-can-i-install-apoc-library-for-neo4j-version-3-4-6-edition-community/1495


But still, it was coming as error.  


020-12-11 01:26:50.300+0000 ERROR Failed to start Neo4j on dbms.connector.http.listen_address, a socket address. If missing port or hostname it is acquired from dbms.default_listen_address.

java.lang.RuntimeException: Error starting Neo4j database server at /Users/rekumarm/Documents/RR/projects/KenNetworkVisualization/tools/CommunityServer/neo4j-community-4.2.1/data/databases

at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.startDatabaseServer(DatabaseManagementServiceFactory.java:207) ~[neo4j-4.2.1.jar:4.2.1]



Then looking to the error bit more, found the exception trace saying aooc,collection was already enabled. This was because both core and full was present in the plugins folder and removing the core actually helped to start the server successfully i.e. only full is required, to note the same apoc version as dbms version was the one I used. Which was 4.2.0.0 


Now, the excel load procedure complaint has gone, but results in the below error


Failed to invoke procedure `apoc.load.xls`: Caused by: java.lang.RuntimeException: Import from files not enabled, please set apoc.import.file.enabled=true in your apoc.conf


This below link helped to work around this. 


https://community.neo4j.com/t/setting-apoc-import-file-enabled-true-in-your-neo4j-conf/4293


Essentially, these below notes


The apoc.conf file must be created manually (next to neo4j.conf file) because it's not necessary for other procedures.


In this file, you must write:


apoc.import.file.enabled=true

apoc.import.file.use_neo4j_config=true

After, you must restart your database in order to apply changes :slight_smile:



Now left with the one error of unable to load the file because trying from browser, the file path 


So it appears that the neo4j by default looks into the import folder. So that solves the problem! Now Able get past all of the issues related to connectivities 


References:

https://stackoverflow.com/questions/43893457/understanding-inplace-true


No comments:

Post a Comment