Introduction
This is a REPL (Read, Execute, Print, Loop) program (shell) to execute Janus Graph DB queries. One can type (create a script file) and run gremlin queries on the graph DB using this.
Usecase / Scenario
- Debugging/Exploring object, metagraph for scenarios such as:
- App config corruptions
- Vertex corruptions
- Accessing the data even when LDM is not up (Say LDM is not up due to this corruption you are debugging)
Features
- Can execute gremlin java API queries.
- Can run any java code
Usage / Example
- Download the source.
- Set INFA_HOME
- Run the command
bash
.
/gshell
.sh <zookeeper_url> <SCN> <graph_name>
# Graph name can be (This is for 10.2.2, Graph names depend on the version)
OBJECT_GRAPH
META_GRAPH
PARAM_GRAPH
- A global variable `G` of type `GraphTraversalSource` will be available for usage in the shell.
Sample Run Output
- Connecting to META_GRAPH and listing all the vertices and listing properties for a particular vertex.
# bash gshell.sh inhtw03.informatica.com:
2181
infa1022hf1 META_GRAPH
19
/
12
/
17
13
:
37
:
56
WARN util.NativeCodeLoader: Unable to load
native
-hadoop library
for
your platform... using builtin-java classes where applicable
gshell> G.V().toList()
executing [G.V().toList()]
compiling gexec_tempClass_1
gshell> loading
class
: gexec_tempClass_1
19
/
12
/
17
13
:
38
:
10
WARN transaction.StandardJanusGraphTx: Query requires iterating over all vertices [()]. For better performance, use indexes
[v[
11784
], v[
11296
], v[
10288
], v[
13368
], v[
13888
], v[
13896
], v[
14408
], v[
8272
], v[
14968
], v[
13440
], v[
81929856
], v[
10888
], v[
9872
], v[
12440
], v[
9376
], v[
10400
], v[
13992
], v[
9392
], v[
14008
], v[
9416
], v[
8400
], v[
14032
], v[
15088
], v[
12536
], v[
13056
], v[
15632
], v[
13592
], v[
8992
], v[
12064
], v[
14144
], v[
13664
], v[
10088
], v[
11624
], v[
16232
], v[
18280
], v[
75624
], v[
14192
], v[
81932144
], v[
81929600
], v[
15752
], v[
11664
], v[
15264
], v[
10160
], v[
13232
], v[
81932720
], v[
10176
], v[
16320
], v[
9672
], v[
13776
], v[
13272
], v[
15320
], v[
11232
], v[
12768
], v[
19424
], v[
10216
], v[
11240
], v[
14824
], v[
19432
], v[
27624
], v[
35816
], v[
44008
], v[
52200
], v[
60392
], v[
68584
], v[
76776
], v[
84968
], v[
93160
], v[
101352
]]
gshell> G.V(
93160
).valueMap().toList()
executing [G.V(
93160
).valueMap().toList()]
compiling gexec_tempClass_2
gshell> loading
class
: gexec_tempClass_2
[{_objId=[KeyValuePair.viewmodel.idxAttrs\com.infa.ldm.isp.datasetUsedByBase], _resourceNameKey=[
2
], _metaJson=[{
"n
amespace"
:
"viewmodel.idxAttrs"
,
"uuid"
:
"viewmodel.idxAttrs\\com.infa.ldm.isp.datasetUsedByBase"
,
"parameters"
:{
"com.infa.ldm.isp.datasetUsedByBase"
:
"\u003c?xml version\u003d\"1.0\" encoding\u003d\"UTF-8\" standalone\u003d\"yes\"?\u003e\u003cns6:indexAttribute search\u003d\"true\" facet\u003d\"true\" sort\u003d\"false\" index\u003d\"false\" category\u003d\"all\" system\u003d\"false\" suggested\u003d\"false\" stored\u003d\"true\" boosted\u003d\"false\" xmlns\u003d\"http://www.example.org/base\" xmlns:ns6\u003d\"http://www.example.org/includeattr\" xmlns:ns5\u003d\"http://www.example.org/excludeclass\" xmlns:ns2\u003d\"http://www.example.org/viewmodel\" xmlns:ns4\u003d\"http://www.example.org/association_config\" xmlns:ns3\u003d\"http://www.example.org/view_configuration\"\u003ecom.infa.ldm.isp.datasetUsedByBase\u003c/ns6:indexAttribute\u003e"
},
"vertexLabel"
:
"_VKeyValuePair"
}]}]
Files (Runnable & Source Code)
- The tools are distributed as source
- gexec.java // Graph query executor
- repl.sh // the query compiler
- gshell.sh // the manager
System Requirement
- Informatica Catalog Service files
More Info (How does this work)
- The tool is built on a simple and bruteforce idea
- There are three components in the tools
- repl.sh (Be careful before writing anything to stdout from this script)
- This script reads graph query from stdin.
- Creates a java file with unique name (I am using a counter to generate unique names which starts with 0)
- Write the implementation of the interface `gexec.GraphQuery` to this file. It just wraps the given query with some static content (Static content is in the same script).
- Compiles the class
- Writes the classname to the stdout.
- gexec (java process)
- This process reads the classnames from the stdin (Reads them line by line)
- Loads the class
- Creates an objects of that class
- Calls the method GraphQuery.execute for that object.
- Writes the toString() of the returned value of the function to stdout.
- gshell.sh
- This script has responsibility of setting up the runtime environment for other two components.
- It sets CLASSPATH, does some cleanup etc.
- It also creates a properties file with name application.properties which will be used by gexec to create connection to the GraphDB (Hbase basically)
- It starts the repl.sh and pipes the stdout of this to the stdin of the gexec process.
- repl.sh (Be careful before writing anything to stdout from this script)
- GraphQuery interface
public
static
interface
GraphQuery {
/* Note that G is available for the traversal. */
public
Object execute(GraphTraversalSource G);
}
Comments