gshell

Version 1

    Introduction

    This is a REPL (Read, Execute, Print, Loop) program (shell) to execute Janus Graph DB queries. One can type (create a script file) and run gremlin queries on the graph DB using this.

    Usecase / Scenario

    • Debugging/Exploring object, metagraph for scenarios such as:
      • App config corruptions
      • Vertex corruptions
      • Accessing the data even when LDM is not up (Say LDM is not up due to this corruption you are debugging)

    Features

    • Can execute gremlin java API queries.
    • Can run any java code

    Usage / Example

    1. Download the source from (probably shipped with the product as well)
      1. https://informatica-my.sharepoint.com/:f:/p/gmurthy/ElcbsSa6NBFNiEvhMpo7FLMBw1D_bo660x-rqEaOyrHHnQ?e=Ed7wko
    2. Set INFA_HOME
    3. Run the command
      bash./gshell.sh <zookeeper_url> <SCN> <graph_name># Graph name can be (This is for 10.2.2, Graph names depend on the version)OBJECT_GRAPHMETA_GRAPHPARAM_GRAPH
    4. A global variable `G` of type `GraphTraversalSource` will be available for usage in the shell.

    Sample Run Output

    1. Connecting to META_GRAPH and listing all the vertices and listing properties for a particular vertex.

       

      # bash gshell.sh inhtw03.informatica.com:2181infa1022hf1 META_GRAPH19/12/1713:37:56WARN util.NativeCodeLoader: Unable to load native-hadoop library foryour platform... using builtin-java classes where applicablegshell> G.V().toList()executing [G.V().toList()]compiling gexec_tempClass_1gshell> loading class: gexec_tempClass_119/12/1713:38:10WARN transaction.StandardJanusGraphTx: Query requires iterating over all vertices [()]. For better performance, use indexes[v[11784], v[11296], v[10288], v[13368], v[13888], v[13896], v[14408], v[8272], v[14968], v[13440], v[81929856], v[10888], v[9872], v[12440], v[9376], v[10400], v[13992], v[9392], v[14008], v[9416], v[8400], v[14032], v[15088], v[12536], v[13056], v[15632], v[13592], v[8992], v[12064], v[14144], v[13664], v[10088], v[11624], v[16232], v[18280], v[75624], v[14192], v[81932144], v[81929600], v[15752], v[11664], v[15264], v[10160], v[13232], v[81932720], v[10176], v[16320], v[9672], v[13776], v[13272], v[15320], v[11232], v[12768], v[19424], v[10216], v[11240], v[14824], v[19432], v[27624], v[35816], v[44008], v[52200], v[60392], v[68584], v[76776], v[84968], v[93160], v[101352]]gshell> G.V(93160).valueMap().toList()executing [G.V(93160).valueMap().toList()]compiling gexec_tempClass_2gshell> loading class: gexec_tempClass_2[{_objId=[KeyValuePair.viewmodel.idxAttrs\com.infa.ldm.isp.datasetUsedByBase], _resourceNameKey=[2], _metaJson=[{"namespace":"viewmodel.idxAttrs","uuid":"viewmodel.idxAttrs\\com.infa.ldm.isp.datasetUsedByBase","parameters":{"com.infa.ldm.isp.datasetUsedByBase":"\u003c?xml version\u003d\"1.0\" encoding\u003d\"UTF-8\" standalone\u003d\"yes\"?\u003e\u003cns6:indexAttribute search\u003d\"true\" facet\u003d\"true\" sort\u003d\"false\" index\u003d\"false\" category\u003d\"all\" system\u003d\"false\" suggested\u003d\"false\" stored\u003d\"true\" boosted\u003d\"false\" xmlns\u003d\"http://www.example.org/base\" xmlns:ns6\u003d\"http://www.example.org/includeattr\" xmlns:ns5\u003d\"http://www.example.org/excludeclass\" xmlns:ns2\u003d\"http://www.example.org/viewmodel\" xmlns:ns4\u003d\"http://www.example.org/association_config\" xmlns:ns3\u003d\"http://www.example.org/view_configuration\"\u003ecom.infa.ldm.isp.datasetUsedByBase\u003c/ns6:indexAttribute\u003e"},"vertexLabel":"_VKeyValuePair"}]}]

       

    Files (Runnable & Source Code)

    • The tools are distributed as source
      1. gexec.java  // Graph query executor
      2. repl.sh       // the query compiler
      3. gshell.sh   // the manager

    System Requirement

    • Informatica Catalog Service files

    More Info (How does this work)

    1. The tool is built on a simple and bruteforce idea
    2. There are three components in the tools
      1. repl.sh  (Be careful before writing anything to stdout from this script)
        1. This script reads graph query from stdin.
        2. Creates a java file with unique name (I am using a counter to generate unique names which starts with 0)
        3. Write the implementation of the interface `gexec.GraphQuery` to this file. It just wraps the given query with some static content (Static content is in the same script).
        4. Compiles the class
        5. Writes the classname to the stdout.
      2. gexec (java process)
        1. This process reads the classnames from the stdin (Reads them line by line)
        2. Loads the class
        3. Creates an objects of that class
        4. Calls the method GraphQuery.execute for that object.
        5. Writes the toString() of the returned value of the function to stdout.
      3. gshell.sh
        1. This script has responsibility of setting up the runtime environment for other two components.
        2. It sets CLASSPATH, does some cleanup etc.
        3. It also creates a properties file with name application.properties which will be used by gexec to create connection to the GraphDB (Hbase basically)
        4. It starts the repl.sh and pipes the stdout of this to the stdin of the gexec process.
    3. GraphQuery interface

       

      publicstaticinterfaceGraphQuery {/* Note that G is available for the traversal. */publicObject execute(GraphTraversalSource G);}