Go to file
LEEXYZABC eb5b4d0aa2 .... 2017-02-28 19:03:30 +08:00
conf .... 2017-02-28 19:03:30 +08:00
project rdd support and scalaTests verification 2017-02-28 18:47:58 +08:00
src rdd support and scalaTests verification 2017-02-28 18:47:58 +08:00
test_input rdd support and scalaTests verification 2017-02-28 18:54:41 +08:00
.DS_Store API updated 2017-02-15 09:05:22 +08:00
LICENSE Initial commit 2016-11-14 11:21:10 +08:00
README.md rdd support and scalaTests verification 2017-02-28 18:47:58 +08:00
build.sbt rdd support and scalaTests verification 2017-02-28 18:47:58 +08:00

README.md

simba

insert, extraction and analysis framework for LDM

#Notice 1: scala version should be compatible for the system and the Spark

  1. spark 1.3.1
  2. scala 2.10.4
  3. hadoop 1.2.1
  4. titan 1.0.0

#Notice 2: assume lib in simba home contains following libs hadoop-client-1.2.1.jar
hadoop-gremlin-3.0.1-incubating.jar
hbase-common-0.98.2-hadoop1.jar
htrace-core-2.04.jar hadoop-core-1.2.1.jar
hbase-client-0.98.2-hadoop1.jar
hbase-protocol-0.98.2-hadoop1.jar or you need to include these libs through modifying the build.sbt

#Notice 3: (for titan)

  1. conf contains "conf/titan-hbase-es-simba.properties" configuration file for TitanDB(hbase+es in default)
  2. test_input contains the docs and links data and can be accessed as val docRDD = sc.objectFileDocument val linkRDD = sc.objectFileDocumentLink

compile####

sbt clean compile

run

sbt run

test

sbt test

#Simple Example: var gDB = TitanSimbaDB(sc, titanConf) val docRDD = sc.objectFileDocument gDB.insert(docRDD) gDB.docs().foreach(s => s.simbaPrint()) gDB.close()