Go to file

LEEXYZABC eb5b4d0aa2 ....		2017-02-28 19:03:30 +08:00
conf	....	2017-02-28 19:03:30 +08:00
project	rdd support and scalaTests verification	2017-02-28 18:47:58 +08:00
src	rdd support and scalaTests verification	2017-02-28 18:47:58 +08:00
test_input	rdd support and scalaTests verification	2017-02-28 18:54:41 +08:00
.DS_Store	API updated	2017-02-15 09:05:22 +08:00
LICENSE	Initial commit	2016-11-14 11:21:10 +08:00
README.md	rdd support and scalaTests verification	2017-02-28 18:47:58 +08:00
build.sbt	rdd support and scalaTests verification	2017-02-28 18:47:58 +08:00

README.md

simba

insert, extraction and analysis framework for LDM

#Notice 1: scala version should be compatible for the system and the Spark

spark 1.3.1
scala 2.10.4
hadoop 1.2.1
titan 1.0.0

#Notice 2: assume lib in simba home contains following libs hadoop-client-1.2.1.jar
hadoop-gremlin-3.0.1-incubating.jar
hbase-common-0.98.2-hadoop1.jar
htrace-core-2.04.jar hadoop-core-1.2.1.jar
hbase-client-0.98.2-hadoop1.jar
hbase-protocol-0.98.2-hadoop1.jar or you need to include these libs through modifying the build.sbt

#Notice 3: (for titan)

conf contains "conf/titan-hbase-es-simba.properties" configuration file for TitanDB(hbase+es in default)
test_input contains the docs and links data and can be accessed as val docRDD = sc.objectFileDocument val linkRDD = sc.objectFileDocumentLink

compile####

sbt clean compile

run

sbt run

test

sbt test

#Simple Example: var gDB = TitanSimbaDB(sc, titanConf) val docRDD = sc.objectFileDocument gDB.insert(docRDD) gDB.docs().foreach(s => s.simbaPrint()) gDB.close()