Difference between HBase and Hadoop/HDFS

Hadoop is basically 3 things, a FS (Hadoop Distributed File System), a computation framework (MapReduce) and a management bridge (Yet Another Resource Negotiator). HDFS allows you store huge amounts of data in a distributed (provides faster read/write access) and redundant (provides better availability) manner. And MapReduce allows you to process this huge data in a … Read more

hadoop java.net.URISyntaxException: Relative path in absolute URI: rsrc:hbase-common-0.98.1-hadoop2.jar

The exception is a bit misleading; there’s no real relative path being parsed, the issue here is that Hadoop “Path” doesn’t support ‘:’ in filenames. In your case, “rsrc:hbase-common-0.98.1-hadoop2.jar” is being interpreted as “rsrc” being the “scheme”, whereas I suspect you really intended to add the resource file:///path/to/your/jarfile/rsrc:hbase-common-0.98.1-hadoop2.jar”. Here’s an old JIRA discussing the illegal … Read more

How to read from hbase using spark

A Basic Example to Read the HBase data using Spark (Scala), You can also wrtie this in Java : import org.apache.hadoop.hbase.client.{HBaseAdmin, Result} import org.apache.hadoop.hbase.{ HBaseConfiguration, HTableDescriptor } import org.apache.hadoop.hbase.mapreduce.TableInputFormat import org.apache.hadoop.hbase.io.ImmutableBytesWritable import org.apache.spark._ object HBaseRead { def main(args: Array[String]) { val sparkConf = new SparkConf().setAppName(“HBaseRead”).setMaster(“local[2]”) val sc = new SparkContext(sparkConf) val conf = HBaseConfiguration.create() val … Read more

HBase Kerberos connection renewal strategy

A Kerberos TGT has a lifetime (e.g. 12h) and a renewable lifetime (e.g. 7 days). As long as the ticket is still valid and is still renewable, you can request a “free” renewal — no password required –, and the lifetime counter is reset (e.g. 12h to go, again). The Hadoop authentication library spawns a … Read more