java.lang.NoClassDefFoundError: org/apache/spark/Logging

org.apache.spark.Logging is available in Spark version 1.5.2 or lower version. It is not in the 2.0.0. Pls change versions as follows <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_2.11</artifactId> <version>1.6.2</version> </dependency>

How to create auto increment IDs in Cassandra

How about the following, using Cassandra’s Lightweight transactions 1 – Create IDs table: CREATE TABLE ids ( id_name varchar, next_id int, PRIMARY KEY (id_name) ) 2 – Insert every id you’d like to use a global sequence with For example: INSERT INTO ids (id_name, next_id) VALUES (‘person_id’, 1) 3 – Then, when inserting to a … Read more

Is there a reason not to use SparkContext.getOrCreate when writing a spark job?

TL;DR There are many legitimate applications of the getOrCreate methods but attempt to find a loophole to perform map-side joins is not one of them. In general there is nothing deeply wrong with SparkContext.getOrCreate. The method has its applications, and although there some caveats, most notably: In its simplest form it doesn’t allow you to … Read more

Cassandra port usage – how are the ports used?

@Schildmeijer is largely right, however port 7001 is still used when using TLS Encrypted Internode communication So my complete list would be for current versions of Cassandra: 7199 – JMX (was 8080 pre Cassandra 0.8.xx) 7000 – Internode communication (not used if TLS enabled) 7001 – TLS Internode communication (used if TLS enabled) 9160 – … Read more

Inner Join in cassandra CQL

Because of its distributed nature, Cassandra has no support for RDBMS style joins. You have a few options for when you want something like a join. One option perform separate queries and then have your application join the data itself. This makes sense if the data is relatively small and you only have to perform … Read more

cassandra get all records in time range

The timeout is because Cassandra is taking longer than the timeout (default is 10 seconds) to return the data. For your query, Cassandra will attempt to fetch the entire dataset before returning. For more than a few records this can easily take longer than the timeout. For queries that are producing lots of data you … Read more

MongoDB vs. Cassandra [closed]

Lots of reads in every query, fewer regular writes Both databases perform well on reads where the hot data set fits in memory. Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documents or rows, although MongoDB’s indexes are currently more flexible. Cassandra’s storage engine provides constant-time writes no … Read more