cassandra - w3toppers.com

Understand cassandra replication factor versus consistency level

Short summary: Replication factor describes how many copies of your data exist. Consistency level describes the behavior seen by the client. Perhaps there’s a better way to categorize these. As an example, you can have a replication factor of 2. When you write, two copies will always be stored, assuming enough nodes are up. When … Read more

java.lang.NoClassDefFoundError: org/apache/spark/Logging

org.apache.spark.Logging is available in Spark version 1.5.2 or lower version. It is not in the 2.0.0. Pls change versions as follows <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.11</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.5.2</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-kafka-0-8_2.11</artifactId> <version>1.6.2</version> </dependency>

How to create auto increment IDs in Cassandra

How about the following, using Cassandra’s Lightweight transactions 1 – Create IDs table: CREATE TABLE ids ( id_name varchar, next_id int, PRIMARY KEY (id_name) ) 2 – Insert every id you’d like to use a global sequence with For example: INSERT INTO ids (id_name, next_id) VALUES (‘person_id’, 1) 3 – Then, when inserting to a … Read more

Is there a reason not to use SparkContext.getOrCreate when writing a spark job?

TL;DR There are many legitimate applications of the getOrCreate methods but attempt to find a loophole to perform map-side joins is not one of them. In general there is nothing deeply wrong with SparkContext.getOrCreate. The method has its applications, and although there some caveats, most notably: In its simplest form it doesn’t allow you to … Read more

TaskSchedulerImpl: Initial job has not accepted any resources;

I faced similar issue and after some online research and trial-n-error, I narrowed down to 3 causes for this (except for the first the other two are not even close to the error message): As indicated by the error, probably you are allocating the resources more than that is available. => This was not my … Read more

Cassandra port usage – how are the ports used?

@Schildmeijer is largely right, however port 7001 is still used when using TLS Encrypted Internode communication So my complete list would be for current versions of Cassandra: 7199 – JMX (was 8080 pre Cassandra 0.8.xx) 7000 – Internode communication (not used if TLS enabled) 7001 – TLS Internode communication (used if TLS enabled) 9160 – … Read more

Inner Join in cassandra CQL

Because of its distributed nature, Cassandra has no support for RDBMS style joins. You have a few options for when you want something like a join. One option perform separate queries and then have your application join the data itself. This makes sense if the data is relatively small and you only have to perform … Read more

cassandra get all records in time range

The timeout is because Cassandra is taking longer than the timeout (default is 10 seconds) to return the data. For your query, Cassandra will attempt to fetch the entire dataset before returning. For more than a few records this can easily take longer than the timeout. For queries that are producing lots of data you … Read more

MAX(), DISTINCT and group by in Cassandra

With Cassandra you solve these kinds of problems by doing more work when you insert your data — which sounds like it would be slow, but Cassandra is designed for fast writes, and you’re probably going to read the data many more times than you write it so it makes sense when you consider the … Read more

MongoDB vs. Cassandra [closed]

Lots of reads in every query, fewer regular writes Both databases perform well on reads where the hot data set fits in memory. Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documents or rows, although MongoDB’s indexes are currently more flexible. Cassandra’s storage engine provides constant-time writes no … Read more