How to save/insert each DStream into a permanent table

Vanilla Spark does not provide a way to persist data unless you’ve downloaded the version packaged with HDFS (although they appear to be playing with the idea in Spark 2.0). One way to store the results to a permanent table and query those results later is to use one of the various databases in the Spark Database Ecosystem. There are pros and cons to each and your use case matters. I’ll provide something close to a master list. These are segmented by:

Type of data managment, form data is stored in, connection to Spark

Database, SQL, Integrated

Database, SQL, Connector

Database, NoSQL, Connector

Database, Document, Connector

Database, Graph, Connector

Search, Document, Connector

Data grid, SQL, Connector

Data grid, NoSQL, Connector

File System, Files, Integrated

  • HDFS

File System, Files, Connector

Datawarehouse, SQL, Connector

Leave a Comment