Vanilla Spark does not provide a way to persist data unless you’ve downloaded the version packaged with HDFS (although they appear to be playing with the idea in Spark 2.0). One way to store the results to a permanent table and query those results later is to use one of the various databases in the Spark Database Ecosystem. There are pros and cons to each and your use case matters. I’ll provide something close to a master list. These are segmented by:
Type of data managment, form data is stored in, connection to Spark
Database, SQL, Integrated
Database, SQL, Connector
Database, NoSQL, Connector
Database, Document, Connector
Database, Graph, Connector
Search, Document, Connector
Data grid, SQL, Connector
Data grid, NoSQL, Connector
File System, Files, Integrated
- HDFS