Spark SQL – load data with JDBC using SQL statement, not table name

You should pass a valid subquery as a dbtable argument. For example in Scala:

val query = """(SELECT TOP 1000 
  -- and the rest of your query
  -- ...
) AS tmp  -- alias is mandatory*"""   

val url: String = ??? 

val jdbcDF = sqlContext.read.format("jdbc")
  .options(Map("url" -> url, "dbtable" -> query))
  .load()

* Hive Language Manual SubQueries: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

Leave a Comment