How to run stored procedure on SQL server from Spark (Databricks) JDBC python?

Yes, it’s possible you just need to get access to the underlying Java classes of JDBC, something like this: # the first line is the main entry point into JDBC world driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager connection = driver_manager.getConnection(mssql_url, mssql_user, mssql_pass) connection.prepareCall(“EXEC sys.sp_tables”).execute() connection.close()

Not able to cat dbfs file in databricks community edition cluster. FileNotFoundError: [Errno 2] No such file or directory:

By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn’t know about it – that’s why it’s failing. But there is a workaround – DBFS is mounted to the nodes at /dbfs, so you just need to append it to your file name: instead of … Read more

Spark dataframe save in single file on hdfs location [duplicate]

It’s not possible using standard spark library, but you can use Hadoop API for managing filesystem – save output in temporary directory and then move file to the requested path. For example (in pyspark): df.coalesce(1) \ .write.format(“com.databricks.spark.csv”) \ .option(“header”, “true”) \ .save(“mydata.csv-temp”) from py4j.java_gateway import java_import java_import(spark._jvm, ‘org.apache.hadoop.fs.Path’) fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) file = fs.globStatus(sc._jvm.Path(‘mydata.csv-temp/part*’))[0].getPath().getName() fs.rename(sc._jvm.Path(‘mydata.csv-temp/’ … Read more