databricks
How to run stored procedure on SQL server from Spark (Databricks) JDBC python?
Yes, it’s possible you just need to get access to the underlying Java classes of JDBC, something like this: # the first line is the main entry point into JDBC world driver_manager = spark._sc._gateway.jvm.java.sql.DriverManager connection = driver_manager.getConnection(mssql_url, mssql_user, mssql_pass) connection.prepareCall(“EXEC sys.sp_tables”).execute() connection.close()
Not able to cat dbfs file in databricks community edition cluster. FileNotFoundError: [Errno 2] No such file or directory:
By default, this data is on the DBFS, and your code need to understand how to access it. Python doesn’t know about it – that’s why it’s failing. But there is a workaround – DBFS is mounted to the nodes at /dbfs, so you just need to append it to your file name: instead of … Read more
Exploding nested Struct in Spark dataframe
In my opinion the most elegant solution is to star expand a Struct using a select operator as shown below: var explodedDf2 = explodedDf.select(“department.*”,”*”) https://docs.databricks.com/spark/latest/spark-sql/complex-types.html
databricks configure using cmd and R
Steps for installing and configuring Azure Databricks CLI using cmd: Step1: Install Python, you’ll need Python version 2.7.9 and above if you’re using Python 2 or Python 3.6 and above if you’re using Python 3. Step2: Run pip install databricks-cli using the appropriate version of pip for your Python installation. If you are using Python … Read more
Spark dataframe save in single file on hdfs location [duplicate]
It’s not possible using standard spark library, but you can use Hadoop API for managing filesystem – save output in temporary directory and then move file to the requested path. For example (in pyspark): df.coalesce(1) \ .write.format(“com.databricks.spark.csv”) \ .option(“header”, “true”) \ .save(“mydata.csv-temp”) from py4j.java_gateway import java_import java_import(spark._jvm, ‘org.apache.hadoop.fs.Path’) fs = spark._jvm.org.apache.hadoop.fs.FileSystem.get(spark._jsc.hadoopConfiguration()) file = fs.globStatus(sc._jvm.Path(‘mydata.csv-temp/part*’))[0].getPath().getName() fs.rename(sc._jvm.Path(‘mydata.csv-temp/’ … Read more