For non integral values you should use percentile_approx
UDF:
import org.apache.spark.mllib.random.RandomRDDs
val df = RandomRDDs.normalRDD(sc, 1000, 10, 1).map(Tuple1(_)).toDF("x")
df.registerTempTable("df")
sqlContext.sql("SELECT percentile_approx(x, 0.5) FROM df").show
// +--------------------+
// | _c0|
// +--------------------+
// |0.035379710486199915|
// +--------------------+
On a side not you should use GROUP BY
not PARTITION BY
. Latter one is used for window functions and has different effect than you expect.
SELECT source, percentile_approx(value, 0.5) FROM df GROUP BY source
See also How to find median using Spark