Spark Dataframe :How to add a index Column : Aka Distributed Data Index

With Scala you can use:

import org.apache.spark.sql.functions._ 

df.withColumn("id",monotonicallyIncreasingId)

You can refer to this exemple and scala docs.

With Pyspark you can use:

from pyspark.sql.functions import monotonically_increasing_id 

df_index = df.select("*").withColumn("id", monotonically_increasing_id())

More Related Contents:

How to query JSON data column using Spark DataFrames?
How to define partitioning of DataFrame?
How do I detect if a Spark DataFrame has a column
Caused by: java.lang.NullPointerException at org.apache.spark.sql.Dataset
How to split a dataframe into dataframes with same column values?
Provide schema while reading csv file as a dataframe
Dropping a nested column from Spark DataFrame
Defining a UDF that accepts an Array of objects in a Spark DataFrame?
How to zip two (or more) DataFrame in Spark
Joining Spark dataframes on the key
Apache Spark how to append new column from list/array to Spark dataframe
Create new Dataframe with empty/null field values
Provide schema while reading csv file as a dataframe in Scala Spark
Renaming column names of a DataFrame in Spark Scala
Derive multiple columns from a single column in a Spark DataFrame
Replace missing values with mean – Spark Dataframe
What is going wrong with `unionAll` of Spark `DataFrame`?
Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame
How to get keys and values from MapType column in SparkSQL DataFrame
Spark DataFrames when udf functions do not accept large enough input variables
Filter spark DataFrame on string contains
Spark 2.0 Dataset vs DataFrame
SparkSQL: How to deal with null values in user defined function?
How to aggregate values into collection after groupBy?
How to read records in JSON format from Kafka using Structured Streaming?
Processing multiple files as independent RDD’s in parallel
MatchError while accessing vector column in Spark 2.0
How to save a spark DataFrame as csv on disk?
How to sort by column in descending order in Spark SQL?
How to create a Dataset of Maps?

More Related Contents:

Leave a Comment Cancel reply