_corrupt_record error when reading a JSON file into Spark

If you want to leave your JSON file as it is (without stripping new lines characters \n), include multiLine=True keyword argument

sc = SparkContext() 
sqlc = SQLContext(sc)

df = sqlc.read.json('my_file.json', multiLine=True)

print df.show()

More Related Contents:

How to add a constant column in a Spark DataFrame?
How to extract multiple JSON objects from one file?
Convert Pandas Dataframe to nested JSON
Spark Dataframe distinguish columns with duplicated name
Pyspark: Split multiple array columns into rows
How do I add a new column to a Spark DataFrame (using PySpark)?
How to change a dataframe column from String type to Double type in PySpark?
Count number of non-NaN entries in each column of Spark dataframe with Pyspark
Python – How to convert JSON File to Dataframe
Retrieve top n in each group of a DataFrame in pyspark
How to flatten a pandas dataframe with some columns as json?
Rename nested field in spark dataframe
Filtering a Pyspark DataFrame with SQL-like IN clause
Updating a dataframe column in spark
Dividing complex rows of dataframe to simple rows in Pyspark
Filter Pyspark dataframe column with None value
PySpark converting a column of type ‘map’ to multiple columns in a dataframe
How to explode multiple columns of a dataframe in pyspark
Spark add new column to dataframe with value from previous row
PySpark: multiple conditions in when clause
Pyspark: Replacing value in a column by searching a dictionary
Count number of non-NaN entries in each column of Spark dataframe in PySpark
Pivot String column on Pyspark Dataframe
Pyspark: Parse a column of json strings
FastAPI is very slow in returning a large amount of JSON data
Calculating the cosine similarity between all the rows of a dataframe in pyspark
How to make JSON flattening memory efficient?
Filtering DataFrame using the length of a column
Multiple condition filter on dataframe
How to return a “Tuple type” in a UDF in PySpark?

More Related Contents:

Leave a Comment Cancel reply