If you want to leave your JSON file as it is (without stripping new lines characters \n
), include multiLine=True
keyword argument
sc = SparkContext()
sqlc = SQLContext(sc)
df = sqlc.read.json('my_file.json', multiLine=True)
print df.show()
More Related Contents:
- How to add a constant column in a Spark DataFrame?
- How to extract multiple JSON objects from one file?
- Convert Pandas Dataframe to nested JSON
- Spark Dataframe distinguish columns with duplicated name
- Pyspark: Split multiple array columns into rows
- How do I add a new column to a Spark DataFrame (using PySpark)?
- How to change a dataframe column from String type to Double type in PySpark?
- Count number of non-NaN entries in each column of Spark dataframe with Pyspark
- Python – How to convert JSON File to Dataframe
- Retrieve top n in each group of a DataFrame in pyspark
- How to flatten a pandas dataframe with some columns as json?
- Rename nested field in spark dataframe
- Filtering a Pyspark DataFrame with SQL-like IN clause
- Updating a dataframe column in spark
- Dividing complex rows of dataframe to simple rows in Pyspark
- Filter Pyspark dataframe column with None value
- PySpark converting a column of type ‘map’ to multiple columns in a dataframe
- How to explode multiple columns of a dataframe in pyspark
- Spark add new column to dataframe with value from previous row
- PySpark: multiple conditions in when clause
- Pyspark: Replacing value in a column by searching a dictionary
- Count number of non-NaN entries in each column of Spark dataframe in PySpark
- Pivot String column on Pyspark Dataframe
- Pyspark: Parse a column of json strings
- FastAPI is very slow in returning a large amount of JSON data
- Calculating the cosine similarity between all the rows of a dataframe in pyspark
- How to make JSON flattening memory efficient?
- Filtering DataFrame using the length of a column
- Multiple condition filter on dataframe
- How to return a “Tuple type” in a UDF in PySpark?