What is the fastest and most efficient way to append rows to a DataFrame?

As Mohit Motwani suggested fastest way is to collect data into dictionary then load all into data frame. Below some speed measurements examples: import pandas as pd import numpy as np import time import random end_value = 10000 Measurement for creating a list of dictionaries and at the end load all into data frame start_time … Read more

Finding the intersection between two series in Pandas

Place both series in Python’s set container then use the set intersection method: s1.intersection(s2) and then transform back to list if needed. Just noticed pandas in the tag. Can translate back to that: pd.Series(list(set(s1).intersection(set(s2)))) From comments I have changed this to a more Pythonic expression, which is shorter and easier to read: Series(list(set(s1) & set(s2))) … Read more

Convert pandas data frame to series

You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame). df = pd.DataFrame([list(range(5))], columns=[“a{}”.format(i) for i in range(5)]) >>> df.squeeze(axis=0) a0 0 a1 1 a2 2 a3 3 a4 4 Name: 0, dtype: int64 Note: To accommodate the point raised by … Read more

Remove name, dtype from pandas output of dataframe or series

DataFrame/Series.to_string These methods have a variety of arguments that allow you configure what, and how, information is displayed when you print. By default Series.to_string has name=False and dtype=False, so we additionally specify index=False: s = pd.Series([‘race’, ‘gender’], index=[311, 317]) print(s.to_string(index=False)) # race # gender If the Index is important the default is index=True: print(s.to_string()) #311 … Read more

Pandas pd.Series.isin performance with set versus array

This might not be obvious, but pd.Series.isin uses O(1)-look up per element. After an analysis, which proves the above statement, we will use its insights to create a Cython-prototype which can easily beat the fastest out-of-the-box-solution. Let’s assume that the “set” has n elements and the “series” has m elements. The running time is then: … Read more

Pandas: change data type of Series to String

A new answer to reflect the most current practices: as of now (v1.2.4), neither astype(‘str’) nor astype(str) work. As per the documentation, a Series can be converted to the string datatype in the following ways: df[‘id’] = df[‘id’].astype(“string”) df[‘id’] = pandas.Series(df[‘id’], dtype=”string”) df[‘id’] = pandas.Series(df[‘id’], dtype=pandas.StringDtype)