Compute the running (cumulative) maximum for a series in pandas
Use cummax df.High.cummax() 0 954 1 954 2 954 3 955 4 956 5 956 6 956 7 956 Name: High, dtype: int64 df[‘Max’] = df.High.cummax() df
Use cummax df.High.cummax() 0 954 1 954 2 954 3 955 4 956 5 956 6 956 7 956 Name: High, dtype: int64 df[‘Max’] = df.High.cummax() df
As Mohit Motwani suggested fastest way is to collect data into dictionary then load all into data frame. Below some speed measurements examples: import pandas as pd import numpy as np import time import random end_value = 10000 Measurement for creating a list of dictionaries and at the end load all into data frame start_time … Read more
Place both series in Python’s set container then use the set intersection method: s1.intersection(s2) and then transform back to list if needed. Just noticed pandas in the tag. Can translate back to that: pd.Series(list(set(s1).intersection(set(s2)))) From comments I have changed this to a more Pythonic expression, which is shorter and easier to read: Series(list(set(s1) & set(s2))) … Read more
You can transpose the single-row dataframe (which still results in a dataframe) and then squeeze the results into a series (the inverse of to_frame). df = pd.DataFrame([list(range(5))], columns=[“a{}”.format(i) for i in range(5)]) >>> df.squeeze(axis=0) a0 0 a1 1 a2 2 a3 3 a4 4 Name: 0, dtype: int64 Note: To accommodate the point raised by … Read more
DataFrame/Series.to_string These methods have a variety of arguments that allow you configure what, and how, information is displayed when you print. By default Series.to_string has name=False and dtype=False, so we additionally specify index=False: s = pd.Series([‘race’, ‘gender’], index=[311, 317]) print(s.to_string(index=False)) # race # gender If the Index is important the default is index=True: print(s.to_string()) #311 … Read more
Use shift. df[‘dA’] = df[‘A’] – df[‘A’].shift(-1)
There is no simple way to do that, because the argument that is passed to the rolling-applied function is a plain numpy array, not a pandas Series, so it doesn’t know about the index. Moreover, the rolling functions must return a float result, so they can’t directly return the index values if they’re not floats. … Read more
This might not be obvious, but pd.Series.isin uses O(1)-look up per element. After an analysis, which proves the above statement, we will use its insights to create a Cython-prototype which can easily beat the fastest out-of-the-box-solution. Let’s assume that the “set” has n elements and the “series” has m elements. The running time is then: … Read more
A new answer to reflect the most current practices: as of now (v1.2.4), neither astype(‘str’) nor astype(str) work. As per the documentation, a Series can be converted to the string datatype in the following ways: df[‘id’] = df[‘id’].astype(“string”) df[‘id’] = pandas.Series(df[‘id’], dtype=”string”) df[‘id’] = pandas.Series(df[‘id’], dtype=pandas.StringDtype)
For check values use boolean indexing: #get value where index is 1 print (col1[1]) 2 #more common with loc print (col1.loc[1]) 2 print (col1 == ‘2’) 0 False 1 True 2 False 3 False Name: col1, dtype: bool And if need get rows: print (col1[col1 == ‘2’]) 1 2 Name: col1, dtype: object For check … Read more