## Does Pandas calculate ewm wrong?

There are several ways to initialize an exponential moving average, so I wouldn’t say pandas is doing it wrong, just different. Here would be a way to calculate it like you want: In [20]: s.head() Out[20]: 0 22.27 1 22.19 2 22.08 3 22.17 4 22.18 Name: Price, dtype: float64 In [21]: span = 10 … Read more

## Grouped seaborn.barplot from a wide pandas.DataFrame

I think need melt if want use barplot: data = df.melt(‘date’, var_name=”a”, value_name=”b”) print (data) date a b 0 2017-09-05 A 25 1 2017-09-06 A 261 2 2017-09-07 A 188 3 2017-09-08 A 200 4 2017-09-09 A 292 5 2017-09-05 B 261 6 2017-09-06 B 1519 7 2017-09-07 B 1545 8 2017-09-08 B 2110 9 … Read more

## Using in operator with Pandas series [duplicate]

In the first case: Because the in operator is interpreted as a call to df[‘name’].__contains__(‘Adam’). If you look at the implementation of __contains__ in pandas.Series, you will find that it’s the following (inhereted from pandas.core.generic.NDFrame) : def __contains__(self, key): “””True if the key is in the info axis””” return key in self._info_axis so, your first … Read more

## Convert dataframe with start and end date to daily data

Edit: I had to revisit this problem in a project, and looks like using DataFrame.apply with pd.date_range and DataFrame.explode is almost 3x faster: df[“date”] = df.apply( lambda row: pd.date_range(row[“start_date”], row[“end_date”]), axis=1 ) df = ( df.explode(“date”, ignore_index=True) .drop(columns=[“start_date”, “end_date”]) ) Output id age state date 0 123 18 CA 2019-02-17 1 123 18 CA 2019-02-18 … Read more

## use Featureunion in scikit-learn to combine two pandas columns for tfidf

FeatureUnion was not meant to be used that way. It instead takes two feature extractors / vectorizers and applies them to the input. It does not take data in the constructor the way it is shown. CountVectorizer is expecting a sequence of strings. The easiest way to provide it with that is to concatenate the … Read more

## concise way of flattening multiindex columns

You can do a map join with columns out.columns = out.columns.map(‘_’.join) out Out[23]: B_mean B_std C_median A 1 0.204825 0.169408 0.926347 2 0.362184 0.404272 0.224119 3 0.533502 0.380614 0.218105 For some reason (when the column contain int) I like this way better out.columns.map(‘{0[0]}_{0[1]}’.format) Out[27]: Index([‘B_mean’, ‘B_std’, ‘C_median’], dtype=”object”)

## How can I left justify text in a pandas DataFrame column in an IPython notebook

If you’re willing to use another library, tabulate will do this – \$ pip install tabulate and then from tabulate import tabulate df = pd.DataFrame ({‘Text’: [‘abcdef’, ‘x’], ‘Value’: [12.34, 4.2]}) print(tabulate(df, showindex=False, headers=df.columns)) Text Value —— ——- abcdef 12.34 x 4.2 It has various other output formats also.

## Turn pandas dataframe into a file-like object in memory?

Python module io(docs) has necessary tools for file-like objects. import io # text buffer s_buf = io.StringIO() # saving a data frame to a buffer (same as with a regular file): df.to_csv(s_buf) Edit. (I forgot) In order to read from the buffer afterwards, its position should be set to the beginning: s_buf.seek(0) I’m not familiar … Read more

## Plot pandas dataframe containing NaNs

The reason your not seeing anything is because the default plot style is only a line. But the line gets interupted at NaN’s so only multiple consequtive values will be plotted. And the latter doesnt happen in your case. You need to change the style of plotting, which depends on what you want to see. … Read more

## How can I draw scatter trend line on matplot? Python-Pandas

I’m sorry I found the answer by myself. How to add trendline in python matplotlib dot (scatter) graphs? Python import pandas as pd import numpy as np import matplotlib.pyplot as plt csv = pd.read_csv(‘/tmp/test.csv’) data = csv[[‘fee’, ‘time’]] x = data[‘fee’] y = data[‘time’] plt.scatter(x, y) z = np.polyfit(x, y, 1) p = np.poly1d(z) plt.plot(x,p(x),”r–“) … Read more