how to understand closed and label arguments in pandas resample method?

Short answer: If you use closed=’left’ and loffset=”2T” then you’ll get what you expected: series.resample(‘3T’, label=”left”, closed=’left’, loffset=”2T”).sum() 2000-01-01 00:02:00 3 2000-01-01 00:05:00 12 2000-01-01 00:08:00 21 Long answer: (or why the results you got were correct, given the arguments you used) This may not be clear from the documentation, but open and closed in … Read more

SparkSQL on pyspark: how to generate time series?

EDIT This creates a dataframe with one row containing an array of consecutive dates: from pyspark.sql.functions import sequence, to_date, explode, col spark.sql(“SELECT sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 month) as date”) +——————————————+ | date | +——————————————+ | [“2018-01-01″,”2018-02-01″,”2018-03-01″] | +——————————————+ You can use the explode function to “pivot” this array into rows: spark.sql(“SELECT sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 … Read more

Annotate Time Series plot

Matplotlib uses an internal floating point format for dates. You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual. As a somewhat excessively fancy example: import datetime as dt import matplotlib.pyplot as plt import matplotlib.dates as mdates x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01), … Read more

Convert dataframe with start and end date to daily data

Edit: I had to revisit this problem in a project, and looks like using DataFrame.apply with pd.date_range and DataFrame.explode is almost 3x faster: df[“date”] = df.apply( lambda row: pd.date_range(row[“start_date”], row[“end_date”]), axis=1 ) df = ( df.explode(“date”, ignore_index=True) .drop(columns=[“start_date”, “end_date”]) ) Output id age state date 0 123 18 CA 2019-02-17 1 123 18 CA 2019-02-18 … Read more