time-series
how to understand closed and label arguments in pandas resample method?
Short answer: If you use closed=’left’ and loffset=”2T” then you’ll get what you expected: series.resample(‘3T’, label=”left”, closed=’left’, loffset=”2T”).sum() 2000-01-01 00:02:00 3 2000-01-01 00:05:00 12 2000-01-01 00:08:00 21 Long answer: (or why the results you got were correct, given the arguments you used) This may not be clear from the documentation, but open and closed in … Read more
SparkSQL on pyspark: how to generate time series?
EDIT This creates a dataframe with one row containing an array of consecutive dates: from pyspark.sql.functions import sequence, to_date, explode, col spark.sql(“SELECT sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 month) as date”) +——————————————+ | date | +——————————————+ | [“2018-01-01″,”2018-02-01″,”2018-03-01″] | +——————————————+ You can use the explode function to “pivot” this array into rows: spark.sql(“SELECT sequence(to_date(‘2018-01-01’), to_date(‘2018-03-01’), interval 1 … Read more
Annotate Time Series plot
Matplotlib uses an internal floating point format for dates. You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual. As a somewhat excessively fancy example: import datetime as dt import matplotlib.pyplot as plt import matplotlib.dates as mdates x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01), … Read more
Creating graph with date and time in ticklabels with matplotlib
I hope this helps. I’ve always had a hard time with matplotlib’s dates. Matplotlib requires a float format which is days since epoch. The helper functions num2date and date2num along with python builtin datetime can be used to convert to/from. The formatting business was lifted from this example. You can change an axis on any … Read more
Python: Matplotlib avoid plotting gaps
Simply set the two values defining the line you don’t want to see as NaN (Not a Number). Matplotlib will hide the line between the two values automatically. Check out this example : http://matplotlib.org/examples/pylab_examples/nan_test.html
Convert dataframe with start and end date to daily data
Edit: I had to revisit this problem in a project, and looks like using DataFrame.apply with pd.date_range and DataFrame.explode is almost 3x faster: df[“date”] = df.apply( lambda row: pd.date_range(row[“start_date”], row[“end_date”]), axis=1 ) df = ( df.explode(“date”, ignore_index=True) .drop(columns=[“start_date”, “end_date”]) ) Output id age state date 0 123 18 CA 2019-02-17 1 123 18 CA 2019-02-18 … Read more