pandas - w3toppers.com

Python returns "SyntaxError: invalid syntax sys module" [closed]

import pandas as pd sys.path.insert(0, “/usr/lib/python2.7/site-packages”) This line contains two statements. Split them into two lines: import pandas as pd sys.path.insert(0, “/usr/lib/python2.7/site-packages”) Or, if they must be in one line, separate them with semicolon (highly not recomended!!!): import pandas as pd; sys.path.insert(0, “/usr/lib/python2.7/site-packages”)

Replacing values represented by 'UN' with NaN

while reading csv, use the parameter ‘na_values’. pd.read_csv(”, na_values=”UN”)

How to change column names from numbers to names [closed]

First, move the non-date columns in to the index, then use replace and reset_index: df = df.set_index(‘CashFlows’) df = df.rename(columns=lambda x: ‘year_’+str(x.year))

Calculate cumulative intraday measures that reset every day in pandas

A cumulative sum that resets is equivalent to apply it to groups : each new group will reset the cumulative sum when it starts. It is always easier to illustrate an answer with a good minimal reproducible example : df = pd.DataFrame([ [‘20191224’, ‘20191224 01:00’, 50, ‘Merry’], [‘20191224’, ‘20191224 02:30’, 50, ‘Christmas’], [‘20191225’, ‘20191225 02:00’, … Read more

how to do multiple scatter plots with matploatlib

You are a Udacity Machine Learning student? Something like this might be useful for you: import numpy as np import matplotlib.pyplot as plt labels = [‘output’,’varA’,’varB’,’varC’] data = np.array([[4,5,6,2,5], [3,6,4,6,3], [12,3,5,3,2], [4,1,1,44, 7]]) colors = [“r”, “g”, “b”, “k”] # make sure you have enough colors to match # the number of variables for i … Read more

how to save sql query result to csv in pandas

You can try following code: import pandas as pd df1 = pd.read_csv(“Insert file path”) df2 = pd.read_csv(“Insert file path”) df1[‘Date’] = pd.to_datetime(df1[‘Date’] ,errors=”coerce”,format=”%Y-%m-%d”) df2[‘Date’] = pd.to_datetime(df2[‘Date’] ,errors=”coerce”,format=”%Y-%m-%d”) df = df1.merge(df2,how=’inner’, on =’Date’) df.to_csv(‘data.csv’,index=False) This should solve your problem.

how to use if statement with multiple conditions on pandas dataframe

Try This: import pandas as pd data = { “YEAR” : [2016,2016,2020,2020,2021,2021,2021,2021,2021,2021,2021,2021,2021,2021], “MONTH” : [1,2,4,5,1,2,3,4,5,6,7,8,9,10] } df = pd.DataFrame(data) df.loc[(df[“YEAR”] <= 2021) & (df[“MONTH”] < 7),”TIME_TYPE”] = “History” df.loc[(df[“YEAR”] >= 2021) & (df[“MONTH”] >= 7),”TIME_TYPE”] = “Forecast” print(df) Result: YEAR MONTH TIME_TYPE 0 2016 1 History 1 2016 2 History 2 2020 4 History 3 … Read more

Python: create a list of list of lists

You can do this using two applications of itertools.groupby, one to group by ID, and one to group by date. The code below uses a triple-nested list comprehension, which is compact, but not so easy to read. I’ll post a longer version shortly. from itertools import groupby from operator import itemgetter data=””‘\ ID date product … Read more

how to calculation cost time [closed]

I think I understand what you’re asking. You just want to have a new dataframe that calculates the time difference between the three different entries for each unique order id? So, I start by creating the dataframe: data = [ [11238,3943,201805030759165986,’新建订单’,20180503075916,’2018/5/3 07:59:16′,’2018/5/3 07:59:16′], [11239,3943,201805030759165986,’新建订单’,20180503082115,’2018/5/3 08:21:15′,’2018/5/3 08:21:15′], [11240,3943,201805030759165986,’新建订单’,20180503083204,’2018/5/3 08:32:04′,’2018/5/3 08:32:04′], [11241,3941,201805030856445991,’新建订单’,20180503085644,’2018/5/3 08:56:02′,’2018/5/3 08:56:44′], [11242,3941,201805022232081084,’初审成功’,20180503085802,’2018/5/3 08:58:02′,’2018/5/3 08:58:02′], … Read more