How to create a table with clickable hyperlink in pandas & Jupyter Notebook

If you want to apply URL formatting only to a single column, you can use: data = [dict(name=”Google”, url=”http://www.google.com”), dict(name=”Stackoverflow”, url=”http://stackoverflow.com”)] df = pd.DataFrame(data) def make_clickable(val): # target _blank to open new window return ‘<a target=”_blank” href=”https://stackoverflow.com/questions/42263946/{}”>{}</a>’.format(val, val) df.style.format({‘url’: make_clickable}) (PS: Unfortunately, I didn’t have enough reputation to post this as a comment to @Abdou’s … Read more

Requirements for converting Spark dataframe to Pandas/R dataframe

toPandas (PySpark) / as.data.frame (SparkR) Data has to be collected before local data frame is created. For example toPandas method looks as follows: def toPandas(self): import pandas as pd return pd.DataFrame.from_records(self.collect(), columns=self.columns) You need Python, optimally with all the dependencies, installed on each node. SparkR counterpart (as.data.frame) is simply an alias for collect. To summarize … Read more

Make Pandas DataFrame apply() use all cores?

The simplest way is to use Dask’s map_partitions. You need these imports (you will need to pip install dask): import pandas as pd import dask.dataframe as dd from dask.multiprocessing import get and the syntax is data = <your_pandas_dataframe> ddata = dd.from_pandas(data, npartitions=30) def myfunc(x,y,z, …): return <whatever> res = ddata.map_partitions(lambda df: df.apply((lambda row: myfunc(*row)), axis=1)).compute(get=get) … Read more

Show DataFrame as table in iPython Notebook

You’ll need to use the HTML() or display() functions from IPython’s display module: from IPython.display import display, HTML # Assuming that dataframes df1 and df2 are already defined: print “Dataframe 1:” display(df1) print “Dataframe 2:” display(HTML(df2.to_html())) Note that if you just print df1.to_html() you’ll get the raw, unrendered HTML. You can also import from IPython.core.display … Read more

Pandas bar plot changes date format

The plotting code assumes that each bar in a bar plot deserves its own label. You could override this assumption by specifying your own formatter: ax.xaxis.set_major_formatter(formatter) The pandas.tseries.converter.TimeSeries_DateFormatter that Pandas uses to format the dates in the “good” plot works well with line plots when the x-values are dates. However, with a bar plot the … Read more

How to split data into 3 sets (train, validation and test)?

Numpy solution. We will shuffle the whole dataset first (df.sample(frac=1, random_state=42)) and then split our data set into the following parts: 60% – train set, 20% – validation set, 20% – test set In [305]: train, validate, test = \ np.split(df.sample(frac=1, random_state=42), [int(.6*len(df)), int(.8*len(df))]) In [306]: train Out[306]: A B C D E 0 0.046919 … Read more