binning - w3toppers.com

Python: Checking to which bin a value belongs

Probably too late, but for future reference, numpy has a function that does just that: http://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html >>> my_list = [3,2,56,4,32,4,7,88,4,3,4] >>> bins = [0,20,40,60,80,100] >>> np.digitize(my_list,bins) array([1, 1, 3, 1, 2, 1, 1, 5, 1, 1, 1]) The result is an array of indexes corresponding to the bin from bins that each element from my_list … Read more

Pandas pd.cut() – binning datetime column / series

UPDATE: starting from Pandas v0.20.1 (May 5, 2017) pd.cut and pd.qcut support datetime64 and timedelta64 dtypes (GH14714, GH14798). Thanks @lighthouse65 for checking this! Updated answer: df = pd.DataFrame(pd.date_range(‘2000-01-02′, freq=’1D’, periods=15), columns=[‘Date’]) bins_dt = pd.date_range(‘2000-01-01′, freq=’3D’, periods=6) bins_str = bins_dt.astype(str).values labels = [‘({}, {}]’.format(bins_str[i-1], bins_str[i]) for i in range(1, len(bins_str))] df[‘cat’] = pd.cut(df[‘Date’], bins=bins_dt, labels=labels) Old … Read more

Binning a column with pandas

You can use pandas.cut: bins = [0, 1, 5, 10, 25, 50, 100] df[‘binned’] = pd.cut(df[‘percentage’], bins) print (df) percentage binned 0 46.50 (25, 50] 1 44.20 (25, 50] 2 100.00 (50, 100] 3 42.12 (25, 50] bins = [0, 1, 5, 10, 25, 50, 100] labels = [1,2,3,4,5,6] df[‘binned’] = pd.cut(df[‘percentage’], bins=bins, labels=labels) print … Read more

2D and 3D Scatter Histograms from arrays in Python

Here it follows two functions: hist2d_bubble and hist3d_bubble; that may fit for your purpose: import numpy as np import matplotlib.pyplot as pyplot from mpl_toolkits.mplot3d import Axes3D def hist2d_bubble(x_data, y_data, bins=10): ax = np.histogram2d(x_data, y_data, bins=bins) xs = ax[1] ys = ax[2] points = [] for (i, j), v in np.ndenumerate(ax[0]): points.append((xs[i], ys[j], v)) points = … Read more

How does cut with breaks work in R

cut in your example splits the vector into the following parts: 0-1 (1); 1-2 (2); 2-3 (3); 3-5 (4); 5-7 (5); 7-8 (6); 8-10 (7) The numbers in brackets are default labels assigned by cut to each bin, based on the breaks values provided. cut by default is exclusive of the lower range. If you … Read more

Mapping ranges of values in pandas dataframe [duplicate]

There are a few alternatives. Pandas via pd.cut / NumPy via np.digitize You can construct a list of boundaries, then use specialist library functions. This is described in @EdChum’s solution, and also in this answer. NumPy via np.select df = pd.DataFrame(data=np.random.randint(1,10,10), columns=[‘a’]) criteria = [df[‘a’].between(1, 3), df[‘a’].between(4, 7), df[‘a’].between(8, 10)] values = [1, 2, 3] … Read more

Getting data for histogram plot

This is a post about a super quick-and-dirty way to create a histogram in MySQL for numeric values. There are multiple other ways to create histograms that are better and more flexible, using CASE statements and other types of complex logic. This method wins me over time and time again since it’s just so easy … Read more

Bin pandas dataframe by every X rows

In Python 2 use: >>> df.groupby(df.index / 3).mean() col1 0 2.0 1 0.5

Define and apply custom bins on a dataframe

Another cut answer that takes into account extrema: dat <- read.table(“clipboard”, header=TRUE) cuts <- apply(dat, 2, cut, c(-Inf,seq(0.5, 1, 0.1), Inf), labels=0:6) cuts[cuts==”6″] <- “0” cuts <- as.data.frame(cuts) cosinFcolor cosinEdge cosinTexture histoFcolor histoEdge histoTexture jaccard 1 3 0 0 1 1 0 0 2 0 0 5 0 2 2 0 3 1 0 2 … Read more

Histogram using gnuplot?

yes, and its quick and simple though very hidden: binwidth=5 bin(x,width)=width*floor(x/width) plot ‘datafile’ using (bin($1,binwidth)):(1.0) smooth freq with boxes check out help smooth freq to see why the above makes a histogram to deal with ranges just set the xrange variable.