Pandas select rows and columns based on boolean condition

Let’s break down your problem. You want to

  1. Filter rows based on some boolean condition
  2. You want to select a subset of columns from the result.

For the first point, the condition you’d need is –

df["col_z"] < m

For the second requirement, you’d want to specify the list of columns that you need –

["col_x", "col_y"]

How would you combine these two to produce an expected output with pandas? The most straightforward way is using loc

df.loc[df["col_z"] < m, ["col_x", "col_y"]]

The first argument selects rows, and the second argument selects columns.


More About loc

Think of this in terms of the relational algebra operations – selection and projection. If you’re from the SQL world, this would be a relatable equivalent. The above operation, in SQL syntax, would look like this –

SELECT col_x, col_y     # projection on columns
FROM df
WHERE col_z < m         # selection on rows

pandas loc allows you to specify index labels for selecting rows. For example, if you have a dataframe –

   col_x  col_y
a      1      4
b      2      5
c      3      6

To select index a, and c, and col_x you’d use –

df.loc[['a', 'c'], ['col_x']]

   col_x
a      1
c      3

Alternatively, for selecting by a boolean condition (using a series/array of bool values, as your original question asks), where all values in col_x are odd –

df.loc[(df.col_x % 2).ne(0), ['col_y']]

   col_y
a      4
c      6

For details, df.col_x % 2 computes the modulus of each value with respect to 2. The ne(0) will then compare the value to 0, and return True if it isn’t (all odd numbers are selected like this). Here’s what that expression results in –

(df.col_x % 2).ne(0)

a     True
b    False
c     True
Name: col_x, dtype: bool

Further Reading

Leave a Comment