Filtering (reducing) a NumPy Array

Summary Using a loop-based approach with single pass and copying, accelerated with Numba, offers the best overall trade-off in terms of speed, memory efficiency and flexibility. If the execution of the condition function is sufficiently fast, two-passes (filter2_nb()) may be faster, while they are more memory efficient regardless. Also, for sufficiently large inputs, resizing instead … Read more

Improve Pandas Merge performance

set_index on merging column does indeed speed this up. Below is a slightly more realistic version of julien-marrec’s Answer. import pandas as pd import numpy as np myids=np.random.choice(np.arange(10000000), size=1000000, replace=False) df1 = pd.DataFrame(myids, columns=[‘A’]) df1[‘B’] = np.random.randint(0,1000,(1000000)) df2 = pd.DataFrame(np.random.permutation(myids), columns=[‘A2’]) df2[‘B2′] = np.random.randint(0,1000,(1000000)) %%timeit x = df1.merge(df2, how=’left’, left_on=’A’, right_on=’A2′) #1 loop, best of … Read more

Cython class AttributeError

By default cdef attributes are only accessible from within Cython. If you make it a public attribute with cdef public in front of the attribute name then Cython will generate suitable properties to be able to access it from Python. Some extra notes about related problems: If you’re getting the same error from within Cython … Read more

Add numpy.get_include() argument to setuptools without preinstalled numpy

First question, when is numpy needed? It is needed during the setup (i.e. when build_ext-funcionality is called) and in the installation, when the module is used. That means numpy should be in setup_requires and in install_requires. There are following alternatives to solve the issue for the setup: using PEP 517/518 (which is more straight forward … Read more

Running Cython in Windows x64 – fatal error C1083: Cannot open include file: ‘basetsd.h’: No such file or directory

In case anyone is currently (2017) facing same error with visual C++ 2015 tools, launch setup again and also select windows 8.1 / 10 SDK depending upon your OS. This will fix basestd.h error. If it is still not working, try launching build tools from: C:\Program Files (x86)\Microsoft Visual C++ Build Tools. Another alternative would … Read more

Make distutils look for numpy header files in the correct place

Use numpy.get_include(): from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext import numpy as np # <—- New line ext_modules = [Extension(“hello”, [“hello.pyx”], include_dirs=[get_numpy_include()])] # <—- New argument setup( name=”Hello world app”, cmdclass = {‘build_ext’: build_ext}, ext_modules = ext_modules )