numexpr: “faster than numpy” library, which most data scientists have never used

liralbes April 28, 2025

0 7 minutes read

numexpr: “faster than numpy” library, which most data scientists have never used

A few days ago, I came across a library that I had never heard of before. it’s known numexpr.

I was immediately interested because of some claims about the library. In particular, it points out that for some complex numerical calculations it is 15 times faster than Numpy.

I’m interested because so far Numpy has remained unchallenged in Python’s numerical computing space. Especially in data science, Numpy is the cornerstone of machine learning, exploratory data analysis and model training. Anything we can use to extrude every last performance in the system will be welcome. So I decided to make a claim for the exam myself.

You can find a link to the NUMEXPR repository at the end of this article.

What is numexpr?

According to its GitHub page, NumExpr is a fast numeric expression evaluator for Numpy. Using it, expressions running on arrays accelerate and use less memory than the same calculations as those in Python in other numeric libraries, such as Numpy.

Additionally, because it is multithreaded, NumExpr can use all CPU cores, which usually leads to a lot of performance scaling compared to Numpy.

Establish a development environment

Before we start coding, let’s set up the development environment. The best thing to do is to create a separate Python environment where you can install any necessary software and try coding, knowing that nothing you do in this environment will affect the rest of the system. I use Conda for this, but you can use whatever method is best for you.

If you want to follow the Miniconda route and don’t have it yet, you must first install Miniconda. Use this link to get it:

1/Create our new development environment and install the required libraries

(base) $ conda create -n numexpr_test python=3.12-y
(base) $ conda activate numexpr
(numexpr_test) $ pip install numexpr
(numexpr_test) $ pip install jupyter

2/Start jupyter
Enter now jupyter notebook Go to your command prompt. You should see your Jupyter laptop in your browser. If this does not happen automatically, you may jupyter notebook Order. Near the bottom you will find a URL that should be copied and pasted into your browser to start your jupyter notebook.

Your URL is different from mine, but it should look like this: –

Comparing NumExpr and Numpy performance

To compare performance, we will run a series of numerical calculations using Numpy and NumExpr and two systems.

Example 1 – Simple array addition calculations
In this example, we run 5000 times a vector addition of two large arrays.

import numpy as np
import numexpr as ne
import timeit

a = np.random.rand(1000000)
b = np.random.rand(1000000)

# Using timeit with lambda functions
time_np_expr = timeit.timeit(lambda: 2*a + 3*b, number=5000)
time_ne_expr = timeit.timeit(lambda: ne.evaluate("2*a + 3*b"), number=5000)

print(f"Execution time (NumPy): {time_np_expr} seconds")
print(f"Execution time (NumExpr): {time_ne_expr} seconds")

>>>>>>>>>>>


Execution time (NumPy): 12.03680682599952 seconds
Execution time (NumExpr): 1.8075962659931974 seconds

I have to say that this is already an impressive start to the NumExpr library. I’ve improved 6 times more than Numpy runtime.

Let’s double check that both operations return the same result set.


# Arrays to store the results
result_np = 2*a + 3*b
result_ne = ne.evaluate("2*a + 3*b")

# Ensure the two new arrays are equal
arrays_equal = np.array_equal(result_np, result_ne)
print(f"Arrays equal: {arrays_equal}")

>>>>>>>>>>>>

Arrays equal: True

Example 2 – Calculate PI using Monte Carlo Simulation

Our second example will examine a more complex use case and have more real-life applications.

Monte Carlo simulation involves running iterations of many random processes to estimate the properties of the system, which can be computationally intensive.

In this case, we will use Monte Carlo to calculate the value of the PI. This is a well-known example where we occupy a square with the side length of a unit and engraved with the radius of a unit in one of the quarter circles. The ratio of the area of a quarter circle to the square area is (π/4)/1, we can multiply this expression by four to get π Own.

Therefore, if we consider all the many random (x,y) points that lie inside or within the square, because the total number of these points tends to be infinity, then the points that lie within a quarter circle or within an inner circle tend toward PI.

First, Numpy implementation.

import numpy as np
import timeit

def monte_carlo_pi_numpy(num_samples):
    x = np.random.rand(num_samples)
    y = np.random.rand(num_samples)
    inside_circle = (x**2 + y**2) >>>>>>>

Estimated Pi (NumPy): 3.144832
Execution Time (NumPy): 10.642843848007033 seconds

Now, use numexpr.

import numpy as np
import numexpr as ne
import timeit

def monte_carlo_pi_numexpr(num_samples):
    x = np.random.rand(num_samples)
    y = np.random.rand(num_samples)
    inside_circle = ne.evaluate("(x**2 + y**2) >>>>>>>>>>>>>>

Estimated Pi (NumExpr): 3.141684
Execution Time (NumExpr): 8.077501275009126 seconds

OK, so the acceleration isn’t impressive, but a 20% increase isn’t scary either. Part of the reason is that NumExpr does not have an optimized sum() function, so we have to go back to Numpy by default for that operation.

Example 3 – Implementing SOBEL Image Filter

In this example, we will implement a SOBEL filter for the image. SOBEL filters are commonly used in image processing for edge detection. It calculates the image intensity gradient at each pixel, highlighting edges and intensity transitions. Our input image is the Taj Mahal of India.

Original image of Yury Taranik (licensed from Shutterstock)

Let’s look at the Numpy code that runs first, and then the time.

import numpy as np
from scipy.ndimage import convolve
from PIL import Image
import timeit

# Sobel kernels
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])

sobel_y = np.array([[-1, -2, -1],
                    [ 0,  0,  0],
                    [ 1,  2,  1]])

def sobel_filter_numpy(image):
    """Apply Sobel filter using NumPy."""
    img_array = np.array(image.convert('L'))  # Convert to grayscale
    gradient_x = convolve(img_array, sobel_x)
    gradient_y = convolve(img_array, sobel_y)
    gradient_magnitude = np.sqrt(gradient_x**2 + gradient_y**2)
    gradient_magnitude *= 255.0 / gradient_magnitude.max()  # Normalize to 0-255
    
    return Image.fromarray(gradient_magnitude.astype(np.uint8))

# Load an example image
image = Image.open("/mnt/d/test/taj_mahal.png")

# Benchmark the NumPy version
time_np_sobel = timeit.timeit(lambda: sobel_filter_numpy(image), number=100)
sobel_image_np = sobel_filter_numpy(image)
sobel_image_np.save("/mnt/d/test/sobel_taj_mahal_numpy.png")

print(f"Execution Time (NumPy): {time_np_sobel} seconds")

>>>>>>>>>

Execution Time (NumPy): 8.093792188999942 seconds

Now it’s the digital code.

import numpy as np
import numexpr as ne
from scipy.ndimage import convolve
from PIL import Image
import timeit

# Sobel kernels
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])

sobel_y = np.array([[-1, -2, -1],
                    [ 0,  0,  0],
                    [ 1,  2,  1]])

def sobel_filter_numexpr(image):
    """Apply Sobel filter using NumExpr for gradient magnitude computation."""
    img_array = np.array(image.convert('L'))  # Convert to grayscale
    gradient_x = convolve(img_array, sobel_x)
    gradient_y = convolve(img_array, sobel_y)
    gradient_magnitude = ne.evaluate("sqrt(gradient_x**2 + gradient_y**2)")
    gradient_magnitude *= 255.0 / gradient_magnitude.max()  # Normalize to 0-255
    
    return Image.fromarray(gradient_magnitude.astype(np.uint8))

# Load an example image
image = Image.open("/mnt/d/test/taj_mahal.png")

# Benchmark the NumExpr version
time_ne_sobel = timeit.timeit(lambda: sobel_filter_numexpr(image), number=100)
sobel_image_ne = sobel_filter_numexpr(image)
sobel_image_ne.save("/mnt/d/test/sobel_taj_mahal_numexpr.png")

print(f"Execution Time (NumExpr): {time_ne_sobel} seconds")

>>>>>>>>>>>>>

Execution Time (NumExpr): 4.938702256011311 seconds

In this case, using NumExpr leads to a good result, with nearly twice as much performance as Numpy.

This is what the edge detection image looks like.

Example 4-Fourier Series Approximation

It is well known that complex periodic functions can be simulated by applying a series of sine waves to superimpose each other. In extreme cases, even square waves are easily modeled in this way. This method is called Fourier series approximation. Despite the approximation, we can get as close to the target waveform as the target waveform and computing power allow.

The math behind all this is not the main point. Note that the solution’s runtime increases significantly as we increase the number of iterations.

import numpy as np
import numexpr as ne
import time
import matplotlib.pyplot as plt

# Define the constant pi explicitly
pi = np.pi

# Generate a time vector and a square wave signal
t = np.linspace(0, 1, 1000000) # Reduced size for better visualization
signal = np.sign(np.sin(2 * np.pi * 5 * t))

# Number of terms in the Fourier series
n_terms = 10000

# Fourier series approximation using NumPy
start_time = time.time()
approx_np = np.zeros_like
for n in range(1, n_terms + 1, 2):
    approx_np += (4 / (np.pi * n)) * np.sin(2 * np.pi * n * 5 * t)
numpy_time = time.time() - start_time

# Fourier series approximation using NumExpr
start_time = time.time()
approx_ne = np.zeros_like
for n in range(1, n_terms + 1, 2):
    approx_ne = ne.evaluate("approx_ne + (4 / (pi * n)) * sin(2 * pi * n * 5 * t)", local_dict={"pi": pi, "n": n, "approx_ne": approx_ne, "t": t})
numexpr_time = time.time() - start_time

print(f"NumPy Fourier series time: {numpy_time:.6f} seconds")
print(f"NumExpr Fourier series time: {numexpr_time:.6f} seconds")

# Plotting the results
plt.figure(figsize=(10, 6))

plt.plot(t, signal, label='Original Signal (Square Wave)', color='black', linestyle='--')
plt.plot(t, approx_np, label='Fourier Approximation (NumPy)', color='blue')
plt.plot(t, approx_ne, label='Fourier Approximation (NumExpr)', color='red', linestyle='dotted')

plt.title('Fourier Series Approximation of a Square Wave')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.legend()
plt.grid(True)
plt.show()

and output?

This is another good result. NumExpr shows 5 times more than Numpy on this occasion.

Summary

Numpy and Numexpr are both powerful libraries for Python numerical calculations. Each of them has unique advantages and use cases that make them suitable for different types of tasks. Here we compare their performance and applicability for specific computing tasks, focusing on examples such as simple arrays in more complex applications, such as image edge detection using SOBEL filters.

While I haven’t seen the claimed 15x speed increase by 15x over Numpy in my tests, there is no doubt that in many cases NumExpr may be much faster than Numpy.

If you are a heavy Numpy user and need to extract all performance from your code, it is recommended to try using the NumExpr library. Apart from the fact that all Numpy code can be copied using NumExpr, there are few drawbacks, and uplinks may surprise you.

For more details on the NumExpr library, check out the GitHub page here.

liralbes April 28, 2025

0 7 minutes read