Python’s modern GUI application computer vision application

0 0 11 minutes read

Python’s modern GUI application computer vision application

I am a big fan of interactive visualization. As a computer vision engineer, I work on related tasks almost every day, and I often iterate on the issues I need Visual feedback Make a decision. Let’s consider a very simple image processing pipeline, where one step has some parameters that can convert the parameters of the image:

Visualization of missing output of sample processing pipeline

How do you know which parameters to adjust? Will the pipeline work as expected? In the case of unvisible output, you may miss some key insights and make a secondary best choice.

Sometimes simply displaying the output image and/or some calculated metrics is enough to iterate over the parameters. But I find myself in many cases where tools iterate quickly and interactively across my pipeline. So in this article I’ll show you how to work with simple built-in interactive elements OpenCV And how to use it customtkinter.

Prerequisites

If you want to follow, I recommend you set up your local environment with UV and install the following packages:

uv add numpy opencv-Python pillow customtkinter

Target

Before digging into the code of the project, let’s quickly outline what to build. The application should use a webcam to feed and allow the user to select different types of filters that will be applied to the stream. The processed image should be displayed in real time in the window. A rough sketch of the potential UI is as follows:

OPENCV – GUI

Let’s start with a simple loop that takes frames from the webcam and then displays them in the OpenCV window.

import cv2

cap = cv2.VideoCapture(0)

while True:
    ret, frame = cap.read()
    if not ret:
        break

    cv2.imshow("Video Feed", frame)
    
    key = cv2.waitKey(1) & 0xFF
    if key == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Keyboard input

The easiest way to add interactivity here is to add keyboard input. For example, we can use the numeric keys to loop through different filters.

...

filter_type = "normal"

while True:
    ...

    if filter_type == "grayscale":
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    elif filter_type == "normal":
        pass

    ...

    if key == ord('1'):
        filter_type = "normal"
    if key == ord('2'):
        filter_type = "grayscale"
        
    ...

Now you can toggle between normal image and grayscale version by pressing the numeric keys 1 and 2. We also quickly add the title to the image so that we can actually see the name of the filter to be applied.

Now, we need to be careful here: If you look at the frame shape after the filter, you will notice that the dimensions of the frame array have changed. Remember that the OPENCV image array is sorted HWC (height, width, color) color is BGR (green, blue, red), so my webcam’s 640×480 image has shape (480, 640, 3).

print(filter_type, frame.shape)
# normal (480, 640, 3)
# grayscale (480, 640)

Now, color sizes are dropped because grayscale operations output a single channel image. If we want to draw the top of this image now, we either need to specify a single channel color for the grayscale image or we convert that image back to the original image BGR Format. The second option is cleaner because we can unify the annotation of the image.

if filter_type == "grayscale":
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
elif filter_type == "normal":
    pass

if len(frame.shape) == 2: # Convert grayscale to BGR
    frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR)

title

I want to add a black border to the bottom of the image with the name of the filter shown above. We can use copyMakeBorder Function Paste the image with border color at the bottom. We can then add text at the top of this boundary.

# Add a black border at the bottom of the frame
border_height = 50
border_color = (0, 0, 0)
frame = cv2.copyMakeBorder(frame, 0, border_height, 0, 0, cv2.BORDER_CONSTANT, value=border_color)

# Show the filter name
cv2.putText(
    frame,
    filter_type,
    (frame.shape[1] // 2 - 50, frame.shape[0] - border_height // 2 + 10),
    cv2.FONT_HERSHEY_SIMPLEX,
    1,
    (255, 255, 255),
    2,
    cv2.LINE_AA,
)

This is how the output looks, you can switch between normal and grayscale modes and the frame will be titled accordingly.

slider

Now, instead of using the keyboard as input method, OPENCV provides the basic tracking bar slider UI elements. The tracking bar needs to be initialized at the beginning of the script. We need to reference the same window as the image shown later, so I will create a variable for the name of the window. With this name, we can create a trace bar and make it a selector for indexes in the filter list.

filter_types = ["normal", "grayscale"]

win_name = "Webcam Stream"
cv2.namedWindow(win_name)

tb_filter = "Filter"
# def createTrackbar(trackbarName: str, windowName: str, value: int, count: int, onChange: _typing.Callable[[int], None]) -> None: ...
cv2.createTrackbar(
    tb_filter,
    win_name,
    0,
    len(filter_types) - 1,
    lambda _: None,
)

Please note how we use empty lambda onChange Callback, we will manually get the value in the loop. Everything else will remain the same.

while True:
    ...

    # Get the selected filter type
    filter_id = cv2.getTrackbarPos(tb_filter, win_name)
    filter_type = filter_types[filter_id]

    ...

Look, we have a tracking bar to select filters.

Now we can also easily add more filters by expanding our list and implementing each processing step.

filter_types = [
    "normal",
    "grayscale",
    "blur",
    "threshold",
    "canny",
    "sobel",
    "laplacian",
]

...

    if filter_type == "grayscale":
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    elif filter_type == "blur":
        frame = cv2.GaussianBlur(frame, ksize=(15, 15), sigmaX=0)
    elif filter_type == "threshold":
        gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        _, thresholded_frame = cv2.threshold(gray, thresh=127, maxval=255, type=cv2.THRESH_BINARY)
    elif filter_type == "canny":
        frame = cv2.Canny(frame, threshold1=100, threshold2=200)
    elif filter_type == "sobel":
        frame = cv2.Sobel(frame, ddepth=cv2.CV_64F, dx=1, dy=0, ksize=5)
    elif filter_type == "laplacian":
        frame = cv2.Laplacian(frame, ddepth=cv2.CV_64F)
    elif filter_type == "normal":
        pass

    if frame.dtype != np.uint8:
        # Scale the frame to uint8 if necessary
        cv2.normalize(frame, frame, 0, 255, cv2.NORM_MINMAX)
        frame = frame.astype(np.uint8)

Modern GUI using CustomTkinter

Now I don’t know you, but the current user interface doesn’t look good Modern Big book. Don’t get me wrong, the interface has some beauty, but I prefer a cleaner, more modern design. Also, what limit are we already OPENCV UI elements are out of the box. Yes, there are no buttons, text fields, drop-down lists, checkboxes, or radio buttons, and there is no custom layout. So let’s see how to convert the look and user experience of this basic app into a freshly clean app.

So first, we need to create classes for our application first. We create two frames: the first frame contains our filter selections, while the second contains the image display. Now, let’s start with a simple placeholder text. Unfortunately, it is possible to operate directly from CustomTkinter’s OPENCV component, so we need to quickly build our own build in the next few steps. But, let’s complete the basic UI layout first.

import customtkinter


class App(customtkinter.CTk):
    def __init__(self) -> None:
        super().__init__()

        self.title("Webcam Stream")
        self.geometry("800x600")

        self.filter_var = customtkinter.IntVar(value=0)

        # Frame for filters
        self.filters_frame = customtkinter.CTkFrame(self)
        self.filters_frame.pack(side="left", fill="both", expand=False, padx=10, pady=10)

        # Frame for image display
        self.image_frame = customtkinter.CTkFrame(self)
        self.image_frame.pack(side="right", fill="both", expand=True, padx=10, pady=10)

        self.image_display = customtkinter.CTkLabel(self.image_frame, text="Loading...")
        self.image_display.pack(fill="both", expand=True, padx=10, pady=10)

app = App()
app.mainloop()

Filter radio button

Now that the bones have been built, we can start filling the components. For the left side, I will use the same list filter_types Fill in a set of radio buttons to select filters.

        # Create radio buttons for each filter type
        self.filter_var = customtkinter.IntVar(value=0)
        for filter_id, filter_type in enumerate(filter_types):
            rb_filter = customtkinter.CTkRadioButton(
                self.filters_frame,
                text=filter_type.capitalize(),
                variable=self.filter_var,
                value=filter_id,
            )
            rb_filter.pack(padx=10, pady=10)

            if filter_id == 0:
                rb_filter.select()

Image display component

Now we can start the interesting part about how to make our OpenCV frames appear in the image component. Since there are no built-in components, let’s follow CTKLabel. This allows us to display loaded text when the webcam stream starts.

...

class CTkImageDisplay(customtkinter.CTkLabel):
    """
    A reusable ctk widget widget to display opencv images.
    """

    def __init__(
        self,
        master: Any,
    ) -> None:
        self._textvariable = customtkinter.StringVar(master, "Loading...")
        super().__init__(
            master,
            textvariable=self._textvariable,
            image=None,
        )

...

class App(customtkinter.CTk):
    def __init__(self) -> None:
        ...

        self.image_display = CTkImageDisplay(self.image_frame)
        self.image_display.pack(fill="both", expand=True, padx=10, pady=10)

So far, nothing has changed, we simply swapped the existing tags for our custom class implementation. In our CTKImageDisplay class, we can define a function to display an image in a component, let’s call it set_frame.

import cv2
import numpy.typing as npt
from PIL import Image

class CTkImageDisplay(customtkinter.CTkLabel):
    ...

    def set_frame(self, frame: npt.NDArray) -> None:
        """
        Set the frame to be displayed in the widget.

        Args:
            frame: The new frame to display, in opencv format (BGR).
        """
        target_width, target_height = frame.shape[1], frame.shape[0]

        # Convert the frame to PIL Image format
        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frame_pil = Image.fromarray(frame_rgb, "RGB")

        ctk_image = customtkinter.CTkImage(
            light_image=frame_pil,
            dark_image=frame_pil,
            size=(target_width, target_height),
        )
        self.configure(image=ctk_image, text="")
        self._textvariable.set("")

Let’s digest this. First, we need to know the size of our image component, which we can extract from the shape properties of the image array. exist tkinterwe need a pillow Image Type, we cannot use OpenCV arrays directly. To convert an OpenCV array to a pillow, we first need to BGR arrive RGB Then we can use Image.fromarray Function to create pillow image objects. Next, we can create a CTKimage where we will use the same image and set the size according to the frame regardless of the theme. Finally, we can set the image in the framework using the configuration method. Finally, we also reset the text variable to delete “loading…” Text, even if in theory it will be hidden behind the image.

For quick tests, we can set the first image of the webcam in the constructor. (We will see in one second why this is not a good idea)

class App(customtkinter.CTk):
    def __init__(self) -> None:
        ...
        
        cap = cv2.VideoCapture(0)
        _, frame0 = cap.read()
        self.image_display.set_frame(frame0)

If you run this you will notice that the window takes longer to pop up, but for a short time you should see the static image from the webcam.

notes: If you don’t have a webcam ready, you can also pass the file path to cv2.VideoCapture Constructor call.

Now, this is not very exciting because the frame has not been updated yet. So let’s see what happens if we try to do this naively.

class App(customtkinter.CTk):
    def __init__(self) -> None:
        ...

        cap = cv2.VideoCapture(0)
        while True:
            ret, frame = cap.read()
            if not ret:
                break

            self.image_display.set_frame(frame)

Almost the same as before, except now we run the framework loop like in the previous chapter. If you run this, you will see… nothing. The window will never appear because we are creating an infinite loop in the constructor of the application! This is also why the program only appears after the delay in the previous example, the opening of the webcam stream is a blocking operation and the event loop of the window cannot run, so it has not been displayed yet.

So let’s solve this problem by adding a slightly better implementation that makes the GUI event loop run while we occasionally update the framework. We can use after method tkinter Schedule function calls during the waiting time while yielding the process.


        ...

        self.cap = cv2.VideoCapture(0)
        self.after(10, self.update_frame)

    def update_frame(self) -> None:
        """
        Update the displayed frame.
        """
        
        ret, frame = self.cap.read()
        if not ret:
            return
        
        self.image_display.set_frame(frame)

        self.after(10, self.update_frame)

So now we are still setting up the webcam stream in the constructor, so we haven’t solved the problem yet. But at least we can see a continuous stream of frames in the image component.

Apply filters

Now, the framework loop is running. We can reimplement the filter from the start and apply it to our webcam streams. In the “update_frame” function, we can check the current filter variable and apply the corresponding filter function.

    def update_frame(self) -> None:
        ...
        
        # Get the selected filter type
        filter_id = self.filter_var.get()
        filter_type = filter_types[filter_id]

        if filter_type == "grayscale":
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        elif filter_type == "blur":
            frame = cv2.GaussianBlur(frame, ksize=(15, 15), sigmaX=0)
        elif filter_type == "threshold":
            gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
            _, frame = cv2.threshold(gray, thresh=127, maxval=255, type=cv2.THRESH_BINARY)
        elif filter_type == "canny":
            frame = cv2.Canny(frame, threshold1=100, threshold2=200)
        elif filter_type == "sobel":
            frame = cv2.Sobel(frame, ddepth=cv2.CV_64F, dx=1, dy=0, ksize=5)
        elif filter_type == "laplacian":
            frame = cv2.Laplacian(frame, ddepth=cv2.CV_64F)
        elif filter_type == "normal":
            pass

        if frame.dtype != np.uint8:
            # Scale the frame to uint8 if necessary
            cv2.normalize(frame, frame, 0, 255, cv2.NORM_MINMAX)
            frame = frame.astype(np.uint8)
        if len(frame.shape) == 2:  # Convert grayscale to BGR
            frame = cv2.cvtColor(frame, cv2.COLOR_GRAY2BGR)
        
        self.image_display.set_frame(frame)

        self.after(10, self.update_frame)

Now, we’re back to the full functionality of the app, you can select any filter on the left and apply it in real time to the webcam feed!

Multithreading and synchronization

Now, although the application runs as it is, there are some problems with the current way we run frame loops. Currently, everything is run with a single thread as the main GUI thread. That’s why at the beginning we don’t immediately see the window pop up and our webcam initialization blocks the main thread. Now imagine if we did some heavier image processing, maybe running images through a neural network, then you don’t want to always block the user interface when the network runs inference. This will result in a very responsive user experience when clicking on a UI element!

A better way to deal with this in our application is Separate image processing from user interface. Often, this is almost always a good idea to separate your GUI logic from any kind of non-trivial processing. So, in our case, we will run a separate thread responsible for the image loop. It will read frames from the webcam stream and apply filters.

notes: Python threads are not “Real” Threads in a sense do not have the ability to run on different logical CPU cores, so they do not real Run in parallel. In Python multithreading, the context switches between threads, but due to GIL, global interpreter lock, a Python process can only run one physical thread. If you want “Real” Parallel processing, you need to use Multiprocessing. Since our process here is not CPU bound, it is I/O bindingmulti-threading is sufficient.

class App(customtkinter.CTk):
    def __init__(self) -> None:
        ...

        self.webcam_thread = threading.Thread(target=self.run_webcam_loop, daemon=True)
        self.webcam_thread.start()

    def run_webcam_loop(self) -> None:
        """
        Run the webcam loop in a separate thread.
        """
        self.cap = cv2.VideoCapture(0)
        if not self.cap.isOpened():
            return

        while True:
            ret, frame = self.cap.read()
            if not ret:
                break

            # Filters
            ...

            self.image_display.set_frame(frame)

If you run this you will now see our window open immediately, and even when the webcam stream opens we will even see the loaded text. However, once the stream begins, the frame starts to flash. Depending on many factors, you may encounter different visual artifacts or errors at this stage.

Warning: Flashing image

Why is this happening now? The problem is that we try to update the new frame at the same time, and the internal refresh loop of the UI may draw it on the screen using the information of the array. They are all competing for the same framework array.

In general, updating UI elements directly from other threads is not a good idea, and in some frameworks this can even prevent and cause exceptions. exist tkinterwe can do it, but we will get strange results. We need some type of synchronous Between our threads. That’s Queue Work.

You may be familiar with the queues at grocery stores or theme parks. The concept of queues here is very similar: the first element entering the queue also leaves (fFirst In fFirst oUT).

In this case, we actually only need a queue with a slot queue with a single element. The queue implementation in Python is Thread safetywhich means we can put and get Objects from different threads. Perfect for our use case, processing lines put the image array in a queue, while the GUI thread will try to get the elements, but if the queue is empty, it won’t block.

class App(customtkinter.CTk):
    def __init__(self) -> None:
        ...

        self.queue = queue.Queue(maxsize=1)

        self.webcam_thread = threading.Thread(target=self.run_webcam_loop, daemon=True)
        self.webcam_thread.start()

        self.frame_loop_dt_ms = 16  # ~60 FPS
        self.after(self.frame_loop_dt_ms, self._update_frame)
    
    def _update_frame(self) -> None:
        """
        Update the frame in the image display widget.
        """
        try:
            frame = self.queue.get_nowait()
            self.image_display.set_frame(frame)
        except queue.Empty:
            pass

        self.after(self.frame_loop_dt_ms, self._update_frame)

    def run_webcam_loop(self) -> None:
        ...

        while True:
            ...

            self.queue.put(frame)

Please note how we move direct calls to set_frame A feature from a webcam loop that runs to _update_frame The functions running on the main thread are repeatedly arranged in 16ms interval.

Here, use get_nowait Function in the main thread, otherwise, if we use the GET function, we will block it here. This phone call is indeed No blockingbut proposed a queue.Empty Exception If there is no element to be obtained, so we have to grab it and ignore it. In the webcam loop we can use the block put function because we Blockage this run_webcam_loopthere is no other place to run there.

Now everything is running as expected, no longer flashing frames!

in conclusion

Combine UI frameworks together tkinter and OPENCV Allows us to build modern-looking applications using an interactive graphical user interface. Because of the UI running in the main thread, we run image processing in separate threads and synchronize data between threads using a single slot queue. You can find a cleaned version of the demo in the repository below, with a more modular structure. Let me know if you are building something interesting with this approach. careful!

View the full source code in the GitHub repository: