Data Science

Build modern dashboards with Python and Gradio

The second short series about a short development data dashboard using the latest Python-based GUI development tools, namely Shatlit, Gradio, and Taipei.

The source dataset for each dashboard will be the same, but stored in a different format. As much as I can, I also try to make the actual dashboard layouts of each tool similar to each other and have the same functionality.

In the first part of this series, I created a simplified version of the dashboard that retrieves its data from a local PostgreSQL database. You can view the article here.

This time, we are exploring the use of Gradio libraries.

The data for this dashboard will be in the local CSV file, and PANDA will be our primary data processing engine.

If you want to see a quick demo of the app, I deployed it into a space where the faces are embracing. You can run it with the link below, but note that the two input date selector popups are invalid due to known errors in embracing the facial environment. You can still manually change the date if you deploy the application on HF only. Running the application locally works fine and there is no problem.

Dashboard demonstration on the hug surface

What is Gradio?

Gradio is an open source Python package that simplifies the process of building demonstrations or web applications for machine learning models, APIs, or any Python functionality. With it, you can create a demo or web application without the JavaScript, CSS, or web hosting experience. By writing just a few lines of Python code, you can unlock Gradio’s capabilities and seamlessly present your machine learning model to a wider audience.

Gradio simplifies the development process by providing an intuitive framework that removes the complexity associated with building user interfaces from scratch. Whether you are a machine learning developer, researcher or enthusiast, Gradio allows you to create beautiful and interactive presentations that enhance the understanding and accessibility of machine learning models.

This open source Python package helps you bridge the gap between machine learning expertise and a wider audience, making your model accessible and actionable.

What will we develop

We are developing a data dashboard. Our source data will be a CSV file containing 100,000 synthetic sales records.

The actual source of the data is not That Important. It can be easily a text file, an excel file, a sqlite or any database you can connect to.

That’s what our final dashboard looks like.

Image of the author

There are four main parts.

  • Top-move enables users to select specific start and end points and/or product categories using the date picker and drop-down lists, respectively.
  • The second line – Key Metrics – shows a top-level summary of the selected data.
  • The visualization section allows the user to select one of three charts to display the input dataset.
  • The original data part is exactly what it claims to be. This table of selected data indicates that a snapshot of the underlying CSV data file is effectively displayed.

It’s easy to use the dashboard. Initially, statistics for the entire dataset are displayed. The user can then use the three filter fields at the top of the display to narrow the data focus. Graphics, key metrics, and raw data partially updated dynamically to reflect user selections in the filter field.

Basic data

As mentioned earlier, the source data for the dashboard is contained in a single comma-separated value (CSV) file. The data includes 100,000 records related to integrated sales. Here are the top ten records of the file that will give you an idea of ​​its appearance.

+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+
| order_id | order_date | customer_id| customer_name  | product_id | product_names | categories | quantity | price | total              |
+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+
| 0        | 01/08/2022 | 245        | Customer_884   | 201        | Smartphone    | Electronics| 3        | 90.02 | 270.06             |
| 1        | 19/02/2022 | 701        | Customer_1672  | 205        | Printer       | Electronics| 6        | 12.74 | 76.44              |
| 2        | 01/01/2017 | 184        | Customer_21720 | 208        | Notebook      | Stationery | 8        | 48.35 | 386.8              |
| 3        | 09/03/2013 | 275        | Customer_23770 | 200        | Laptop        | Electronics| 3        | 74.85 | 224.55             |
| 4        | 23/04/2022 | 960        | Customer_23790 | 210        | Cabinet       | Office     | 6        | 53.77 | 322.62             |
| 5        | 10/07/2019 | 197        | Customer_25587 | 202        | Desk          | Office     | 3        | 47.17 | 141.51             |
| 6        | 12/11/2014 | 510        | Customer_6912  | 204        | Monitor       | Electronics| 5        | 22.5  | 112.5              |
| 7        | 12/07/2016 | 150        | Customer_17761 | 200        | Laptop        | Electronics| 9        | 49.33 | 443.97             |
| 8        | 12/11/2016 | 997        | Customer_23801 | 209        | Coffee Maker  | Electronics| 7        | 47.22 | 330.54             |
| 9        | 23/01/2017 | 151        | Customer_30325 | 207        | Pen           | Stationery | 6        | 3.5   | 21                 |
+----------+------------+------------+----------------+------------+---------------+------------+----------+-------+--------------------+

Here is some Python code you can use to generate similar datasets. Make sure to install the Numpy and Pandas libraries first.

# generate the 100K record CSV file
#
import polars as pl
import numpy as np
from datetime import datetime, timedelta

def generate(nrows: int, filename: str):
    names = np.asarray(
        [
            "Laptop",
            "Smartphone",
            "Desk",
            "Chair",
            "Monitor",
            "Printer",
            "Paper",
            "Pen",
            "Notebook",
            "Coffee Maker",
            "Cabinet",
            "Plastic Cups",
        ]
    )
    categories = np.asarray(
        [
            "Electronics",
            "Electronics",
            "Office",
            "Office",
            "Electronics",
            "Electronics",
            "Stationery",
            "Stationery",
            "Stationery",
            "Electronics",
            "Office",
            "Sundry",
        ]
    )
    product_id = np.random.randint(len(names), size=nrows)
    quantity = np.random.randint(1, 11, size=nrows)
    price = np.random.randint(199, 10000, size=nrows) / 100
    # Generate random dates between 2010-01-01 and 2023-12-31
    start_date = datetime(2010, 1, 1)
    end_date = datetime(2023, 12, 31)
    date_range = (end_date - start_date).days
    # Create random dates as np.array and convert to string format
    order_dates = np.array([(start_date + timedelta(days=np.random.randint(0, date_range))).strftime('%Y-%m-%d') for _ in range(nrows)])
    # Define columns
    columns = {
        "order_id": np.arange(nrows),
        "order_date": order_dates,
        "customer_id": np.random.randint(100, 1000, size=nrows),
        "customer_name": [f"Customer_{i}" for i in np.random.randint(2**15, size=nrows)],
        "product_id": product_id + 200,
        "product_names": names[product_id],
        "categories": categories[product_id],
        "quantity": quantity,
        "price": price,
        "total": price * quantity,
    }
    # Create Polars DataFrame and write to CSV with explicit delimiter
    df = pl.DataFrame(columns)
    df.write_csv(filename, separator=',',include_header=True)  # Ensure comma is used as the delimiter

# Generate 100,000 rows of data with random order_date and save to CSV
generate(100_000, "/mnt/d/sales_data/sales_data.csv")

Install and use Gradio

Installing Gradio is easy to use pipbut for encoding, the best thing to do is to create a separate Python environment for all your work. I use Miniconda for this purpose, but feel free to use any method that suits your work exercises.

If you want to have not used along the Conda route, you must first install Miniconda (recommended) or Anaconda.

Please note that when writing, Gradio requires at least Python 3.8 to be installed to work correctly.

After creating the environment, use “activation” Command, then run “PIP Installation” arrive Install the Python library we need.

#create our test environment
(base) C:Usersthoma>conda create -n gradio_dashboard python=3.12 -y

# Now activate it
(base) C:Usersthoma>conda activate gradio_dashboard

# Install python libraries, etc ...
(gradio_dashboard) C:Usersthoma>pip install gradio pandas matplotlib cachetools

Key Differences between Simplification and Gradio

As I have demonstrated in this article, it is possible to generate very similar data dashboards using simplification and Gradio. However, their spirits differ in several key aspects.

Focus

  • Gradio creates interfaces specifically for machine learning models, while streamlined designs are more specifically designed for general-purpose data applications and visualizations.

Ease of use

  • Gradio is known for its simplicity and fast prototyping capabilities, making it easier for beginners to use. Simplification provides more advanced features and customization options, which may require a steeper learning curve.

interactive

  • Simplify using a reactive programming model, where any input changes trigger a full script replay and update all components immediately. By default, Gradio is only updated when the user clicks the Submit button, although it can be configured for real-time updates.

custom made

  • Gradio focuses on pre-built components to quickly demonstrate AI models. Simplification provides a wider range of customization options and flexibility for complex projects.

deploy

  • After deploying simplified and Gradio applications, I would say deploying simplified applications is easier than Gradio applications. In simplification, you can deploy with a simplified community cloud click click. This feature is built into any simplified app you create. Gradio uses embracing facial space to provide deployment, but involves more work. However, neither of these methods is particularly complicated.

Use Cases

Simplified features excel at creating data-centric applications and interactive dashboards for complex projects. Gradio is perfect for quickly presenting machine learning models and building simpler applications.

Gradio dashboard code

I’ll break the code into parts and explain each part as I go.

We first need to import the required external library and then load the entire dataset from the CSV file into the PANDAS DataFrame.

import gradio as gr
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import warnings
import os
import tempfile
from cachetools import cached, TTLCache

warnings.filterwarnings("ignore", category=FutureWarning, module="seaborn")

# ------------------------------------------------------------------
# 1) Load CSV data once
# ------------------------------------------------------------------
csv_data = None

def load_csv_data():
    global csv_data
    
    # Optional: specify column dtypes if known; adjust as necessary
    dtype_dict = {
        "order_id": "Int64",
        "customer_id": "Int64",
        "product_id": "Int64",
        "quantity": "Int64",
        "price": "float",
        "total": "float",
        "customer_name": "string",
        "product_names": "string",
        "categories": "string"
    }
    
    csv_data = pd.read_csv(
        "d:/sales_data/sales_data.csv",
        parse_dates=["order_date"],
        dayfirst=True,      # if your dates are DD/MM/YYYY format
        low_memory=False,
        dtype=dtype_dict
    )

load_csv_data()

Next, we configured a maximum of 128 projects and an expiration time of 300 seconds. This is used to store the results of expensive function calls and speed up duplicate searches

this get_unique_categories The function returns a list of uniquely clean (caps) categories of “csv_data” dataframe, thus caches the results for faster access.

this get_date_range The function returns the minimum and maximum order dates of the data set, or if the data is not available, it does not need to return.

this filter_data The function filters the CSV_DATA data frame based on the specified date range and optional categories, thus returning the filtered data frame.

this get_dashboard_stats Feature Retrieval Summary Metrics – Total Revenue, Total Order, Average Order Value and Highest Category – Given Filter. It is used internally filter_data() To range the dataset, these key statistics are then calculated.

this get_data_for_table fUNICTER returns a detailed data framework for filtering sales data, from order_id and order_dateincluding additional revenue per sales.

this get_plot_data Function formats data to generate graphs by summarizing revenues into time over time, grouping by date.

this get_revenue_by_category Feature summary and return revenue by category, sorted by revenue within specified date range and category.

this get_top_products Features return to the top ten products by revenue, filtered by date range and category.

Based on direction parameters, create_matplotlib_figure The function generates a bar chart from the data and saves it as a vertical or horizontal image file.

cache = TTLCache(maxsize=128, ttl=300)

@cached(cache)
def get_unique_categories():
    global csv_data
    if csv_data is None:
        return []
    cats = sorted(csv_data['categories'].dropna().unique().tolist())
    cats = [cat.capitalize() for cat in cats]
    return cats

def get_date_range():
    global csv_data
    if csv_data is None or csv_data.empty:
        return None, None
    return csv_data['order_date'].min(), csv_data['order_date'].max()

def filter_data(start_date, end_date, category):
    global csv_data

    if isinstance(start_date, str):
        start_date = datetime.datetime.strptime(start_date, '%Y-%m-%d').date()
    if isinstance(end_date, str):
        end_date = datetime.datetime.strptime(end_date, '%Y-%m-%d').date()

    df = csv_data.loc[
        (csv_data['order_date'] >= pd.to_datetime(start_date)) &
        (csv_data['order_date'] 

this update_dashboard Function to retrieve key sales statistics (total revenue, total order, average order value and top category) by calling the phoneget_dashboard_stats Function. It collects data from three different visualizations (income, income in categories and highest products over time) and then uses create_matplotlib_figure Generate a graph. It prepares and returns the data table (by get_data_for_table() function) and all generated graphs and statistics, so they can be displayed in the dashboard.

this create_dashboard Function sets date boundaries (minimum and maximum dates) and creates the initial default filter value. It uses Gradio to build a user interface (UI) with date picker, category drop-down list, key metric display, drawing tabs, and data tables. Then pin it to the filter so that changing any of them will trigger the call update_dashboard Features to ensure that the dashboard’s visuals and metrics are always in sync with the selected filter. Finally, it returns the assembly-level interface launched as a web application.

def update_dashboard(start_date, end_date, category):
    total_revenue, total_orders, avg_order_value, top_category = get_dashboard_stats(start_date, end_date, category)

    # Generate plots
    revenue_data = get_plot_data(start_date, end_date, category)
    category_data = get_revenue_by_category(start_date, end_date, category)
    top_products_data = get_top_products(start_date, end_date, category)

    revenue_over_time_path = create_matplotlib_figure(
        revenue_data, 'date', 'revenue',
        "Revenue Over Time", "Date", "Revenue"
    )
    revenue_by_category_path = create_matplotlib_figure(
        category_data, 'categories', 'revenue',
        "Revenue by Category", "Category", "Revenue"
    )
    top_products_path = create_matplotlib_figure(
        top_products_data, 'product_names', 'revenue',
        "Top Products", "Revenue", "Product Name", orientation='h'
    )

    # Data table
    table_data = get_data_for_table(start_date, end_date, category)

    return (
        revenue_over_time_path,
        revenue_by_category_path,
        top_products_path,
        table_data,
        total_revenue,
        total_orders,
        avg_order_value,
        top_category
    )

def create_dashboard():
    min_date, max_date = get_date_range()
    if min_date is None or max_date is None:
        min_date = datetime.datetime.now()
        max_date = datetime.datetime.now()

    default_start_date = min_date
    default_end_date = max_date

    with gr.Blocks(css="""
        footer {display: none !important;}
        .tabs {border: none !important;}  
        .gr-plot {border: none !important; box-shadow: none !important;}
    """) as dashboard:
        
        gr.Markdown("# Sales Performance Dashboard")

        # Filters row
        with gr.Row():
            start_date = gr.DateTime(
                label="Start Date",
                value=default_start_date.strftime('%Y-%m-%d'),
                include_time=False,
                type="datetime"
            )
            end_date = gr.DateTime(
                label="End Date",
                value=default_end_date.strftime('%Y-%m-%d'),
                include_time=False,
                type="datetime"
            )
            category_filter = gr.Dropdown(
                choices=["All Categories"] + get_unique_categories(),
                label="Category",
                value="All Categories"
            )

        gr.Markdown("# Key Metrics")

        # Stats row
        with gr.Row():
            total_revenue = gr.Number(label="Total Revenue", value=0)
            total_orders = gr.Number(label="Total Orders", value=0)
            avg_order_value = gr.Number(label="Average Order Value", value=0)
            top_category = gr.Textbox(label="Top Category", value="N/A")

        gr.Markdown("# Visualisations")
        # Tabs for Plots
        with gr.Tabs():
            with gr.Tab("Revenue Over Time"):
                revenue_over_time_image = gr.Image(label="Revenue Over Time", container=False)
            with gr.Tab("Revenue by Category"):
                revenue_by_category_image = gr.Image(label="Revenue by Category", container=False)
            with gr.Tab("Top Products"):
                top_products_image = gr.Image(label="Top Products", container=False)

        gr.Markdown("# Raw Data")
        # Data Table (below the plots)
        data_table = gr.DataFrame(
            label="Sales Data",
            type="pandas",
            interactive=False
        )

        # When filters change, update everything
        for f in [start_date, end_date, category_filter]:
            f.change(
                fn=lambda s, e, c: update_dashboard(s, e, c),
                inputs=[start_date, end_date, category_filter],
                outputs=[
                    revenue_over_time_image, 
                    revenue_by_category_image, 
                    top_products_image,
                    data_table,
                    total_revenue, 
                    total_orders,
                    avg_order_value, 
                    top_category
                ]
            )

        # Initial load
        dashboard.load(
            fn=lambda: update_dashboard(default_start_date, default_end_date, "All Categories"),
            outputs=[
                revenue_over_time_image, 
                revenue_by_category_image, 
                top_products_image,
                data_table,
                total_revenue, 
                total_orders,
                avg_order_value, 
                top_category
            ]
        )

    return dashboard

if __name__ == "__main__":
    dashboard = create_dashboard()
    dashboard.launch(share=False)

Run the program

Create a python file, such as gradio_test.py, and insert all the above code snippets. Save and run like this,

(gradio_dashboard) $ python gradio_test.py

* Running on local URL:  

To create a public link, set `share=True` in `launch()`.

Click the local URL displayed and the dashboard will open full screen in your browser.

Summary

This article provides a comprehensive guide to building interactive sales performance dashboards using Gradio and CSV files as their source data.

Gradio is a modern Python-based open source framework that simplifies the creation of data-driven dashboards and GUI applications. The dashboard I developed allows users to filter data by date range and product category, view key metrics such as total revenue and best performing categories, explore visualizations such as revenue trends and best products, and browse raw data with paging.

I also mentioned some key differences between using Gradio and Seamlit to develop visualization tools (another popular front-end Python library).

This guide provides a comprehensive implementation of the Gradio data dashboard, from creating sample data to developing Python capabilities for querying data, generating graphs, and processing user input. This step-by-step approach demonstrates how to leverage Gradio’s ability to create user-friendly and dynamic dashboards, perfect for data engineers and scientists who want to build interactive data applications.

Although I use CSV files for data, it should be simple to modify the code to use other data sources such as relational database management systems (RDBMS) like SQLITE. For example, in another post in this series about creating a dashboard-like using Sparlit, the data source is a PostgreSQL database.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button