Master SQL window functions | Going to Data Science

0 0 5 minutes read

Master SQL window functions | Going to Data Science

In my work, I write countless SQL queries to extract insights from the data. This is always a challenging task, as writing effective queries is not only important, but also simple enough to sustain over time.

With each new question coming, a new course, I’ve been working on SQL windowing features lately. These powerful tools are very useful when you need to perform calculations on a set of rows No loss of granularity of a single record.

In this article, I will gradually break down the SQL window functionality. At first they may seem complex or unintuitive, but once you understand how they work, you will see their essentials. Are you ready? Let’s dive in and master them together!

Table of contents

Why do we need window function?
Syntax of window functions
Four simple examples

Why do we need window function?

To understand the functionality of window functions, let’s start with a simple example. Imagine we have a table of six orders from an e-commerce website. Each line includes the order ID, date, product, its brand and price.

Illustrations of the author. The example table illustrates the function of the window function.

Suppose we want to calculate the total price for each brand. use Group For terms, we can write a query like this:

SELECT 
      brand, 
      SUM(price) as total_price 
FROM Orders 
GROUP BY brand

This returns the result, each row represents the total price of one brand and all orders placed by that brand.

|brand  |total_price|
|-------|-----------|
|carpisa|30         |
|nike   |175        |
|parfois|25         |
|zara   |65         |

This aggregation removes the details of a single order, as the output contains only one line of the brand. What if we want to keep all the original rows and add the total price for each brand as an extra field?

By using SUM(price) OVER (PARTITION BY brand)we can calculate the total price of each brand OK not to fall:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders

We got the following results:

|order_id|date      |product|brand  |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|6       |2025/05/01|bag    |carpisa|30   |30         |
|1       |2024/02/01|shoes  |nike   |90   |175        |
|3       |2024/06/01|shoes  |nike   |85   |175        |
|5       |2025/04/01|bag    |parfois|25   |25         |
|2       |2024/05/01|dress  |zara   |50   |65         |
|4       |2025/01/01|t-shirt|zara   |15   |65         |

The query returns all six rows, keeps each individual order, and adds a new column to display the total price for each brand. For example, the orders with the brand Carpisa show a total of 30, as it is the only Carpisa order, two orders for Nike Show 175 (90+85), and so on.

You may notice that the table is no longer ordered by order_id. This is because unless explicitly specified, window functions are divided by branding, and SQL cannot guarantee orders. To restore the original order, we just need to add one ORDER BY Terms:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
ORDER BY order_id

Finally, we have the output with all the required details:

|order_id|date      |product|brand  |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|1       |2024/02/01|shoes  |nike   |90   |175        |
|2       |2024/05/01|dress  |zara   |50   |65         |
|3       |2024/06/01|shoes  |nike   |85   |175        |
|4       |2025/01/01|t-shirt|zara   |15   |65         |
|5       |2025/04/01|bag    |parfois|25   |25         |
|6       |2025/05/01|bag    |carpisa|30   |30         |

Now, we have added the GROUP BYwhile retaining all personal order details.

Syntax of window functions

Typically, window functions have a syntax like this:

f(col2) OVER(
[PARTITION BY col1] 
[ORDER BY col3]
)

Let’s break it down. f(col2) is what you want to do, such as sum, count, and ranking. OVER A clause defines a subset of the rows running on a “window” or window function. PARTITION BY col1 Group the data, then ORDER BY col1 Determines the order of rows in each partition.

In addition, window functions are divided into three main categories:

Aggregation function:COUNT, , , , , SUM, , , , , AVG, , , , , MINand MAX
Level function: ROW_NUMBER, , , , , RANK, , , , , DENSE_RANK, , , , , CUME_DIST, , , , , PERCENT_RANKandNTILE
Value function: LEAD, , , , , LAG, , , , , FIRST_VALUE and LAST_VALUE

Four simple examples

Let’s show different examples to the main window function.

Example 1: Simple Window Function

To understand the concept of window functionality, let’s start with a straightforward example. Suppose we want to calculate the total price of all orders in the table. use GROUP BY The terms will provide us with a value: 295. However, this will crash the line and lose the details of a single order. Instead, if we want to display the total price next to each record, we can use a window function like this:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    SUM(price) OVER () as tot_price
FROM Orders

Here is the output:

|order_id|date      |product|brand  |price|tot_price|
|--------|----------|-------|-------|-----|---------|
|1       |2024-02-01|shoes  |nike   |90   |295      |
|2       |2024-05-01|dress  |zara   |50   |295      |
|3       |2024-06-01|shoes  |nike   |85   |295      |
|4       |2025-01-01|t-shirt|zara   |15   |295      |
|5       |2025-04-01|bag    |parfois|25   |295      |
|6       |2025-05-01|bag    |carpisa|30   |295      |

This way, we get the sum of all prices in the entire dataset and repeat it for each row.

Example 2: Partition by clause

Now, let’s calculate the average price per year while still retaining all the details. We can use PARTITION BY The clauses in the window function are grouped by year and the average value in each group is calculated:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    round(AVG(price) OVER (PARTITION BY YEAR(date) as avg_price
FROM Orders

Here is what the output looks like:

|order_id|date      |product|brand  |price|avg_price|
|--------|----------|-------|-------|-----|---------|
|1       |2024-02-01|shoes  |nike   |90   |75       |
|2       |2024-05-01|dress  |zara   |50   |75       |
|3       |2024-06-01|shoes  |nike   |85   |75       |
|4       |2025-01-01|t-shirt|zara   |15   |23.33    |
|5       |2025-04-01|bag    |parfois|25   |23.33    |
|6       |2025-05-01|bag    |carpisa|30   |23.33    |

That’s awesome! We see the average price for each row.

Example 3: Order by clause

One of the best ways to understand how to order in window functions is to apply a ranking Function. Suppose we want to rank all orders Highest to lowest price. This is what we can use RANK() Function:

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    RANK() OVER (ORDER BY price DESC) as Rank
FROM Orders

We get an output like this:

|order_id|date      |product|brand  |price|Rank|
|--------|----------|-------|-------|-----|----|
|1       |2024-02-01|shoes  |nike   |90   |1   |
|3       |2024-06-01|shoes  |nike   |85   |2   |
|2       |2024-05-01|dress  |zara   |50   |3   |
|6       |2025-05-01|bag    |carpisa|30   |4   |
|5       |2025-04-01|bag    |parfois|25   |5   |
|4       |2025-01-01|t-shirt|zara   |15   |6   |

As shown in the figure, the most expensive orders get Rank 1, while the rest get in descending order.

Example 4: Combining partitions and groups by terms

In the previous example, we ranked the lowest price of all orders from the entire dataset. But what if we want to restart our rankings every year? We can add PARTITION BY Clauses in window functions. This allows the data to be divided into separate groups for one year and the orders are classified from the highest price to the lowest.

SELECT 
    order_id,
    date,
    product,
    brand,
    price,
    RANK() OVER (PARTITION BY YEAR(date) ORDER BY price DESC) as Rank
FROM Orders

The result should look like this:

|order_id|date      |product|brand  |price|Rank|
|--------|----------|-------|-------|-----|----|
|1       |2024-02-01|shoes  |nike   |90   |1   |
|3       |2024-06-01|shoes  |nike   |85   |2   |
|2       |2024-05-01|dress  |zara   |50   |3   |
|6       |2025-05-01|bag    |carpisa|30   |1   |
|5       |2025-04-01|bag    |parfois|25   |2   |
|4       |2025-01-01|t-shirt|zara   |15   |3   |

Now, as we decided, the rankings restarted every year.

Final Thought:

I hope this guide will help you clearly introduce SQL windowing capabilities. At first, they may feel a little unintuitive, but once you compare them side by side GROUP BY The terms, the value they bring becomes easier to understand.

In my own experience, window functions are very powerful for extracting insights without losing row-level details, which is hidden by traditional summaries. They are very useful when extracting metrics such as totals, rankings, year-on-year or month comparisons.

However, there are some limitations. Windowing capabilities can be computationally expensive, especially on large datasets or complex partitions. It is important to evaluate whether the additional flexibility can justify performance tradeoffs in your specific use case.

Thank you for reading! I wish you a happy day!

Useful resources:

liralbes 4 days ago

0 0 5 minutes read