Master SQL window functions | Going to Data Science

In my work, I write countless SQL queries to extract insights from the data. This is always a challenging task, as writing effective queries is not only important, but also simple enough to sustain over time.
With each new question coming, a new course, I’ve been working on SQL windowing features lately. These powerful tools are very useful when you need to perform calculations on a set of rows No loss of granularity of a single record.
In this article, I will gradually break down the SQL window functionality. At first they may seem complex or unintuitive, but once you understand how they work, you will see their essentials. Are you ready? Let’s dive in and master them together!
Table of contents
- Why do we need window function?
- Syntax of window functions
- Four simple examples
Why do we need window function?
To understand the functionality of window functions, let’s start with a simple example. Imagine we have a table of six orders from an e-commerce website. Each line includes the order ID, date, product, its brand and price.
Suppose we want to calculate the total price for each brand. use Group For terms, we can write a query like this:
SELECT
brand,
SUM(price) as total_price
FROM Orders
GROUP BY brand
This returns the result, each row represents the total price of one brand and all orders placed by that brand.
|brand |total_price|
|-------|-----------|
|carpisa|30 |
|nike |175 |
|parfois|25 |
|zara |65 |
This aggregation removes the details of a single order, as the output contains only one line of the brand. What if we want to keep all the original rows and add the total price for each brand as an extra field?
By using SUM(price) OVER (PARTITION BY brand)
we can calculate the total price of each brand OK not to fall:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
We got the following results:
|order_id|date |product|brand |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|6 |2025/05/01|bag |carpisa|30 |30 |
|1 |2024/02/01|shoes |nike |90 |175 |
|3 |2024/06/01|shoes |nike |85 |175 |
|5 |2025/04/01|bag |parfois|25 |25 |
|2 |2024/05/01|dress |zara |50 |65 |
|4 |2025/01/01|t-shirt|zara |15 |65 |
The query returns all six rows, keeps each individual order, and adds a new column to display the total price for each brand. For example, the orders with the brand Carpisa show a total of 30, as it is the only Carpisa order, two orders for Nike Show 175 (90+85), and so on.
You may notice that the table is no longer ordered by order_id. This is because unless explicitly specified, window functions are divided by branding, and SQL cannot guarantee orders. To restore the original order, we just need to add one ORDER BY
Terms:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER (PARTITION BY brand) as total_price
FROM Orders
ORDER BY order_id
Finally, we have the output with all the required details:
|order_id|date |product|brand |price|total_price|
|--------|----------|-------|-------|-----|-----------|
|1 |2024/02/01|shoes |nike |90 |175 |
|2 |2024/05/01|dress |zara |50 |65 |
|3 |2024/06/01|shoes |nike |85 |175 |
|4 |2025/01/01|t-shirt|zara |15 |65 |
|5 |2025/04/01|bag |parfois|25 |25 |
|6 |2025/05/01|bag |carpisa|30 |30 |
Now, we have added the GROUP BY
while retaining all personal order details.
Syntax of window functions
Typically, window functions have a syntax like this:
f(col2) OVER(
[PARTITION BY col1]
[ORDER BY col3]
)
Let’s break it down. f(col2)
is what you want to do, such as sum, count, and ranking. OVER
A clause defines a subset of the rows running on a “window” or window function. PARTITION BY col1
Group the data, then ORDER BY col1
Determines the order of rows in each partition.
In addition, window functions are divided into three main categories:
- Aggregation function:
COUNT
, , , , ,SUM
, , , , ,AVG
, , , , ,MIN
andMAX
- Level function:
ROW_NUMBER
, , , , ,RANK
, , , , ,DENSE_RANK
, , , , ,CUME_DIST
, , , , ,PERCENT_RANK
andNTILE
- Value function:
LEAD
, , , , ,LAG
, , , , ,FIRST_VALUE
andLAST_VALUE
Four simple examples
Let’s show different examples to the main window function.
Example 1: Simple Window Function
To understand the concept of window functionality, let’s start with a straightforward example. Suppose we want to calculate the total price of all orders in the table. use GROUP BY
The terms will provide us with a value: 295
. However, this will crash the line and lose the details of a single order. Instead, if we want to display the total price next to each record, we can use a window function like this:
SELECT
order_id,
date,
product,
brand,
price,
SUM(price) OVER () as tot_price
FROM Orders
Here is the output:
|order_id|date |product|brand |price|tot_price|
|--------|----------|-------|-------|-----|---------|
|1 |2024-02-01|shoes |nike |90 |295 |
|2 |2024-05-01|dress |zara |50 |295 |
|3 |2024-06-01|shoes |nike |85 |295 |
|4 |2025-01-01|t-shirt|zara |15 |295 |
|5 |2025-04-01|bag |parfois|25 |295 |
|6 |2025-05-01|bag |carpisa|30 |295 |
This way, we get the sum of all prices in the entire dataset and repeat it for each row.
Example 2: Partition by clause
Now, let’s calculate the average price per year while still retaining all the details. We can use PARTITION BY
The clauses in the window function are grouped by year and the average value in each group is calculated:
SELECT
order_id,
date,
product,
brand,
price,
round(AVG(price) OVER (PARTITION BY YEAR(date) as avg_price
FROM Orders
Here is what the output looks like:
|order_id|date |product|brand |price|avg_price|
|--------|----------|-------|-------|-----|---------|
|1 |2024-02-01|shoes |nike |90 |75 |
|2 |2024-05-01|dress |zara |50 |75 |
|3 |2024-06-01|shoes |nike |85 |75 |
|4 |2025-01-01|t-shirt|zara |15 |23.33 |
|5 |2025-04-01|bag |parfois|25 |23.33 |
|6 |2025-05-01|bag |carpisa|30 |23.33 |
That’s awesome! We see the average price for each row.
Example 3: Order by clause
One of the best ways to understand how to order in window functions is to apply a ranking Function. Suppose we want to rank all orders Highest to lowest price. This is what we can use RANK()
Function:
SELECT
order_id,
date,
product,
brand,
price,
RANK() OVER (ORDER BY price DESC) as Rank
FROM Orders
We get an output like this:
|order_id|date |product|brand |price|Rank|
|--------|----------|-------|-------|-----|----|
|1 |2024-02-01|shoes |nike |90 |1 |
|3 |2024-06-01|shoes |nike |85 |2 |
|2 |2024-05-01|dress |zara |50 |3 |
|6 |2025-05-01|bag |carpisa|30 |4 |
|5 |2025-04-01|bag |parfois|25 |5 |
|4 |2025-01-01|t-shirt|zara |15 |6 |
As shown in the figure, the most expensive orders get Rank 1, while the rest get in descending order.
Example 4: Combining partitions and groups by terms
In the previous example, we ranked the lowest price of all orders from the entire dataset. But what if we want to restart our rankings every year? We can add PARTITION BY
Clauses in window functions. This allows the data to be divided into separate groups for one year and the orders are classified from the highest price to the lowest.
SELECT
order_id,
date,
product,
brand,
price,
RANK() OVER (PARTITION BY YEAR(date) ORDER BY price DESC) as Rank
FROM Orders
The result should look like this:
|order_id|date |product|brand |price|Rank|
|--------|----------|-------|-------|-----|----|
|1 |2024-02-01|shoes |nike |90 |1 |
|3 |2024-06-01|shoes |nike |85 |2 |
|2 |2024-05-01|dress |zara |50 |3 |
|6 |2025-05-01|bag |carpisa|30 |1 |
|5 |2025-04-01|bag |parfois|25 |2 |
|4 |2025-01-01|t-shirt|zara |15 |3 |
Now, as we decided, the rankings restarted every year.
Final Thought:
I hope this guide will help you clearly introduce SQL windowing capabilities. At first, they may feel a little unintuitive, but once you compare them side by side GROUP BY
The terms, the value they bring becomes easier to understand.
In my own experience, window functions are very powerful for extracting insights without losing row-level details, which is hidden by traditional summaries. They are very useful when extracting metrics such as totals, rankings, year-on-year or month comparisons.
However, there are some limitations. Windowing capabilities can be computationally expensive, especially on large datasets or complex partitions. It is important to evaluate whether the additional flexibility can justify performance tradeoffs in your specific use case.
Thank you for reading! I wish you a happy day!
Useful resources: