From data to stories: Code proxy for KPI narrative

0 0 23 minutes read

From data to stories: Code proxy for KPI narrative

we often need to investigate the situation of KPIs: whether we are reacting to abnormalities on the dashboard or usually doing digital updates. Based on my years of experience as a KPI analyst, I estimate that more than 80% of these tasks are fairly standard and can be solved by following a simple checklist.

This is an advanced plan to investigate changes to KPIs (You can find more details in the article “Root Cause Analysis 101”):

Estimate the first-line changes in the metric system Understand the extent of the transfer.
Check the quality of data Make sure the numbers are accurate and reliable.
Collect context Regarding internal and external events that may affect changes.
Slices and dice Determine which segments help the indicators change.
Consolidate your findings in the executive summary This includes assumptions and estimates of the impact on major KPIs.

Since we have a plan to execute explicitly, we can use AI agents to automate such tasks. The code proxy we discussed recently may be very suitable because their ability to write and execute code will help them analyze data with minimal back and forth. So let’s try to build such a proxy using the HuggingFace Smolagents framework.

As we complete our tasks, we will discuss more advanced features of the Smolagents framework:

Adjust the techniques of various tips to ensure the desired behavior.
Build a multi-proxy system that explains KPIs and links them to the root cause.
Increase reflection on the process through supplemental planning steps.

MVP explains KPI changes

As usual, we will take an iterative approach and start with a simple MVP with a focus on the slicing and decomposition steps of the analysis. We will analyze changes in simple measures (income) distributions by one dimension (country). We will use the dataset from my previous post, “Making meaningful for KPI changes.”

Let’s load the data first.

raw_df = pd.read_csv('absolute_metrics_example.csv', sep = 't')
df = raw_df.groupby('country')[['revenue_before', 'revenue_after_scenario_2']].sum()
  .sort_values('revenue_before', ascending = False).rename(
    columns = {'revenue_after_scenario_2': 'after', 
      'revenue_before': 'before'})

Image of the author

Next, let’s initialize the model. I chose OpenAI GPT-4O-Mini as my preferred option for simple tasks. However, the Smolagents framework supports various models, so you can use your favorite models. Then we just need to create a proxy and provide it with the tasks and datasets.

from smolagents import CodeAgent, LiteLLMModel

model = LiteLLMModel(model_id="openai/gpt-4o-mini", 
  api_key=config['OPENAI_API_KEY']) 

agent = CodeAgent(
    model=model, tools=[], max_steps=10,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", 
      "plotly.*"], verbosity_level=1 
)

task = """
Here is a dataframe showing revenue by segment, comparing values 
before and after.
Could you please help me understand the changes? Specifically:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total 
change in revenue.

Please round all floating-point numbers in the output 
to two decimal places.
"""

agent.run(
    task,
    additional_args={"data": df},
)

The proxy returned fairly reasonable results. We obtained detailed statistics on the metric changes in each segment and their impact on top-line KPIs.

{'total_before': 1731985.21, 'total_after': 
1599065.55, 'total_change': -132919.66, 'segment_changes': 
{'absolute_change': {'other': 4233.09, 'UK': -4376.25, 'France': 
-132847.57, 'Germany': -690.99, 'Italy': 979.15, 'Spain': 
-217.09}, 'percentage_change': {'other': 0.67, 'UK': -0.91, 
'France': -55.19, 'Germany': -0.43, 'Italy': 0.81, 'Spain': 
-0.23}, 'contribution_to_change': {'other': -3.18, 'UK': 3.29, 
'France': 99.95, 'Germany': 0.52, 'Italy': -0.74, 'Spain': 0.16}}}

Let’s take a look at the code generated by the proxy. Very good, but there is a potential problem. LLM recreates the data frame based on the input data, rather than referencing it directly. This approach is not ideal (especially when using large datasets) as it can lead to errors and higher markup usage.

import pandas as pd                                                                                                        
 
# Creating the DataFrame from the provided data                 
data = {                                                        
    'before': [632767.39, 481409.27, 240704.63, 160469.75,      
120352.31, 96281.86],                                           
    'after': [637000.48, 477033.02, 107857.06, 159778.76,       
121331.46, 96064.77]                                            
}                                                               
index = ['other', 'UK', 'France', 'Germany', 'Italy', 'Spain']  
df = pd.DataFrame(data, index=index)                            
                                                                
# Calculating total revenue before and after                    
total_before = df['before'].sum()                               
total_after = df['after'].sum()                                 
                                                                
# Calculating absolute and percentage change for each segment   
df['absolute_change'] = df['after'] - df['before']              
df['percentage_change'] = (df['absolute_change'] /              
df['before']) * 100                                             
                                                                
# Calculating total revenue change                              
total_change = total_after - total_before                       
                                                                
# Calculating contribution of each segment to the total change  
df['contribution_to_change'] = (df['absolute_change'] /         
total_change) * 100                                             
                                                                
# Rounding results                                              
df = df.round(2)                                                
                                                                
# Printing the calculated results                               
print("Total revenue before:", total_before)                    
print("Total revenue after:", total_after)                      
print("Total change in revenue:", total_change)                 
print(df)

It is worth solving this problem before continuing to build more complex systems.

Adjustment tips

Since LLM only follows the instructions given, we will fix this by tweaking the prompts.

Initially, I tried to make the task hint more explicit, clearly instructing the LLM to use the provided variables.

task = """Here is a dataframe showing revenue by segment, comparing 
values before and after. The data is stored in df variable. 
Please, use it and don't try to parse the data yourself. 

Could you please help me understand the changes?
Specifically:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total change in revenue.

Please round all floating-point numbers in the output to two decimal places.
"""

It doesn’t work. So the next step is to check the system prompts to see why it works this way.

print(agent.prompt_templates['system_prompt'])

#... 
# Here are the rules you should always follow to solve your task:
# 1. Always provide a 'Thought:' sequence, and a 'Code:n```py' sequence ending with '```' sequence, else you will fail.
# 2. Use only variables that you have defined.
# 3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'.
# 4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.
# 5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.
# 6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'.
# 7. Never create any notional variables in our code, as having these in your logs will derail you from the true variables.
# 8. You can use imports in your code, but only from the following list of modules: ['collections', 'datetime', 'itertools', 'math', 'numpy', 'pandas', 'queue', 'random', 're', 'stat', 'statistics', 'time', 'unicodedata']
# 9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist.
# 10. Don't give up! You're in charge of solving the task, not providing directions to solve it.
# Now Begin!

At the end of the prompt, we have instructions "# 2. Use only variables that you have defined!". This may be interpreted as a strict rule that does not use any other variables. So, I changed it to "# 2. Use only variables that you have defined or ones provided in additional arguments! Never try to copy and parse additional arguments."

modified_system_prompt = agent.prompt_templates['system_prompt']
    .replace(
        '2. Use only variables that you have defined!', 
        '2. Use only variables that you have defined or ones provided in additional arguments! Never try to copy and parse additional arguments.'
    )
agent.prompt_templates['system_prompt'] = modified_system_prompt

This change alone won’t help either. Then, I checked the task message.

╭─────────────────────────── New run ────────────────────────────╮
│                                                                │
│ Here is a pandas dataframe showing revenue by segment,         │
│ comparing values before and after.                             │
│ Could you please help me understand the changes?               │
│ Specifically:                                                  │
│ 1. Estimate how the total revenue and the revenue for each     │
│ segment have changed, both in absolute terms and as a          │
│ percentage.                                                    │
│ 2. Calculate the contribution of each segment to the total     │
│ change in revenue.                                             │
│                                                                │
│ Please round all floating-point numbers in the output to two   │
│ decimal places.                                                │
│                                                                │
│ You have been provided with these additional arguments, that   │
│ you can access using the keys as variables in your python      │
│ code:                                                          │
│ {'df':             before      after                           │
│ country                                                        │
│ other    632767.39  637000.48                                  │
│ UK       481409.27  477033.02                                  │
│ France   240704.63  107857.06                                  │
│ Germany  160469.75  159778.76                                  │
│ Italy    120352.31  121331.46                                  │
│ Spain     96281.86   96064.77}.                                │
│                                                                │
╰─ LiteLLMModel - openai/gpt-4o-mini ────────────────────────────╯

Its instructions are related to the use of other parameters "You have been provided with these additional arguments, that you can access using the keys as variables in your python code". We can try to make it more specific and clear. Unfortunately, this parameter is not exposed externally, so I have to find it in the source code. To find the path to the Python package, we can use the following code.

import smolagents 
print(smolagents.__path__)

Then, I found it agents.py File and modify this line to include more specific instructions.

self.task += f"""
You have been provided with these additional arguments available as variables 
with names {",".join(additional_args.keys())}. You can access them directly. 
Here is what they contain (just for informational purposes): 
{str(additional_args)}."""

It’s a little hacky, but that’s sometimes what’s happening with the LLM framework. After that, don’t forget to reload the package, we’re happy to go. Let’s test if it works now.

task = """
Here is a pandas dataframe showing revenue by segment, comparing values 
before and after. 

Your task will be understand the changes to the revenue (after vs before) 
in different segments and provide executive summary.
Please, follow the following steps:
1. Estimate how the total revenue and the revenue for each segment 
have changed, both in absolute terms and as a percentage.
2. Calculate the contribution of each segment to the total change 
in revenue.

Round all floating-point numbers in the output to two decimal places. 
"""
agent.logger.level = 1 # Lower verbosity level
agent.run(
    task,
    additional_args={"df": df},
)

Wow! The problem has been resolved. The proxy no longer copies input variables and references df Variables are direct. This is the newly generated code.

import pandas as pd                                             
                                                                  
# Calculate total revenue before and after                      
total_before = df['before'].sum()                               
total_after = df['after'].sum()                                 
total_change = total_after - total_before                       
percentage_change_total = (total_change / total_before * 100)   
if total_before != 0 else 0                                     
                                                                
# Round values                                                  
total_before = round(total_before, 2)                           
total_after = round(total_after, 2)                             
total_change = round(total_change, 2)                           
percentage_change_total = round(percentage_change_total, 2)     
                                                                
# Display results                                               
print(f"Total Revenue Before: {total_before}")                  
print(f"Total Revenue After: {total_after}")                    
print(f"Total Change: {total_change}")                          
print(f"Percentage Change: {percentage_change_total}%")

Now we are ready to continue to build the actual agent that can solve our tasks.

AI Agent for KPI Narrative

Finally, it’s time to work on an AI proxy, which will help us explain changes in KPIs and create an executive summary.

Our agents will follow this program for root cause analysis:

The highest KPI change is estimated.
Cut into thin slices and slice the indicators to see which fragments are driving the transfer.
Look for events in the change log to see if they can interpret metric changes.
Merge all findings from the Comprehensive Executive Summary.

After a lot of experiments and several adjustments, I achieved a promising result. Here are the key tweaks I made (we will discuss them in more detail later):

I’m using Multi-agent settings By adding another team member – the Change Log Agent, they can access the change log and assist in interpreting KPI changes.
I’ve tried More powerful model like gpt-4o and gpt-4.1-mini since gpt-4o-mini not enough. Using a more powerful model not only improves the results, but also greatly reduces the number of steps: With: gpt-4.1-miniI only got the final result after six steps and 14-16 steps gpt-4o-mini. This suggests that investing in more expensive models may be worthwhile for the agency workflow.
I provide it to the agent Complex tools Simple metrics for analyzing KPI changes. The tool performs all calculations, and LLM can interpret the results. I discussed in detail the method of KPI change analysis in the previous article.
I’ll re-enact the prompts to be a very A clear step-by-step guide Help agents stay on track.
I added Plan Steps This encourages the LLM agent to consider its approach first and revisit the plan every three iterations.

After all the adjustments, I got the following summary of the agent, which is very good.

Executive Summary:
Between April 2025 and May 2025, total revenue declined sharply by
approximately 36.03%, falling from 1,731,985.21 to 1,107,924.43, a
drop of -624,060.78 in absolute terms.
This decline was primarily driven by significant revenue 
reductions in the 'new' customer segments across multiple 
countries, with declines of approximately 70% in these segments.

The most impacted segments include:
- other_new: before=233,958.42, after=72,666.89, 
abs_change=-161,291.53, rel_change=-68.94%, share_before=13.51%, 
impact=25.85, impact_norm=1.91
- UK_new: before=128,324.22, after=34,838.87, 
abs_change=-93,485.35, rel_change=-72.85%, share_before=7.41%, 
impact=14.98, impact_norm=2.02
- France_new: before=57,901.91, after=17,443.06, 
abs_change=-40,458.85, rel_change=-69.87%, share_before=3.34%, 
impact=6.48, impact_norm=1.94
- Germany_new: before=48,105.83, after=13,678.94, 
abs_change=-34,426.89, rel_change=-71.56%, share_before=2.78%, 
impact=5.52, impact_norm=1.99
- Italy_new: before=36,941.57, after=11,615.29, 
abs_change=-25,326.28, rel_change=-68.56%, share_before=2.13%, 
impact=4.06, impact_norm=1.91
- Spain_new: before=32,394.10, after=7,758.90, 
abs_change=-24,635.20, rel_change=-76.05%, share_before=1.87%, 
impact=3.95, impact_norm=2.11

Based on analysis from the change log, the main causes for this 
trend are:
1. The introduction of new onboarding controls implemented on May 
8, 2025, which reduced new customer acquisition by about 70% to 
prevent fraud.
2. A postal service strike in the UK starting April 5, 2025, 
causing order delivery delays and increased cancellations 
impacting the UK new segment.
3. An increase in VAT by 2% in Spain as of April 22, 2025, 
affecting new customer pricing and causing higher cart 
abandonment.

These factors combined explain the outsized negative impacts 
observed in new customer segments and the overall revenue decline.

The LLM agent also generates a large number of descriptive charts (they are part of our growth explanation tool). For example, this shows the impact of a whole country and mature combination.

The results look really exciting. Now let’s take a deeper look at the actual implementation to understand how it works under the hood.

Multi-ai proxy settings

We will start with our changelog proxy. The proxy will query the change log and try to determine the potential root cause of the metric changes we observed. Since the proxy does not require complex operations, we implement it as a tool. Since the agent will be called by another agent, we need to define its name and description property.

@tool 
def get_change_log(month: str) -> str: 
    """
    Returns the change log (list of internal and external events that might have affected our KPIs) for the given month 

    Args:
        month: month in the format %Y-%m-01, for example, 2025-04-01
    """
    return events_df[events_df.month == month].drop('month', axis = 1).to_dict('records')

model = LiteLLMModel(model_id="openai/gpt-4.1-mini", api_key=config['OPENAI_API_KEY'])
change_log_agent = ToolCallingAgent(
    tools=[get_change_log],
    model=model,
    max_steps=10,
    name="change_log_agent",
    description="Helps you find the relevant information in the change log that can explain changes on metrics. Provide the agent with all the context to receive info",
)

Since the manager agent will call this agent, we will not be able to control the queries it receives. So I decided to modify the system prompts to include other contexts.

change_log_system_prompt = '''
You're a master of the change log and you help others to explain 
the changes to metrics. When you receive a request, look up the list of events 
happened by month, then filter the relevant information based 
on provided context and return back. Prioritise the most probable factors 
affecting the KPI and limit your answer only to them.
'''

modified_system_prompt = change_log_agent.prompt_templates['system_prompt'] 
  + 'nnn' + change_log_system_prompt

change_log_agent.prompt_templates['system_prompt'] = modified_system_prompt

To enable the primary agent to delegate tasks to the changelog agent, we just need to managed_agents site.

agent = CodeAgent(
    model=model,
    tools=[calculate_simple_growth_metrics],
    max_steps=20,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", "plotly.*"],
    verbosity_level = 2, 
    planning_interval = 3,
    managed_agents = [change_log_agent]
)

Let’s see how it works. First, we can check out the new system prompts for the main agent. Now, it includes information about team members and instructions on how to ask them for help.

You can also give tasks to team members.
Calling a team member works the same as for calling a tool: simply, 
the only argument you can give in the call is 'task'.
Given that this team member is a real human, you should be very verbose 
in your task, it should be a long string providing informations 
as detailed as necessary.
Here is a list of the team members that you can call:
```python
def change_log_agent("Your query goes here.") -> str:
    """Helps you find the relevant information in the change log that 
    can explain changes on metrics. Provide the agent with all the context 
    to receive info"""
```

The execution log shows that the primary agent successfully delegated the task to the second agent and received the following response.



─ Executing parsed code: ─────────────────────────────────────── 
  # Query change_log_agent with the detailed task description     
  prepared                                                        
  context_for_change_log = (                                      
      "We analyzed changes in revenue from April 2025 to May      
  2025. We found large decreases "                                
      "mainly in the 'new' maturity segments across countries:    
  Spain_new, UK_new, Germany_new, France_new, Italy_new, and      
  other_new. "                                                    
      "The revenue fell by around 70% in these segments, which    
  have outsized negative impact on total revenue change. "        
      "We want to know the 1-3 most probable causes for this      
  significant drop in revenue in the 'new' customer segments      
  during this period."                                            
  )                                                               
                                                                  
  explanation = change_log_agent(task=context_for_change_log)     
  print("Change log agent explanation:")                          
  print(explanation)                                              
 ──────────────────────────────────────────────────────────────── 


╭──────────────────── New run - change_log_agent ─────────────────────╮
│                                                                     │
│ You're a helpful agent named 'change_log_agent'.                    │
│ You have been submitted this task by your manager.                  │
│ ---                                                                 │
│ Task:                                                               │
│ We analyzed changes in revenue from April 2025 to May 2025.         │
│ We found large decreases mainly in the 'new' maturity segments      │
│ across countries: Spain_new, UK_new, Germany_new, France_new,       │
│ Italy_new, and other_new. The revenue fell by around 70% in these   │
│ segments, which have outsized negative impact on total revenue      │
│ change. We want to know the 1-3 most probable causes for this       │
│ significant drop in revenue in the 'new' customer segments during   │
│ this period.                                                        │
│ ---                                                                 │
│ You're helping your manager solve a wider task: so make sure to     │
│ not provide a one-line answer, but give as much information as      │
│ possible to give them a clear understanding of the answer.          │
│                                                                     │
│ Your final_answer WILL HAVE to contain these parts:                 │
│ ### 1. Task outcome (short version):                                │
│ ### 2. Task outcome (extremely detailed version):                   │
│ ### 3. Additional context (if relevant):                            │
│                                                                     │
│ Put all these in your final_answer tool, everything that you do     │
│ not pass as an argument to final_answer will be lost.               │
│ And even if your task resolution is not successful, please return   │
│ as much context as possible, so that your manager can act upon      │
│ this feedback.                                                      │
│                                                                     │
╰─ LiteLLMModel - openai/gpt-4.1-mini ────────────────────────────────╯

With the Smolagents framework, we can easily set up a simple multi-agent system where managers coordinate and delegate tasks to team members with specific skills.

Iterate on the prompt

We started with very advanced tips, outlining the goals and blurring directions, but unfortunately, it didn’t work continuously. LLM is not smart enough to figure out this method by itself. So I created a detailed step-by-step tip that describes the entire plan and includes detailed specifications of the growth narrative tools we use.

task = """
Here is a pandas dataframe showing the revenue by segment, comparing values 
before (April 2025) and after (May 2025). 

You're a senior and experienced data analyst. Your task will be to understand 
the changes to the revenue (after vs before) in different segments 
and provide executive summary.

## Follow the plan:
1. Start by udentifying the list of dimensions (columns in dataframe that 
are not "before" and "after")
2. There might be multiple dimensions in the dataframe. Start high-level 
by looking at each dimension in isolation, combine all results 
together into the list of segments analysed (don't forget to save 
the dimension used for each segment). 
Use the provided tools to analyse the changes of metrics: {tools_description}. 
3. Analyse the results from previous step and keep only segments 
that have outsized impact on the KPI change (absolute of impact_norm 
is above 1.25). 
4. Check what dimensions are present in the list of significant segment, 
if there are multiple ones - execute the tool on their combinations 
and add to the analysed segments. If after adding an additional dimension, 
all subsegments show close different_rate and impact_norm values, 
then we can exclude this split (even though impact_norm is above 1.25), 
since it doesn't explain anything. 
5. Summarise the significant changes you identified. 
6. Try to explain what is going on with metrics by getting info 
from the change_log_agent. Please, provide the agent the full context 
(what segments have outsized impact, what is the relative change and 
what is the period we're looking at). 
Summarise the information from the changelog and mention 
only 1-3 the most probable causes of the KPI change 
(starting from the most impactful one).
7. Put together 3-5 sentences commentary what happened high-level 
and why (based on the info received from the change log). 
Then follow it up with more detailed summary: 
- Top-line total value of metric before and after in human-readable format, 
absolute and relative change 
- List of segments that meaningfully influenced the metric positively 
or negatively with the following numbers: values before and after, 
absoltue and relative change, share of segment before, impact 
and normed impact. Order the segments by absolute value 
of absolute change since it represents the power of impact. 

## Instruction on the calculate_simple_growth_metrics tool:
By default, you should use the tool for the whole dataset not the segment, 
since it will give you the full information about the changes.

Here is the guidance how to interpret the output of the tool
- difference - the absolute difference between after and before values
- difference_rate - the relative difference (if it's close for 
  all segments then the dimension is not informative)
- impact - the share of KPI differnce explained by this segment 
- segment_share_before - share of segment before
- impact_norm - impact normed on the share of segments, we're interested 
  in very high or very low numbers since they show outsized impact, 
  rule of thumb - impact_norm between -1.25 and 1.25 is not-informative 

If you're using the tool on the subset of dataframe keep in mind, 
that the results won't be aplicable to the full dataset, so avoid using it 
unless you want to explicitly look at subset (i.e. change in France). 
If you decided to use the tool on a particular segment 
and share these results in the executive summary, explicitly outline 
that we're diving deeper into a particular segment.
""".format(tools_description = tools_description)
agent.run(
    task,
    additional_args={"df": df},
)

Detailing everything is a difficult task, but it is necessary if we want consistent results.

Plan Steps

The Smolagents framework allows you to add scheduled steps in the proxy process. This encourages agents to start with the plan and update after the specified number of steps. In my experience, this reflection is very helpful in keeping focused on issues and adjusting actions to stay aligned with the initial plan and goals. I definitely recommend using it when complex reasoning is needed.

Setting it as easy as specifying it planning_interval = 3 For code proxy.

agent = CodeAgent(
    model=model,
    tools=[calculate_simple_growth_metrics],
    max_steps=20,
    additional_authorized_imports=["pandas", "numpy", "matplotlib.*", "plotly.*"],
    verbosity_level = 2, 
    planning_interval = 3,
    managed_agents = [change_log_agent]
)

That’s it. The agent then starts with thinking about the initial plan to provide reflection.

────────────────────────── Initial plan ──────────────────────────
Here are the facts I know and the plan of action that I will 
follow to solve the task:
```
## 1. Facts survey

### 1.1. Facts given in the task
- We have a pandas dataframe `df` showing revenue by segment, for 
two time points: before (April 2025) and after (May 2025).
- The dataframe columns include:
  - Dimensions: `country`, `maturity`, `country_maturity`, 
`country_maturity_combined`
  - Metrics: `before` (revenue in April 2025), `after` (revenue in
May 2025)
- The task is to understand the changes in revenue (after vs 
before) across different segments.
- Key instructions and tools provided:
  - Identify all dimensions except before/after for segmentation.
  - Analyze each dimension independently using 
`calculate_simple_growth_metrics`.
  - Filter segments with outsized impact on KPI change (absolute 
normed impact > 1.25).
  - Examine combinations of dimensions if multiple dimensions have
significant segments.
  - Summarize significant changes and engage `change_log_agent` 
for contextual causes.
  - Provide a final executive summary including top-line changes 
and segment-level detailed impacts.
- Dataset snippet shows segments combining countries (`France`, 
`UK`, `Germany`, `Italy`, `Spain`, `other`) and maturity status 
(`new`, `existing`).
- The combined segments are uniquely identified in columns 
`country_maturity` and `country_maturity_combined`.

### 1.2. Facts to look up
- Definitions or descriptions of the segments if unclear (e.g., 
what defines `new` vs `existing` maturity).
  - Likely not mandatory to proceed, but could be requested from 
business documentation or change log.
- More details on the change log (accessible via 
`change_log_agent`) that could provide probable causes for revenue
changes.
- Confirmation on handling combined dimension splits - how exactly
`country_maturity_combined` is formed and should be interpreted in
combined dimension analysis.
- Data dictionary or description of metrics if any additional KPI 
besides revenue is relevant (unlikely given data).
- Dates confirm period of analysis: April 2025 (before) and May 
2025 (after). No need to look these up since given.

### 1.3. Facts to derive
- Identify all dimension columns available for segmentation:
  - By excluding 'before' and 'after', likely candidates are 
`country`, `maturity`, `country_maturity`, and 
`country_maturity_combined`.
- For each dimension, calculate change metrics using the given 
tool:
  - Absolute and relative difference in revenue per segment.
  - Impact, segment share before, and normed impact for each 
segment.
- Identify which segments have outsized impact on KPI change 
(|impact_norm| > 1.25).
- If multiple dimensions have significant segments, combine 
dimensions (e.g., country + maturity) and reanalyze.
- Determine if combined dimension splits provide meaningful 
differentiation or not, based on delta rate and impact_norm 
consistency.
- Summarize direction and magnitude of KPI changes at top-line 
level (aggregate revenue before and after).
- Identify top segments driving positive and negative changes 
based on ordered absolute absolute_change.
- Gather contextual insights from the change log agent regarding 
probable causes tied to significant segments and the May 2025 vs 
April 2025 period.

## 2. Plan

1. Identify all dimension columns present in the dataframe by 
listing columns and excluding 'before' and 'after'.
2. For each dimension identified (`country`, `maturity`, 
`country_maturity`, `country_maturity_combined`):
   - Use `calculate_simple_growth_metrics` on the full dataframe 
grouped by that dimension.
   - Extract segments with calculated metrics including 
impact_norm.
3. Aggregate results from all single-dimension analyses and filter
segments where |impact_norm| > 1.25.
4. Determine which dimensions these significant segments belong 
to.
5. If more than one dimension is represented in these significant 
segments, analyze the combined dimension formed by those 
dimensions (for example, combination of `country` and `maturity` 
or use existing combined dimension columns).
6. Repeat metric calculation using 
`calculate_simple_growth_metrics` on the combined dimension.
7. Examine if the combined dimension splits create meaningful 
differentiation - if all subsegments show close difference_rate 
and impact_norm, exclude the split.
8. Prepare a summary of significant changes:
   - Top-line KPIs before and after (absolute and relative 
changes).
   - List of impactful segments sorted by absolute absolute_change
that influenced overall revenue.
9. Provide the list of segments with details (values before, 
after, absolute and relative change, share before, impact, 
impact_norm).
10. Using this summarized information, query `change_log_agent` 
with full context:
    - Include significant segments, their relative changes, and 
periods (April to May 2025).
11. Process the agent's response to identify 1-3 main probable 
causes of the KPI changes.
12. Draft executive summary commentary:
    - High-level overview of what happened and why, based on log 
info.
    - Detailed summary including top-line changes and 
segment-level metrics impact.
13. Deliver the final answer using `final_answer` tool containing 
the above executive summary and data-driven insights.

Then, after every three steps, the agent revisits and updates the plan.

────────────────────────── Updated plan ──────────────────────────
I still need to solve the task I was given:
```

Here is a pandas dataframe showing the revenue by segment, 
comparing values before (April 2025) and after (May 2025). 

You're a senior and experienced data analyst. Your task will be 
understand the changes to the revenue (after vs before) in 
different segments 
and provide executive summary.

<... repeating="" the="" full="" initial="" task="" ...="">
```

Here are the facts I know and my new/updated plan of action to 
solve the task:
```
## 1. Updated facts survey

### 1.1. Facts given in the task
- We have a pandas dataframe with revenue by segment, showing 
values "before" (April 2025) and "after" (May 2025).
- Columns in the dataframe include multiple dimensions and the 
"before" and "after" revenue values.
- The goal is to understand revenue changes by segment and provide
an executive summary.
- Guidance and rules about how to analyze and interpret results 
from the `calculate_simple_growth_metrics` tool are provided.
- The dataframe contains columns: country, maturity, 
country_maturity, country_maturity_combined, before, after.

### 1.2. Facts that we have learned
- The dimensions to analyze are: country, maturity, 
country_maturity, and country_maturity_combined.
- Analyzed revenue changes by dimension.
- Only the "new" maturity segment has significant impact 
(impact_norm=1.96 > 1.25), with a large negative revenue change (~
-70.6%).
- In the combined segment "country_maturity," the "new" segments 
across countries (Spain_new, UK_new, Germany_new, France_new, 
Italy_new, other_new) all have outsized negative impacts with 
impact_norm values all above 1.9.
- The mature/existing segments in those countries have smaller 
normed impacts below 1.25.
- Country-level and maturity-level segment dimension alone are 
less revealing than the combined country+maturity segment 
dimension which highlights the new segments as strongly impactful.
- Total revenue dropped substantially from before to after, mostly
driven by new segments shrinking drastically.

### 1.3. Facts still to look up
- Whether splitting the data by additional dimensions beyond 
country and maturity (e.g., country_maturity_combined) explains 
further heterogeneous impacts or if the pattern is uniform.
- Explanation/context from change log about what caused the major 
drop predominantly in new segments in all countries.
- Confirming whether any country within the new segment behaved 
differently or mitigated losses.

### 1.4. Facts still to derive
- A concise executive summary describing the top-level revenue 
change and identifying which segments explain the declines.
- Explanation involving the change log agent with summary of 
probable reasons for these outsized reductions in revenue in the 
new segments across countries for April-May 2025.

## 2. Plan

### 2.1. Verify if adding the additional dimension 
'country_maturity_combined' splits the impactful "new" segments 
into subsegments with significantly different impacts or if the 
change rates and normed impacts are relatively homogeneous. If 
homogeneous, we do not gain deeper insight and should disregard 
further splitting.

### 2.2. Summarize all significant segments identified with 
outsized impact_norm ≥ 1.25, including their before and after 
values, absolute and relative changes, segment shares before, 
impact, and normalized impact, ordered by absolute value of the 
change.

### 2.3. Query the change_log_agent with the full context: 
significant segments are the new country_maturity segments with 
large negative changes (~ -70%), timeframe April 2025 to May 2025,
and request top 1-3 most probable causes for the KPI revenue drop 
in these segments.

### 2.4. Based on the change log agent's response, synthesize a 
3-5 sentence high-level commentary explaining what happened 
broadly and why.

### 2.5. Draft a detailed executive summary including:
- Total revenue before and after in human-readable format with 
absolute and relative change.
- A list of significant segments driving these changes, in order 
by absolute impact, with detailed numbers (before, after, absolute
and relative change, segment share before, impact, normed impact).

### 2.6. Use the `final_answer` tool to produce the finalized 
executive summary report.

I really like how agents are encouraged to reiterate their initial tasks and focus on the main issues. Regular reflections like this also help real life, as teams often get stuck in the process and ignore the reasons for what they are doing. It’s so cool to see management best practices incorporated into the proxy framework.

That’s it! We built a code proxy that was able to analyze KPI changes to implement simple metrics and explored all the key nuances of the process.

You can find the complete code and execution log on GitHub.

Summary

We have done a lot of experiments with the code proxy and are now ready to draw conclusions. For our experiments, we use the HuggingFace Smolagents framework for the code proxy – a very convenient tool set that provides:

Easy integration with different LLMs (from Ollama’s native models to public providers like humans or OpenAI),
Excellent logging allows easy understanding of the entire thinking process of agents and debugging problems,
Ability to build complex systems that utilize multiple AI proxy setup or planning capabilities without much effort.

Although Smolagents is my favorite proxy framework, it has its limitations:

Sometimes there may be a lack of flexibility. For example, I have to modify the prompt directly in the source code to get the behavior I want.
It only supports hierarchical multi-agent setups (where one manager can delegate tasks to other agents), but does not cover sequential workflows or consensus decision processes.
Long-term memory out of the box is not supported, which means you start every task with scratches.