...
Artificial Intelligence

Advanced coding implementation: Mastering browser-driven AI in Google Colab with playwrights, browser_use Agent & BrowserContext, Langchain and Gemini

In this tutorial, we will learn how to fully leverage the power of browser-driven AI agents in Google COLAB. We will leverage Playwright’s headless chrome engine, along with Browser_use Library’s advanced proxy and BrowserContext abstraction to programmatically navigate websites, extract data and automate complex workflows. We will wrap Google’s Gemini model through the langchain_google_genai connector to provide natural language reasoning and decision making, and secure API-Key processing is ensured by Pydantic’s SecretSTR. Manage credentials through GetPass, coordinated non-blocking execution asynchronously, and optional .ENV support through Python-Dotenv, this setup will give you an end-to-end interactive proxy platform without leaving your laptop environment.

!apt-get update -qq
!apt-get install -y -qq chromium-browser chromium-chromedriver fonts-liberation
!pip install -qq playwright python-dotenv langchain-google-generative-ai browser-use
!playwright install

We first refresh the system package list and install headless chrome, its network drive, and Liberation fonts to enable browser automation. It then installed the playwright with Python-Dotenv, Langchain Google GenerativeAi connector and browser usage, and finally downloaded the necessary browser binary via playwright installs.

import os
import asyncio
from getpass import getpass
from pydantic import SecretStr
from langchain_google_genai import ChatGoogleGenerativeAI
from browser_use import Agent, Browser, BrowserContextConfig, BrowserConfig
from browser_use.browser.browser import BrowserContext

We introduced core Python utilities, operating systems for environment management and Asyncio for asynchronous execution, and Secretstr for GetPass and Pydantic for secure API key input and storage. It then loads Langchain’s Gemini wrapper (ChatGoogleGeneratiVeai) and Browser_use Toolkit (Proxy, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and drive headless browser proxy.

os.environ["ANONYMIZED_TELEMETRY"] = "false"

We disable anonymous usage reporting by setting the Anonymized_telemetry environment variable to “false”, thus ensuring that neither the playwright nor browser_use libraries send any telemetry data to their maintainers.

async def setup_browser(headless: bool = True):
    browser = Browser(config=BrowserConfig(headless=headless))
    context = BrowserContext(
        browser=browser,
        config=BrowserContextConfig(
            wait_for_network_idle_page_load_time=5.0,
            highlight_elements=True,
            save_recording_path="./recordings",
        )
    )
    return browser, context

This asynchronous assistant initializes the headless (or header) browser instance and wraps it in a configured browser context to wait for the web plug page to load, highlight elements visually during interaction, and save records for each session. Each session under /recordings. It then returns the ready-made context of the browser and its proxy tasks.

async def agent_loop(llm, browser_context, query, initial_url=None):
    initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url else None
    agent = Agent(
        task=query,
        llm=llm,
        browser_context=browser_context,
        use_vision=True,
        generate_gif=False,  
        initial_actions=initial_actions,
    )
    result = await agent.run()
    return result.final_result() if result else None

This asynchronous assistant encapsulates a “think and browse” cycle: it rotates a proxy configured using LLM, browser context and optional initial URL tabs, leverages vision when available, and disables GIF recording. Once you call Agent_loop, it runs the agent through its steps and returns the end result of the agent (or if no product is generated).

async def main():
    raw_key = getpass("Enter your GEMINI_API_KEY: ")


    os.environ["GEMINI_API_KEY"] = raw_key


    api_key = SecretStr(raw_key)
    model_name = "gemini-2.5-flash-preview-04-17"


    llm = ChatGoogleGenerativeAI(model=model_name, api_key=api_key)


    browser, context = await setup_browser(headless=True)


    try:
        while True:
            query = input("nEnter prompt (or leave blank to exit): ").strip()
            if not query:
                break
            url = input("Optional URL to open first (or blank to skip): ").strip() or None


            print("n🤖 Running agent…")
            answer = await agent_loop(llm, context, query, initial_url=url)
            print("n📊 Search Resultsn" + "-"*40)
            print(answer or "No results found")
            print("-"*40)
    finally:
        print("Closing browser…")
        await browser.close()


await main()

Finally, this main coroutine drives the entire Colab session: it securely prompts for your Gemini API key (using getpass and SecretStr), sets up the ChatGoogleGeneratedAI LLM and a headless Playwright browser context, then enters an interactive loop where it reads your natural‑language prompts (and optional start URL), invokes the agent_loop to perform the browser‑driven AI task, prints results and ultimately ensure that the browser‑driven AI task, prints.

In short, by following this guide, you now have a reproducible COLAB template that integrates browser automation, LLM reasoning, and security credential management into a single cohesive pipeline. Whether you’re scratching actual market data, summarizing news articles or automated reporting tasks, playwrights, Browser_use and Langchain’s Gemini interface provides a flexible foundation for your next AI -Ener -Ener -En -Ener -Project. Feel free to extend the capabilities of the agent, re-enable GIF recording, add custom navigation steps, or swap other LLM backends to accurately tailor the workflow to your research or production needs.


This is COLAB notebook. Also, don’t forget to follow us twitter And join us Telegram Channel and LinkedIn GrOUP. Don’t forget to join us 90K+ ml reddit.

🔥 [Register Now] Minicon Agesic AI Virtual Conference: Free Registration + Certificate of Attendance + 4-hour Short Event (May 21, 9am-1pm) + Hands-On the Workshop


Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button
Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.