Browser Backends

Strategy pattern for browser lifecycle management.

BrowserBackend (ABC)

Abstract base class for browser backends.

All browser lifecycle management starting, configuring, and tearing down the browser is handled through this interface. AsyncScraper delegates to a concrete backend chosen at construction time.

class intelliscraper.browser.backend.BrowserBackend[source]

Bases: ABC

Abstract base for browser lifecycle management.

Subclasses handle how a browser is started, configured, and cleaned up. The scraper delegates all browser-level concerns here.

Typical usage:

backend = ManagedBrowserBackend(headless=True, ...)
browser, context = await backend.initialize(playwright)
# ... use browser / context ...
await backend.cleanup(browser, context)
abstractmethod async initialize(playwright)[source]

Start the browser and return a ready-to-use context.

Parameters:

playwright (Playwright) – A started Playwright instance.

Returns:

A (browser, context) tuple.

Raises:

LocalBrowserConnectionError – If the local browser cannot be reached (only from LocalBrowserBackend).

Return type:

tuple[Browser, BrowserContext]

abstractmethod async cleanup(browser, context)[source]

Release browser resources.

What gets closed depends on the backend:

  • Local: Only the pages opened by the scraper are closed. The Chrome process and its context are left running.

  • Managed: Pages, context, and browser process are all closed.

Parameters:
  • browser (Browser) – The browser instance to clean up.

  • context (BrowserContext) – The browser context to clean up.

Return type:

None

abstract property owns_browser: bool

Whether this backend owns (and should close) the browser process.

Returns:

True for managed backends, False for local/CDP backends where the browser belongs to the user.

LocalBrowserBackend

Local browser backend connects to Chrome via CDP.

Connects to a user’s already-running Chrome instance via the Chrome DevTools Protocol (CDP) on a configurable port. All existing cookies, logins, and sessions in that browser are immediately available no session_data or authentication code needed.

Before using this backend, either:

  1. Start Chrome manually:

    google-chrome \
        --remote-debugging-port=9222 \
        --user-data-dir="$HOME/.config/google-chrome-debug" \
        --profile-directory="Default"
    
  2. Or let the backend auto-launch Chrome (it will attempt to find and start Chrome with the debug profile if the port is not already open).

Important

The debug profile at ~/.config/google-chrome-debug is separate from your default Chrome profile. You must log into target sites (e.g. LinkedIn) in this profile before scraping. Use:

make chrome-debug-login URL=https://www.linkedin.com

to open Chrome with this profile and log in.

class intelliscraper.browser.local.LocalBrowserBackend(cdp_port=9222, headless=True, profile_dir=None)[source]

Bases: BrowserBackend

Connect to an already-running Chrome instance via CDP.

This backend reuses the user’s real Chrome session, preserving all cookies, localStorage, and authenticated state. It is ideal for scraping sites that require complex authentication flows (e.g. LinkedIn, Gmail).

Parameters:
  • cdp_port (int) – Chrome DevTools Protocol port. Defaults to 9222.

  • headless (bool) – If Chrome needs to be auto-launched, whether to run it headless. Defaults to True.

  • profile_dir (str | None) – Path to the Chrome user-data-dir used for the debug profile. Defaults to ~/.config/google-chrome-debug.

Raises:

LocalBrowserConnectionError – If Chrome is not reachable after _CDP_MAX_WAIT_SEC seconds.

Example:

backend = LocalBrowserBackend(cdp_port=9222, headless=False)
browser, context = await backend.initialize(playwright)
property owns_browser: bool

Local backend does not own the browser process.

async initialize(playwright)[source]

Connect to Chrome via CDP and return browser + context.

If Chrome is not already running on the configured port, the backend will attempt to auto-launch it using the debug profile.

Parameters:

playwright (Playwright) – A started Playwright instance.

Returns:

A (browser, context) tuple. The context is the first existing context found in the CDP browser (preserving all cookies and logins).

Raises:

LocalBrowserConnectionError – If Chrome cannot be reached.

Return type:

tuple[Browser, BrowserContext]

async cleanup(browser, context)[source]

Clean up local browser resources.

In CDP mode the Chrome process and context belong to the user, so they are not closed. Only a debug log entry is emitted.

Parameters:
  • browser (Browser) – The CDP-connected browser (not closed).

  • context (BrowserContext) – The reused browser context (not closed).

Return type:

None

ManagedBrowserBackend

Managed browser backend launches Chromium via Playwright.

Launches a fresh Chromium instance managed entirely by the scraper. Applies fingerprint spoofing, proxy configuration, anti-detection scripts, and session cookies/storage when provided.

This is the default backend used when use_local_browser=False.

class intelliscraper.browser.managed.ManagedBrowserBackend(headless=True, browser_launch_options=None, proxy=None, session_data=None)[source]

Bases: BrowserBackend

Launch and manage a Chromium browser instance via Playwright.

Handles the full lifecycle of a Playwright-managed browser: launching, configuring the context (fingerprint, proxy, cookies, anti-detection scripts), and tearing everything down on cleanup.

Parameters:
  • headless (bool) – Run browser without a visible UI. Defaults to True.

  • browser_launch_options (dict | None) – Custom Chromium launch options. Merged with the headless flag. Defaults to BROWSER_LAUNCH_OPTIONS.

  • proxy (Proxy | None) – Proxy configuration for network requests. Applied at the browser-context level so all pages share the same proxy. Defaults to None.

  • session_data (Session | None) – Pre-authenticated session with cookies, localStorage, sessionStorage, and browser fingerprint. Defaults to None.

Example:

backend = ManagedBrowserBackend(
    headless=True,
    proxy=my_proxy,
    session_data=my_session,
)
browser, context = await backend.initialize(playwright)
property owns_browser: bool

Managed backend owns and should close the browser process.

async initialize(playwright)[source]

Launch Chromium and create a fully configured context.

Steps performed:

  1. Launch Chromium with the configured options.

  2. Create a browser context with fingerprint spoofing and optional proxy.

  3. Inject session cookies (if provided).

  4. Apply anti-detection JavaScript scripts.

Parameters:

playwright (Playwright) – A started Playwright instance.

Returns:

A (browser, context) tuple.

Return type:

tuple[Browser, BrowserContext]

async cleanup(browser, context)[source]

Close context and browser process.

Parameters:
  • browser (Browser) – The Playwright-managed browser to close.

  • context (BrowserContext) – The browser context to close.

Return type:

None

async apply_session_storage(page)[source]

Apply localStorage and sessionStorage to a page.

If session_data is configured with storage data, navigates the page to the session’s base_url and injects the stored key-value pairs.

Parameters:

page – A Playwright Page instance to configure.

Return type:

None