Browser Backends¶
Strategy pattern for browser lifecycle management.
BrowserBackend (ABC)¶
Abstract base class for browser backends.
All browser lifecycle management starting, configuring, and tearing
down the browser is handled through this interface. AsyncScraper
delegates to a concrete backend chosen at construction time.
- class intelliscraper.browser.backend.BrowserBackend[source]¶
Bases:
ABCAbstract base for browser lifecycle management.
Subclasses handle how a browser is started, configured, and cleaned up. The scraper delegates all browser-level concerns here.
Typical usage:
backend = ManagedBrowserBackend(headless=True, ...) browser, context = await backend.initialize(playwright) # ... use browser / context ... await backend.cleanup(browser, context)
- abstractmethod async initialize(playwright)[source]¶
Start the browser and return a ready-to-use context.
- Parameters:
playwright (Playwright) – A started Playwright instance.
- Returns:
A
(browser, context)tuple.- Raises:
LocalBrowserConnectionError – If the local browser cannot be reached (only from
LocalBrowserBackend).- Return type:
tuple[Browser, BrowserContext]
- abstractmethod async cleanup(browser, context)[source]¶
Release browser resources.
What gets closed depends on the backend:
Local: Only the pages opened by the scraper are closed. The Chrome process and its context are left running.
Managed: Pages, context, and browser process are all closed.
- Parameters:
browser (Browser) – The browser instance to clean up.
context (BrowserContext) – The browser context to clean up.
- Return type:
None
LocalBrowserBackend¶
Local browser backend connects to Chrome via CDP.
Connects to a user’s already-running Chrome instance via the Chrome
DevTools Protocol (CDP) on a configurable port. All existing cookies,
logins, and sessions in that browser are immediately available no
session_data or authentication code needed.
Before using this backend, either:
Start Chrome manually:
google-chrome \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.config/google-chrome-debug" \ --profile-directory="Default"
Or let the backend auto-launch Chrome (it will attempt to find and start Chrome with the debug profile if the port is not already open).
Important
The debug profile at ~/.config/google-chrome-debug is separate
from your default Chrome profile. You must log into target sites
(e.g. LinkedIn) in this profile before scraping. Use:
make chrome-debug-login URL=https://www.linkedin.com
to open Chrome with this profile and log in.
- class intelliscraper.browser.local.LocalBrowserBackend(cdp_port=9222, headless=True, profile_dir=None)[source]¶
Bases:
BrowserBackendConnect to an already-running Chrome instance via CDP.
This backend reuses the user’s real Chrome session, preserving all cookies, localStorage, and authenticated state. It is ideal for scraping sites that require complex authentication flows (e.g. LinkedIn, Gmail).
- Parameters:
- Raises:
LocalBrowserConnectionError – If Chrome is not reachable after
_CDP_MAX_WAIT_SECseconds.
Example:
backend = LocalBrowserBackend(cdp_port=9222, headless=False) browser, context = await backend.initialize(playwright)
- async initialize(playwright)[source]¶
Connect to Chrome via CDP and return browser + context.
If Chrome is not already running on the configured port, the backend will attempt to auto-launch it using the debug profile.
- Parameters:
playwright (Playwright) – A started Playwright instance.
- Returns:
A
(browser, context)tuple. The context is the first existing context found in the CDP browser (preserving all cookies and logins).- Raises:
LocalBrowserConnectionError – If Chrome cannot be reached.
- Return type:
tuple[Browser, BrowserContext]
- async cleanup(browser, context)[source]¶
Clean up local browser resources.
In CDP mode the Chrome process and context belong to the user, so they are not closed. Only a debug log entry is emitted.
- Parameters:
browser (Browser) – The CDP-connected browser (not closed).
context (BrowserContext) – The reused browser context (not closed).
- Return type:
None
ManagedBrowserBackend¶
Managed browser backend launches Chromium via Playwright.
Launches a fresh Chromium instance managed entirely by the scraper. Applies fingerprint spoofing, proxy configuration, anti-detection scripts, and session cookies/storage when provided.
This is the default backend used when use_local_browser=False.
- class intelliscraper.browser.managed.ManagedBrowserBackend(headless=True, browser_launch_options=None, proxy=None, session_data=None)[source]¶
Bases:
BrowserBackendLaunch and manage a Chromium browser instance via Playwright.
Handles the full lifecycle of a Playwright-managed browser: launching, configuring the context (fingerprint, proxy, cookies, anti-detection scripts), and tearing everything down on cleanup.
- Parameters:
headless (bool) – Run browser without a visible UI. Defaults to
True.browser_launch_options (dict | None) – Custom Chromium launch options. Merged with the
headlessflag. Defaults toBROWSER_LAUNCH_OPTIONS.proxy (Proxy | None) – Proxy configuration for network requests. Applied at the browser-context level so all pages share the same proxy. Defaults to
None.session_data (Session | None) – Pre-authenticated session with cookies, localStorage, sessionStorage, and browser fingerprint. Defaults to
None.
Example:
backend = ManagedBrowserBackend( headless=True, proxy=my_proxy, session_data=my_session, ) browser, context = await backend.initialize(playwright)
- async initialize(playwright)[source]¶
Launch Chromium and create a fully configured context.
Steps performed:
Launch Chromium with the configured options.
Create a browser context with fingerprint spoofing and optional proxy.
Inject session cookies (if provided).
Apply anti-detection JavaScript scripts.
- Parameters:
playwright (Playwright) – A started Playwright instance.
- Returns:
A
(browser, context)tuple.- Return type:
tuple[Browser, BrowserContext]
- async cleanup(browser, context)[source]¶
Close context and browser process.
- Parameters:
browser (Browser) – The Playwright-managed browser to close.
context (BrowserContext) – The browser context to close.
- Return type:
None
- async apply_session_storage(page)[source]¶
Apply localStorage and sessionStorage to a page.
If
session_datais configured with storage data, navigates the page to the session’sbase_urland injects the stored key-value pairs.- Parameters:
page – A Playwright
Pageinstance to configure.- Return type:
None