Data Models¶

Pydantic data models used throughout IntelliScraper.

Pydantic data models for IntelliScraper.

Defines the core data structures used throughout the library:

RequestEvent / SessionStats — time-series request tracking.
Session — browser session with cookies, storage, and fingerprint.
Proxy — proxy server configuration.
ScrapeRequest — input parameters for a scrape operation.
ScrapeResponse — output of a scrape operation with enriched metadata.

class intelliscraper.common.models.RequestEvent(*, sent_at, request_status)[source]¶

Bases: BaseModel

A single scraping request event in time-series format.

Each event captures when a request was made and its outcome, enabling audit trails and performance analysis.

Parameters:

sent_at (float)
request_status (ScrapStatus)

sent_at¶

Unix timestamp when this request was sent.

Type:: float

request_status¶

Outcome status of the scraping request.

Type:: intelliscraper.enums.ScrapStatus

sent_at: float¶

request_status: ScrapStatus¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.SessionStats(*, request_events=<factory>)[source]¶

Bases: BaseModel

Thread-safe statistics collector for scraping sessions.

Maintains a time-series log of all request events and provides computed statistics about success rates, failures, and performance. All operations are thread-safe via an internal Lock.

Parameters:: request_events (list[RequestEvent])

request_events¶

Chronological list of all request events.

Type:: list[intelliscraper.common.models.RequestEvent]

model_config = {'arbitrary_types_allowed': True}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

request_events: list[RequestEvent]¶

add_request_event(request_event)[source]¶

Add a request event to the log in a thread-safe manner.

Parameters:: request_event (RequestEvent) – The RequestEvent to record.
Return type:: None

property stats: dict[str, int]¶

Get a breakdown of all request statuses.

Returns:

Dictionary mapping status names to counts, e.g.:

{"success": 42, "partial_success": 3, "failed": 1, ...}

model_post_init(context, /)¶

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.

Return type:

None

class intelliscraper.common.models.Session(*, site, base_url, cookies=<factory>, localStorage=None, sessionStorage=None, fingerprint=None, stats=<factory>)[source]¶

Bases: BaseModel

Browser session data for authenticated scraping.

Captures all state needed to resume an authenticated browser session: cookies, localStorage, sessionStorage, and a browser fingerprint for anti-detection.

Parameters:

site (str)
base_url (str)
cookies (list[dict])
localStorage (dict | None)
sessionStorage (dict | None)
fingerprint (dict | None)
stats (SessionStats)

site¶

Identifier for the target site (e.g. "linkedin").

Type:: str

base_url¶

The base URL used for scraping.

Type:: str

cookies¶

List of cookie dicts captured from the session.

Type:: list[dict]

localStorage¶

Key-value pairs from the browser’s localStorage.

Type:: dict | None

sessionStorage¶

Key-value pairs from the browser’s sessionStorage.

Type:: dict | None

fingerprint¶

Browser fingerprint data for anti-detection.

Type:: dict | None

stats¶

Time-series event log and computed statistics.

Type:: intelliscraper.common.models.SessionStats

site: str¶

base_url: str¶

cookies: list[dict]¶

localStorage: dict | None¶

sessionStorage: dict | None¶

fingerprint: dict | None¶

stats: SessionStats¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.Proxy(*, server, bypass=None, username=None, password=None)[source]¶

Bases: BaseModel

Proxy configuration for network requests.

Applied at the browser-context level in managed browser mode only. All pages within a scraper instance share the same proxy.

Not used in local browser mode — the user’s Chrome instance manages its own network configuration.

Parameters:

server (str)
bypass (str | None)
username (str | None)
password (str | None)

server¶

Proxy server URL (e.g. http://myproxy.com:3128).

Type:: str

bypass¶

Comma-separated domains to bypass the proxy.

Type:: str | None

username¶

Proxy authentication username.

Type:: str | None

password¶

Proxy authentication password.

Type:: str | None

server: str¶

bypass: str | None¶

username: str | None¶

password: str | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.ScrapeRequest(*, url, timeout, browser_launch_options=None, proxy=None, session_data=None, browsing_mode=None)[source]¶

Bases: BaseModel

Input configuration for a single scraping request.

Captures all parameters used to initiate a scrape, enabling full traceability from request to response.

Parameters:

url (str)
timeout (timedelta)
browser_launch_options (dict | None)
proxy (Proxy | None)
session_data (Session | None)
browsing_mode (BrowsingMode | None)

url¶

The target URL to scrape.

Type:: str

timeout¶

Maximum time allowed for page load.

Type:: datetime.timedelta

browser_launch_options¶

Options used to launch the browser.

Type:: dict | None

proxy¶

Proxy configuration used, if any.

Type:: intelliscraper.common.models.Proxy | None

session_data¶

Session information used, if any.

Type:: intelliscraper.common.models.Session | None

browsing_mode¶

Browser behaviour mode (FAST or HUMAN_LIKE).

Type:: intelliscraper.enums.BrowsingMode | None

url: str¶

timeout: timedelta¶

browser_launch_options: dict | None¶

proxy: Proxy | None¶

session_data: Session | None¶

browsing_mode: BrowsingMode | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.ScrapeResponse(*, scrape_request, status, http_status_code=None, elapsed_time=None, scrap_html_content=None, error_msg=None, session_id=None, browser_mode=None)[source]¶

Bases: BaseModel

Output of a web scraping operation with enriched metadata.

Contains the scraped content, timing information, HTTP status, and metadata about which session and browser mode were used.

Parameters:

scrape_request (ScrapeRequest)
status (ScrapStatus)
http_status_code (int | None)
elapsed_time (float | None)
scrap_html_content (str | None)
error_msg (str | None)
session_id (str | None)
browser_mode (str | None)

scrape_request¶

The original request parameters.

Type:: intelliscraper.common.models.ScrapeRequest

status¶

Final outcome status of the scrape.

Type:: intelliscraper.enums.ScrapStatus

http_status_code¶

Actual HTTP status code returned by the server (e.g. 200, 403, 429). None if the request failed before receiving a response.

Type:: int | None

elapsed_time¶

Total scrape duration in seconds.

Type:: float | None

scrap_html_content¶

Raw HTML content from the page.

Type:: str | None

error_msg¶

Error message if the scrape failed.

Type:: str | None

session_id¶

Identifier of the session used (the site field from Session), or None if no session.

Type:: str | None

browser_mode¶

Which browser backend was used: "local_browser" or "managed_browser".

Type:: str | None

scrape_request: ScrapeRequest¶

status: ScrapStatus¶

http_status_code: int | None¶

elapsed_time: float | None¶

scrap_html_content: str | None¶

error_msg: str | None¶

session_id: str | None¶

browser_mode: str | None¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].