Data Models

Pydantic data models used throughout IntelliScraper.

Pydantic data models for IntelliScraper.

Defines the core data structures used throughout the library:

  • RequestEvent / SessionStats — time-series request tracking.

  • Session — browser session with cookies, storage, and fingerprint.

  • Proxy — proxy server configuration.

  • ScrapeRequest — input parameters for a scrape operation.

  • ScrapeResponse — output of a scrape operation with enriched metadata.

class intelliscraper.common.models.RequestEvent(*, sent_at, request_status)[source]

Bases: BaseModel

A single scraping request event in time-series format.

Each event captures when a request was made and its outcome, enabling audit trails and performance analysis.

Parameters:
sent_at

Unix timestamp when this request was sent.

Type:

float

request_status

Outcome status of the scraping request.

Type:

intelliscraper.enums.ScrapStatus

sent_at: float
request_status: ScrapStatus
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.SessionStats(*, request_events=<factory>)[source]

Bases: BaseModel

Thread-safe statistics collector for scraping sessions.

Maintains a time-series log of all request events and provides computed statistics about success rates, failures, and performance. All operations are thread-safe via an internal Lock.

Parameters:

request_events (list[RequestEvent])

request_events

Chronological list of all request events.

Type:

list[intelliscraper.common.models.RequestEvent]

model_config = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

request_events: list[RequestEvent]
add_request_event(request_event)[source]

Add a request event to the log in a thread-safe manner.

Parameters:

request_event (RequestEvent) – The RequestEvent to record.

Return type:

None

property stats: dict[str, int]

Get a breakdown of all request statuses.

Returns:

Dictionary mapping status names to counts, e.g.:

{"success": 42, "partial_success": 3, "failed": 1, ...}

model_post_init(context, /)

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:
  • self (BaseModel) – The BaseModel instance.

  • context (Any) – The context.

Return type:

None

class intelliscraper.common.models.Session(*, site, base_url, cookies=<factory>, localStorage=None, sessionStorage=None, fingerprint=None, stats=<factory>)[source]

Bases: BaseModel

Browser session data for authenticated scraping.

Captures all state needed to resume an authenticated browser session: cookies, localStorage, sessionStorage, and a browser fingerprint for anti-detection.

Parameters:
site

Identifier for the target site (e.g. "linkedin").

Type:

str

base_url

The base URL used for scraping.

Type:

str

cookies

List of cookie dicts captured from the session.

Type:

list[dict]

localStorage

Key-value pairs from the browser’s localStorage.

Type:

dict | None

sessionStorage

Key-value pairs from the browser’s sessionStorage.

Type:

dict | None

fingerprint

Browser fingerprint data for anti-detection.

Type:

dict | None

stats

Time-series event log and computed statistics.

Type:

intelliscraper.common.models.SessionStats

site: str
base_url: str
cookies: list[dict]
localStorage: dict | None
sessionStorage: dict | None
fingerprint: dict | None
stats: SessionStats
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.Proxy(*, server, bypass=None, username=None, password=None)[source]

Bases: BaseModel

Proxy configuration for network requests.

Applied at the browser-context level in managed browser mode only. All pages within a scraper instance share the same proxy.

Not used in local browser mode — the user’s Chrome instance manages its own network configuration.

Parameters:
  • server (str)

  • bypass (str | None)

  • username (str | None)

  • password (str | None)

server

Proxy server URL (e.g. http://myproxy.com:3128).

Type:

str

bypass

Comma-separated domains to bypass the proxy.

Type:

str | None

username

Proxy authentication username.

Type:

str | None

password

Proxy authentication password.

Type:

str | None

server: str
bypass: str | None
username: str | None
password: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.ScrapeRequest(*, url, timeout, browser_launch_options=None, proxy=None, session_data=None, browsing_mode=None)[source]

Bases: BaseModel

Input configuration for a single scraping request.

Captures all parameters used to initiate a scrape, enabling full traceability from request to response.

Parameters:
url

The target URL to scrape.

Type:

str

timeout

Maximum time allowed for page load.

Type:

datetime.timedelta

browser_launch_options

Options used to launch the browser.

Type:

dict | None

proxy

Proxy configuration used, if any.

Type:

intelliscraper.common.models.Proxy | None

session_data

Session information used, if any.

Type:

intelliscraper.common.models.Session | None

browsing_mode

Browser behaviour mode (FAST or HUMAN_LIKE).

Type:

intelliscraper.enums.BrowsingMode | None

url: str
timeout: timedelta
browser_launch_options: dict | None
proxy: Proxy | None
session_data: Session | None
browsing_mode: BrowsingMode | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class intelliscraper.common.models.ScrapeResponse(*, scrape_request, status, http_status_code=None, elapsed_time=None, scrap_html_content=None, error_msg=None, session_id=None, browser_mode=None)[source]

Bases: BaseModel

Output of a web scraping operation with enriched metadata.

Contains the scraped content, timing information, HTTP status, and metadata about which session and browser mode were used.

Parameters:
scrape_request

The original request parameters.

Type:

intelliscraper.common.models.ScrapeRequest

status

Final outcome status of the scrape.

Type:

intelliscraper.enums.ScrapStatus

http_status_code

Actual HTTP status code returned by the server (e.g. 200, 403, 429). None if the request failed before receiving a response.

Type:

int | None

elapsed_time

Total scrape duration in seconds.

Type:

float | None

scrap_html_content

Raw HTML content from the page.

Type:

str | None

error_msg

Error message if the scrape failed.

Type:

str | None

session_id

Identifier of the session used (the site field from Session), or None if no session.

Type:

str | None

browser_mode

Which browser backend was used: "local_browser" or "managed_browser".

Type:

str | None

scrape_request: ScrapeRequest
status: ScrapStatus
http_status_code: int | None
elapsed_time: float | None
scrap_html_content: str | None
error_msg: str | None
session_id: str | None
browser_mode: str | None
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].