Data Models¶
Pydantic data models used throughout IntelliScraper.
Pydantic data models for IntelliScraper.
Defines the core data structures used throughout the library:
RequestEvent/SessionStats— time-series request tracking.Session— browser session with cookies, storage, and fingerprint.Proxy— proxy server configuration.ScrapeRequest— input parameters for a scrape operation.ScrapeResponse— output of a scrape operation with enriched metadata.
- class intelliscraper.common.models.RequestEvent(*, sent_at, request_status)[source]¶
Bases:
BaseModelA single scraping request event in time-series format.
Each event captures when a request was made and its outcome, enabling audit trails and performance analysis.
- Parameters:
sent_at (float)
request_status (ScrapStatus)
- request_status¶
Outcome status of the scraping request.
- request_status: ScrapStatus¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class intelliscraper.common.models.SessionStats(*, request_events=<factory>)[source]¶
Bases:
BaseModelThread-safe statistics collector for scraping sessions.
Maintains a time-series log of all request events and provides computed statistics about success rates, failures, and performance. All operations are thread-safe via an internal
Lock.- Parameters:
request_events (list[RequestEvent])
- request_events¶
Chronological list of all request events.
- model_config = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- request_events: list[RequestEvent]¶
- add_request_event(request_event)[source]¶
Add a request event to the log in a thread-safe manner.
- Parameters:
request_event (RequestEvent) – The
RequestEventto record.- Return type:
None
- property stats: dict[str, int]¶
Get a breakdown of all request statuses.
- Returns:
Dictionary mapping status names to counts, e.g.:
{"success": 42, "partial_success": 3, "failed": 1, ...}
- model_post_init(context, /)¶
This function is meant to behave like a BaseModel method to initialize private attributes.
It takes context as an argument since that’s what pydantic-core passes when calling it.
- Parameters:
self (BaseModel) – The BaseModel instance.
context (Any) – The context.
- Return type:
None
- class intelliscraper.common.models.Session(*, site, base_url, cookies=<factory>, localStorage=None, sessionStorage=None, fingerprint=None, stats=<factory>)[source]¶
Bases:
BaseModelBrowser session data for authenticated scraping.
Captures all state needed to resume an authenticated browser session: cookies, localStorage, sessionStorage, and a browser fingerprint for anti-detection.
- Parameters:
- stats¶
Time-series event log and computed statistics.
- stats: SessionStats¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class intelliscraper.common.models.Proxy(*, server, bypass=None, username=None, password=None)[source]¶
Bases:
BaseModelProxy configuration for network requests.
Applied at the browser-context level in managed browser mode only. All pages within a scraper instance share the same proxy.
Not used in local browser mode — the user’s Chrome instance manages its own network configuration.
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class intelliscraper.common.models.ScrapeRequest(*, url, timeout, browser_launch_options=None, proxy=None, session_data=None, browsing_mode=None)[source]¶
Bases:
BaseModelInput configuration for a single scraping request.
Captures all parameters used to initiate a scrape, enabling full traceability from request to response.
- Parameters:
- timeout¶
Maximum time allowed for page load.
- Type:
- proxy¶
Proxy configuration used, if any.
- Type:
- session_data¶
Session information used, if any.
- Type:
- browsing_mode¶
Browser behaviour mode (FAST or HUMAN_LIKE).
- Type:
- timeout: timedelta¶
- browsing_mode: BrowsingMode | None¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class intelliscraper.common.models.ScrapeResponse(*, scrape_request, status, http_status_code=None, elapsed_time=None, scrap_html_content=None, error_msg=None, session_id=None, browser_mode=None)[source]¶
Bases:
BaseModelOutput of a web scraping operation with enriched metadata.
Contains the scraped content, timing information, HTTP status, and metadata about which session and browser mode were used.
- Parameters:
scrape_request (ScrapeRequest)
status (ScrapStatus)
http_status_code (int | None)
elapsed_time (float | None)
scrap_html_content (str | None)
error_msg (str | None)
session_id (str | None)
browser_mode (str | None)
- scrape_request¶
The original request parameters.
- status¶
Final outcome status of the scrape.
- http_status_code¶
Actual HTTP status code returned by the server (e.g. 200, 403, 429).
Noneif the request failed before receiving a response.- Type:
int | None
- session_id¶
Identifier of the session used (the
sitefield fromSession), orNoneif no session.- Type:
str | None
- browser_mode¶
Which browser backend was used:
"local_browser"or"managed_browser".- Type:
str | None
- scrape_request: ScrapeRequest¶
- status: ScrapStatus¶
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].