Catalog108 / challenges

Challenges index

Every scraping challenge on Catalog108, categorized. Each link is a permanent URL referenced from Scraping Central / Learn. Each challenge page has its own grader endpoint at /challenges/…/grade.

Phase 1A: URL surface is laid out below. Phase 1B+ ships the actual challenge pages and graders.

Static scraping

HTML + HTTP only. No JavaScript needed to extract the data.

/challenges/static/tables/simple Basic HTML table extraction
/challenges/static/tables/nested Tables with merged cells, nested tables
/challenges/static/tables/dynamic-headers Tables where headers differ per row
/challenges/static/pagination/numbered ?page=N URL pagination
/challenges/static/pagination/offset ?offset=N&limit=M
/challenges/static/pagination/cursor Cursor-based pagination
/challenges/static/pagination/load-more-http "Load more" without JS (server-rendered)
/challenges/static/pagination/unknown-end Last page indeterminate, must detect
/challenges/static/forms/simple GET form
/challenges/static/forms/post POST form
/challenges/static/forms/csrf CSRF-protected form
/challenges/static/forms/multi-step 3-step wizard with session state
/challenges/static/forms/hidden-fields Required hidden inputs
/challenges/static/encoding/utf8 Mixed UTF-8 content
/challenges/static/encoding/latin1 Legacy Latin-1 page
/challenges/static/encoding/broken Malformed HTML that breaks parsers
/challenges/static/files/images Page with 50 images to bulk download
/challenges/static/files/pdfs Page linking to 20 PDFs
/challenges/static/files/large Large file requiring streaming download
/challenges/static/lists/cards Card-grid pattern
/challenges/static/lists/nested Deeply nested ul/ol structures
/challenges/static/lists/mixed-types Heterogeneous items in one list
/challenges/static/redirects/single 301 redirect
/challenges/static/redirects/chain 5-hop redirect chain
/challenges/static/redirects/meta-refresh <meta refresh> redirect
/challenges/static/redirects/js JavaScript-based redirect
/challenges/static/cookies/required Content gated behind a specific cookie
/challenges/static/cookies/set-on-visit Cookie set after first visit, content unlocks
/challenges/static/json-ld Structured data via JSON-LD
/challenges/static/microdata Schema.org microdata

Dynamic / browser automation

Requires a real browser (Playwright, Selenium, Puppeteer) or careful JS understanding.

/challenges/dynamic/spa-pure Empty initial HTML, everything JS-rendered
/challenges/dynamic/spa-routed Client-side router (no page reloads)
/challenges/dynamic/lazy-images Images with data-src and lazy-load
/challenges/dynamic/infinite-scroll/intersection IntersectionObserver-triggered
/challenges/dynamic/infinite-scroll/scroll-event Scroll-event-triggered
/challenges/dynamic/infinite-scroll/button-jsappend Button clicks, content appended via JS
/challenges/dynamic/modals/cookie-banner EU-style cookie banner blocking content
/challenges/dynamic/modals/marketing-popup Popup after N seconds
/challenges/dynamic/modals/login-wall Login-wall modal after scroll
/challenges/dynamic/iframe/same-origin iframe content extraction
/challenges/dynamic/iframe/multi-nested Nested iframes (2 levels deep)
/challenges/dynamic/shadow-dom/open Open Shadow DOM
/challenges/dynamic/shadow-dom/closed Closed Shadow DOM (harder)
/challenges/dynamic/canvas/text-rendered Text rendered as canvas (requires OCR)
/challenges/dynamic/canvas/chart Chart data only in canvas
/challenges/dynamic/click-required/reveal Data appears only after specific button click
/challenges/dynamic/hover-required/tooltip Data shown only on hover
/challenges/dynamic/drag-drop/list-reorder Drag interaction required
/challenges/dynamic/date-picker/custom Custom JS date picker
/challenges/dynamic/auto-typed/animated Text typed in via JS animation
/challenges/dynamic/heavy-dom/10k-items DOM with 10k items, virtualized

API challenges

REST, GraphQL, auth flows, and (simulated) WebSocket / SSE endpoints.

/challenges/api/rest/simple Plain JSON GET endpoint
/challenges/api/rest/paginated-page Paginated REST with ?page=
/challenges/api/rest/paginated-cursor Cursor-paginated REST
/challenges/api/rest/rate-limited Returns 429 after threshold
/challenges/api/rest/flaky Randomly fails 30%, retry/backoff
/challenges/api/rest/large-payload 50MB response, streaming
/challenges/api/graphql/playground GraphQL endpoint with introspection enabled
/challenges/api/graphql/no-introspection Same endpoint, introspection disabled
/challenges/api/graphql/persisted Persisted queries only (hash-based)
/challenges/api/websocket/echo Simulated echo WS (polling shim)
/challenges/api/websocket/live-prices Simulated live price feed (3s polling)
/challenges/api/websocket/socketio Socket.IO-style protocol (simulated)
/challenges/api/auth/basic HTTP Basic Auth
/challenges/api/auth/bearer-static Static bearer token
/challenges/api/auth/jwt-with-refresh JWT + refresh token flow
/challenges/api/auth/oauth2 Full OAuth 2.0 Authorization Code flow
/challenges/api/auth/csrf-form CSRF token dynamically extracted
/challenges/api/auth/hmac-signed HMAC-signed requests
/challenges/api/auth/api-key-in-js API key hidden in minified JS
/challenges/api/sse/notifications Server-Sent Events (simulated via polling)

Anti-bot (light, educational)

Observable not punishing, every block shows a friendly "Detection reason".

/challenges/antibot/js-challenge Interstitial that sets cookie via JS
/challenges/antibot/captcha-mock Mock CAPTCHA (button labeled "I am human")
/challenges/antibot/header-fingerprint Rejects requests with wrong header order
/challenges/antibot/ua-blocklist Blocks bot-like User-Agents
/challenges/antibot/tls-fingerprint Blocks default Python requests TLS fingerprint
/challenges/antibot/canvas-fingerprint Detects headless browsers via canvas
/challenges/antibot/webdriver-detected Checks navigator.webdriver
/challenges/antibot/rate-limit-aggressive 5 req/min hard cap
/challenges/antibot/honeypot-links Hidden links humans don't follow
/challenges/antibot/timing-detection Detects scrape patterns by request timing