Catalog108 / challenges
Challenges index
Every scraping challenge on Catalog108, categorized. Each link is a permanent URL referenced from
Scraping Central / Learn.
Each challenge page has its own grader endpoint at /challenges/…/grade.
Phase 1A: URL surface is laid out below. Phase 1B+ ships the actual challenge pages and graders.
Static scraping
HTML + HTTP only. No JavaScript needed to extract the data.
-
/challenges/static/tables/simpleBasic HTML table extraction -
/challenges/static/tables/nestedTables with merged cells, nested tables -
/challenges/static/tables/dynamic-headersTables where headers differ per row -
/challenges/static/pagination/numbered?page=N URL pagination -
/challenges/static/pagination/offset?offset=N&limit=M -
/challenges/static/pagination/cursorCursor-based pagination -
/challenges/static/pagination/load-more-http"Load more" without JS (server-rendered) -
/challenges/static/pagination/unknown-endLast page indeterminate, must detect -
/challenges/static/forms/simpleGET form -
/challenges/static/forms/postPOST form -
/challenges/static/forms/csrfCSRF-protected form -
/challenges/static/forms/multi-step3-step wizard with session state -
/challenges/static/forms/hidden-fieldsRequired hidden inputs -
/challenges/static/encoding/utf8Mixed UTF-8 content -
/challenges/static/encoding/latin1Legacy Latin-1 page -
/challenges/static/encoding/brokenMalformed HTML that breaks parsers -
/challenges/static/files/imagesPage with 50 images to bulk download -
/challenges/static/files/pdfsPage linking to 20 PDFs -
/challenges/static/files/largeLarge file requiring streaming download -
/challenges/static/lists/cardsCard-grid pattern -
/challenges/static/lists/nestedDeeply nested ul/ol structures -
/challenges/static/lists/mixed-typesHeterogeneous items in one list -
/challenges/static/redirects/single301 redirect -
/challenges/static/redirects/chain5-hop redirect chain -
/challenges/static/redirects/meta-refresh<meta refresh> redirect -
/challenges/static/redirects/jsJavaScript-based redirect -
/challenges/static/cookies/requiredContent gated behind a specific cookie -
/challenges/static/cookies/set-on-visitCookie set after first visit, content unlocks -
/challenges/static/json-ldStructured data via JSON-LD -
/challenges/static/microdataSchema.org microdata
Dynamic / browser automation
Requires a real browser (Playwright, Selenium, Puppeteer) or careful JS understanding.
-
/challenges/dynamic/spa-pureEmpty initial HTML, everything JS-rendered -
/challenges/dynamic/spa-routedClient-side router (no page reloads) -
/challenges/dynamic/lazy-imagesImages with data-src and lazy-load -
/challenges/dynamic/infinite-scroll/intersectionIntersectionObserver-triggered -
/challenges/dynamic/infinite-scroll/scroll-eventScroll-event-triggered -
/challenges/dynamic/infinite-scroll/button-jsappendButton clicks, content appended via JS -
/challenges/dynamic/modals/cookie-bannerEU-style cookie banner blocking content -
/challenges/dynamic/modals/marketing-popupPopup after N seconds -
/challenges/dynamic/modals/login-wallLogin-wall modal after scroll -
/challenges/dynamic/iframe/same-originiframe content extraction -
/challenges/dynamic/iframe/multi-nestedNested iframes (2 levels deep) -
/challenges/dynamic/shadow-dom/openOpen Shadow DOM -
/challenges/dynamic/shadow-dom/closedClosed Shadow DOM (harder) -
/challenges/dynamic/canvas/text-renderedText rendered as canvas (requires OCR) -
/challenges/dynamic/canvas/chartChart data only in canvas -
/challenges/dynamic/click-required/revealData appears only after specific button click -
/challenges/dynamic/hover-required/tooltipData shown only on hover -
/challenges/dynamic/drag-drop/list-reorderDrag interaction required -
/challenges/dynamic/date-picker/customCustom JS date picker -
/challenges/dynamic/auto-typed/animatedText typed in via JS animation -
/challenges/dynamic/heavy-dom/10k-itemsDOM with 10k items, virtualized
API challenges
REST, GraphQL, auth flows, and (simulated) WebSocket / SSE endpoints.
-
/challenges/api/rest/simplePlain JSON GET endpoint -
/challenges/api/rest/paginated-pagePaginated REST with ?page= -
/challenges/api/rest/paginated-cursorCursor-paginated REST -
/challenges/api/rest/rate-limitedReturns 429 after threshold -
/challenges/api/rest/flakyRandomly fails 30%, retry/backoff -
/challenges/api/rest/large-payload50MB response, streaming -
/challenges/api/graphql/playgroundGraphQL endpoint with introspection enabled -
/challenges/api/graphql/no-introspectionSame endpoint, introspection disabled -
/challenges/api/graphql/persistedPersisted queries only (hash-based) -
/challenges/api/websocket/echoSimulated echo WS (polling shim) -
/challenges/api/websocket/live-pricesSimulated live price feed (3s polling) -
/challenges/api/websocket/socketioSocket.IO-style protocol (simulated) -
/challenges/api/auth/basicHTTP Basic Auth -
/challenges/api/auth/bearer-staticStatic bearer token -
/challenges/api/auth/jwt-with-refreshJWT + refresh token flow -
/challenges/api/auth/oauth2Full OAuth 2.0 Authorization Code flow -
/challenges/api/auth/csrf-formCSRF token dynamically extracted -
/challenges/api/auth/hmac-signedHMAC-signed requests -
/challenges/api/auth/api-key-in-jsAPI key hidden in minified JS -
/challenges/api/sse/notificationsServer-Sent Events (simulated via polling)
Anti-bot (light, educational)
Observable not punishing, every block shows a friendly "Detection reason".
-
/challenges/antibot/js-challengeInterstitial that sets cookie via JS -
/challenges/antibot/captcha-mockMock CAPTCHA (button labeled "I am human") -
/challenges/antibot/header-fingerprintRejects requests with wrong header order -
/challenges/antibot/ua-blocklistBlocks bot-like User-Agents -
/challenges/antibot/tls-fingerprintBlocks default Python requests TLS fingerprint -
/challenges/antibot/canvas-fingerprintDetects headless browsers via canvas -
/challenges/antibot/webdriver-detectedChecks navigator.webdriver -
/challenges/antibot/rate-limit-aggressive5 req/min hard cap -
/challenges/antibot/honeypot-linksHidden links humans don't follow -
/challenges/antibot/timing-detectionDetects scrape patterns by request timing