How I Processed 53M+ Products on One of Eastern Europe's Largest E-Commerce Platforms

Distributed e-commerce price monitoring at scale

I wanted to build something that actually worked in the real world — not just another toy scraper that gets blocked after a few hours.

So I created a distributed system that continuously monitors one of Eastern Europe's biggest e-commerce platforms, processes 53,168,387 products, detects real price drops, and sends instant Telegram alerts — all while keeping the entire operation under €26.50 per month.

The Smart Bet: Cheap Proxies + Clever Anti-Bot Layers

Most people immediately reach for expensive residential proxies when they hit anti-bot protections. I took a different path.

I used only cheap datacenter proxies (just 20 of them) and turned them into a highly effective solution by layering smart techniques on top:

  • Round-robin rotation on every single request — so no single IP gets overwhelmed and blocked.
  • Full browser fingerprint randomization (user-agent, viewport, timezone, language, Canvas noise) — to make each visit look like it comes from a completely different real user.
  • Playwright stealth patches + custom middleware — to hide the fact that it was an automated browser.
  • Automatic CAPTCHA solving with a low-cost service (CapSolver) — so when a challenge appeared, it was solved instantly and transparently.

The trade-off was clear: datacenter proxies are much cheaper and faster, but easier to detect. By carefully combining them with fingerprint spoofing, cookie persistence, and instant CAPTCHA handling, I achieved over 95% scraping success rate without ever paying for premium services. This is the kind of resourceful engineering I'm most proud of — solving hard problems without throwing money at them.

Maximum Efficiency with Minimum Resources

I was obsessed with using every resource intelligently.

I built several custom middlewares (small, focused pieces of code that sit between the browser and the target site) that:

  • Blocked images, fonts, videos, and unnecessary scripts at the browser level → 60–70% faster page loads and 30–40% less network traffic.
  • Applied early-stop logic so the crawler automatically stops pagination when discounts are no longer relevant.
  • Added aggressive filtering and deduplication → 99.93% of scanned products were smartly skipped.

Of course, adding middlewares means extra computation. But I designed them to be simple, smart, and to follow SOLID principles — so they added almost no overhead while delivering massive performance gains. This is a pattern anyone can reuse in their own projects to get much better results without needing more powerful hardware.

Real-Time Impact with Clean Decoupling

One of the best architectural decisions was building the system around a clean fan-out / fan-in pattern with Redis queues.

  • Raw data is collected once.
  • It is immediately fanned out to multiple specialized queues.
  • Multiple independent consumers process everything in parallel.

This decoupling makes the system extremely flexible. For example, today it sends Telegram alerts, but tomorrow I could easily add a new consumer that forwards the same product data to a machine-learning model for price prediction, a recommendation engine, or an analytics dashboard — without touching the core scraping logic or slowing down real-time notifications.

  • Typical alert delivery: under 1 second (most cases)
  • Maximum observed: 3 seconds
  • Used two separate Telegram bots — one dedicated to sending new notifications first (high priority), and a second one that later edits the message to enrich it with additional data. This way the user receives the alert almost instantly, while the heavier processing happens in the background. Using different bots also helped bypass Telegram's rate limits and doubled the overall throughput.

Responsible & Scalable Design

Aggressive scraping can put heavy load on websites, so I made the entire system highly configurable. I could easily control scraping speed, number of concurrent proxies, and request frequency depending on the situation.

For a performance test, I ran all proxies in parallel and saw the total scanning time drop 4 times while the VM reached 90% usage — yet the detection rate stayed exactly the same. This showed that with smart architecture I could scale performance dramatically without increasing costs.

Results at a Glance

Products processed53,168,387
Monthly cost€26.5020 proxies, €1.50 captcha, €5 VM
Scraping success95%+
Skip rate (efficiency)99.93%
Alert latency1-3s typical
Uptime100% (3 months)

Key Lessons

In an era where AI lets us build things faster than ever, it's easy to forget one important question: at what cost?

This project taught me that being a truly effective software engineer is not about using the most expensive tools or the latest hype. It's about being resourceful and strategic with every resource—time, money, and infrastructure.

Instead of throwing budget at "it works" solutions, I focused on smart patterns: layered anti-bot techniques, custom middlewares, aggressive optimization, and clean decoupling through queues. The result was a professional-grade system that delivered outstanding performance at a fraction of the usual cost.

Great engineering means choosing the right tradeoffs: when to spend for speed/reliability, when clever architecture delivers more value than cash.

This project proves you can build production-grade systems serving 53M+ products without enterprise budgets—by prioritizing decoupling, efficiency, and layered defenses. That strategic thinking shapes how I approach every system I design.