Image Upload & Thumbnails Track

How to Use This Track

Learning by Shipping File Services

Ground rules

Validate early. User uploads are vectors for attacks (zip bombs, infinite files, malware). Validate format, size, and content.
Process asynchronously. Image resizing is expensive. Never block the upload on processing.
Embrace immutability. Once uploaded, images don't change. Use ETags and cache aggressively.
Pick a storage backend and SDK. AWS S3, Google Cloud Storage, or self-hosted MinIO are all valid. Learn your SDK deeply.
Edge cases are the work. EXIF data, corrupted files, unsupported formats, SSRF, path traversal — think like an attacker.

The eleven modules

MODULE 01

Foundations & Setup

Repo, schema, design doc.

MODULE 02

File Upload & Validation

MIME type, size, magic bytes checks.

MODULE 03

Async Thumbnail Generation

Bull queue, Sharp, multiple sizes.

MODULE 04

Cloud Storage Integration

S3 uploads, signed URLs, deletion.

MODULE 05

Image Metadata & EXIF

Extract, store, strip sensitive data.

MODULE 06

Image Transformations & Filters

Resize, crop, blur, sepia, caching.

MODULE 07

Search, Tags & Collections

Full-text search, organization, discovery.

MODULE 08

Testing & Quality Assurance

Unit, integration, formats, edge cases.

MODULE 09

Observability & Caching

Logs, metrics, Redis caching.

MODULE 10

Security & Compliance

Auth, RBAC, audit logging, GDPR.

MODULE 11

Capstone & Production

Docker, CI/CD, runbooks, docs.

Module 01 · ~2–3 hrs

Foundations & Project Setup

Start with a clear design. Image services are deceptively complex — edge cases, security, and performance are everywhere.

Tasks

Create a Git repo with standard config (.editorconfig, .gitignore, linter/formatter).
Scaffold an HTTP server. Pick one: Node, Python, Go, Rust.
Write a one-page design doc: problem, goals, storage backend choice, image sizes you'll generate, API sketch, failure modes.
Create database schema: images (metadata), variants (sizes/formats), uploads (user tracking).
Document your validation and security strategy in the design doc.

Acceptance criteria

git clone + one command spins up the server.
Design doc committed to /docs/design.md with schema sketch.
Linter passes. Pre-commit hook blocks failures.

Deep dives: Documentation & Writing · Problem Decomposition · Schema Design

Module 02 · ~4–6 hrs

Upload API & Blob Storage

The baseline: accept uploads, store them durably, and serve them back. Start with your storage backend.

Tasks

Set up blob storage: AWS S3, Google Cloud Storage, or MinIO locally.
Implement POST /upload: multipart file upload, store in blob storage, record metadata in DB.
Implement GET /images/:id: fetch from blob storage, return with correct content-type.
Implement DELETE /images/:id: delete from blob storage and DB.
Return proper HTTP headers: Content-Type, ETag, Cache-Control.

Acceptance criteria

curl -F file=@test.jpg POST /upload stores the file and returns an image ID.
GET /images/:id returns the file with correct content-type.
Deleting an image removes it from blob storage and the database.

Deep dives: REST API Design · Relational Databases

Module 03 · ~4–6 hrs

Input Validation & Security

Users can upload anything. Validate format, size, content. Prevent zip bombs, SSRF, path traversal.

Tasks

File type validation: check magic bytes (file signature), not just extension.
Size limits: max file size (e.g., 50MB). Rate limit uploads per user.
SSRF prevention: if you support URL uploads, block private IPs and metadata endpoints.
Malware scanning (optional): integrate ClamAV or VirusTotal for safety.
Reject uploads immediately if validation fails with clear 400 error.

Acceptance criteria

Uploading a text file with .jpg extension is rejected.
Uploading a file larger than your limit is rejected with 413 Payload Too Large.
A rate-limited user gets 429 Too Many Requests.

Deep dives: OWASP Top 10 · Input Validation

Module 04 · ~5–7 hrs

Async Image Processing

Resizing images is expensive. Don't block the upload. Enqueue, process asynchronously, store variants.

Tasks

On successful upload, enqueue a processing job for the image.
A worker processes the job: read the original from blob storage, generate a thumbnail (200x200), store both.
Use ImageMagick, libvips, or equivalent. libvips is faster and uses less memory.
Track processing state: pending, processing, success, failed.
If processing fails, don't lose the original image. Log the error and expose it in the API.

Acceptance criteria

Upload an image; GET /images/:id?size=thumb blocks until the thumbnail is ready.
Processing a 5MB image takes < 5 seconds on local hardware.
A processing failure doesn't delete the original.

Deep dives: Message Queues · Event-Driven Patterns

Module 05 · ~5–7 hrs

Multiple Sizes & Formats

Different clients need different sizes. Generate: thumbnail (200x200), medium (800x800), full (original). Support modern formats: WebP, AVIF.

Tasks

Generate multiple variants: thumb, medium, large. Store each as a variant in blob storage.
Implement responsive images: GET /images/:id?size=medium&format=webp returns WebP, falling back to JPEG.
On-demand generation (optional): if a variant doesn't exist, generate it on-the-fly and cache.
Document your variant strategy in the design doc: which sizes, which formats, storage costs.
Measure disk usage: log variant sizes so you can optimize later.

Acceptance criteria

GET /images/:id?size=thumb returns a 200x200 thumbnail.
GET /images/:id?format=webp returns WebP if available, else original.
You can describe storage overhead: original + 2 thumbnails + WebP variants.

Deep dives: Caching · Performance Optimization

Module 06 · ~3–4 hrs

Metadata & EXIF Handling

EXIF data can leak sensitive info (GPS, camera, timestamps). Extract what's useful, strip what's not.

Tasks

Extract EXIF on upload: camera model, date taken, dimensions. Store in the images table.
Strip sensitive EXIF on variant generation (GPS, camera serial).
Expose GET /images/:id/metadata with safe EXIF (omit GPS, serial).
Use dimensions from EXIF to validate uploads (e.g., reject images < 100x100).
Document your EXIF stripping strategy in the design doc.

Acceptance criteria

GET /images/:id/metadata returns EXIF data (no GPS or sensitive info).
Variants don't contain GPS or camera serial in EXIF.
You can describe why EXIF stripping matters (privacy).

Deep dives: Privacy & Data Protection

Module 07 · ~4–6 hrs

CDN Integration & Distribution

Serve images from edge locations, not your origin. Integrate with a CDN: CloudFlare, Fastly, or AWS CloudFront.

Tasks

Set up a CDN (CloudFlare, Fastly, or equivalent). Point to your origin.
Configure cache headers: Cache-Control: public, max-age=31536000 for immutable images.
Implement POST /images/:id/purge to purge from CDN if an image is deleted.
Use a CDN-friendly URL format: https://cdn.example.com/images/:id.
Measure: edge cache hit rate, origin traffic reduction, latency from different regions.

Acceptance criteria

Image served from CDN (check response headers for CF-Cache-Status or equivalent).
Deleting an image purges it from the CDN.
Cache-Control headers are correct (long-lived for immutable images).

Deep dives: Caching · Deployment & Distribution

Module 08 · ~4–6 hrs

Signed URLs & Access Control

Not all images are public. Generate time-limited signed URLs for private uploads. Implement ownership and sharing.

Tasks

Add a public flag to images. Private images are only accessible via signed URLs.
Implement POST /images/:id/signed-url: generate a time-limited signed URL (valid for 1 hour, configurable).
Implement ownership: only the uploader can delete their images and generate signed URLs.
Use HMAC or JWT for signing. Validate signature on GET /images/:id.
Expose POST /images/:id/share to grant time-limited access to another user.

Acceptance criteria

Private images return 403 Forbidden without a valid signed URL.
POST /images/:id/signed-url generates a URL valid for 1 hour.
User A cannot delete User B's private images.

Deep dives: Authentication · Authorization

Module 09 · ~5–7 hrs

Observability & Performance

You can't debug what you can't see. Instrument: logs, metrics, traces. Monitor upload latency, processing time, CDN performance.

Tasks

Structured logs for every upload: file size, content-type, processing time, variants generated.
Metrics: uploads/sec, total storage used, variant generation time (p50, p95, p99), CDN hit rate.
Distributed tracing: instrument the full path (upload → validate → store → process variants).
Build a Grafana dashboard: upload volume, processing queue depth, variant generation latency, storage usage.
Define SLOs: e.g., 99.9% of uploads complete within 30 seconds (over 30 days).

Acceptance criteria

You can query logs for a single upload: validation → storage → variant generation.
Dashboard shows variant generation latency; you can identify bottlenecks.
SLO doc with error budget committed.

Deep dives: Logging · Metrics · Tracing · SLOs

Module 10 · ~5–7 hrs

Testing & Resilience

Edge cases are everywhere: corrupted uploads, missing processing, quota exceeded. Test them all.

Tasks

Unit tests for validation: magic bytes, file size, EXIF parsing.
Integration tests: upload → validate → store → process → serve. Verify end-to-end.
Edge case tests: corrupted JPEG, 0-byte file, 50MB file, animated GIF, AVIF without support.
Chaos tests: kill blob storage, kill processing worker. Verify originals aren't lost.
Coverage threshold: 80% for validation and processing logic.

Acceptance criteria

Test suite runs in under 2 minutes.
You can describe 5 edge cases your tests cover (e.g., corrupted JPEG, missing processor).
Killing the processor doesn't lose uploads; jobs retry on restart.

Deep dives: Unit Testing · Integration Testing · TDD

Module 11 · ~6–8 hrs

Capstone & Production Readiness

Ship it. Deploy, document, set up monitoring, go live.

Tasks

Write a multi-stage Dockerfile. Use docker-compose.yml for local dev (app + Postgres + worker + MinIO).
GitHub Actions: lint → test → build → push → deploy.
Deploy to production: Fly.io, Render, Railway, or a VM. Set up health checks and auto-rollback.
Write two runbooks: "storage quota exceeded" and "variant generation is slow".
Write a capacity planning guide: storage calculations, processing capacity, CDN costs.
Polish the README: architecture diagram, API examples, how to run locally, how to deploy.

Acceptance criteria

Every PR gets a preview URL; main deploys to production automatically.
A bad deploy auto-rolls back.
Runbooks + capacity planning guide committed to /docs.

Deep dives: CI/CD · Containers · Operational Excellence

After the Track

Where to Go Next

Stretch goals on the same project: watermarking, background blur, AI tagging, image search, video support, progressive uploads.
Read about media processing. Study libvips, FFmpeg, ML-based image processing; sketch how you'd add video transcoding.
Try the Webhook Delivery track to deepen your async systems knowledge.
Write about it. Blog post per module; especially on EXIF stripping, CDN cache invalidation, and variant generation.

↑ Back to Learning Tracks ↑ Back to Map