Deployment

This guide covers deploying LearnPanta to Google Cloud Platform.

Architecture Overview

Prerequisites

1. Authentication

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud container clusters get-credentials learnpanta-gke --region us-central1

Verify access:

kubectl get pods

2. Deploying Backend Changes

Option A: Cloud Build (Recommended)

Trigger the automated build pipeline:

cd backend
gcloud builds submit --config=cloudbuild.yaml

This will:

  1. Build Docker image from Dockerfile
  2. Push to Artifact Registry
  3. Update Kubernetes deployment
  4. Perform rolling restart

Option B: Manual Deployment

Build and push manually:

# Build image
docker build -t us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest .

# Push to registry
docker push us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest

# Update deployment
kubectl set image deployment/backend backend=us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest

# Or restart to pull latest
kubectl rollout restart deployment/backend

3. Deploying Frontend

Option A: Vercel

cd frontend
vercel --prod

Option B: Cloud Build

If you use Cloud Build for frontend deployments, run:

cd frontend
gcloud builds submit --config=cloudbuild-frontend.yaml

4. Environment Configuration

Kubernetes Secrets

# View current secrets
kubectl get secrets

# Update a secret value
kubectl create secret generic backend-secrets \
  --from-literal=database-url="postgresql://..." \
  --from-literal=google-api-key="..." \
  --dry-run=client -o yaml | kubectl apply -f -

Required Environment Variables

VariableDescription
DATABASE_URLCloud SQL connection string
GOOGLE_API_KEYGemini API key
TEMPORAL_ADDRESSTemporal server address
API_KEYBackend authentication key

Optional Environment Variables

VariableDescription
PINECONE_API_KEYSemantic search vectors
TIMESCALE_HOSTTimescaleDB host for analytics

5. Verifying Deployment

Check Pod Status

kubectl get pods
kubectl logs deployment/backend --tail=50

Test Endpoints

# Health check
curl https://learnpanta.com/api/v1/health

# Curation status
curl https://learnpanta.com/api/v1/curator/status

Check Temporal

kubectl exec -it deployment/temporal -- tctl namespace list
kubectl exec -it deployment/temporal -- tctl workflow list

6. Monitoring

View Logs

# Backend logs
kubectl logs -f deployment/backend

# Worker logs
kubectl logs -f deployment/worker

# Temporal logs
kubectl logs -f deployment/temporal

Resource Usage

kubectl top pods

7. Troubleshooting

Pod CrashLoopBackOff

# Check events
kubectl describe pod POD_NAME

# Check logs
kubectl logs POD_NAME --previous

Database Connection Issues

# Verify Cloud SQL proxy or direct connection
kubectl exec -it deployment/backend -- python -c "from app.database import engine; print(engine.url)"

Temporal Connection Issues

# Verify Temporal is running
kubectl exec -it deployment/worker -- python -c "from temporalio.client import Client; import asyncio; asyncio.run(Client.connect('temporal-service:7233'))"

8. Scaling

Manual Scaling

# Scale backend
kubectl scale deployment/backend --replicas=3

# Scale workers
kubectl scale deployment/worker --replicas=4

Auto-scaling (HPA)

kubectl autoscale deployment/backend --min=2 --max=10 --cpu-percent=70

9. Rollback

If a deployment fails:

# Check rollout history
kubectl rollout history deployment/backend

# Rollback to previous version
kubectl rollout undo deployment/backend

# Rollback to specific revision
kubectl rollout undo deployment/backend --to-revision=2

10. Database Migrations

Run migrations from a pod:

kubectl exec -it deployment/backend -- alembic upgrade head

Or via job:

kubectl create job --from=cronjob/migration-job manual-migration

11. Observability & Alerts

  • Logs: Cloud Logging for backend/worker; filter by severity>=ERROR and deployment.
  • Metrics: Export FastAPI/worker metrics via Prometheus (or Cloud Monitoring) — latency p95, error rate, Temporal poller metrics (temporal_workflow_task_queue_poll_success), DB connections.
  • Tracing: (Optional) Enable OpenTelemetry in FastAPI/worker to trace request → workflow → activity.
  • Alerts (suggested):
    • HTTP 5xx rate > 2% for 5m
    • Worker poller failures > 0 for 5m
    • Temporal persistence DB CPU > 80% for 10m
    • Cloud SQL connection errors spike

12. Temporal Operations Runbook

  • Stuck workflow: tctl workflow describe -w marathon-{session_id} → check history; if over 50 loops, continue-as-new; to force close: tctl workflow terminate.
  • History bloat: Lower continue-as-new threshold in workflow; monitor Temporal DB size; archive old namespaces.
  • Queue backlog: Scale worker replicas; verify task queue matches marathon-session-queue.
  • Namespace retention: Set retention to 30d+ for audit; prune completed workflows older than retention.

13. Release & Rollback Checklist

  1. Run pytest --cov=app and pnpm lint && pnpm build.
  2. Tag release vX.Y.Z; update changelog.
  3. Cloud Build: ensure substitutions set (_REGION, Firebase keys, TLDRAW key).
  4. Deploy backend, then frontend.
  5. Post-deploy smoke:
    • GET /health
    • Create session, stream telemetry, finalize, debrief stream.
  6. Rollback: kubectl rollout undo deployment/examforge-backend and /examforge-frontend; if DB migration failed, run alembic downgrade -1.

14. Backup & Restore (DB)

  • Cloud SQL: Automated daily backups; for point-in-time, enable PITR. Restore to new instance, then point DATABASE_URL.
  • TimescaleDB: Use pg_dump for analytics metrics; keep last 7 days if storage is tight.

Next Steps