Deployment

This guide covers deploying LearnPanta to Google Cloud Platform.

Architecture Overview

Prerequisites

Google Cloud SDK installed
kubectl installed
Access to the GCP project

1. Authentication

gcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud container clusters get-credentials learnpanta-gke --region us-central1

Verify access:

kubectl get pods

2. Deploying Backend Changes

Option A: Cloud Build (Recommended)

Trigger the automated build pipeline:

cd backend
gcloud builds submit --config=cloudbuild.yaml

This will:

Build Docker image from Dockerfile
Push to Artifact Registry
Update Kubernetes deployment
Perform rolling restart

Option B: Manual Deployment

Build and push manually:

# Build image
docker build -t us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest .

# Push to registry
docker push us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest

# Update deployment
kubectl set image deployment/backend backend=us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest

# Or restart to pull latest
kubectl rollout restart deployment/backend

3. Deploying Frontend

Option A: Vercel

cd frontend
vercel --prod

Option B: Cloud Build

If you use Cloud Build for frontend deployments, run:

cd frontend
gcloud builds submit --config=cloudbuild-frontend.yaml

4. Environment Configuration

Kubernetes Secrets

# View current secrets
kubectl get secrets

# Update a secret value
kubectl create secret generic backend-secrets \
  --from-literal=database-url="postgresql://..." \
  --from-literal=google-api-key="..." \
  --dry-run=client -o yaml | kubectl apply -f -

Required Environment Variables

Variable	Description
`DATABASE_URL`	Cloud SQL connection string
`GOOGLE_API_KEY`	Gemini API key
`TEMPORAL_ADDRESS`	Temporal server address
`API_KEY`	Backend authentication key

Optional Environment Variables

Variable	Description
`PINECONE_API_KEY`	Semantic search vectors
`TIMESCALE_HOST`	TimescaleDB host for analytics

5. Verifying Deployment

Check Pod Status

kubectl get pods
kubectl logs deployment/backend --tail=50

Test Endpoints

# Health check
curl https://learnpanta.com/api/v1/health

# Curation status
curl https://learnpanta.com/api/v1/curator/status

Check Temporal

kubectl exec -it deployment/temporal -- tctl namespace list
kubectl exec -it deployment/temporal -- tctl workflow list

6. Monitoring

View Logs

# Backend logs
kubectl logs -f deployment/backend

# Worker logs
kubectl logs -f deployment/worker

# Temporal logs
kubectl logs -f deployment/temporal

Resource Usage

kubectl top pods

7. Troubleshooting

Pod CrashLoopBackOff

# Check events
kubectl describe pod POD_NAME

# Check logs
kubectl logs POD_NAME --previous

Database Connection Issues

# Verify Cloud SQL proxy or direct connection
kubectl exec -it deployment/backend -- python -c "from app.database import engine; print(engine.url)"

Temporal Connection Issues

# Verify Temporal is running
kubectl exec -it deployment/worker -- python -c "from temporalio.client import Client; import asyncio; asyncio.run(Client.connect('temporal-service:7233'))"

8. Scaling

Manual Scaling

# Scale backend
kubectl scale deployment/backend --replicas=3

# Scale workers
kubectl scale deployment/worker --replicas=4

Auto-scaling (HPA)

kubectl autoscale deployment/backend --min=2 --max=10 --cpu-percent=70

9. Rollback

If a deployment fails:

# Check rollout history
kubectl rollout history deployment/backend

# Rollback to previous version
kubectl rollout undo deployment/backend

# Rollback to specific revision
kubectl rollout undo deployment/backend --to-revision=2

10. Database Migrations

Run migrations from a pod:

kubectl exec -it deployment/backend -- alembic upgrade head

Or via job:

kubectl create job --from=cronjob/migration-job manual-migration

11. Observability & Alerts

Logs: Cloud Logging for backend/worker; filter by severity>=ERROR and deployment.
Metrics: Export FastAPI/worker metrics via Prometheus (or Cloud Monitoring) — latency p95, error rate, Temporal poller metrics (temporal_workflow_task_queue_poll_success), DB connections.
Tracing: (Optional) Enable OpenTelemetry in FastAPI/worker to trace request → workflow → activity.
Alerts (suggested):
- HTTP 5xx rate > 2% for 5m
- Worker poller failures > 0 for 5m
- Temporal persistence DB CPU > 80% for 10m
- Cloud SQL connection errors spike

12. Temporal Operations Runbook

Stuck workflow: tctl workflow describe -w marathon-{session_id} → check history; if over 50 loops, continue-as-new; to force close: tctl workflow terminate.
History bloat: Lower continue-as-new threshold in workflow; monitor Temporal DB size; archive old namespaces.
Queue backlog: Scale worker replicas; verify task queue matches marathon-session-queue.
Namespace retention: Set retention to 30d+ for audit; prune completed workflows older than retention.

13. Release & Rollback Checklist

Run pytest --cov=app and pnpm lint && pnpm build.
Tag release vX.Y.Z; update changelog.
Cloud Build: ensure substitutions set (_REGION, Firebase keys, TLDRAW key).
Deploy backend, then frontend.
Post-deploy smoke:
- GET /health
- Create session, stream telemetry, finalize, debrief stream.
Rollback: kubectl rollout undo deployment/examforge-backend and /examforge-frontend; if DB migration failed, run alembic downgrade -1.

14. Backup & Restore (DB)

Cloud SQL: Automated daily backups; for point-in-time, enable PITR. Restore to new instance, then point DATABASE_URL.
TimescaleDB: Use pg_dump for analytics metrics; keep last 7 days if storage is tight.

Next Steps

Development - Local setup guide
Architecture - System overview
Agents - AI agent handbook