Deployment
This guide covers deploying LearnPanta to Google Cloud Platform.
Architecture Overview
Prerequisites
- Google Cloud SDK installed
kubectlinstalled- Access to the GCP project
1. Authentication
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
gcloud container clusters get-credentials learnpanta-gke --region us-central1
Verify access:
kubectl get pods
2. Deploying Backend Changes
Option A: Cloud Build (Recommended)
Trigger the automated build pipeline:
cd backend
gcloud builds submit --config=cloudbuild.yaml
This will:
- Build Docker image from
Dockerfile - Push to Artifact Registry
- Update Kubernetes deployment
- Perform rolling restart
Option B: Manual Deployment
Build and push manually:
# Build image
docker build -t us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest .
# Push to registry
docker push us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest
# Update deployment
kubectl set image deployment/backend backend=us-central1-docker.pkg.dev/YOUR_PROJECT/learnpanta/backend:latest
# Or restart to pull latest
kubectl rollout restart deployment/backend
3. Deploying Frontend
Option A: Vercel
cd frontend
vercel --prod
Option B: Cloud Build
If you use Cloud Build for frontend deployments, run:
cd frontend
gcloud builds submit --config=cloudbuild-frontend.yaml
4. Environment Configuration
Kubernetes Secrets
# View current secrets
kubectl get secrets
# Update a secret value
kubectl create secret generic backend-secrets \
--from-literal=database-url="postgresql://..." \
--from-literal=google-api-key="..." \
--dry-run=client -o yaml | kubectl apply -f -
Required Environment Variables
| Variable | Description |
|---|---|
DATABASE_URL | Cloud SQL connection string |
GOOGLE_API_KEY | Gemini API key |
TEMPORAL_ADDRESS | Temporal server address |
API_KEY | Backend authentication key |
Optional Environment Variables
| Variable | Description |
|---|---|
PINECONE_API_KEY | Semantic search vectors |
TIMESCALE_HOST | TimescaleDB host for analytics |
5. Verifying Deployment
Check Pod Status
kubectl get pods
kubectl logs deployment/backend --tail=50
Test Endpoints
# Health check
curl https://learnpanta.com/api/v1/health
# Curation status
curl https://learnpanta.com/api/v1/curator/status
Check Temporal
kubectl exec -it deployment/temporal -- tctl namespace list
kubectl exec -it deployment/temporal -- tctl workflow list
6. Monitoring
View Logs
# Backend logs
kubectl logs -f deployment/backend
# Worker logs
kubectl logs -f deployment/worker
# Temporal logs
kubectl logs -f deployment/temporal
Resource Usage
kubectl top pods
7. Troubleshooting
Pod CrashLoopBackOff
# Check events
kubectl describe pod POD_NAME
# Check logs
kubectl logs POD_NAME --previous
Database Connection Issues
# Verify Cloud SQL proxy or direct connection
kubectl exec -it deployment/backend -- python -c "from app.database import engine; print(engine.url)"
Temporal Connection Issues
# Verify Temporal is running
kubectl exec -it deployment/worker -- python -c "from temporalio.client import Client; import asyncio; asyncio.run(Client.connect('temporal-service:7233'))"
8. Scaling
Manual Scaling
# Scale backend
kubectl scale deployment/backend --replicas=3
# Scale workers
kubectl scale deployment/worker --replicas=4
Auto-scaling (HPA)
kubectl autoscale deployment/backend --min=2 --max=10 --cpu-percent=70
9. Rollback
If a deployment fails:
# Check rollout history
kubectl rollout history deployment/backend
# Rollback to previous version
kubectl rollout undo deployment/backend
# Rollback to specific revision
kubectl rollout undo deployment/backend --to-revision=2
10. Database Migrations
Run migrations from a pod:
kubectl exec -it deployment/backend -- alembic upgrade head
Or via job:
kubectl create job --from=cronjob/migration-job manual-migration
11. Observability & Alerts
- Logs: Cloud Logging for backend/worker; filter by
severity>=ERRORanddeployment. - Metrics: Export FastAPI/worker metrics via Prometheus (or Cloud Monitoring) — latency p95, error rate, Temporal poller metrics (
temporal_workflow_task_queue_poll_success), DB connections. - Tracing: (Optional) Enable OpenTelemetry in FastAPI/worker to trace request → workflow → activity.
- Alerts (suggested):
- HTTP 5xx rate > 2% for 5m
- Worker poller failures > 0 for 5m
- Temporal persistence DB CPU > 80% for 10m
- Cloud SQL connection errors spike
12. Temporal Operations Runbook
- Stuck workflow:
tctl workflow describe -w marathon-{session_id}→ check history; if over 50 loops, continue-as-new; to force close:tctl workflow terminate. - History bloat: Lower continue-as-new threshold in workflow; monitor Temporal DB size; archive old namespaces.
- Queue backlog: Scale worker replicas; verify task queue matches
marathon-session-queue. - Namespace retention: Set retention to 30d+ for audit; prune completed workflows older than retention.
13. Release & Rollback Checklist
- Run
pytest --cov=appandpnpm lint && pnpm build. - Tag release
vX.Y.Z; update changelog. - Cloud Build: ensure substitutions set (
_REGION, Firebase keys, TLDRAW key). - Deploy backend, then frontend.
- Post-deploy smoke:
GET /health- Create session, stream telemetry, finalize, debrief stream.
- Rollback:
kubectl rollout undo deployment/examforge-backendand/examforge-frontend; if DB migration failed, runalembic downgrade -1.
14. Backup & Restore (DB)
- Cloud SQL: Automated daily backups; for point-in-time, enable PITR. Restore to new instance, then point
DATABASE_URL. - TimescaleDB: Use
pg_dumpfor analytics metrics; keep last 7 days if storage is tight.
Next Steps
- Development - Local setup guide
- Architecture - System overview
- Agents - AI agent handbook