AI in CareerForge
Deep Dive

A complete technical breakdown of how artificial intelligence is embedded in CareerForge — architecture, implementation code, guardrails, current strengths and weaknesses, and the path toward a self-improving, simulation-driven platform.

Last updated: 23 April 2026  ·  Internal use only

What AI Does in CareerForge

AI in CareerForge is not a single feature — it is an intelligence layer that runs across all four portals, generating personalised career guidance, synthesising market data, and building structured learning paths. Every AI call is grounded in real business context (the employee's profile, skill inventory, anonymised coaching relationship) and constrained to JSON-structured output.

🗺️
Learning Paths
Claude generates a personalised node-graph learning path for each employee at registration, spanning skills, certifications, projects, and milestones.
🎯
Role Suggestions
Based on the employee's specialisation and seniority, AI proposes 5 emerging roles with match scores and required skills.
📊
Market Intelligence
Six technology market trends updated every 6 hours. Role demand catalogue refreshed every 2 hours with high/medium/low distribution.
🤝
Coaching Advice
Coaches submit a question about an employee (using their pseudo-ID only). AI responds with specific, actionable advice grounded in market context.
🧭
Skill Gap Analysis
AI maps the top 8 required skills for any target role, with required proficiency level and market demand score, cached for 12 hours.
📰
Intelligence Digest
A weekly digest synthesising connector-ingested market data into a structured report: top rising skills, role movements, and recommended actions.
ℹ️
Zero hallucination surface on user identity Every AI prompt uses anonymised profiles only. The system prompt for coaching advice explicitly says "Never reveal or speculate about personal identity or PII." Employee names, emails, and exact locations are never sent to any LLM.

AI Architecture

The AI layer is a dedicated microservice (ai-connector-service, port 8001) plus two Celery workers (celery-worker, celery-beat). All other backend services import directly from the shared RAG module — there is no inter-service HTTP call to the AI connector for inference; only for connector management (CRUD) and async task status polling.

Component Map

ComponentFilePort / TriggerRole
ai-connector-serviceservices/ai_connector/main.py:8001FastAPI: connector CRUD, task status polling, Prometheus metrics
rag.pyservices/ai_connector/rag.pyimportedLLM fallback chain, caching, text search, all public AI functions
embeddings.pyservices/ai_connector/embeddings.pyimportedText chunking + embedding generation (Ollama / OpenAI)
celery-workertasks/ingestion.py, tasks/pipeline.pyRedis queueConnector sync, learning path generation, intelligence digest
celery-beatservices/ai_connector/celery_app.pyCron (hourly)Schedules sync_all_connectors every hour
PostgreSQL / pgvectorshared/models/ai.pyDBStores ai_documents, vectors (1536 dims), connector configs, ingestion history
Rediscelery_app.py:6379Celery broker + result backend

LLM Provider Fallback Chain

Claude Sonnet 4.6
Primary (Reasoning)
Perplexity sonar-pro
Fallback (Live web search)
Graceful Empty
Both unavailable
async def _ai_chat(prompt: str, system: str = "") -> str:
    # rag.py:72 — try Claude first (more capable), fall back to Perplexity (has live web search)
    if ANTHROPIC_API_KEY:
        result = await _anthropic_chat(prompt, system)
        if result:
            return result
    return await _perplexity_chat(prompt, system)

Embedding Provider Fallback

Ollama nomic-embed-text
Default (Local, free)
OpenAI text-embedding-3-large
Override (OPENAI_API_KEY set)

Both providers produce 1536-dimensional vectors, stored in PostgreSQL with the pgvector extension.

The RAG Pattern

RAG — Retrieval-Augmented Generation — grounds LLM responses in real documents. In CareerForge the knowledge base is the ai_documents table, populated by connector ingestion. Every AI function first searches this table, prepends relevant chunks to the prompt, then calls the LLM.

Request Flow

Portal request
(employee / coach)
Backend service
employee/coach svc
rag.py
cache lookup
↓ cache miss
Text search
ai_documents LIKE
Build prompt
context + task
LLM call
Claude / Perplexity
Parse JSON
+ cache result

Public AI Functions

FunctionConsumerOutput SchemaCache TTL
get_role_suggestions_for_profile()
rag.py:236
Employee portal [{role_title, match_score, description, required_skills[]}] 24 hours
get_role_skill_requirements()
rag.py:272
Employee portal [{skill, skill_ids[], required_level, market_demand}] 12 hours
get_coaching_advice()
rag.py:301
Coach portal {answer, sources[], model_used, generated_at} No cache
get_emerging_roles()
rag.py:340
Coach portal (Catalogues) [{role_title, demand_level, avg_salary_range, required_skills[], growth_rate_percent}] 2 hours
get_market_trends()
rag.py:373
Coach portal (Analytics) [{topic, trend_direction, summary, confidence, sources[]}] 6 hours

Caching Strategy

AI responses are expensive — both in latency (1–5 seconds) and API cost. CareerForge uses the ai_documents PostgreSQL table as a key-value cache. This gives the cache an automatic audit trail: every cache entry is a document with an ingestion timestamp.

TTL Tiers (by volatility)

roles_catalogue
2h
market_trends
6h
skill_gap (per role)
12h
role_suggestions (per employee)
24h
intelligence_digest
7 days (168h)

Cache Key Design

Key PatternScoped ByExample
__cache_roles_{pseudo_id}__Individual employee__cache_roles_a3f7b2__
__cache_roles_{spec}_{seniority}__Specialisation + level__cache_roles_backend_senior__
__cache_roles_catalogue_{spec}__Specialisation (coach view)__cache_roles_catalogue_general__
__cache_gap_{safe_role}__Target role__cache_gap_data_engineer__
__cache_trends_{topic}__Topic keyword__cache_trends_cloud__
__digest__Global singleton__digest__
# rag.py:120 — cache lookup with TTL check
async def _get_cached_response(cache_key: str, ttl_hours: int) -> list | dict | None:
    async with AsyncSessionLocal() as session:
        row = (await session.execute(
            sql_text("SELECT content, ingested_at FROM ai_documents "
                     "WHERE source_name = :key ORDER BY ingested_at DESC LIMIT 1"),
            {"key": cache_key}
        )).fetchone()

    if not row:
        return None
    gen_at = datetime.fromisoformat(row.ingested_at)
    if (datetime.now(UTC) - gen_at) > timedelta(hours=ttl_hours):
        return None              # expired — caller will regenerate
    return json.loads(row.content)
⚠️
Cache accumulation Old cache entries are never deleted — they simply age past their TTL and become invisible to the lookup query. The table grows over time. A scheduled cleanup task (not yet implemented) should prune entries older than 2× their TTL.

Connector Ingestion

Connectors are configurable data sources — REST APIs, RSS feeds, JSON endpoints — that feed real-world market intelligence into CareerForge. The Admin portal provides CRUD for connectors. Ingestion runs hourly via Celery Beat.

Data Models

TablePurposeKey Fields
connector_configs Registry of data sources name, source_type, config_json_encrypted, fetch_interval_minutes, trend_score
connector_sources Individual URLs per connector connector_id, url, label, last_status_code
ai_documents Ingested content + AI cache source_name, content, topics[], trend_score, metadata_json
ai_document_vectors Chunk embeddings document_id, chunk_index, chunk_text, embedding vector(1536)
ingestion_history Audit trail per sync connector_id, chunks_ingested, duration_seconds, success, error_message

Ingestion Pipeline (tasks/ingestion.py)

Beat (hourly)
HTTP GET each source
Normalise fields
SHA-256 dedup
Chunk (512 words, 64 overlap)
Embed + store vector

Response normalisation handles multiple API schemas automatically: items[], data[], results[], or bare dict. Field aliases resolve content/body/description/text and url/link transparently.

Content-addressed dedup Each document is hashed (SHA-256 of content). Re-ingesting the same document has no effect — the hash is checked in metadata_json["content_hash"] before insertion. This makes repeated hourly syncs of slow-updating sources cheap.

Celery Pipeline Tasks

Long-running AI generation — learning paths and the weekly digest — runs asynchronously in Celery workers, so API responses remain fast. The client polls GET /api/v1/tasks/{task_id} for status.

TaskTriggerLLM BudgetOutput
generate_initial_profile
pipeline.py:19
Employee registration (fire-and-forget) Claude with thinking 2048 LearningPath node graph stored in DB
generate_learning_path
pipeline.py:125
Manual refresh by employee Claude with thinking 2048 Updated LearningPath, task_id returned for polling
generate_intelligence_digest
pipeline.py:227
Weekly (Beat) or coach requests refresh Claude with thinking 4096 JSON digest: rising skills, role movements, recommended actions
sync_all_connectors
ingestion.py:249
Hourly Beat schedule No LLM — embedding model only New documents + vectors in ai_documents

Learning Path Output Schema

{
  "target_role": "Senior Data Engineer",
  "summary": "12-month path targeting 8 skills across 4 phases...",
  "nodes": [
    {
      "id": "n1",
      "title": "Apache Kafka Fundamentals",
      "type": "skill",            // skill | cert | project | milestone
      "description": "...",
      "estimated_hours": 40,
      "prerequisites": [],
      "resources": [{ "type": "course", "title": "...", "url": "..." }]
    }
  ]
}

Intelligence Digest Schema

{
  "week_summary": "...",
  "top_skills_rising": [{ "skill": "Rust", "trend": "growing 34% YoY" }],
  "role_movements": [{ "role": "AI Engineer", "direction": "rising", "insight": "..." }],
  "recommended_actions": ["Upskill Python ML team in LangGraph", "..."],
  "generated_at": "2026-04-21T08:00:00Z"
}

Code Walkthrough

The following are the key implementation files with their roles and the most important code patterns.

services/
  ai_connector/
    rag.py  ← LLM orchestration, caching, all public AI functions (401 lines)
    embeddings.py  ← Ollama/OpenAI embedding + chunking (67 lines)
    main.py  ← FastAPI app: connector CRUD + task status API (52 lines)
    celery_app.py  ← Celery config + Beat schedule (39 lines)
    db.py  ← Async SQLAlchemy session (46 lines)
    tasks/
      ingestion.py  ← Connector sync: HTTP fetch → chunk → embed → store (273 lines)
      pipeline.py  ← Learning path + digest generation with extended thinking (333 lines)
    routers/
      tasks.py  ← GET /tasks/{id} status polling (35 lines)
  employee/routers/
    learning.py  ← /learning-path, /skill-gap, /role-suggestions (214 lines)
    registration.py  ← triggers generate_initial_profile.delay() (175 lines)
  coach/routers/
    advice.py  ← POST /employees/{pseudo_id}/ai-advice (79 lines)
    catalogues.py  ← GET /catalogues/roles (194 lines)
    analytics.py  ← GET /analytics/market-trend, /intelligence-digest (288 lines)
shared/
  models/
    ai.py  ← ConnectorConfig, AiDocument, AiDocumentVector, IngestionHistory (103 lines)
  auth/
    keycloak.py  ← RS256 JWT verify, JWKS caching 300s, realm-based verifiers (169 lines)

Prompt Engineering Pattern

All prompts follow a consistent 3-part template: identity declaration → context injection → JSON-only output constraint.

# rag.py:253–265 — role suggestions prompt assembly
system = (
    "You are a career intelligence system specialising in IT career paths. "
    "Suggest emerging roles based on current 2025 market data. "
    "Respond ONLY with a valid JSON array of objects with keys: "
    "role_title (string), match_score (0.0-1.0), description (string), "
    "required_skills (array of strings)."
)
context_block = ("Connector market data:\n" + context + "\n\n") if context else ""
prompt = (
    context_block
    + f"Suggest 5 emerging IT career roles for a {seniority} {specialization} professional "
    + "in 2025. Focus on high-demand, growing roles. Return JSON array only."
)

JSON Parsing with Graceful Fallback

# rag.py:81 — robust JSON extraction from LLM output
def _parse_json(text: str) -> list:
    if not text: return []
    text = text.strip()
    if "```" in text:                        # strip markdown code fences
        for part in text.split("```"):
            p = part.strip()
            if p.startswith("json"): p = p[4:].strip()
            if p.startswith("[") or p.startswith("{"): text = p; break
    try:
        result = json.loads(text)
        return result if isinstance(result, list) else [result]
    except json.JSONDecodeError:
        m = re.search(r"\[.*\]", text, re.DOTALL)  # last-chance regex
        if m:
            try: return json.loads(m.group())
            except: pass
    return []                                      # graceful empty on failure

Safety & Guardrails

Guardrails in CareerForge operate at three levels: prompt-level constraints, input validation, and infrastructure-level access control. The goal is to prevent PII leaks, hallucinated identities, runaway API spend, and prompt injection from malicious connector data.

GuardrailLayerImplementation
PII never sent to LLM Prompt Coaching prompts use pseudo_id + aggregated profile only. System prompt: "Never reveal or speculate about personal identity or PII."
JSON-only output Prompt Every system prompt ends with "Respond ONLY with a valid JSON array". _parse_json() strips markdown and falls back to empty list — never returns free text to the UI.
Input length cap API Coach advice question: max 2000 chars (FastAPI validator, advice.py:42). 422 returned if exceeded.
LLM timeout HTTP httpx.AsyncClient(timeout=45.0) for all LLM calls. Connector fetches use 60s timeout. Prevents stuck requests blocking workers.
Celery task isolation Infra task_acks_late=True, worker_prefetch_multiplier=1: tasks only ACK on success, preventing duplicate processing on crash.
Connector dedup Ingestion SHA-256 content hash checked before insertion. Prevents prompt poisoning from re-injecting modified documents at the same URL.
Encrypted connector config DB Connector API keys stored in config_json_encrypted (AES-256-GCM via Vault-backed key). Never logged.
API key injection Infra LLM keys come from Kubernetes Sealed Secrets → env vars. Never in code or ConfigMaps.
Retry cap Celery max_retries=3 on connector sync tasks. Failed ingestion writes to IngestionHistory.error_message and stops retrying after 3 attempts.
CORS allow-list Network Only the four portal domains are allowed; wildcard CORS is explicitly not configured.

Role-Based AI Access

Keycloak enforces which user roles can access which AI features. Each portal authenticates against its own realm; the token's realm_access.roles claim is verified on every request.

AI FeatureRequired RoleEndpoint
Learning path generationemployeeGET /learning-path
Skill gap analysisemployeeGET /skill-gap
Role suggestionsemployeeGET /role-suggestions
AI coaching advicecoachPOST /employees/{pseudo_id}/ai-advice
Emerging roles cataloguecoachGET /catalogues/roles
Market trendscoachGET /analytics/market-trend
Intelligence digestcoachGET /analytics/intelligence-digest
Connector managementSUPER_ADMIN / APP_ADMINPOST/PUT/DELETE /connectors
Trigger digest refreshcoachGET /analytics/intelligence-digest?refresh=true

PII Protection

Personal data is protected at two levels: pseudonymisation of the employee identity before it reaches any AI prompt, and AES-256-GCM encryption of the stored PII fields.

Pseudonymisation Flow

Keycloak sub
(real user ID)
HMAC-SHA256
with HMAC_KEY_HEX
pseudo_id
(opaque token)
Sent to LLM
(no name/email)

Coaches see only the pseudo_id, the employee's specialisation, seniority, experience years, and region (not city). The HMAC key is injected from Sealed Secrets — not derivable from any data in the LLM prompt.

At-rest Encryption

Employee PII fields (name, email, exact location) are stored encrypted in employee.encrypted_pii using AES-256-GCM. The key is injected at runtime from DEV_PII_KEY_HEX (Vault-backed in production, Sealed Secret in the demo cluster).

Strengths & Weaknesses

An honest assessment of what the current AI implementation does well and where it falls short.

Strengths

Dual-LLM fallback with graceful degradation The Claude → Perplexity → empty chain means the platform stays usable even if the primary LLM is down. Users see a "being configured" message rather than a crash.
Tiered caching in the DB TTLs are set by volatility: digest 7 days, role suggestions 24h, market trends 6h, roles catalogue 2h. LLM is called only when the cache expires — typical users never trigger a live call.
PII-safe by design Pseudo-IDs and HMAC pseudonymisation ensure real identity never reaches any LLM, regardless of whether the connector documents happen to mention company names.
Structured output enforcement Every prompt specifies the exact JSON schema. _parse_json() strips markdown and falls back to empty list. The UI never receives free-form prose from the LLM.
Modular connector ingestion New data sources can be added via the Admin portal without code changes. The ingestion task normalises heterogeneous API schemas automatically.
Async + fire-and-forget Learning path generation is queued at registration — API responds immediately. If Celery is unavailable, the path is generated on-demand instead. Zero hard dependency.
Extended thinking for deep tasks Learning path and digest generation use Claude with budget_tokens 2048–4096 for extended thinking, producing higher-quality, more coherent multi-node paths.

Weaknesses

Text search uses LIKE, not vector similarity _cached_text_search() uses SQL LIKE '%word%' wildcards. pgvector and embeddings are stored but the RAG retrieval doesn't use them — semantic search is a stub.
No rate limiting on LLM calls A coach can call POST /ai-advice in a tight loop. There is no per-user or per-hour quota on LLM API calls, exposing the platform to accidental or deliberate cost explosion.
Coaching advice is uncached get_coaching_advice() makes a live LLM call on every request — no dedup, no cache. Two coaches asking the same question about the same pseudo_id both pay the API cost.
Cache entries never purged Expired entries accumulate in ai_documents indefinitely. No scheduled cleanup task exists. Over months, the table will grow without bound.
No outcome feedback loop The platform does not capture whether employees actually completed suggested skills or whether role suggestions led to promotions. LLM quality cannot improve without this signal.
No hallucination detection LLM output is parsed and served directly. There is no consistency check (e.g. does match_score stay in [0,1]?), no cross-validation between calls, and no human review step for high-stakes recommendations.
Single embedding dimension assumption The schema hardcodes Vector(1536). Switching to a newer embedding model with different dimensions requires a schema migration.

Company Simulation

CareerForge is seeded with 14 realistic personas across 8 engineering practices, spanning 5 seniority levels and 4 user roles. This creates a miniature simulation of a real outsourcing company's internal dynamics — skill distributions, coaching relationships, cohort readiness, and management visibility.

Simulated Population

RoleCountPractices CoveredAI Interactions
Employees7Backend, Frontend, Data, Cloud, ML, QA, Security, PMOLearning paths, role suggestions, skill gap
Coaches6Backend+Frontend, Data+ML, Cloud+Security, Mobile, QA+PMO, BackendAI coaching advice, market trends, digest
Managers1Cross-practiceTeam readiness dashboard (AI-scored)
Admins4HR, Platform, Org, L&DConnector management, system config

Dynamics the Simulation Captures

💡
The simulation is a stress test and a data generator Running all 14 personas through their natural workflows for 30 days produces a dataset of: coaching questions → advice pairs, employee learning events, skill gap evolutions, role suggestion accept/reject rates, and digest quality scores. This dataset is the foundation for the reconciliation loops described below.

Reconciliation Loops

A reconciliation loop is a closed feedback cycle: the platform observes the outcome of its AI recommendations, compares them to what actually happened, and adjusts future recommendations accordingly. CareerForge has the infrastructure to support several such loops — some partially built, some requiring additional work.

The Current Data Signals

Reconciliation Loop — Current + Future
1

AI generates recommendation
(learning path, role suggestion)

2

Employee accepts & acts
(commits, completes skill)

3

Coach signs off
(IDP form, verified skill)

4

Outcome measured
(readiness Δ, role movement)

5

Signal fed back
(update prompt, weights)

Loop 1 — Connector Quality Feedback

Current state: fully implemented infrastructure, loop not closed.

Every connector source has a trend_score field and last_status_code. The ingestion task records IngestionHistory.chunks_ingested and success. A reconciliation job could:

  1. Correlate connector sources with AI output quality (do documents from source X lead to higher-confidence market trends?)
  2. Automatically lower trend_score for sources that repeatedly 404 or return empty content
  3. Promote sources whose topics appear frequently in accepted learning paths

Loop 2 — Learning Path Acceptance Rate

Current state: data exists, loop not implemented.

The learning_path_nodes table records which nodes employees mark complete. A weekly digest job could compute:

-- Proposed reconciliation query
SELECT node_type, skill_name,
       COUNT(*) FILTER (WHERE status = 'completed') AS completed,
       COUNT(*) FILTER (WHERE status = 'skipped')  AS skipped,
       COUNT(*) AS total
FROM learning_path_nodes
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY node_type, skill_name
ORDER BY (skipped::float / NULLIF(total, 0)) DESC;

Nodes with high skip rates could feed back into the system prompt: "Avoid recommending X for specialisation at seniority level — historically low completion rate."

Loop 3 — Role Suggestion Relevance

Current state: data partially exists.

If an employee clicks through a role suggestion and sets it as their target role, that is an acceptance signal. The ratio of suggested-to-accepted roles, broken down by specialisation and seniority, tells us whether the LLM is recommending roles people actually want. This can be used to:

Loop 4 — Coaching Advice Quality Score

Current state: not implemented. Requires UI addition.

Add a thumbs-up/down widget after each AI coaching response. Store the score in a new ai_feedback table keyed by (pseudo_id, question_hash, model_used, score). The reconciliation job then:

  1. Computes per-model quality scores over rolling 30-day windows
  2. Routes to the higher-scoring model when both are available
  3. Feeds low-scored (question, answer) pairs back to the prompt as negative examples

Loop 5 — Market Trend Accuracy

Current state: infrastructure exists for trend_direction tracking. Loop not closed.

Perplexity sonar-pro returns real-time web search context. Store the predicted trend_direction at time T. At time T+30d, re-run the trend query and compare. If the direction is consistent, increase the confidence score for that topic. If it reversed, lower it and flag for human review. Over time this builds a calibration curve for the AI's trend forecasting ability.

Self-Improvement Architecture (Target State)

# Proposed: weekly reconciliation task
@celery_app.task(name="reconcile_ai_quality")
async def reconcile_ai_quality():
    signals = {
        "path_skip_rates":   await compute_node_skip_rates(),
        "role_accept_rates": await compute_role_acceptance(),
        "advice_scores":     await aggregate_feedback_scores(),
        "connector_quality": await score_connector_sources(),
    }
    # Write signals as a new ai_document for RAG context injection
    await store_cached_response("__reconciliation_signals__", signals)
    # Update connector trend_scores based on quality signals
    await update_connector_scores(signals["connector_quality"])
    # Invalidate stale caches for sub-segments with poor outcomes
    await invalidate_poor_performing_caches(signals)

Future Roadmap

The following developments are ordered by impact and implementation complexity. Each builds on the existing infrastructure without requiring architectural rewrites.

#FeatureImpactComplexityDepends On
1 Vector similarity search (replace LIKE)
Replace _cached_text_search() LIKE queries with pgvector cosine similarity over ai_document_vectors
High Low pgvector already installed; embeddings already stored
2 LLM rate limiter per user/hour
Redis counter keyed by pseudo_id:hour; 429 after N coaching calls
Medium Low Redis already available as Celery backend
3 Cache the coaching advice on question hash
SHA-256(pseudo_id + question) as cache key, 1h TTL
Medium Low Existing cache infrastructure
4 Cache pruning task
Nightly Celery Beat job deletes ai_documents entries older than 2× their TTL
Low-med Low Celery Beat already running
5 Coaching advice feedback widget
👍/👎 on each AI advice response, stored in ai_feedback table
High Medium Frontend change + new DB table
6 Reconciliation Celery task
Weekly job: compute skip rates, acceptance rates, feedback scores; update connector trend_score; inject signals into next prompt
High Medium Feedback widget (#5) + path completion tracking
7 Anthropic prompt caching
Add cache_control: {"type": "ephemeral"} to the system prompt block; reduce latency and cost by 60–80% on repeated similar queries
High Low Anthropic API key available
8 Fine-tuning dataset pipeline
Export (profile, question) → (accepted advice) pairs as JSONL for fine-tuning Claude or a smaller local model
High High Feedback loop (#6) + 3+ months of data
9 Local LLM option (Ollama Mistral/Llama)
Add Ollama as a third fallback in _ai_chat() for air-gapped environments; already used for embeddings
Low-med Medium Ollama service already in cluster for embeddings
10 Anomaly detection on skill gap evolution
Alert when an employee's readiness score drops unexpectedly fast — possible skill obsolescence or coaching relationship issue
Medium High Reconciliation task (#6) + Alertmanager integration
🚀
Quick wins: items 1, 2, 3, 4, 7 These five improvements require no new infrastructure, only code changes to rag.py and celery_app.py. Together they would reduce LLM API costs by an estimated 60–80%, eliminate the unbounded table growth, and protect against cost explosion from bot traffic — all within a single sprint.

CareerForge AI Deep Dive — Internal use only

← Back to Info Portal