Transcript Intelligence

Production AI pipeline that turns sales call transcripts into structured CRM records within minutes of a call ending, zero manual steps.

Language

Python

Year

2026

Category

ai, automation

Private repoOmodunjo11/transcript-intelligence-pipeline

01The Problem

Sales discovery call transcripts were piling up in Google Drive with no structured extraction. Pain points, commitments, and follow-ups were getting lost between the call ending and the CRM being updated.

02What I Built

A Python pipeline connecting Google Drive to Claude/GPT-4.1 to Fibery. Watches for new transcripts, extracts insights/pain points/next steps, writes formatted Google Docs, and syncs structured records to a GTM workspace with zero manual steps.

03Overview

A B2B fintech startup in elder care financial management was running discovery calls with daily money managers and elder care professionals, a specialized high-trust buyer segment. Auto-generated meeting transcripts piled up in Google Drive with no systematic extraction of pain points, commitments, or follow-ups. I built a production pipeline that watches Drive for new transcripts, analyzes each with Claude or GPT-4.1, extracts structured intelligence, writes formatted summaries back to Drive, and syncs records to Fibery as CRM-quality call summaries.

04Key Objectives

1.
End-to-End Automation: Drive folder watcher triggers analysis on each new transcript automatically. No human initiation. Idempotent processed-files tracking prevents duplicate runs.
2.
Schema-Constrained Extraction: Structured prompt extracts three intelligence types on every call: insights, pain points, and next steps. Regex-based section parser handles model output format with fallback for empty sections.
3.
Dual-Provider LLM: Config-switchable between Claude Sonnet and GPT-4.1 via single env flag. Both providers retry 3x with exponential backoff. Token usage logged per run for cost tracking.
4.
Fibery GTM Sync: Custom HTTP client against Fibery commands and documents APIs pushes structured call data to GTM workspace. Handles Token and Bearer auth fallback for rich-text fields.

05Methodology

◆
Schema Before Prompt: Defined what the sales team needs from a call summary to act within 24 hours before writing any prompt engineering. LLM constrained to that format, not left to summarize freely.
◆
Production Reliability: Idempotent file tracking, retry-with-backoff on LLM calls, filename regex filtering, local-folder dev mode without Google OAuth, and pytest suite for parsing and DOCX extraction.
◆
Drive API Integration: Polls Drive for new files, downloads via native Google Docs export or DOCX, writes formatted output back with consistent section ordering and heading hierarchy.
◆
Domain-Specific Prompting: Prompt engineered for elder care fintech discovery conversations in a high-trust buyer segment, not generic meeting summaries.

06Why Schema-First Matters

Prompting the model to summarize a meeting produces output that feels thorough but is impossible to act on because every summary is structured differently. The schema-first approach means every output answers the same questions in the same order: What insights emerged? What pain points came up? Who owns what next step? Sales teams scan it in 90 seconds and Fibery gets a consistent record.

07Built for Unattended Operation

This was not a demo. The sales team had to rely on outputs in Fibery without checking whether the pipeline ran. That meant idempotent processing, structured logging, dual-provider failover, and local dev mode for offline testing. Production AI automation fails when the happy path works but edge cases silently drop files.

PM Angle

I designed the output schema from the sales team's actual workflow, not from what the LLM naturally produces. The pipeline is only as valuable as the structure it enforces on unstructured conversation data.

Outcome

Production pipeline running unattended with dual LLM providers, Fibery sync, and idempotent Drive processing. Structured call intelligence synced to CRM with zero manual steps.

Key Features

▸Google Drive folder watcher with regex filename filtering
▸Claude Sonnet + GPT-4.1 with config-switchable provider
▸Structured extraction: insights, pain points, next steps
▸Formatted Google Doc output written back to Drive
▸Fibery GTM workspace sync via custom API client
▸Idempotent processing, retry-with-backoff, local dev mode, pytest suite

Tech Stack

PythonClaude APIOpenAI APIGoogle Drive APIGoogle Docs APIFibery APIpytest

Private repository

Omodunjo11/transcript-intelligence-pipeline

Code available on request for recruiters and hiring teams. Email odunjoonaolapo@gmail.com with your GitHub username.

← Previous

Regulatory Compliance Cockpit

LLM System Reliability