Screenshot 2025-01-31 at 15.06.28

Deep Dive Into Webex Contact Center Transcript Extraction Code

Introduction

Have you ever needed to analyze customer chat conversations from Webex Contact Center? In this post, we’ll explore a Python script that automates the process of downloading chat transcripts and converting them into an easy-to-analyze Excel file. Even if you’re new to Python, we’ll break down the complex parts into digestible pieces.

What Does This Script Do?

At its core, this script:

 

    1. Connects to Webex Contact Center securely

    1. Searches for chat conversations within a specified date range

    1. Downloads the transcripts

    1. Converts them into a structured Excel file

I’ll break this down into comprehensive, digestible parts for a novice programmer. Let me start with the foundational pieces and then move through how everything works together.

Part 1: Understanding the Foundation

Core Building Blocks

First, let’s understand what we’re working with:

				
					import asyncio
import aiohttp
import json
import logging
import os
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Dict, List, Optional
import pandas as pd
import requests
from selenium import webdriver
				
			

These imports tell us a lot about what the script does:

  • asyncio and aiohttp: For making non-blocking API calls
  • json: For handling configuration and API responses
  • logging: For tracking what’s happening
  • dataclasses: For creating clean data structures
  • selenium: For browser automation

Data Structures

Let’s look at our data containers:

				
					@dataclass
class TokenData:
    access_token: str
    expires_in: int
    refresh_token: str
    refresh_token_expires_in: int
    token_type: str
    scope: str
    expires_at: str
				
			

Think of TokenData as a secure container for authentication. When you log into a website, you get a ticket (token) that proves who you are. This class stores that ticket and its details.

				
					@dataclass
class Task:
    id: str
    channelType: str
    createdTime: int
    endedTime: Optional[int]
    status: strconsole.log( 'Code is Poetry' );
				
			
				
					class AuthHandler(BaseHTTPRequestHandler):
    def do_GET(self):
        if "/callback" not in self.path:
            return
            
        try:
            query = self.path.split("?")[1]
            params = dict(param.split("=") for param in query.split("&"))
            self.server.auth_code = params.get("code")
				
			

Task represents one chat conversation. It’s like a file folder containing basic information about a chat:

  • Who was involved (id)
  • How they chatted (channelType)
  • When it started (createdTime)
  • When it ended (endedTime)
  • Current status (status)

Authentication Handler

Think of AuthHandler as a security guard. When you try to log into Webex:

  1. The script opens a browser
  2. You log in
  3. Webex sends a special code to this handler
  4. The handler saves this code so we can use it to get access

Understanding Authentication in the Webex Contact Center (WxCC) API

One of the challenges when working with the WxCC API is the two-step authentication process required to obtain and reuse an access token.

In many APIs, obtaining an access token is straightforward—you typically just need to send a request with your client IDand client secret to get a valid token. However, WxCC follows a more interactive authentication flow that involves manual user intervention.

Step 1: Authenticate via User Login

Instead of directly using a client_id and client_secret, WxCC requires an authentication token first. This is obtained through a user authentication session via a redirect URL in a browser. A WxCC user with the appropriate access rights must manually log in using username, password, and MFA to authorize the API request.

Step 2: Exchange the Authentication Token for an Access Token

Once the user successfully authenticates, the authentication token can then be used to request an access token. This access token is what allows API interactions with WxCC.

Why Is This Process Cumbersome?

Unlike APIs that support client credentials flow (where authentication is fully automated), WxCC’s approach requires a human step in the loop. The need for interactive authentication via a browser session makes automation more complex and less seamless for API-driven workflows.

To work around this, developers often implement a solution to store and refresh the access token while it remains valid, reducing the need for repeated manual logins. However, once the token expires, the authentication flow must be restartedvia the manual login process.

Part 2: The Core Engine – WebexAPI Class

Let’s dive into the main class that does all the heavy lifting. I’ll explain why we built it this way and what each part does.

WebexAPI Class Structure

				
					class WebexAPI:
    def __init__(self, config_file: str = "config.json"):
        self.config = self._load_config(config_file)
        self.setup_logging()
        self.session = None
        self.auth_code_event = Event()
        self.token_data = None);
				
			

Think of this class as a control center. When you create it:

  1. It loads your settings (config)
  2. Sets up logging to track what happens
  3. Prepares to create a connection (session)
  4. Gets ready to handle authentication (auth_code_event)

Configuration and Logging Setup

				
					def _load_config(self, config_file: str) -> Dict:
    try:
        with open(config_file) as f:
            config = json.load(f)
        logging.debug(f"Loaded configuration from {config_file}")
        return config
    except Exception as e:
        logging.exception(f"Failed to load config from {config_file}")
        raise
				
			

This is like reading your instruction manual:

  • Opens the config.json file
  • Loads all your settings
  • If something goes wrong, it tells you exactly what happened
				
					def setup_logging(self) -> None:
    log_format = "%(asctime)s - %(levelname)s - %(funcName)s - %(lineno)d - %(message)s"
    root_logger = logging.getLogger()
    root_logger.setLevel(getattr(logging, self.config["log_level"]))
				
			

Think of logging as your script’s diary:

  • Records everything that happens
  • Shows when it happened
  • Tells you where in the code it happened
  • Helps you fix problems later

Authentication Process

				
					async def initialize(self):
    self.session = aiohttp.ClientSession()
    self.token_data = await self._load_or_refresh_token()
    logging.info("WebexAPI initialized successfully")
    return self
				
			

This is where we “turn on” our control center:

  1. Creates a reusable connection (like opening a phone line)
  2. Gets or refreshes our access token
  3. Confirms everything is ready
				
					async def _exchange_auth_code(self, auth_code: str) -> TokenData:
    url = f"{self.config['api_urls']['base']}/access_token"
    payload = {
        "grant_type": "authorization_code",
        "client_id": self.config["client_id"],
        "client_secret": self.config["client_secret"],
        "code": auth_code,
        "redirect_uri": self.config["redirect_uri"]
    }
				
			

This exchanges your temporary access code for a longer-lasting token:

  • Like trading a temporary visitor pass for a proper ID card
  • Includes your application’s credentials
  • Gets back a token you can use multiple times

Making API Requests

				
					async def _make_request(self, method: str, url: str, **kwargs) -> Dict:
    try:
        async with self.session.request(method, url, **kwargs) as response:
            if response.status == 401:
                logging.info("Token expired, refreshing...")
                await self._refresh_token()
                kwargs['headers']['Authorization'] = f"Bearer {self.token_data.access_token}"
                async with self.session.request(method, url, **kwargs) as retry_response:
                    return await retry_response.json()
            return await response.json()
    except Exception as e:
        logging.exception(f"API request failed: {method} {url}")
        raise
				
			

This is your universal API communicator:

  • Handles all communication with Webex
  • Automatically refreshes expired tokens
  • Retries failed requests
  • Returns the data in a usable format

Why Async?

You’ll notice many functions have async in front of them. This is important because:

  • It allows the script to do multiple things at once
  • While waiting for Webex to respond, it can start another task
  • Makes the script much faster when downloading many transcripts

Part 3: Searching, Processing, and Exporting Data

Searching for Chat Transcripts

				
					async def search_tasks(
    self, 
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    queue_name: Optional[str] = None
) -> List[Task]:
    """
    GraphQL query for chat tasks in date range
    """
    query = """
    query($startTime: Long!, $endTime: Long!) {
        task(
            from: $startTime,
            to: $endTime,
            filter: {
                channelType: { equals: chat }
            }
        ) {
            tasks {
                id channelType createdTime endedTime status
            }
        }
    }
    """
				
			

This search function is like a smart filter:

  • You can specify a date range (or it uses defaults)
  • It uses GraphQL (a smart way to request exactly what you need)
  • Returns only chat conversations
  • Packages results into Task objects

Processing Transcripts

				
					async def process_transcripts(self, tasks: List[Task], output_file: str = "transcripts.xlsx"):
    logging.info(f"Processing transcripts for {len(tasks)} tasks")
    df_data = []
    
    for task in tasks:
        try:
            await asyncio.sleep(0.5)  # Rate limiting
            transcript = await self.get_transcript(task.id)
            if transcript and transcript.get('filePath'):
                transcript_data = await self._fetch_transcript_content(transcript['filePath'])
                df_data.extend(self._parse_transcript_data(task.id, transcript_data))
                logging.debug(f"Processed transcript for task {task.id}")
        except Exception as e:
            logging.exception(f"Failed to process transcript for task {task.id}")
				
			

The processing pipeline works like this:

  1. Takes a list of chat tasks
  2. For each task:
    • Gets the transcript location
    • Downloads the transcript
    • Parses the data
    • Adds it to our collection
  3. Handles errors without stopping

Creating the Excel File

				
					def _parse_transcript_data(self, task_id: str, transcript_data: List[Dict]) -> List[Dict]:
    parsed = []
    for entry in transcript_data:
        parsed.append({
            "task_id": task_id,
            "timestamp": entry.get("timestamp"),
            "direction": entry.get("direction"),
            "message": entry.get("message"),
            "participant_name": entry.get("participant", {}).get("name"),
            "participant_role": entry.get("participant", {}).get("role"),
            "participant_userId": entry.get("participant", {}).get("userId"),
            "participant_aliasId": entry.get("participant", {}).get("aliasId")
        })
    return parsed
				
			

This organizes our data for Excel:

  • Creates consistent structure
  • Extracts key information
  • Handles missing data gracefully

Error Handling Throughout

				
					async def _make_request(self, method: str, url: str, **kwargs) -> Dict:
    try:
        async with self.session.request(method, url, **kwargs) as response:
            if response.status == 401:
                logging.info("Token expired, refreshing...")
                await self._refresh_token()
                # Retry logic...
            return await response.json()
    except Exception as e:
        logging.exception(f"API request failed: {method} {url}")
        raise
				
			

Error handling is built into every level:

  • Network errors
  • Authentication failures
  • Missing data
  • API rate limits

Understanding the config.json Configuration File

				
					{
    "auth_method": "browser",
    "client_id": "<YOUR CLIENDID></YOUR>",
    "client_secret": "<YOUR CLIENTSECRET></YOUR>",
    "org_id": "<YOUR WEBEX ORG ID></YOUR>",
    "redirect_uri": "http://localhost:8089/callback",
    "scope": "spark:kms cloud-contact-center:pod_conv cjp:user spark:people_read cjp:config cjp:config_read cjds:admin_org_read",
    "log_level": "INFO",
    "log_file": "transcript.log",
    "api_urls": {
        "base": "https://webexapis.com/v1",
        "eu": "https://api.wxcc-eu2.cisco.com",
        "auth": "https://webexapis.com/v1/authorize"
    },
    "token_file": "access_token.json",
    "default_queue": "<YOUR QUEUE NAME></YOUR>",
    "url_expiration": 3600
}
				
			

Authentication Settings

  1. "auth_method": "browser"
    • Options: “browser” or “curl”
    • Controls how authentication is handled
    • “browser” opens a web browser for interactive login
    • “curl” is for automated/headless scenarios
  2. "client_id" and "client_secret"
    • Your Webex API application credentials
    • Obtained from Webex Developer portal
    • Keep these secure and never share them
    • Used for OAuth2 authentication
  3. "org_id"
    • Your Webex Contact Center organization identifier
    • Found in WxCC Administration Portal
    • Required for API access
  4. "redirect_uri" Needed for the automated access_token process
    • OAuth callback URL
    • Must match what’s configured in Webex Developer portal
    • Default uses localhost for development
    • Can be changed for production deployments

API Permissions

"scope"

  • Space-separated list of required permissions:
    • spark:kms: Key Management System access
    • cloud-contact-center:pod_conv: Chat conversation access
    • cjp:user: User context permissions
    • spark:people_read: User profile access
    • cjp:config: Configuration access
    • cjp:config_read: Read configuration data
    • cjds:admin_org_read: Organization data access

Logging Configuration

  1. "log_level"
    • Options: “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”
    • Controls detail level of logging
    • “DEBUG” most verbose, “CRITICAL” least verbose
    • “INFO” recommended for production
  2. "log_file"
    • Path to log file
    • Can be relative or absolute path
    • Default: “transcript.log”
    • Logs are rotated to manage size

API Endpoints

				
					{
    "base": "https://webexapis.com/v1",
    "eu": "https://api.wxcc-eu2.cisco.com",
    "auth": "https://webexapis.com/v1/authorize"
}
				
			
  • base: Main Webex API endpoint
  • eu: Contact Center EU datacenter endpoint
  • auth: OAuth authorization endpoint
  • Can be modified for different regions/environments

Token Management

  1. "token_file"
    • Where OAuth tokens are stored
    • Default: “access_token.json”
    • Contains refresh token for reuse
    • Automatically managed by script

Contact Center Settings

  1. "default_queue"
    • Default queue for transcript search
    • Optional – can be overridden in code
    • Used when no specific queue is specified
  2. "url_expiration"
    • Time in seconds for S3 AWS transcript URLs to expire
    • Default: 3600 (1 hour)
    • Can be adjusted based on needs
    • Maximum allowed by API is 24 hours

Setting Up Webex Contact Center API Access

Developer Portal Access and Setup

  1. Access Developer Portal
    • Visit Webex Developer Portal
    • Sign in with your Webex admin credentials
    • Ensure you have Contact Center administrator access
  2. Create New Application
     
    Navigate to: My Apps > Create a New App > Integration
  3. Configure Application Settings
    • App Name: Choose descriptive name (e.g., “Transcript Downloader”)
    • Description: Purpose of your application
    • Icon: Optional branding
    • Redirect URI: Must match config.json
      • Development: http://localhost:8089/callback
      • Production: Your secure endpoint
      • ⚠️ Make sure that the local firewall excepts the connection request.
  4. Required Scopes Select these OAuth scopes:
    spark:kms cloud-contact-center:pod_conv cjp:user spark:people_read cjp:config cjp:config_read cjds:admin_org_read

Obtaining Required Credentials

  1. Client ID and Secret After app creation, you’ll receive:
     
    { "client_id": "Your_Client_ID", "client_secret": "Your_Client_Secret" }
    ⚠️ Store these securely – client_secret cannot be retrieved later
  2. Organization ID Get from Contact Center:
    • Log into Control Hub
    • Navigate to Contact Center > Settings
    • Copy Organization ID

GitHub

The full codebase, including setup instructions and usage details, is available on GitHub:

🔗 WxCC Transcript Extraction