Introduction
Have you ever needed to analyze customer chat conversations from Webex Contact Center? In this post, we’ll explore a Python script that automates the process of downloading chat transcripts and converting them into an easy-to-analyze Excel file. Even if you’re new to Python, we’ll break down the complex parts into digestible pieces.
What Does This Script Do?
At its core, this script:
-
- Connects to Webex Contact Center securely
-
- Searches for chat conversations within a specified date range
-
- Downloads the transcripts
-
- Converts them into a structured Excel file
I’ll break this down into comprehensive, digestible parts for a novice programmer. Let me start with the foundational pieces and then move through how everything works together.
Part 1: Understanding the Foundation
Core Building Blocks
First, let’s understand what we’re working with:
import asyncio
import aiohttp
import json
import logging
import os
from dataclasses import dataclass
from datetime import datetime, timedelta
from typing import Dict, List, Optional
import pandas as pd
import requests
from selenium import webdriver
These imports tell us a lot about what the script does:
asyncio
andaiohttp
: For making non-blocking API callsjson
: For handling configuration and API responseslogging
: For tracking what’s happeningdataclasses
: For creating clean data structuresselenium
: For browser automation
Data Structures
Let’s look at our data containers:
@dataclass
class TokenData:
access_token: str
expires_in: int
refresh_token: str
refresh_token_expires_in: int
token_type: str
scope: str
expires_at: str
Think of TokenData
as a secure container for authentication. When you log into a website, you get a ticket (token) that proves who you are. This class stores that ticket and its details.
@dataclass
class Task:
id: str
channelType: str
createdTime: int
endedTime: Optional[int]
status: strconsole.log( 'Code is Poetry' );
class AuthHandler(BaseHTTPRequestHandler):
def do_GET(self):
if "/callback" not in self.path:
return
try:
query = self.path.split("?")[1]
params = dict(param.split("=") for param in query.split("&"))
self.server.auth_code = params.get("code")
Task
represents one chat conversation. It’s like a file folder containing basic information about a chat:
- Who was involved (id)
- How they chatted (channelType)
- When it started (createdTime)
- When it ended (endedTime)
- Current status (status)
Authentication Handler
Think of AuthHandler
as a security guard. When you try to log into Webex:
- The script opens a browser
- You log in
- Webex sends a special code to this handler
- The handler saves this code so we can use it to get access
Understanding Authentication in the Webex Contact Center (WxCC) API
One of the challenges when working with the WxCC API is the two-step authentication process required to obtain and reuse an access token.
In many APIs, obtaining an access token is straightforward—you typically just need to send a request with your client IDand client secret to get a valid token. However, WxCC follows a more interactive authentication flow that involves manual user intervention.
Step 1: Authenticate via User Login
Instead of directly using a client_id and client_secret, WxCC requires an authentication token first. This is obtained through a user authentication session via a redirect URL in a browser. A WxCC user with the appropriate access rights must manually log in using username, password, and MFA to authorize the API request.
Step 2: Exchange the Authentication Token for an Access Token
Once the user successfully authenticates, the authentication token can then be used to request an access token. This access token is what allows API interactions with WxCC.
Why Is This Process Cumbersome?
Unlike APIs that support client credentials flow (where authentication is fully automated), WxCC’s approach requires a human step in the loop. The need for interactive authentication via a browser session makes automation more complex and less seamless for API-driven workflows.
To work around this, developers often implement a solution to store and refresh the access token while it remains valid, reducing the need for repeated manual logins. However, once the token expires, the authentication flow must be restartedvia the manual login process.
Part 2: The Core Engine – WebexAPI Class
Let’s dive into the main class that does all the heavy lifting. I’ll explain why we built it this way and what each part does.
WebexAPI Class Structure
class WebexAPI:
def __init__(self, config_file: str = "config.json"):
self.config = self._load_config(config_file)
self.setup_logging()
self.session = None
self.auth_code_event = Event()
self.token_data = None);
Think of this class as a control center. When you create it:
- It loads your settings (config)
- Sets up logging to track what happens
- Prepares to create a connection (session)
- Gets ready to handle authentication (auth_code_event)
Configuration and Logging Setup
def _load_config(self, config_file: str) -> Dict:
try:
with open(config_file) as f:
config = json.load(f)
logging.debug(f"Loaded configuration from {config_file}")
return config
except Exception as e:
logging.exception(f"Failed to load config from {config_file}")
raise
This is like reading your instruction manual:
- Opens the config.json file
- Loads all your settings
- If something goes wrong, it tells you exactly what happened
def setup_logging(self) -> None:
log_format = "%(asctime)s - %(levelname)s - %(funcName)s - %(lineno)d - %(message)s"
root_logger = logging.getLogger()
root_logger.setLevel(getattr(logging, self.config["log_level"]))
Think of logging as your script’s diary:
- Records everything that happens
- Shows when it happened
- Tells you where in the code it happened
- Helps you fix problems later
Authentication Process
async def initialize(self):
self.session = aiohttp.ClientSession()
self.token_data = await self._load_or_refresh_token()
logging.info("WebexAPI initialized successfully")
return self
This is where we “turn on” our control center:
- Creates a reusable connection (like opening a phone line)
- Gets or refreshes our access token
- Confirms everything is ready
async def _exchange_auth_code(self, auth_code: str) -> TokenData:
url = f"{self.config['api_urls']['base']}/access_token"
payload = {
"grant_type": "authorization_code",
"client_id": self.config["client_id"],
"client_secret": self.config["client_secret"],
"code": auth_code,
"redirect_uri": self.config["redirect_uri"]
}
This exchanges your temporary access code for a longer-lasting token:
- Like trading a temporary visitor pass for a proper ID card
- Includes your application’s credentials
- Gets back a token you can use multiple times
Making API Requests
async def _make_request(self, method: str, url: str, **kwargs) -> Dict:
try:
async with self.session.request(method, url, **kwargs) as response:
if response.status == 401:
logging.info("Token expired, refreshing...")
await self._refresh_token()
kwargs['headers']['Authorization'] = f"Bearer {self.token_data.access_token}"
async with self.session.request(method, url, **kwargs) as retry_response:
return await retry_response.json()
return await response.json()
except Exception as e:
logging.exception(f"API request failed: {method} {url}")
raise
This is your universal API communicator:
- Handles all communication with Webex
- Automatically refreshes expired tokens
- Retries failed requests
- Returns the data in a usable format
Why Async?
You’ll notice many functions have async
in front of them. This is important because:
- It allows the script to do multiple things at once
- While waiting for Webex to respond, it can start another task
- Makes the script much faster when downloading many transcripts
Part 3: Searching, Processing, and Exporting Data
Searching for Chat Transcripts
async def search_tasks(
self,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
queue_name: Optional[str] = None
) -> List[Task]:
"""
GraphQL query for chat tasks in date range
"""
query = """
query($startTime: Long!, $endTime: Long!) {
task(
from: $startTime,
to: $endTime,
filter: {
channelType: { equals: chat }
}
) {
tasks {
id channelType createdTime endedTime status
}
}
}
"""
This search function is like a smart filter:
- You can specify a date range (or it uses defaults)
- It uses GraphQL (a smart way to request exactly what you need)
- Returns only chat conversations
- Packages results into Task objects
Processing Transcripts
async def process_transcripts(self, tasks: List[Task], output_file: str = "transcripts.xlsx"):
logging.info(f"Processing transcripts for {len(tasks)} tasks")
df_data = []
for task in tasks:
try:
await asyncio.sleep(0.5) # Rate limiting
transcript = await self.get_transcript(task.id)
if transcript and transcript.get('filePath'):
transcript_data = await self._fetch_transcript_content(transcript['filePath'])
df_data.extend(self._parse_transcript_data(task.id, transcript_data))
logging.debug(f"Processed transcript for task {task.id}")
except Exception as e:
logging.exception(f"Failed to process transcript for task {task.id}")
The processing pipeline works like this:
- Takes a list of chat tasks
- For each task:
- Gets the transcript location
- Downloads the transcript
- Parses the data
- Adds it to our collection
- Handles errors without stopping
Creating the Excel File
def _parse_transcript_data(self, task_id: str, transcript_data: List[Dict]) -> List[Dict]:
parsed = []
for entry in transcript_data:
parsed.append({
"task_id": task_id,
"timestamp": entry.get("timestamp"),
"direction": entry.get("direction"),
"message": entry.get("message"),
"participant_name": entry.get("participant", {}).get("name"),
"participant_role": entry.get("participant", {}).get("role"),
"participant_userId": entry.get("participant", {}).get("userId"),
"participant_aliasId": entry.get("participant", {}).get("aliasId")
})
return parsed
This organizes our data for Excel:
- Creates consistent structure
- Extracts key information
- Handles missing data gracefully
Error Handling Throughout
async def _make_request(self, method: str, url: str, **kwargs) -> Dict:
try:
async with self.session.request(method, url, **kwargs) as response:
if response.status == 401:
logging.info("Token expired, refreshing...")
await self._refresh_token()
# Retry logic...
return await response.json()
except Exception as e:
logging.exception(f"API request failed: {method} {url}")
raise
Error handling is built into every level:
- Network errors
- Authentication failures
- Missing data
- API rate limits
Understanding the config.json Configuration File
{
"auth_method": "browser",
"client_id": " ",
"client_secret": " ",
"org_id": " ",
"redirect_uri": "http://localhost:8089/callback",
"scope": "spark:kms cloud-contact-center:pod_conv cjp:user spark:people_read cjp:config cjp:config_read cjds:admin_org_read",
"log_level": "INFO",
"log_file": "transcript.log",
"api_urls": {
"base": "https://webexapis.com/v1",
"eu": "https://api.wxcc-eu2.cisco.com",
"auth": "https://webexapis.com/v1/authorize"
},
"token_file": "access_token.json",
"default_queue": " ",
"url_expiration": 3600
}
Authentication Settings
"auth_method": "browser"
- Options: “browser” or “curl”
- Controls how authentication is handled
- “browser” opens a web browser for interactive login
- “curl” is for automated/headless scenarios
"client_id"
and"client_secret"
- Your Webex API application credentials
- Obtained from Webex Developer portal
- Keep these secure and never share them
- Used for OAuth2 authentication
"org_id"
- Your Webex Contact Center organization identifier
- Found in WxCC Administration Portal
- Required for API access
"redirect_uri"
Needed for the automated access_token process- OAuth callback URL
- Must match what’s configured in Webex Developer portal
- Default uses localhost for development
- Can be changed for production deployments
API Permissions
"scope"
- Space-separated list of required permissions:
spark:kms
: Key Management System accesscloud-contact-center:pod_conv
: Chat conversation accesscjp:user
: User context permissionsspark:people_read
: User profile accesscjp:config
: Configuration accesscjp:config_read
: Read configuration datacjds:admin_org_read
: Organization data access
Logging Configuration
"log_level"
- Options: “DEBUG”, “INFO”, “WARNING”, “ERROR”, “CRITICAL”
- Controls detail level of logging
- “DEBUG” most verbose, “CRITICAL” least verbose
- “INFO” recommended for production
"log_file"
- Path to log file
- Can be relative or absolute path
- Default: “transcript.log”
- Logs are rotated to manage size
API Endpoints
{
"base": "https://webexapis.com/v1",
"eu": "https://api.wxcc-eu2.cisco.com",
"auth": "https://webexapis.com/v1/authorize"
}
base
: Main Webex API endpointeu
: Contact Center EU datacenter endpointauth
: OAuth authorization endpoint- Can be modified for different regions/environments
Token Management
"token_file"
- Where OAuth tokens are stored
- Default: “access_token.json”
- Contains refresh token for reuse
- Automatically managed by script
Contact Center Settings
"default_queue"
- Default queue for transcript search
- Optional – can be overridden in code
- Used when no specific queue is specified
"url_expiration"
- Time in seconds for S3 AWS transcript URLs to expire
- Default: 3600 (1 hour)
- Can be adjusted based on needs
- Maximum allowed by API is 24 hours
Setting Up Webex Contact Center API Access
Developer Portal Access and Setup
- Access Developer Portal
- Visit Webex Developer Portal
- Sign in with your Webex admin credentials
- Ensure you have Contact Center administrator access
- Create New Application
Navigate to: My Apps > Create a New App > Integration
- Configure Application Settings
- App Name: Choose descriptive name (e.g., “Transcript Downloader”)
- Description: Purpose of your application
- Icon: Optional branding
- Redirect URI: Must match config.json
- Development:
http://localhost:8089/callback
- Production: Your secure endpoint
- ⚠️ Make sure that the local firewall excepts the connection request.
- Development:
- Required Scopes Select these OAuth scopes:
spark:kms cloud-contact-center:pod_conv cjp:user spark:people_read cjp:config cjp:config_read cjds:admin_org_read
Obtaining Required Credentials
- Client ID and Secret After app creation, you’ll receive:
{ "client_id": "Your_Client_ID", "client_secret": "Your_Client_Secret" }
- Organization ID Get from Contact Center:
- Log into Control Hub
- Navigate to Contact Center > Settings
- Copy Organization ID
GitHub
The full codebase, including setup instructions and usage details, is available on GitHub: