DSR Processor MCP Server

DSR Processor MCP Server

Enables conversational processing of Daily Status Reports (DSR) by extracting, validating, and storing data from JSON, DOCX, and XLSX files in IBM Cloud Object Storage. It provides a human-in-the-loop workflow for batch file processing and schema validation directly within MCP-compatible clients like Claude Desktop.

Category
Visit Server

README

DSR Processor Agentic Agent

Complete implementation of DSR (Daily Status Report) processing using IBM watsonx Orchestrate ADK Agent with Agentic Workflow tools, plus MCP (Model Context Protocol) server for Claude Desktop integration.

Overview

This agent provides conversational, human-in-the-loop DSR processing capabilities for the C4I SOT system. It handles bulk file processing from Cloud Object Storage (COS), data extraction from multiple formats, schema validation, and automated storage of processed results.

Two deployment options:

  1. WxO ADK Agent - Deploy to IBM watsonx Orchestrate for enterprise workflows
  2. MCP Server - Use with Claude Desktop or other MCP clients for local AI assistance

Uses WxO Knowledge Base for schema management - The DSR schema is stored in WxO Knowledge Base as a .json.txt file, making it easy to update without redeploying tools.

Key Features

  • Conversational Interface: Natural language interaction for DSR processing tasks
  • LLM-Powered Extraction: Intelligent extraction from complex DOCX files using watsonx.ai, OpenAI, Groq, or Anthropic
  • Multi-Format Support: Processes JSON, DOCX, and XLSX DSR files
  • Batch Processing: Handle multiple files with single commands
  • Schema Validation: Validates against C4I SOT DSR unified schema v1.2 from Knowledge Base
  • Knowledge Base Integration: Schema stored in WxO Knowledge Base for easy updates
  • Human Review: Optional review steps for quality control
  • Error Handling: Graceful error handling with helpful suggestions
  • COS Integration: Direct integration with IBM Cloud Object Storage

Architecture

Agentic Workflow Tools

The agent uses 5 custom tools that work together:

  1. list_cos_files - List and filter files in COS bucket
  2. download_cos_file - Download files from COS to processing area
  3. extract_dsr_data - Extract structured data from DSR files
  4. validate_schema - Validate data against DSR schema
  5. save_to_cos - Save processed data back to COS

Agent Flow

User Request
    ↓
Agent (LLM) interprets intent
    ↓
Agent selects appropriate tool(s)
    ↓
Tool executes and returns result
    ↓
Agent processes result and responds
    ↓
[Repeat for multi-step workflows]

Project Structure

dsr-agent-agentic/
├── README.md                           # This file
├── requirements.txt                    # Python dependencies (ADK)
├── requirements-mcp.txt                # Python dependencies (MCP)
├── dsr-processor-agentic.yaml         # Agent specification (ADK)
├── mcp_server.py                       # MCP server implementation
├── tools/                              # Agentic Workflow tools
│   ├── list_cos_files.py              # COS file listing
│   ├── download_cos_file.py           # COS file download
│   ├── extract_dsr_data.py            # Data extraction
│   ├── validate_schema.py             # Schema validation
│   └── save_to_cos.py                 # COS file upload
├── docs/                               # Documentation
│   ├── DEPLOYMENT-GUIDE.md            # ADK deployment guide
│   ├── MCP-SERVER-GUIDE.md            # MCP server guide
│   └── TESTING-GUIDE.md               # Comprehensive testing
└── examples/                           # Usage examples
    └── EXAMPLE-CONVERSATIONS.md       # Conversation examples

Quick Start

Option 1: MCP Server (Claude Desktop)

Prerequisites:

  • Python 3.8+
  • Claude Desktop or other MCP client
  • IBM Cloud Object Storage credentials

Setup:

# Install dependencies
cd dsr-agent-agentic
pip install -r requirements-mcp.txt

# Configure environment variables
cp .env.example .env
# Edit .env with your COS credentials

# Add to Claude Desktop config
# See docs/MCP-SERVER-GUIDE.md for details

Usage: Open Claude Desktop and use natural language:

  • "List all DSR files in Cloud Object Storage"
  • "Download and process the USS VALOR DSR file"
  • "Validate the latest DSR and save it"

See MCP-SERVER-GUIDE.md for complete setup instructions.

Option 2: WxO ADK Agent

Prerequisites:

  • IBM Cloud account with watsonx Orchestrate TZ Essentials (trial)
  • Cloud Object Storage instance with bucket created
  • COS credentials (API key, instance CRN, endpoint, bucket name)

Note: This implementation uses the gpt-oss-120b-groq model (GPT-OSS 120B - OpenAI via Groq) which is available by default in WxO TZ Essentials. No additional watsonx.ai project setup is required.

Deployment:

  1. Deploy Tools to WxO:

    • Follow DEPLOYMENT-GUIDE.md for detailed steps
    • Deploy each of the 5 tools to WxO UI
    • Configure environment variables for COS access
  2. Deploy Agent:

    • Create agent in WxO AI assistant builder
    • Use configuration from dsr-processor-agentic.yaml
    • Connect all 5 tools to the agent
    • Test and publish
  3. Test:

Basic Usage

User: "List all DSR files in COS"
Agent: [Lists files with details]

User: "Process the USS VALOR file from August 14"
Agent: [Downloads → Extracts → Validates → Saves]

User: "Process all JSON files from last week"
Agent: [Batch processes multiple files]

Features in Detail

LLM-Powered Extraction (NEW!)

Intelligent extraction from complex DOCX files using Large Language Models:

  • Configurable Providers: watsonx.ai, OpenAI, Groq, or Anthropic
  • Schema-Aware: Automatically maps extracted data to C4I SOT DSR schema
  • Complex Structure Handling: Extracts multi-section documents (Event Info, Issue Tracker, OJT Hours, History of Effort)
  • Data Normalization: Automatically formats dates, hull numbers, ship names
  • Fallback Support: Falls back to basic extraction if LLM unavailable

See LLM-EXTRACTION-GUIDE.md for complete documentation.

Multi-Format Processing

  • JSON: Direct parsing of structured DSR data
  • DOCX: LLM-powered intelligent extraction (when enabled) or heuristic extraction
  • XLSX: Cyber findings extraction from Excel spreadsheets

Schema Validation

Validates against C4I SOT DSR unified schema v1.2 from Knowledge Base:

  • Knowledge Base Integration: Schema retrieved automatically from WxO Knowledge Base
  • Easy Updates: Update schema without redeploying tools
  • Required fields verification
  • Data type checking
  • Pattern matching (dates, GUIDs, hull numbers)
  • Enum validation (status, enclave, issue types)
  • Detailed error messages with fix suggestions
  • Fallback to minimal schema if Knowledge Base unavailable

Batch Processing

Process multiple files with single commands:

  • Filter by date range
  • Filter by ship name
  • Filter by file format
  • Automatic error recovery
  • Progress reporting

Human Review

Optional review workflows:

  • Review before saving
  • Review only on warnings
  • Approve/reject/modify options
  • Summary views for quick decisions

Environment Variables

Required for all tools:

# Cloud Object Storage
COS_API_KEY_ID=<your-cos-api-key>
COS_INSTANCE_CRN=<your-cos-instance-crn>
COS_ENDPOINT=https://s3.us-south.cloud-object-storage.appdomain.cloud
COS_BUCKET_NAME=dsr-files-in-cloud-object-storage-cos-standard-7q2
DOWNLOAD_DIR=/tmp/dsr-downloads

# Schema Configuration
DSR_SCHEMA_KB_NAME=C4I_SOT_DSR_unified.schema.v1_2.iso.json.txt

# LLM Configuration (Optional - for intelligent extraction)
LLM_ENABLED=true
LLM_PROVIDER=watsonx  # Options: watsonx, openai, groq, anthropic
WATSONX_API_KEY=<your-watsonx-api-key>
WATSONX_PROJECT_ID=<your-watsonx-project-id>

Notes:

  • The schema is retrieved from WxO Knowledge Base using DSR_SCHEMA_KB_NAME
  • LLM extraction is optional but recommended for complex DOCX files
  • See LLM-EXTRACTION-GUIDE.md for LLM configuration details

Dependencies

See requirements.txt for complete list:

  • ibm-cos-sdk - IBM Cloud Object Storage SDK
  • jsonschema - JSON schema validation
  • python-docx - DOCX file processing
  • openpyxl - XLSX file processing

Documentation

Comparison with Skill Flows

This Agentic Workflow implementation differs from Skill Flows:

Feature Agentic Workflow Skill Flows
Interaction Conversational Structured forms
Flexibility High - natural language Low - predefined paths
Human Review Built-in, conversational Requires explicit steps
Error Handling Contextual suggestions Fixed error messages
Batch Processing Natural language commands Requires loops/iteration
Learning Curve Lower (natural language) Higher (flow design)

Use Cases

Daily Operations

  • Process new DSR files as they arrive
  • Validate and archive processed data
  • Generate daily summaries

Batch Processing

  • Process historical data
  • Reprocess files after schema updates
  • Bulk validation of existing files

Quality Control

  • Review files before saving
  • Identify validation issues
  • Suggest corrections

Reporting

  • List processed files
  • Show processing statistics
  • Identify problematic files

Troubleshooting

Common Issues

COS Connection Errors:

  • Verify environment variables
  • Check API key permissions
  • Confirm endpoint URL

Validation Failures:

  • Review error messages
  • Check schema requirements
  • Verify data formats

Knowledge Base Access:

  • Verify schema file uploaded to Knowledge Base
  • Check file name: C4I_SOT_DSR_unified.schema.v1_2.iso.json.txt
  • Ensure Knowledge Base connected to agent
  • Verify DSR_SCHEMA_KB_NAME environment variable
  • Test with action: schema_info parameter

Tool Not Found:

  • Ensure tools are published
  • Check agent configuration
  • Refresh agent

See DEPLOYMENT-GUIDE.md for detailed troubleshooting.

Support

For issues or questions:

  1. Check documentation in docs/ folder
  2. Review example conversations
  3. Consult IBM watsonx Orchestrate documentation
  4. Contact C4I SOT team

Version History

  • v1.2.0 (2026-03-11) - LLM-Powered Extraction

    • Added LLM-powered intelligent extraction for complex DOCX files
    • Multi-provider support: watsonx.ai, OpenAI, Groq, Anthropic
    • New llm_client.py module for configurable LLM providers
    • Enhanced extract_dsr_data.py with LLM extraction capability
    • Comprehensive LLM-EXTRACTION-GUIDE.md documentation
    • Updated requirements.txt with LLM client dependencies
    • Added LLM configuration to .env.example
    • Automatic schema-aware data mapping and normalization
    • Fallback to basic extraction when LLM unavailable
  • v1.1.0 (2026-03-10) - MCP Server Implementation

    • Added MCP (Model Context Protocol) server for Claude Desktop integration
    • New mcp_server.py for standalone MCP deployment
    • New requirements-mcp.txt for MCP dependencies
    • Comprehensive MCP-SERVER-GUIDE.md documentation
    • Updated README with dual deployment options (MCP + WxO)
    • Updated DEPLOYMENT-GUIDE.md to reference MCP option
  • v1.0.1 (2026-03-10) - Knowledge Base Integration

    • Updated to use WxO Knowledge Base for schema storage
    • Schema file in .json.txt format for Knowledge Base compatibility
    • Updated validate_schema tool to use get_knowledge() function
    • Added Knowledge Base upload instructions
    • Enhanced troubleshooting for Knowledge Base access
    • Deprecated DSR_SCHEMA_PATH in favor of DSR_SCHEMA_KB_NAME
  • v1.0.0 (2026-03-10) - Initial release

    • 5 Agentic Workflow tools
    • Complete agent specification
    • Comprehensive documentation
    • Example conversations

License

MIT License - See LICENSE file for details

Authors

C4I SOT Team

Acknowledgments

  • IBM watsonx Orchestrate team
  • ADK Agent framework
  • C4I SOT DSR schema contributors

For detailed deployment instructions, see DEPLOYMENT-GUIDE.md

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured