Hammerspace Storage Management MCP Server

Hammerspace Storage Management MCP Server

Enables natural language management of Hammerspace storage clusters with automated file ingestion, tagging, tier management, and vector embedding generation. Supports real-time file monitoring, multi-format document processing, and Kubernetes-based ingestion workflows with Milvus integration.

Category
Visit Server

README

MCP 1.5 - Advanced Hammerspace Storage Management with AI

A production-ready Model Context Protocol (MCP) server for Hammerspace storage management with natural language interface, automatic file ingestion, and comprehensive monitoring capabilities.

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Linux system (Ubuntu 20.04+ recommended)
  • Hammerspace storage cluster (HSTK CLI installed)
  • Anthropic API key (for natural language UI)
  • NVIDIA API key (optional, for AI features)
  • Kubernetes cluster (for automatic file ingestion)
  • Milvus vector database (for document embeddings)

Installation

  1. Clone the repository:

    git clone https://github.com/mbloomhammerspace/mcp-1.5.git
    cd mcp-1.5
    
  2. Create virtual environment:

    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  3. Configure environment variables:

    cp .env.example .env
    nano .env
    

    Set the following:

    # Hammerspace Configuration
    HS_ANVIL=10.200.120.90
    
    # Anthropic API (for Web UI)
    ANTHROPIC_API_KEY=your_anthropic_key_here
    
    # NVIDIA API (optional)
    NVIDIA_API_KEY=your_nvidia_key_here
    
  4. Start all services with one command:

    ./start_all_services.sh start
    

    Access Web UI at: http://localhost:5000

๐Ÿ“‹ Features

๐Ÿค– Natural Language Interface

  • Web-based UI powered by Claude AI at http://localhost:5000
  • Direct MCP integration for all Hammerspace operations
  • Real-time debug log viewer at /debug
  • File ingest monitor dashboard at /monitor
  • Action-oriented responses - executes commands, reports results

๐Ÿ“ Advanced File Ingestion System

  • Multi-format support - BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3
  • Real-time file monitoring - Polling-based detection on NFS 4.2 mounts
  • Folder-based processing - New folders trigger batch processing of all supported files
  • Kubernetes job deployment - Automatic file ingestion jobs with ConfigMaps
  • Milvus vector database integration - Automatic embedding generation and storage
  • Intelligent collection naming - Collections named after folders or sequential numbering
  • Tag-based tier management - Automatic tier0 promotion for embedding files
  • Event streaming - Real-time monitoring of ingestion pipeline

๐Ÿ“Š Comprehensive Monitoring UI

  • Real-time event streaming - Watch files being ingested live
  • Event filtering - Filter by event type, file pattern, or timestamp
  • Toast notifications - Get notified when new files arrive
  • Status dashboard - Monitor service health, file counts, CPU usage
  • Interactive UI - Beautiful, modern interface for tracking ingestion pipelines
  • Multi-service monitoring - Track all MCP servers and services

๐Ÿ”ง Persistent Service Management

  • Unified startup script - Start all services with one command
  • Screen session persistence - Services survive SSH disconnections
  • Auto-restart capability - Services automatically restart if they crash
  • Systemd integration - Optional auto-start on boot
  • Comprehensive logging - Detailed logs for all services
  • Health monitoring - Real-time service status and port monitoring

API Endpoints

  • /api/monitor/status - Get monitor service status
  • /api/monitor/events - Query ingestion events with filters
  • /api/monitor/events/stream - Server-Sent Events for real-time updates
  • /api/chat - Natural language chat interface
  • /api/tools - List available MCP tools
  • /api/logs/stream - Stream debug logs in real-time

Core MCP Tools

All tools use real HSTK CLI commands (no mock data):

  • tag_directory_recursive - Tag files and directories recursively
  • check_tagged_files_alignment - Check if tagged files are aligned to their objectives
  • apply_objective_to_path - Apply Hammerspace objectives (e.g., "Place-on-tier0", "placeonvolumes")
  • remove_objective_from_path - Remove objectives from paths
  • list_objectives_for_path - List all objectives applied to a path
  • ingest_new_files - Find new files by ctime/mtime, tag them, and place on Tier 1
  • refresh_mounts - Refresh NFS mounts to resolve stale file handles

Advanced Features

  • Automatic file monitoring - Polling service that auto-tags new files with MD5 hash and MIME type
  • Automatic stale file handle recovery - Detects and automatically refreshes mounts
  • Share-relative path support - Use /modelstore/dir instead of /mnt/se-lab/modelstore/dir
  • Multi-tier support - Tier 0 (high-performance) and Tier 1 (default storage)
  • Tag-based workflows - Tag files and manage them as collections
  • Intelligent batching - Groups file events (15 sec) or processes immediately if traffic is light

๐Ÿ†• Advanced File Ingestion Workflow

How It Works

  1. File Detection: The file monitor polls NFS mounts every 5 seconds for new files
  2. Tagging: New files are automatically tagged with:
    • user.ingestid=<md5hash> - MD5 hash for deduplication
    • user.mimeid=<mimetype> - MIME type for classification
    • user.embedding - Tag for files requiring embedding
  3. Multi-format Processing: All supported file types trigger Kubernetes ingestion jobs
  4. Collection Management: Files are organized into Milvus collections:
    • Folder-based: Collections named after folder (e.g., cold_0011)
    • Sequential: Collections named intel_1, intel_2, etc.
  5. Tier Management: Files tagged for embedding are automatically promoted to tier0
  6. Embedding Generation: Files are processed into vector embeddings
  7. Storage: Embeddings stored in Milvus for semantic search
  8. Tier Demotion: After embedding, files are moved back to default tier

Folder Processing

When a new folder is detected:

  • All supported files in the folder are processed as a batch
  • A single Milvus collection is created named after the folder
  • Collection names are sanitized (e.g., cold-0011 โ†’ cold_0011)
  • All files in the folder are uploaded to the same collection
  • Files are tagged with embedding and promoted to tier0 for processing
  • After embedding, tier0 objective is removed

Configuration

The file monitor supports:

  • Polling intervals: 5 seconds (fast) during business hours, 30 seconds during retroactive hours
  • Retroactive tagging: Configurable time windows (default: disabled for testing)
  • Path monitoring: Monitors /mnt/anvil/hub/ by default
  • File filtering: Processes all supported file types (BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3)
  • Recursive folder tagging: Tags entire folder hierarchies for 40x performance improvement

Kubernetes Integration

  • Job Templates: Uses k8s-templates/ingest.yaml for PDF processing
  • ConfigMaps: File lists passed to jobs via Kubernetes ConfigMaps
  • PVC Integration: Uses hammerspace-hub-pvc for persistent storage
  • Path Mapping: Container paths mapped from /mnt/anvil/hub/ to /data/

๐ŸŽฏ Usage

Example Commands (Web UI)

Try these in the natural language interface:

Tag all files in /modelstore/nvidia-test-thurs as modelsetid=my-demo
Promote all modelsetid=my-demo tagged files to tier0
Check alignment status of files tagged with modelsetid=my-demo
Remove the Place-on-tier0 objective from /modelstore/nvidia-test-thurs
List all objectives for /modelstore/incoming-models
Refresh Hammerspace mounts

๐Ÿ†• File Ingestion Commands

Start the file monitor daemon
Check file monitor status
View recent file ingestion events
Stop the file monitor

Direct MCP Server Usage

Add to your MCP client configuration (e.g., Cursor IDE):

{
  "mcpServers": {
    "hammerspace": {
      "command": "python",
      "args": ["/home/mike/mcp-1.5/src/aiq_hstk_mcp_server.py"],
      "env": {
        "PYTHONPATH": "/home/mike/mcp-1.5"
      }
    }
  }
}

Available MCP Tools

Tag Management

  • tag_directory_recursive - Recursively tag all files in a directory

    {
      "path": "/mnt/se-lab/modelstore/my-models",
      "tag_name": "user.modelsetid",
      "tag_value": "my-tag-value"
    }
    
  • check_tagged_files_alignment - Find and check alignment of tagged files

    {
      "tag_name": "user.modelsetid",
      "tag_value": "my-tag-value",
      "share_path": "/mnt/se-lab/modelstore/"
    }
    

Tier Management

  • apply_objective_to_path - Apply objectives like tier promotion

    {
      "objective_name": "Place-on-tier0",
      "path": "/mnt/se-lab/modelstore/my-models"
    }
    
  • remove_objective_from_path - Remove objectives

    {
      "objective_name": "Place-on-tier0",
      "path": "/mnt/se-lab/modelstore/my-models"
    }
    

File Ingestion

  • ingest_new_files - Find new files, tag them, and place on Tier 1
    {
      "path": "/mnt/se-lab/modelstore/",
      "tag_name": "user.modelsetid",
      "tag_value": "new-batch",
      "age_minutes": 60
    }
    

System Utilities

  • refresh_mounts - Refresh NFS mounts to resolve stale file handles
    {
      "mount_type": "all"
    }
    

File Monitoring & Ingestion

  • start_inotify_monitor - Start automated file monitoring service

    {}
    

    Monitors all Hammerspace shares, automatically tags new files with:

    • user.ingestid=<md5hash> - MD5 hash for deduplication
    • user.mimeid=<mimetype> - MIME type for classification

    Events are batched every 15 seconds or processed immediately if traffic is light.

  • get_file_monitor_status - Get monitor status

    {}
    

    Returns: running state, watched paths, pending events, files tagged, CPU usage

  • get_file_ingest_events - Query file ingest/tagging events

    {
      "limit": 100,
      "event_type": "NEW_FILE",
      "file_pattern": ".pdf",
      "since_timestamp": "2025-10-09T20:00:00"
    }
    

    Returns recent file ingestion events with full metadata including:

    • Event type (NEW_FILE or RETROACTIVE_TAG)
    • File path and name
    • MD5 hash (ingestid)
    • MIME type (mimeid)
    • File size and timestamp

    Use Cases:

    • Track which files have been ingested
    • Build automated workflows based on file events
    • Monitor ingestion pipelines
    • Audit file processing
  • stop_inotify_monitor - Stop monitoring service

    {}
    
  • list_objectives_for_path - List all objectives for a path

    {
      "path": "/mnt/se-lab/modelstore/my-models"
    }
    

๐Ÿ”ง Architecture

Components

  1. MCP Server (src/aiq_hstk_mcp_server.py)

    • Implements MCP protocol
    • Executes Hammerspace HSTK CLI commands
    • Handles automatic retry on stale file handles
    • Uses real HSTK operations (no mock data)
  2. Web UI (web_ui/app.py)

    • Flask-based natural language interface
    • Anthropic Claude integration for NL understanding
    • MCP client for tool execution
    • Real-time debug log streaming
  3. ๐Ÿ†• File Monitor (src/file_monitor.py)

    • Polling-based file detection for NFS 4.2 mounts
    • Automatic tagging with MD5 and MIME type
    • Folder-based batch processing
    • Kubernetes job deployment for PDF ingestion
    • Milvus collection management
  4. ๐Ÿ†• File Monitor Daemon (src/file_monitor_daemon.py)

    • Standalone daemon for running file monitor
    • Background service for continuous monitoring
    • Logging and error handling
  5. Mount Refresh Script (refresh_mounts.sh)

    • Unmounts and remounts Hammerspace NFS shares
    • Resolves stale file handle errors
    • Supports selective mount refresh

Data Flow

User Input (NL) โ†’ Claude AI โ†’ MCP Tool Selection โ†’ HSTK CLI โ†’ Hammerspace
                     โ†“
                 MCP Server
                     โ†“
              Real Operations

๐Ÿ†• File Ingestion Data Flow

New File/Folder โ†’ File Monitor โ†’ Tagging โ†’ Kubernetes Job โ†’ PDF Processing โ†’ Milvus Collection
                     โ†“
                 Event Logging โ†’ Web UI Dashboard โ†’ Real-time Monitoring

๐Ÿงช Testing

Web UI Tests

cd tests
./test_web_ui_curl.sh
python test_web_ui.py

Single Command Test

cd tests
./test_single_question.sh "Tag all files in /modelstore/test as modelsetid=test123"

๐Ÿ†• File Ingestion Tests

# Start file monitor daemon
python3 src/file_monitor_daemon.py

# Check monitor status
curl http://localhost:5000/api/monitor/status

# View ingestion events
curl http://localhost:5000/api/monitor/events

# Test with new files
cp test.pdf /mnt/anvil/hub/

๐Ÿ› ๏ธ Service Management

Unified Service Management

# Start all services with one command
./start_all_services.sh start

# Check service status
./start_all_services.sh status

# Stop all services
./start_all_services.sh stop

# Restart all services
./start_all_services.sh restart

Individual Service Management

# View logs for specific services
./start_all_services.sh logs web-ui
./start_all_services.sh logs file-monitor
./start_all_services.sh logs hammerspace-mcp
./start_all_services.sh logs milvus-mcp

# Attach to service screen sessions
./start_all_services.sh attach web-ui
./start_all_services.sh attach file-monitor

# Clean up old logs
./start_all_services.sh cleanup

Service URLs

Once started, services are available at:

  • Web UI: http://localhost:5000
  • Web UI (LAN): http://10.0.0.236:5000
  • Hammerspace MCP: stdio-based (no HTTP endpoint)
  • Milvus MCP: http://localhost:9902/sse (if Milvus is running)
  • Kubernetes MCP: http://localhost:9903/sse (not implemented)

View Logs

# Web UI logs
tail -f logs/web_ui.log

# MCP Server logs
tail -f logs/aiq_hstk_mcp.log

# File monitor logs
tail -f logs/retroactive_fully_disabled.log

# Real-time streaming (in browser)
# Visit http://localhost:5000/debug

๐Ÿ› Troubleshooting

Stale File Handle Errors

The MCP server automatically detects and retries operations when stale file handles occur. If issues persist:

# Manual mount refresh
./refresh_mounts.sh

# Or via Web UI
"Refresh Hammerspace mounts"

Tag Operations Not Working

  1. Ensure you're targeting a mounted directory
  2. Verify the HSTK CLI is accessible: /home/mike/hs-mcp-1.0/.venv/bin/hs --version
  3. Check mounts: mount | grep hammerspace

๐Ÿ†• File Ingestion Issues

  1. Files not detected: Check if file monitor is running

    ps aux | grep file_monitor_daemon
    
  2. Kubernetes jobs failing: Check pod status

    kubectl get pods -l app=pdf-ingest
    kubectl logs -l app=pdf-ingest
    
  3. Empty Milvus collections: Check ingest service logs

    kubectl logs -l app=ingestor-server
    
  4. NFS mount issues: Verify mount points

    mount | grep anvil
    ls -la /mnt/anvil/hub/
    

Common Tag Formats

Tags in Hammerspace use the format: namespace.key=value

Examples:

  • user.modelsetid=my-demo
  • user.project=gtc-2025
  • user.tier=critical

๐Ÿ“š Documentation

๐Ÿš€ Quick Start

๐Ÿ—๏ธ Architecture & API

๐Ÿ”— Integration Guides

๐Ÿ“– Feature Guides

๐Ÿงช Testing Guides

๐Ÿ”’ Production Deployment

Security Considerations

  • Run the Web UI behind a reverse proxy (nginx, Apache)
  • Use HTTPS for production deployments
  • Restrict access with authentication
  • Review and limit MCP tool permissions

Performance Tuning

  • Adjust max_files_to_check for large filesystems (default: 500)
  • Use specific share_path parameters to limit search scope
  • Monitor log files for performance issues

๐Ÿ“ž Support

For support and questions:

  • Create an issue in the GitHub repository
  • Check the documentation in the docs/ folder
  • Review the troubleshooting section above

๐ŸŽ‰ Recent Updates

๐Ÿ†• October 23, 2025 - Advanced Service Management + Multi-Format Support

  • โœ… NEW: Unified service management script (start_all_services.sh) for all MCP services
  • โœ… NEW: Screen session persistence - services survive SSH disconnections
  • โœ… NEW: Auto-restart capability for crashed services
  • โœ… NEW: Systemd integration for auto-start on boot
  • โœ… NEW: Multi-format file support (BMP, DOCX, HTML, JPEG, JSON, MD, PDF, PNG, PPTX, SH, TIFF, TXT, MP3)
  • โœ… NEW: Tag-based tier management with automatic tier0 promotion/demotion
  • โœ… NEW: Recursive folder tagging for 40x performance improvement
  • โœ… NEW: Enhanced event filtering and monitoring UI
  • โœ… NEW: Comprehensive health monitoring and port conflict resolution
  • โœ… FIXED: Unicode handling in logs and CLI output
  • โœ… FIXED: NFS timing issues with retry mechanisms
  • โœ… FIXED: Hammerspace CLI tag operations with fallback methods
  • โœ… FIXED: Individual file vs folder-level tagging optimization

๐Ÿ†• October 16, 2025 - Automatic File Ingestion + Kubernetes + MCP

  • โœ… NEW: Complete automatic file ingestion system with Kubernetes integration
  • โœ… NEW: Polling-based file monitoring for NFS 4.2 mounts (inotify not supported)
  • โœ… NEW: Folder-based batch processing - new folders trigger collection creation
  • โœ… NEW: Milvus vector database integration for PDF embeddings
  • โœ… NEW: Kubernetes job deployment with ConfigMaps for file processing
  • โœ… NEW: Intelligent collection naming (folder-based and sequential)
  • โœ… NEW: Real-time event streaming and monitoring dashboard
  • โœ… NEW: Standalone file monitor daemon for continuous operation
  • โœ… FIXED: NFS compatibility issues with polling-only approach
  • โœ… FIXED: Kubernetes PVC integration and path mapping
  • โœ… FIXED: Milvus collection naming conventions (underscores vs hyphens)
  • โœ… FIXED: File path handling in container environments
  • โœ… FIXED: Job completion tracking and error handling
  • โœ… FIXED: Retroactive tagging time window configuration
  • โœ… FIXED: Threading and async operation coordination

October 9, 2025 - File Ingestion Event System & Monitoring UI

  • โœ… NEW: Agentic event consumption via get_file_ingest_events MCP tool
  • โœ… NEW: Real-time monitoring dashboard at /monitor with live event streaming
  • โœ… NEW: API endpoints for event querying and monitoring
  • โœ… FIXED: Duplicate file processing - files now tagged exactly once
  • โœ… FIXED: In-memory tracking prevents repeated tagging of same files
  • โœ… Deduplication verification tools: check_duplicates.sh and find-dup.sh
  • โœ… Full event metadata: timestamp, file path, MD5 hash, MIME type, file size
  • โœ… Event filtering: by type, file pattern, or timestamp
  • โœ… Toast notifications for new file arrivals
  • โœ… Automated file monitoring service with MD5/MIME tagging

Previous Updates

  • โœ… Added automatic mount refresh on stale file handle errors
  • โœ… Suppressed Anthropic API deprecation warnings in logs
  • โœ… Added copy buttons to all chat messages
  • โœ… Fixed PARTIALLY ALIGNED status detection
  • โœ… Web-based natural language console with Claude AI

MCP 1.5 - Production-ready Hammerspace storage management with natural language interface and automatic file ingestion.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured