MCP Servers

syncly-dataset-mcp

Enables querying JSON/JSONL social datasets in Claude Desktop via MCP tools, using DuckDB as the backend database.

README

syncly-dataset-mcp

JSON/JSONL 소셜 데이터셋을 Claude Desktop에서 MCP tool로 질의하기 위한 로컬 PoC.

구조: JSON/JSONL → DuckDB → Python MCP 서버 → Claude Desktop

MCP 도구 (1단계 구현)

도구	설명
`list_data_queries`	등록된 데이터셋 목록·행 수·날짜 범위 반환. 항상 이 도구를 먼저 호출
`describe_data_query`	스키마, 행 수, 날짜 범위, 샘플 행 반환
`get_metric_summary`	수치 집계 (post_count, engagement_sum 등) + 분포 (sentiment_distribution 등)
`search_posts`	키워드·필터 복합 검색. text/summary/searchable_columns 대상 ILIKE
`get_posts_by_ids`	ID 목록으로 포스트 상세 조회 (최대 50건)
`get_ranked_posts`	engagement_count 등 지표 기준 랭킹 포스트 반환
`search_voc`	VOC 검색: 텍스트+summary 검색, sentiment 필터, 감성별 상위 포스트
`safe_query`	SELECT 전용 SQL 직접 실행 (안전장치 포함)

지원 메트릭 (`get_metric_summary`)

종류	이름
스칼라	`post_count`, `engagement_sum`, `avg_engagement`, `like_sum`, `comment_sum`, `share_sum`
분포	`sentiment_distribution`, `platform_distribution`, `brand_distribution`, `category_distribution`

기본값 (metrics 미지정 시): post_count, engagement_sum, avg_engagement, sentiment_distribution, platform_distribution

2단계 예정 도구 (미구현)

도구	설명
`get_period_change`	기간별 지표 변화·증감률 (주간/월간 비교)
`get_top_entities`	언급 빈도 상위 엔티티 (브랜드, 제품, 카테고리)
`compare_entity_sentiments`	엔티티 간 감성 비교
`get_top_terms`	상위 키워드·해시태그·반복 표현 추출
`get_top_influencers`	인플루언서·파워유저 순위
`get_entity_voc_blocks`	엔티티별 VOC 블록 요약
`get_related_entities`	연관 엔티티 탐색
`search_posts_by_summary`	summary 컬럼 기반 시맨틱 검색
`search_entities_by_semantic`	엔티티 시맨틱 검색
`get_post_details`	포스트 상세 (메타데이터·특징)
`get_post_features`	포스트 피처 분석 (감성 스코어, 주제 등)

프로젝트 운영 방식

디렉터리 구조

syncly-dataset-mcp/
├── config/
│   └── datasets.yaml          # 데이터셋 등록 허용 목록 (여기서 관리)
├── data/
│   ├── raw/                   # 원본 JSONL 파일 보관 위치
│   │   └── sample_social_posts.jsonl
│   └── duckdb/
│       └── syncly_datasets.duckdb   # 자동 생성, git 제외
├── docs/
│   └── data-prep-prompt.md   # 새 데이터 전처리 에이전트 프롬프트
├── src/syncly_dataset_mcp/    # MCP 서버 소스
└── tests/

데이터셋 라이프사이클

원본 데이터               전처리              적재               분석
CSV / JSON array   →   JSONL 변환   →   DuckDB 적재   →   Claude에서 질의
스키마 다를 수 있음    에이전트 활용      ingest CLI

새 데이터셋 추가하기

케이스 A: 스키마가 이미 맞는 JSONL

# 1. 파일 배치
cp my_data.jsonl data/raw/

# 2. datasets.yaml에 등록
#    (config/datasets.yaml 수정)

# 3. 적재
uv run syncly-dataset-ingest --dataset my_dataset

# 4. Claude Desktop 새 대화에서 확인
#    list_data_queries → describe_data_query → get_metric_summary

케이스 B: 원본 스키마가 다르거나 CSV/JSON array인 경우

docs/data-prep-prompt.md에 있는 에이전트 프롬프트를 활용합니다.

docs/data-prep-prompt.md 전체 복사
Claude 대화에 붙여넣고 원본 데이터 샘플 (100~200행) 추가
Claude가 다음을 반환함:
- 전처리 Python 스크립트
- datasets.yaml 설정 블록
- 컬럼 매핑 분석
스크립트 실행 → data/raw/ 에 JSONL 저장
datasets.yaml 업데이트 후 적재

데이터 교체 (같은 데이터셋 ID, 새 파일)

# 적재는 항상 DROP → 재생성이므로 그냥 재실행하면 됨
uv run syncly-dataset-ingest --dataset my_dataset

데이터셋 숨기기

datasets.yaml에서 해당 블록을 삭제하면 MCP tool에서 접근 불가. DuckDB 테이블은 남지만 tool이 차단함. 완전 삭제 원할 경우:

rm data/duckdb/syncly_datasets.duckdb
uv run syncly-dataset-ingest --dataset all  # 남은 데이터셋 재적재

Claude Desktop 재시작 없이 새 데이터 반영

_registry는 lazy-load라 서버 재시작 없이 새 대화만 열면 됩니다.

적재 완료 후 Claude Desktop에서 새 대화 열기
list_data_queries 호출 → 새 데이터셋 확인

설치

사전 요구사항

Python 3.11+

uv 설치

curl -LsSf https://astral.sh/uv/install.sh | sh

프로젝트 설치

git clone <이 저장소>
cd syncly-dataset-mcp
uv sync

데이터 준비

원본 데이터 스키마가 다른 경우 → docs/data-prep-prompt.md 참고

1. datasets.yaml 설정

config/datasets.yaml에 데이터셋을 등록합니다.

datasets:
  social_posts:
    title: "소셜 포스트 데이터"
    source_path: "data/raw/social_posts.jsonl"
    table: "social_posts"
    format: "jsonl"              # 'jsonl' 또는 'json'
    text_column: "text"          # 메인 텍스트 컬럼
    id_column: "id"
    date_column: "created_at"
    summary_column: "summary"    # 요약 컬럼 (optional, search_voc에 활용)
    searchable_columns:
      - text
      - summary
      - author_name
      - brand
      - product
      - platform
    dimensions:                  # 필터 허용 컬럼
      - platform
      - brand
      - sentiment
      - category
    entity_columns:              # 엔티티 분석 대상
      - brand
      - product
      - category
    metrics:                     # 수치 집계 대상 컬럼
      - engagement_count
      - like_count
      - comment_count
      - share_count

2. DuckDB 적재

uv run syncly-dataset-ingest --dataset social_posts

전체 데이터셋 적재:

uv run syncly-dataset-ingest --dataset all

Claude Desktop 연결

설정 파일 위치

OS	경로
macOS	`~/Library/Application Support/Claude/claude_desktop_config.json`
Windows	`%APPDATA%\Claude\claude_desktop_config.json`

설정 내용

{
  "mcpServers": {
    "syncly-dataset": {
      "command": "uv",
      "args": [
        "--directory",
        "/ABSOLUTE/PATH/TO/syncly-dataset-mcp",
        "run",
        "syncly-dataset-mcp"
      ],
      "env": {
        "SYNCLY_DB_PATH": "/ABSOLUTE/PATH/TO/syncly-dataset-mcp/data/duckdb/syncly_datasets.duckdb",
        "SYNCLY_CONFIG_PATH": "/ABSOLUTE/PATH/TO/syncly-dataset-mcp/config/datasets.yaml"
      }
    }
  }
}

현재 경로 확인:

pwd
# 예: /Users/yourname/projects/syncly-dataset-mcp

Claude Desktop을 완전히 종료 후 재시작하고 새 대화에서 연결을 확인하세요.

테스트 질문 예시

데이터셋 탐색

사용 가능한 데이터 쿼리 목록 보여줘

→ list_data_queries 호출

social_posts 데이터셋의 스키마와 샘플 데이터를 보여줘

→ describe_data_query(query_id="social_posts")

지표 요약

소셜 포스트 전체 지표 요약해줘

→ get_metric_summary(query_id="social_posts") → post_count, engagement_sum, sentiment/platform 분포

BrandA의 부정 포스트 수와 engagement 합계를 알려줘

→ get_metric_summary(filters={"brand":"BrandA","sentiment":"negative"}, metrics=["post_count","engagement_sum"])

검색

배송 관련 포스트 찾아줘

→ search_posts(text_query="배송")

BrandB의 2024년 3월 이후 부정 포스트를 engagement 높은 순으로 보여줘

→ get_ranked_posts(filters={"brand":"BrandB","sentiment":"negative","date_from":"2024-03-01"})

고객 불만 VOC 중 engagement 상위 포스트 5개를 보여줘

→ search_voc(sentiment="negative", limit=5)

환불 관련 VOC를 찾아줘

→ search_voc(query="환불")

ID 조회

post_012, post_022의 상세 내용을 보여줘

→ get_posts_by_ids(query_id="social_posts", post_ids=["post_012","post_022"])

SQL 직접 실행

SELECT brand, COUNT(*), AVG(engagement_count) FROM social_posts GROUP BY brand

→ safe_query(query_id="social_posts", sql="SELECT ...")

안전장치

안전장치	동작
SELECT 전용	DROP/DELETE/UPDATE/INSERT/CREATE/ALTER/INSTALL/LOAD 등 차단
자동 LIMIT	LIMIT 없는 쿼리에 자동으로 `LIMIT 500` 추가
최대 반환 행	500행 초과 불가
ID 조회 제한	`get_posts_by_ids` 최대 50개 ID
랭킹/검색 제한	최대 100행
데이터셋 허용 목록	`config/datasets.yaml`에 등록된 테이블만 접근
민감 컬럼 마스킹	`email`, `phone`, `token`, `password` 등 자동 `***` 처리

Troubleshooting

MCP 서버가 Claude Desktop에 표시되지 않을 때

uv 절대경로를 사용해 보세요:

which uv   # 예: /Users/yourname/.local/bin/uv

{
  "command": "/Users/yourname/.local/bin/uv",
  "args": ["--directory", "/path/to/project", "run", "syncly-dataset-mcp"]
}

서버 직접 실행으로 오류 확인:

cd /path/to/syncly-dataset-mcp
uv run syncly-dataset-mcp

Claude Desktop 로그 확인:

tail -f ~/Library/Logs/Claude/mcp*.log

DuckDB 테이블이 없다는 오류

데이터 적재가 필요합니다:

uv run syncly-dataset-ingest --dataset social_posts

datasets.yaml을 못 찾는다는 오류

환경변수로 경로를 직접 지정하세요:

SYNCLY_CONFIG_PATH=/absolute/path/to/datasets.yaml uv run syncly-dataset-mcp

타임아웃 오류 (list_data_queries 4분 후 실패)

Claude Desktop을 완전히 종료 후 재시작
새 대화에서 시도 (기존 대화 세션이 서버 프로세스를 재사용하지 않음)
claude_desktop_config.json에 SYNCLY_DB_PATH, SYNCLY_CONFIG_PATH env 설정 확인

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

syncly-dataset-mcp

README

syncly-dataset-mcp

MCP 도구 (1단계 구현)

지원 메트릭 (get_metric_summary)

2단계 예정 도구 (미구현)

프로젝트 운영 방식

디렉터리 구조

데이터셋 라이프사이클

새 데이터셋 추가하기

케이스 A: 스키마가 이미 맞는 JSONL

케이스 B: 원본 스키마가 다르거나 CSV/JSON array인 경우

데이터 교체 (같은 데이터셋 ID, 새 파일)

데이터셋 숨기기

Claude Desktop 재시작 없이 새 데이터 반영

설치

사전 요구사항

프로젝트 설치

데이터 준비

1. datasets.yaml 설정

2. DuckDB 적재

Claude Desktop 연결

설정 파일 위치

설정 내용

테스트 질문 예시

데이터셋 탐색

지표 요약

검색

ID 조회

SQL 직접 실행

안전장치

Troubleshooting

MCP 서버가 Claude Desktop에 표시되지 않을 때

DuckDB 테이블이 없다는 오류

datasets.yaml을 못 찾는다는 오류

타임아웃 오류 (list_data_queries 4분 후 실패)

Recommended Servers

지원 메트릭 (`get_metric_summary`)