165 lines
5.9 KiB
Markdown
165 lines
5.9 KiB
Markdown
---
|
||
name: nutrient-document-processing
|
||
description: 使用 Nutrient DWS API 進行文件處理、轉換、OCR、擷取、脫敏 (Redact)、簽署及填寫。支援 PDF, DOCX, XLSX, PPTX, HTML 及圖片。
|
||
---
|
||
|
||
# Nutrient 文件處理 (Nutrient Document Processing)
|
||
|
||
使用 [Nutrient DWS Processor API](https://www.nutrient.io/api/) 進行文件處理。功能包含格式轉換、文字與表格擷取、掃描檔 OCR、個資 (PII) 脫敏、增加浮水印、數位簽署以及 PDF 表單填寫。
|
||
|
||
## 設定
|
||
|
||
在 **[nutrient.io](https://dashboard.nutrient.io/sign_up/?product=processor)** 獲取免費的 API 金鑰。
|
||
|
||
```bash
|
||
export NUTRIENT_API_KEY="pdf_live_..."
|
||
```
|
||
|
||
所有請求皆發送至 `https://api.nutrient.io/build`,採用 multipart POST 方式,並帶有一個包含指令的 `instructions` JSON 欄位。
|
||
|
||
## 操作指令
|
||
|
||
### 文件轉換
|
||
|
||
```bash
|
||
# DOCX 轉為 PDF
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.docx=@document.docx" \
|
||
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
|
||
-o output.pdf
|
||
|
||
# PDF 轉為 DOCX
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
|
||
-o output.docx
|
||
|
||
# HTML 轉為 PDF
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "index.html=@index.html" \
|
||
-F 'instructions={"parts":[{"html":"index.html"}]}' \
|
||
-o output.pdf
|
||
```
|
||
|
||
支援的輸入格式:PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS。
|
||
|
||
### 文字與數據擷取
|
||
|
||
```bash
|
||
# 擷取純文字
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
|
||
-o output.txt
|
||
|
||
# 將表格擷取為 Excel 檔案
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
|
||
-o tables.xlsx
|
||
```
|
||
|
||
### 掃描文件 OCR
|
||
|
||
```bash
|
||
# 通過 OCR 生成可搜尋的 PDF (支援 100 多種語言)
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "scanned.pdf=@scanned.pdf" \
|
||
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
|
||
-o searchable.pdf
|
||
```
|
||
|
||
語言設定:支援 100 多種語言,使用 ISO 639-2 代碼(例如:`eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`)。直接使用語言全名如 `english` 或 `german` 亦可。請參考 [OCR 語言支援表](https://www.nutrient.io/guides/document-engine/ocr/language-support/) 獲取所有支援的代碼。
|
||
|
||
### 敏感資訊脫敏 (Redaction)
|
||
|
||
```bash
|
||
# 基於預設模式 (如 身分證字號 SSN、電子郵件)
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
|
||
-o redacted.pdf
|
||
|
||
# 基於正規表示式 (Regex)
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
|
||
-o redacted.pdf
|
||
```
|
||
|
||
預設範本 (Presets):`social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`。
|
||
|
||
### 增加浮水印
|
||
|
||
```bash
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
|
||
-o watermarked.pdf
|
||
```
|
||
|
||
### 數位簽署
|
||
|
||
```bash
|
||
# 自簽署 CMS 簽章
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
|
||
-o signed.pdf
|
||
```
|
||
|
||
### 填寫 PDF 表單
|
||
|
||
```bash
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "form.pdf=@form.pdf" \
|
||
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
|
||
-o filled.pdf
|
||
```
|
||
|
||
## MCP 伺服器 (替代方案)
|
||
|
||
對於原生工具整合,可使用 MCP 伺服器替代 curl 指令:
|
||
|
||
```json
|
||
{
|
||
"mcpServers": {
|
||
"nutrient-dws": {
|
||
"command": "npx",
|
||
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
|
||
"env": {
|
||
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
|
||
"SANDBOX_PATH": "/path/to/working/directory"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 何時使用
|
||
|
||
- 在不同格式間進行文件轉換(PDF, DOCX, XLSX, PPTX, HTML, 圖片)。
|
||
- 從 PDF 中擷取文字、表格或鍵值對。
|
||
- 對掃描文件或圖片進行 OCR 文字辨識。
|
||
- 在分享文件前對個資 (PII) 進行脫敏。
|
||
- 為草案或機密文件增加浮水印。
|
||
- 對合約或協議進行數位簽署。
|
||
- 透過程式化方式填寫 PDF 表單。
|
||
|
||
## 相關連結
|
||
|
||
- [API 測試場 (Playground)](https://dashboard.nutrient.io/processor-api/playground/)
|
||
- [完整 API 文件](https://www.nutrient.io/guides/dws-processor/)
|
||
- [npm MCP 伺服器專頁](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)
|