Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
dify
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
ai-tech
dify
Commits
bcd744b6
Unverified
Commit
bcd744b6
authored
Sep 28, 2023
by
zxhlyh
Committed by
GitHub
Sep 28, 2023
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix: doc (#1256)
parent
5e511e01
Changes
3
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
with
324 additions
and
141 deletions
+324
-141
template.en.mdx
web/app/(commonLayout)/datasets/template/template.en.mdx
+161
-69
template.zh.mdx
web/app/(commonLayout)/datasets/template/template.zh.mdx
+162
-71
layout.tsx
web/app/layout.tsx
+1
-1
No files found.
web/app/(commonLayout)/datasets/template/template.en.mdx
View file @
bcd744b6
...
@@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
###
Path
Query
### Query
<Properties>
<Properties>
<Property name='page' type='string' key='page'>
<Property name='page' type='string' key='page'>
Page number
Page number
...
@@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
This api is based on an existing dataset and creates a new document through text based on this dataset.
This api is based on an existing dataset and creates a new document through text based on this dataset.
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
<Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode
Index mode
-
high_quality
High quality: embedding using embedding model, built as vector database index
-
<code>high_quality</code>
High quality: embedding using embedding model, built as vector database index
-
economy
Economy: Build using inverted index of Keyword Table Index
-
<code>economy</code>
Economy: Build using inverted index of Keyword Table Index
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
Processing rules
-
mode
(string) Cleaning, segmentation mode, automatic / custom
-
<code>mode</code>
(string) Cleaning, segmentation mode, automatic / custom
-
rules (tex
t) Custom rules (in automatic mode, this field is empty)
-
<code>rules</code> (objec
t) Custom rules (in automatic mode, this field is empty)
-
pre_processing_rules
(array[object]) Preprocessing rules
-
<code>pre_processing_rules</code>
(array[object]) Preprocessing rules
-
id
(string) Unique identifier for the preprocessing rule
-
<code>id</code>
(string) Unique identifier for the preprocessing rule
- enumerate
- enumerate
-
remove_extra_spaces
Replace consecutive spaces, newlines, tabs
-
<code>remove_extra_spaces</code>
Replace consecutive spaces, newlines, tabs
-
remove_urls_emails
Delete URL, email address
-
<code>remove_urls_emails</code>
Delete URL, email address
-
enabled
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
<code>enabled</code>
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
segmentation
(object) segmentation rules
-
<code>segmentation</code>
(object) segmentation rules
-
separator
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
<code>separator</code>
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
max_tokens
Maximum length (token) defaults to 1000
-
<code>max_tokens</code>
Maximum length (token) defaults to 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -238,7 +238,8 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -238,7 +238,8 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Row>
<Row>
<Col>
<Col>
This api is based on an existing dataset and creates a new document through a file based on this dataset.
This api is based on an existing dataset and creates a new document through a file based on this dataset.
### Path Params
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -259,22 +260,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -259,22 +260,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
<Property name='indexing_technique' type='string' key='indexing_technique'>
Index mode
Index mode
-
high_quality
High quality: embedding using embedding model, built as vector database index
-
<code>high_quality</code>
High quality: embedding using embedding model, built as vector database index
-
economy
Economy: Build using inverted index of Keyword Table Index
-
<code>economy</code>
Economy: Build using inverted index of Keyword Table Index
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
Processing rules
-
mode
(string) Cleaning, segmentation mode, automatic / custom
-
<code>mode</code>
(string) Cleaning, segmentation mode, automatic / custom
-
rules (tex
t) Custom rules (in automatic mode, this field is empty)
-
<code>rules</code> (objec
t) Custom rules (in automatic mode, this field is empty)
-
pre_processing_rules
(array[object]) Preprocessing rules
-
<code>pre_processing_rules</code>
(array[object]) Preprocessing rules
-
id
(string) Unique identifier for the preprocessing rule
-
<code>id</code>
(string) Unique identifier for the preprocessing rule
- enumerate
- enumerate
-
remove_extra_spaces
Replace consecutive spaces, newlines, tabs
-
<code>remove_extra_spaces</code>
Replace consecutive spaces, newlines, tabs
-
remove_urls_emails
Delete URL, email address
-
<code>remove_urls_emails</code>
Delete URL, email address
-
enabled
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
<code>enabled</code>
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
segmentation
(object) segmentation rules
-
<code>segmentation</code>
(object) segmentation rules
-
separator
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
<code>separator</code>
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
max_tokens
Maximum length (token) defaults to 1000
-
<code>max_tokens</code>
Maximum length (token) defaults to 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -338,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -338,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
This api is based on an existing dataset and updates the document through text based on this dataset.
This api is based on an existing dataset and updates the document through text based on this dataset.
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -358,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -358,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
Processing rules
-
mode
(string) Cleaning, segmentation mode, automatic / custom
-
<code>mode</code>
(string) Cleaning, segmentation mode, automatic / custom
-
rules (tex
t) Custom rules (in automatic mode, this field is empty)
-
<code>rules</code> (objec
t) Custom rules (in automatic mode, this field is empty)
-
pre_processing_rules
(array[object]) Preprocessing rules
-
<code>pre_processing_rules</code>
(array[object]) Preprocessing rules
-
id
(string) Unique identifier for the preprocessing rule
-
<code>id</code>
(string) Unique identifier for the preprocessing rule
- enumerate
- enumerate
-
remove_extra_spaces
Replace consecutive spaces, newlines, tabs
-
<code>remove_extra_spaces</code>
Replace consecutive spaces, newlines, tabs
-
remove_urls_emails
Delete URL, email address
-
<code>remove_urls_emails</code>
Delete URL, email address
-
enabled
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
<code>enabled</code>
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
segmentation
(object) segmentation rules
-
<code>segmentation</code>
(object) segmentation rules
-
separator
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
<code>separator</code>
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
max_tokens
Maximum length (token) defaults to 1000
-
<code>max_tokens</code>
Maximum length (token) defaults to 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -435,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -435,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
This api is based on an existing dataset, and updates documents through files based on this dataset
This api is based on an existing dataset, and updates documents through files based on this dataset
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -455,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -455,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
Processing rules
Processing rules
-
mode
(string) Cleaning, segmentation mode, automatic / custom
-
<code>mode</code>
(string) Cleaning, segmentation mode, automatic / custom
-
rules (tex
t) Custom rules (in automatic mode, this field is empty)
-
<code>rules</code> (objec
t) Custom rules (in automatic mode, this field is empty)
-
pre_processing_rules
(array[object]) Preprocessing rules
-
<code>pre_processing_rules</code>
(array[object]) Preprocessing rules
-
id
(string) Unique identifier for the preprocessing rule
-
<code>id</code>
(string) Unique identifier for the preprocessing rule
- enumerate
- enumerate
-
remove_extra_spaces
Replace consecutive spaces, newlines, tabs
-
<code>remove_extra_spaces</code>
Replace consecutive spaces, newlines, tabs
-
remove_urls_emails
Delete URL, email address
-
<code>remove_urls_emails</code>
Delete URL, email address
-
enabled
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
<code>enabled</code>
(bool) Whether to select this rule or not. If no document ID is passed in, it represents the default value.
-
segmentation
(object) segmentation rules
-
<code>segmentation</code>
(object) segmentation rules
-
separator
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
<code>separator</code>
Custom segment identifier, currently only allows one delimiter to be set. Default is \n
-
max_tokens
Maximum length (token) defaults to 1000
-
<code>max_tokens</code>
Maximum length (token) defaults to 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -527,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -527,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -582,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -582,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -624,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -624,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
</Property>
</Property>
</Properties>
</Properties>
###
Path
Query
### Query
<Properties>
<Properties>
<Property name='keyword' type='string' key='keyword'>
<Property name='keyword' type='string' key='keyword'>
Search keywords, currently only search document names(optional)
Search keywords, currently only search document names(optional)
...
@@ -699,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -699,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Pa
th Pa
rams
### Params
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
Dataset ID
Dataset ID
...
@@ -712,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -712,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
### Request Body
### Request Body
<Properties>
<Properties>
<Property name='segments' type='object list' key='segments'>
<Property name='segments' type='object list' key='segments'>
segments (object list) Segmented content
- <code>content</code> (text) Text content/question content, required
- content (text) Text content/question content, required
- <code>answer</code> (text) Answer content, if the mode of the data set is qa mode, pass the value(optional)
- answer(text) Answer content, if the mode of the data set is qa mode, pass the value(optional)
- <code>keywords</code> (list) Keywords(optional)
- keywords(list) Keywords(optional)
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -778,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -778,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
---
---
Error message
<Row>
- **document_indexing**: Document indexing failed
<Col>
- **provider_not_initialize**: Embedding model is not configured
### Error message
- **not_found**, Document does not exist
<Properties>
- **dataset_name_duplicate**: Duplicate dataset name
<Property name='code' type='string' key='code'>
- **provider_quota_exceeded**: Model quota exceeds limit
Error code
- **dataset_not_initialized**: The dataset has not been initialized yet
</Property>
- **unsupported_file_type**: Unsupported file types.
</Properties>
- Currently only supports, txt, markdown, md, pdf, html, htm, xlsx, docx, csv
<Properties>
- **too_many_files**: There are too many files. Currently, only a single file is uploaded
<Property name='status' type='number' key='status'>
- **file_too_large*: The file is too large, support below 15M based on you environment configuration
Error status
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
Error message
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th class="p-2 border border-slate-300">code</th>
<th class="p-2 border border-slate-300">status</th>
<th class="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td class="p-2 border border-slate-300">no_file_uploaded</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">too_many_files</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">file_too_large</td>
<td class="p-2 border border-slate-300">413</td>
<td class="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">unsupported_file_type</td>
<td class="p-2 border border-slate-300">415</td>
<td class="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">high_quality_dataset_only</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_not_initialized</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">archived_document_immutable</td>
<td class="p-2 border border-slate-300">403</td>
<td class="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_name_duplicate</td>
<td class="p-2 border border-slate-300">409</td>
<td class="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_action</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_already_finished</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_indexing</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_metadata</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div class="pb-4" />
web/app/(commonLayout)/datasets/template/template.zh.mdx
View file @
bcd744b6
...
@@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -71,7 +71,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
###
Path
Query
### Query
<Properties>
<Properties>
<Property name='page' type='string' key='page'>
<Property name='page' type='string' key='page'>
页码
页码
...
@@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -136,7 +136,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
此接口基于已存在数据集,在此数据集的基础上通过文本创建新的文档
此接口基于已存在数据集,在此数据集的基础上通过文本创建新的文档
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -153,22 +153,22 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
<Property name='indexing_technique' type='string' key='indexing_technique'>
索引方式
索引方式
-
high_quality
高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
-
<code>high_quality</code>
高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
-
economy
经济:使用 Keyword Table Index 的倒排索引进行构建
-
<code>economy</code>
经济:使用 Keyword Table Index 的倒排索引进行构建
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
处理规则
处理规则
-
mode
(string) 清洗、分段模式 ,automatic 自动 / custom 自定义
-
<code>mode</code>
(string) 清洗、分段模式 ,automatic 自动 / custom 自定义
-
rules (tex
t) 自定义规则(自动模式下,该字段为空)
-
<code>rules</code> (objec
t) 自定义规则(自动模式下,该字段为空)
-
pre_processing_rules
(array[object]) 预处理规则
-
<code>pre_processing_rules</code>
(array[object]) 预处理规则
-
id
(string) 预处理规则的唯一标识符
-
<code>id</code>
(string) 预处理规则的唯一标识符
- 枚举:
- 枚举:
-
remove_extra_spaces
替换连续空格、换行符、制表符
-
<code>remove_extra_spaces</code>
替换连续空格、换行符、制表符
-
remove_urls_emails
删除 URL、电子邮件地址
-
<code>remove_urls_emails</code>
删除 URL、电子邮件地址
-
enabled
(bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
<code>enabled</code>
(bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
segmentation
(object) 分段规则
-
<code>segmentation</code>
(object) 分段规则
-
separator
自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
-
<code>separator</code>
自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
-
max_tokens
最大长度 (token) 默认为 1000
-
<code>max_tokens</code>
最大长度 (token) 默认为 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -239,7 +239,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -239,7 +239,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
此接口基于已存在数据集,在此数据集的基础上通过文件创建新的文档
此接口基于已存在数据集,在此数据集的基础上通过文件创建新的文档
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -252,30 +252,30 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -252,30 +252,30 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
源文档 ID (选填)
源文档 ID (选填)
- 用于重新上传文档或修改文档清洗、分段配置,缺失的信息从源文档复制
- 用于重新上传文档或修改文档清洗、分段配置,缺失的信息从源文档复制
- 源文档不可为归档的文档
- 源文档不可为归档的文档
- 当传入
original_document_id 时,代表文档进行更新操作,process_rule
为可填项目,不填默认使用源文档的分段方式
- 当传入
<code>original_document_id</code> 时,代表文档进行更新操作,<code>process_rule</code>
为可填项目,不填默认使用源文档的分段方式
- 未传入
original_document_id 时,代表文档进行新增操作,process_rule
为必填
- 未传入
<code>original_document_id</code> 时,代表文档进行新增操作,<code>process_rule</code>
为必填
</Property>
</Property>
<Property name='file' type='multipart/form-data' key='file'>
<Property name='file' type='multipart/form-data' key='file'>
需要上传的文件。
需要上传的文件。
</Property>
</Property>
<Property name='indexing_technique' type='string' key='indexing_technique'>
<Property name='indexing_technique' type='string' key='indexing_technique'>
索引方式
索引方式
-
high_quality
高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
-
<code>high_quality</code>
高质量:使用 embedding 模型进行嵌入,构建为向量数据库索引
-
economy
经济:使用 Keyword Table Index 的倒排索引进行构建
-
<code>economy</code>
经济:使用 Keyword Table Index 的倒排索引进行构建
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
处理规则
处理规则
-
mode (string) 清洗、分段模式 ,automatic 自动 / custom 自定义。
-
<code>mode</code> (string) 清洗、分段模式 ,automatic 自动 / custom 自定义
-
rules (text) 自定义规则(自动模式下,该字段为空)
-
<code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
-
pre_processing_rules
(array[object]) 预处理规则
-
<code>pre_processing_rules</code>
(array[object]) 预处理规则
-
id (string) 预处理规则的唯一标识符
-
<code>id</code> (string) 预处理规则的唯一标识符
- 枚举:
- 枚举:
-
remove_extra_spaces
替换连续空格、换行符、制表符
-
<code>remove_extra_spaces</code>
替换连续空格、换行符、制表符
-
remove_urls_emails
删除 URL、电子邮件地址
-
<code>remove_urls_emails</code>
删除 URL、电子邮件地址
-
enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值。
-
<code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
segmentation
(object) 分段规则
-
<code>segmentation</code>
(object) 分段规则
-
separator 自定义分段标识符,目前仅允许设置一个分隔符,
默认为 \n
-
<code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符。
默认为 \n
-
max_tokens
最大长度 (token) 默认为 1000
-
<code>max_tokens</code>
最大长度 (token) 默认为 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -339,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -339,7 +339,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
此接口基于已存在数据集,在此数据集的基础上通过文本更新文档
此接口基于已存在数据集,在此数据集的基础上通过文本更新文档
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -359,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -359,17 +359,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
处理规则(选填)
处理规则(选填)
-
mode (string) 清洗、分段模式 ,automatic 自动 / custom 自定义。
-
<code>mode</code> (string) 清洗、分段模式 ,automatic 自动 / custom 自定义
-
rules (text) 自定义规则(自动模式下,该字段为空)
-
<code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
-
pre_processing_rules
(array[object]) 预处理规则
-
<code>pre_processing_rules</code>
(array[object]) 预处理规则
-
id (string) 预处理规则的唯一标识符
-
<code>id</code> (string) 预处理规则的唯一标识符
- 枚举:
- 枚举:
-
remove_extra_spaces
替换连续空格、换行符、制表符
-
<code>remove_extra_spaces</code>
替换连续空格、换行符、制表符
-
remove_urls_emails
删除 URL、电子邮件地址
-
<code>remove_urls_emails</code>
删除 URL、电子邮件地址
-
enabled (bool) 是否选中该规则,不传入文档 ID 时代表默认值。
-
<code>enabled</code> (bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
segmentation
(object) 分段规则
-
<code>segmentation</code>
(object) 分段规则
-
separator
自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
-
<code>separator</code>
自定义分段标识符,目前仅允许设置一个分隔符。默认为 \n
-
max_tokens
最大长度 (token) 默认为 1000
-
<code>max_tokens</code>
最大长度 (token) 默认为 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -436,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -436,7 +436,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
<Col>
<Col>
此接口基于已存在数据集,在此数据集的基础上通过文件更新文档的操作。
此接口基于已存在数据集,在此数据集的基础上通过文件更新文档的操作。
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -456,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -456,17 +456,17 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
</Property>
</Property>
<Property name='process_rule' type='object' key='process_rule'>
<Property name='process_rule' type='object' key='process_rule'>
处理规则(选填)
处理规则(选填)
-
mode (string) 清洗、分段模式 ,automatic 自动 / custom 自定义。
-
<code>mode</code> (string) 清洗、分段模式 ,automatic 自动 / custom 自定义
-
rules (text) 自定义规则(自动模式下,该字段为空)
-
<code>rules</code> (object) 自定义规则(自动模式下,该字段为空)
-
pre_processing_rules
(array[object]) 预处理规则
-
<code>pre_processing_rules</code>
(array[object]) 预处理规则
-
id (string) 预处理规则的唯一标识符
-
<code>id</code> (string) 预处理规则的唯一标识符
- 枚举:
- 枚举:
-
remove_extra_spaces
替换连续空格、换行符、制表符
-
<code>remove_extra_spaces</code>
替换连续空格、换行符、制表符
-
remove_urls_emails
删除 URL、电子邮件地址
-
<code>remove_urls_emails</code>
删除 URL、电子邮件地址
-
enabled
(bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
<code>enabled</code>
(bool) 是否选中该规则,不传入文档 ID 时代表默认值
-
segmentation
(object) 分段规则
-
<code>segmentation</code>
(object) 分段规则
-
separator 自定义分段标识符,目前仅允许设置一个分隔符,
默认为 \n
-
<code>separator</code> 自定义分段标识符,目前仅允许设置一个分隔符。
默认为 \n
-
max_tokens
最大长度 (token) 默认为 1000
-
<code>max_tokens</code>
最大长度 (token) 默认为 1000
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -528,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -528,7 +528,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -583,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -583,7 +583,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -625,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -625,14 +625,14 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
</Property>
</Property>
</Properties>
</Properties>
###
Path
Query
### Query
<Properties>
<Properties>
<Property name='keyword' type='string' key='keyword'>
<Property name='keyword' type='string' key='keyword'>
搜索关键词,可选,目前仅搜索文档名称
搜索关键词,可选,目前仅搜索文档名称
...
@@ -700,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -700,7 +700,7 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
/>
/>
<Row>
<Row>
<Col>
<Col>
### Path
Params
### Path
<Properties>
<Properties>
<Property name='dataset_id' type='string' key='dataset_id'>
<Property name='dataset_id' type='string' key='dataset_id'>
数据集 ID
数据集 ID
...
@@ -713,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -713,10 +713,9 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
### Request Body
### Request Body
<Properties>
<Properties>
<Property name='segments' type='object list' key='segments'>
<Property name='segments' type='object list' key='segments'>
segments (object list) 分段内容
- <code>content</code> (text) 文本内容/问题内容,必填
- content (text) 文本内容/问题内容,必填
- <code>answer</code> (text) 答案内容,非必填,如果数据集的模式为qa模式则传值
- answer(text) 答案内容,非必填,如果数据集的模式为qa模式则传值
- <code>keywords</code> (list) 关键字,非必填
- keywords(list) 关键字,非必填
</Property>
</Property>
</Properties>
</Properties>
</Col>
</Col>
...
@@ -779,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
...
@@ -779,14 +778,106 @@ import { Row, Col, Properties, Property, Heading, SubProperty, Paragraph } from
---
---
错误信息
<Row>
- **document_indexing**: 文档索引失败
<Col>
- **provider_not_initialize**: Embedding 模型未配置
### 错误信息
- **not_found**,文档不存在
<Properties>
- **dataset_name_duplicate**: 数据集名称重复
<Property name='code' type='string' key='code'>
- **provider_quota_exceeded**: 模型额度超过限制
返回的错误代码
- **dataset_not_initialized**: 数据集还未初始化
</Property>
- **unsupported_file_type**: 不支持的文件类型
</Properties>
- 目前只支持:txt, markdown, md, pdf, html, htm, xlsx, docx, csv
<Properties>
- **too_many_files**: 文件数量过多,暂时只支持单一文件上传
<Property name='status' type='number' key='status'>
- **file_too_large*: 文件太大,默认支持15M以下, 具体需要参考环境变量配置
返回的错误状态
</Property>
</Properties>
<Properties>
<Property name='message' type='string' key='message'>
返回的错误信息
</Property>
</Properties>
</Col>
<Col>
<CodeGroup title="Example">
```json {{ title: 'Response' }}
{
"code": "no_file_uploaded",
"message": "Please upload your file.",
"status": 400
}
```
</CodeGroup>
</Col>
</Row>
<table className="max-w-auto border-collapse border border-slate-400" style={{ maxWidth: 'none', width: 'auto' }}>
<thead style={{ background: '#f9fafc' }}>
<tr>
<th class="p-2 border border-slate-300">code</th>
<th class="p-2 border border-slate-300">status</th>
<th class="p-2 border border-slate-300">message</th>
</tr>
</thead>
<tbody>
<tr>
<td class="p-2 border border-slate-300">no_file_uploaded</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Please upload your file.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">too_many_files</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Only one file is allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">file_too_large</td>
<td class="p-2 border border-slate-300">413</td>
<td class="p-2 border border-slate-300">File size exceeded.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">unsupported_file_type</td>
<td class="p-2 border border-slate-300">415</td>
<td class="p-2 border border-slate-300">File type not allowed.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">high_quality_dataset_only</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Current operation only supports 'high-quality' datasets.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_not_initialized</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The dataset is still being initialized or indexing. Please wait a moment.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">archived_document_immutable</td>
<td class="p-2 border border-slate-300">403</td>
<td class="p-2 border border-slate-300">The archived document is not editable.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">dataset_name_duplicate</td>
<td class="p-2 border border-slate-300">409</td>
<td class="p-2 border border-slate-300">The dataset name already exists. Please modify your dataset name.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_action</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">Invalid action.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_already_finished</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document has been processed. Please refresh the page or go to the document details.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">document_indexing</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The document is being processed and cannot be edited.</td>
</tr>
<tr>
<td class="p-2 border border-slate-300">invalid_metadata</td>
<td class="p-2 border border-slate-300">400</td>
<td class="p-2 border border-slate-300">The metadata content is incorrect. Please check and verify.</td>
</tr>
</tbody>
</table>
<div class="pb-4" />
web/app/layout.tsx
View file @
bcd744b6
...
@@ -20,7 +20,7 @@ const LocaleLayout = ({
...
@@ -20,7 +20,7 @@ const LocaleLayout = ({
return
(
return
(
<
html
lang=
{
locale
??
'en'
}
className=
"h-full"
>
<
html
lang=
{
locale
??
'en'
}
className=
"h-full"
>
<
body
<
body
className=
"h-full"
className=
"h-full
select-auto
"
data
-
api
-
prefix=
{
process
.
env
.
NEXT_PUBLIC_API_PREFIX
}
data
-
api
-
prefix=
{
process
.
env
.
NEXT_PUBLIC_API_PREFIX
}
data
-
pubic
-
api
-
prefix=
{
process
.
env
.
NEXT_PUBLIC_PUBLIC_API_PREFIX
}
data
-
pubic
-
api
-
prefix=
{
process
.
env
.
NEXT_PUBLIC_PUBLIC_API_PREFIX
}
data
-
public
-
edition=
{
process
.
env
.
NEXT_PUBLIC_EDITION
}
data
-
public
-
edition=
{
process
.
env
.
NEXT_PUBLIC_EDITION
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment