Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
dify
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
ai-tech
dify
Commits
dd21c0ca
Commit
dd21c0ca
authored
Jul 26, 2023
by
John Wang
Browse files
Options
Browse Files
Download
Plain Diff
Merge branch 'feat/universal-chat' into deploy/dev
parents
457a1c4f
84932436
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
5 deletions
+5
-5
web_reader_tool.py
api/core/tool/web_reader_tool.py
+5
-5
No files found.
api/core/tool/web_reader_tool.py
View file @
dd21c0ca
...
@@ -88,9 +88,9 @@ class WebReaderTool(BaseTool):
...
@@ -88,9 +88,9 @@ class WebReaderTool(BaseTool):
texts
=
character_splitter
.
split_text
(
page_contents
)
texts
=
character_splitter
.
split_text
(
page_contents
)
docs
=
[
Document
(
page_content
=
t
)
for
t
in
texts
]
docs
=
[
Document
(
page_content
=
t
)
for
t
in
texts
]
# only use first
10
docs
# only use first
5
docs
if
len
(
docs
)
>
10
:
if
len
(
docs
)
>
5
:
docs
=
docs
[:
10
]
docs
=
docs
[:
5
]
chain
=
load_summarize_chain
(
self
.
llm
,
chain_type
=
"refine"
,
callbacks
=
self
.
callbacks
)
chain
=
load_summarize_chain
(
self
.
llm
,
chain_type
=
"refine"
,
callbacks
=
self
.
callbacks
)
try
:
try
:
...
@@ -124,7 +124,7 @@ def get_url(url: str) -> str:
...
@@ -124,7 +124,7 @@ def get_url(url: str) -> str:
}
}
supported_content_types
=
file_extractor
.
SUPPORT_URL_CONTENT_TYPES
+
[
"text/html"
]
supported_content_types
=
file_extractor
.
SUPPORT_URL_CONTENT_TYPES
+
[
"text/html"
]
head_response
=
requests
.
head
(
url
,
headers
=
headers
,
allow_redirects
=
True
,
timeout
=
10
)
head_response
=
requests
.
head
(
url
,
headers
=
headers
,
allow_redirects
=
True
,
timeout
=
(
5
,
10
)
)
if
head_response
.
status_code
!=
200
:
if
head_response
.
status_code
!=
200
:
return
"URL returned status code {}."
.
format
(
head_response
.
status_code
)
return
"URL returned status code {}."
.
format
(
head_response
.
status_code
)
...
@@ -137,7 +137,7 @@ def get_url(url: str) -> str:
...
@@ -137,7 +137,7 @@ def get_url(url: str) -> str:
if
main_content_type
in
file_extractor
.
SUPPORT_URL_CONTENT_TYPES
:
if
main_content_type
in
file_extractor
.
SUPPORT_URL_CONTENT_TYPES
:
return
FileExtractor
.
load_from_url
(
url
,
return_text
=
True
)
return
FileExtractor
.
load_from_url
(
url
,
return_text
=
True
)
response
=
requests
.
get
(
url
,
headers
=
headers
,
allow_redirects
=
True
,
timeout
=
30
)
response
=
requests
.
get
(
url
,
headers
=
headers
,
allow_redirects
=
True
,
timeout
=
(
5
,
30
)
)
a
=
extract_using_readabilipy
(
response
.
text
)
a
=
extract_using_readabilipy
(
response
.
text
)
if
not
a
[
'plain_text'
]
or
not
a
[
'plain_text'
]
.
strip
():
if
not
a
[
'plain_text'
]
or
not
a
[
'plain_text'
]
.
strip
():
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment