Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
D
dify
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
ai-tech
dify
Commits
8fe83750
Unverified
Commit
8fe83750
authored
Mar 07, 2024
by
Yeuoly
Committed by
GitHub
Mar 07, 2024
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Fix/jina tokenizer cache (#2735)
parent
1809f059
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
20 additions
and
8 deletions
+20
-8
jina_tokenizer.py
...ime/model_providers/jina/text_embedding/jina_tokenizer.py
+20
-8
No files found.
api/core/model_runtime/model_providers/jina/text_embedding/jina_tokenizer.py
View file @
8fe83750
from
os.path
import
abspath
,
dirname
,
join
from
threading
import
Lock
from
transformers
import
AutoTokenizer
class
JinaTokenizer
:
@
staticmethod
def
_get_num_tokens_by_jina_base
(
text
:
str
)
->
int
:
_tokenizer
=
None
_lock
=
Lock
()
@
classmethod
def
_get_tokenizer
(
cls
):
if
cls
.
_tokenizer
is
None
:
with
cls
.
_lock
:
if
cls
.
_tokenizer
is
None
:
base_path
=
abspath
(
__file__
)
gpt2_tokenizer_path
=
join
(
dirname
(
base_path
),
'tokenizer'
)
cls
.
_tokenizer
=
AutoTokenizer
.
from_pretrained
(
gpt2_tokenizer_path
)
return
cls
.
_tokenizer
@
classmethod
def
_get_num_tokens_by_jina_base
(
cls
,
text
:
str
)
->
int
:
"""
use jina tokenizer to get num tokens
"""
base_path
=
abspath
(
__file__
)
gpt2_tokenizer_path
=
join
(
dirname
(
base_path
),
'tokenizer'
)
tokenizer
=
AutoTokenizer
.
from_pretrained
(
gpt2_tokenizer_path
)
tokenizer
=
cls
.
_get_tokenizer
()
tokens
=
tokenizer
.
encode
(
text
)
return
len
(
tokens
)
@
staticmethod
def
get_num_tokens
(
text
:
str
)
->
int
:
return
JinaTokenizer
.
_get_num_tokens_by_jina_base
(
text
)
\ No newline at end of file
@
classmethod
def
get_num_tokens
(
cls
,
text
:
str
)
->
int
:
return
cls
.
_get_num_tokens_by_jina_base
(
text
)
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment