There was an error fetching the commit references. Please try again later.
Add `.doc` file parser. (#497)
### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO
def extract_text_from_doc_bytes(doc_bytes):
file_like_object = BytesIO(doc_bytes)
parsed = parser.from_buffer(file_like_object)
return parsed["content"]
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by:
chrysanthemum-boy <fannc@qq.com>
Showing
- api/utils/file_utils.py 1 addition, 1 deletionapi/utils/file_utils.py
- rag/app/book.py 12 additions, 1 deletionrag/app/book.py
- rag/app/laws.py 11 additions, 1 deletionrag/app/laws.py
- rag/app/naive.py 10 additions, 1 deletionrag/app/naive.py
- rag/app/one.py 11 additions, 1 deletionrag/app/one.py
- requirements.txt 2 additions, 1 deletionrequirements.txt
Loading
Please register or sign in to comment