Skip to content
Snippets Groups Projects
Unverified Commit 72384b19 authored by chrysanthemum-boy's avatar chrysanthemum-boy Committed by GitHub
Browse files

Add `.doc` file parser. (#497)


### What problem does this PR solve?
Add `.doc` file parser, using tika.
```
pip install tika
```
```
from tika import parser
from io import BytesIO

def extract_text_from_doc_bytes(doc_bytes):
    file_like_object = BytesIO(doc_bytes)
    parsed = parser.from_buffer(file_like_object)
    return parsed["content"]
```
### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: default avatarchrysanthemum-boy <fannc@qq.com>
parent 0dfc8ddc
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment