Skip to content
Snippets Groups Projects
  1. Apr 23, 2024
    • KevinHuSh's avatar
      fix bug of table in docx (#510) · 369400c4
      KevinHuSh authored
      ### What problem does this PR solve?
      #509 
      ### Type of change
      
      - [x] Bug Fix (non-breaking change which fixes an issue)
      369400c4
    • chrysanthemum-boy's avatar
      Add `.doc` file parser. (#497) · 72384b19
      chrysanthemum-boy authored
      
      ### What problem does this PR solve?
      Add `.doc` file parser, using tika.
      ```
      pip install tika
      ```
      ```
      from tika import parser
      from io import BytesIO
      
      def extract_text_from_doc_bytes(doc_bytes):
          file_like_object = BytesIO(doc_bytes)
          parsed = parser.from_buffer(file_like_object)
          return parsed["content"]
      ```
      ### Type of change
      
      - [x] New Feature (non-breaking change which adds functionality)
      
      ---------
      
      Co-authored-by: default avatarchrysanthemum-boy <fannc@qq.com>
      72384b19
    • KevinHuSh's avatar
      enlarge docker memory usage (#501) · 0dfc8ddc
      KevinHuSh authored
      ### What problem does this PR solve?
      
      ### Type of change
      
      - [x] Refactoring
      0dfc8ddc
  2. Apr 22, 2024
  3. Apr 19, 2024
  4. Apr 16, 2024
    • KevinHuSh's avatar
      fix gb2312 encoding issue (#394) · d4e0bfc8
      KevinHuSh authored
      ### What problem does this PR solve?
      
      Issue link:#384
      ### Type of change
      
      - [x] Bug Fix (non-breaking change which fixes an issue)
      d4e0bfc8
  5. Apr 07, 2024
  6. Mar 27, 2024
  7. Mar 22, 2024
  8. Mar 20, 2024
  9. Mar 19, 2024
  10. Mar 08, 2024
  11. Mar 05, 2024
  12. Mar 04, 2024
  13. Mar 01, 2024
  14. Feb 29, 2024
  15. Feb 23, 2024
  16. Feb 21, 2024
  17. Feb 19, 2024
  18. Feb 08, 2024
  19. Feb 05, 2024
  20. Feb 02, 2024
Loading