Sha256: 0ef596cabbad528492e3671d7f1e429c6052db11b24b6593bffd5988c5ee5f0e
Contents?: true
Size: 420 Bytes
Versions: 1
Compression:
Stored size: 420 Bytes
Contents
# frozen_string_literal: true module Baran class SentenceTextSplitter < TextSplitter def initialize(chunk_size: 1024, chunk_overlap: 64) super(chunk_size: chunk_size, chunk_overlap: chunk_overlap) end def splitted(text) # Use a regex to split text based on the specified sentence-ending characters followed by whitespace text.scan(/[^.!?]+[.!?]+(?:\s+)/).map(&:strip) end end end
Version data entries
1 entries across 1 versions & 1 rubygems
Version | Path |
---|---|
baran-0.2.0 | lib/baran/sentence_text_splitter.rb |