Sha256: 0ef596cabbad528492e3671d7f1e429c6052db11b24b6593bffd5988c5ee5f0e

Contents?: true

Size: 420 Bytes

Versions: 1

Compression:

Stored size: 420 Bytes

Contents

# frozen_string_literal: true

module Baran
  class SentenceTextSplitter < TextSplitter
    def initialize(chunk_size: 1024, chunk_overlap: 64)
      super(chunk_size: chunk_size, chunk_overlap: chunk_overlap)
    end

    def splitted(text)
      # Use a regex to split text based on the specified sentence-ending characters followed by whitespace
      text.scan(/[^.!?]+[.!?]+(?:\s+)/).map(&:strip)
    end
  end
end

Version data entries

1 entries across 1 versions & 1 rubygems

Version Path
baran-0.2.0 lib/baran/sentence_text_splitter.rb