Sha256: ea4cb9673426e08c0acc438b35d2ffb7d547bb117ff697cca334962f4370d731

Contents?: true

Size: 453 Bytes

Versions: 6

Compression:

Stored size: 453 Bytes

Contents

require_relative './text_splitter'

module Baran
  class CharacterTextSplitter < TextSplitter
    attr_accessor :separator

    def initialize(chunk_size: 1024, chunk_overlap: 64, separator: nil)
      super(chunk_size: chunk_size, chunk_overlap: chunk_overlap)
      @separator = separator || "\n\n"
    end

    def splitted(text)
      splits = separator.empty? ? text.chars : text.split(separator)
      merged(splits, @separator)
    end
  end
end

Version data entries

6 entries across 6 versions & 1 rubygems

Version Path
baran-0.2.1 lib/baran/character_text_splitter.rb
baran-0.2.0 lib/baran/character_text_splitter.rb
baran-0.1.12 lib/baran/character_text_splitter.rb
baran-0.1.11 lib/baran/character_text_splitter.rb
baran-0.1.10 lib/baran/character_text_splitter.rb
baran-0.1.9 lib/baran/character_text_splitter.rb