Sha256: 0b9b112bc00b23b884af354afe323fe9ff3b60f935b9dbd48cc51d885b172524

Contents?: true

Size: 1.71 KB

Versions: 37

Compression:

Stored size: 1.71 KB

Contents

# Hot to use ChupaText as Ruby library

You can use ChupaText as Ruby library. If you want to extract text
data from many input data, `chupa-text` command may be
inefficient. You need to execute `chupa-text` command to process one
input file. You need to execute `chupa-text` command N times to
process N input files. It means that you need to initializes ChupaText
N times. It may be inefficient.

You can reduce initializations of ChupaText by using ChupaText as Ruby
library.

Here is a simple usage:

```
require "chupa-text"
gem "chupa-text-decomposer-html"

ChupaText::Decomposers.load

extractor = ChupaText::Extractor.new
extractor.apply_configuration(ChupaText::Configuration.default)

extractor.extract("http://ranguba.org/") do |text_data|
  puts(text_data.body)
end
extractor.extract("http://ranguba.org/ja/") do |text_data|
  puts(text_data.body)
end
```

It is better that you use Bundler to manager decomposer plugins:

```
# Gemfile
source "https://rubygems.org"

gem "chupa-text-decomposer-html"
gem "chupa-text-decomposer-XXX"
# ...
```

Here is a usage that uses the Gemfile:

```
require "bundler/setup"

ChupaText::Decomposers.load

extractor = ChupaText::Extractor.new
extractor.apply_configuration(ChupaText::Configuration.default)

extractor.extract("http://ranguba.org/") do |text_data|
  puts(text_data.body)
end
extractor.extract("http://ranguba.org/ja/") do |text_data|
  puts(text_data.body)
end
```

Use {ChupaText::Data#[]} to get meta-data from extracted text
data. For example, you can get title from input HTML:

```
extractor.extract("http://ranguba.org/") do |text_data|
  puts(text_data["title"])
end
```

It is depended on decomposer that what meta-data can be got. See
decomposer's documentation to know about it.

Version data entries

37 entries across 37 versions & 1 rubygems

Version Path
chupa-text-1.3.6 doc/text/library.md
chupa-text-1.3.5 doc/text/library.md
chupa-text-1.3.4 doc/text/library.md
chupa-text-1.3.3 doc/text/library.md
chupa-text-1.3.2 doc/text/library.md
chupa-text-1.3.1 doc/text/library.md
chupa-text-1.3.0 doc/text/library.md
chupa-text-1.2.9 doc/text/library.md
chupa-text-1.2.8 doc/text/library.md
chupa-text-1.2.7 doc/text/library.md
chupa-text-1.2.6 doc/text/library.md
chupa-text-1.2.5 doc/text/library.md
chupa-text-1.2.4 doc/text/library.md
chupa-text-1.2.3 doc/text/library.md
chupa-text-1.2.2 doc/text/library.md
chupa-text-1.2.1 doc/text/library.md
chupa-text-1.2.0 doc/text/library.md
chupa-text-1.1.9 doc/text/library.md
chupa-text-1.1.8 doc/text/library.md
chupa-text-1.1.7 doc/text/library.md