Sha256: 8cf755bd419d88f92a9b84f5ded01a20a3fd992b7a8cd5531fa889fb489a34e6

Contents?: true

Size: 529 Bytes

Versions: 2

Compression:

Stored size: 529 Bytes

Contents

require_relative "url_trimmer/version"
require "domain_name"

module URLTrimmer
  class Worker
    URL_REGEXP = %r(\Ahttps?://([^/]+))

    def self.uniq_by_domain(urls)
      urls.map! do |url|
        begin
          url.downcase
        rescue ArgumentError
          url.encode("UTF-8", invalid: :replace, undef: :replace, replace: "").downcase
        end
      end
      urls.select! { |url| url =~ URL_REGEXP }
      urls.uniq! { |url| DomainName(url[URL_REGEXP, 1]).domain }
      urls.sort!
      urls
    end
  end
end

Version data entries

2 entries across 2 versions & 1 rubygems

Version Path
url_trimmer-0.1.0 lib/url_trimmer.rb
url_trimmer-0.0.2 lib/url_trimmer.rb