[](https://app.circleci.com/pipelines/github/DMazzei/log-analyser) [](https://coveralls.io/github/DMazzei/log-analyser?branch=master) [](https://badge.fury.io/rb/log-analyser)   # Log-Analyser ## About Simple ruby library to read and parse web-server's log files and aggregate pageview data. ### TL;DR <details> <summary>check minimal instructions</summary> Install *log-analyser* gem. After instantiating *log-analyser's* `PageviewsLogAggregator` class with the path to the logfile: </br>- the method `all` will return the pageview count </br>- whilst method `unique` will return the unique pageview count. </details> ### Table of Contents <details> <summary>click to expand the index</summary> - [Installation](#installation) * [Gem](#gem) * [Project](#project) - [Usage](#usage) - [Logs and Pageviews](#logs-and-pageviews) * [Definitions](#definitions) * [Log Formatting](#log-formatting) - [Development](#development) - [Contributing](#contributing) - [Next Steps](#next-steps) - [License](#license) </details> ## Installation ### Gem To use *log-analyser* in your application, add this line to your Gemfile: ```ruby gem 'log-analyser' ``` Or install it yourself as: $ gem install log-analyser #### Gem Usage ```ruby #!/usr/bin/env ruby require 'pageviews_log_aggregator' file_path = '/Users/dmazzei/projects/personal/ruby/sp_test/log-analyser/resources/webserver.log' log_aggregator = LogAnalyser::PageviewsLogAggregator.new(file_path) puts "\nAll pageviews" log_aggregator.all.each do |key, value| puts "#{key&.to_s&.ljust(28, '.')} | #{value}" end puts "\nUnique pageviews" log_aggregator.unique.each do |key, value| puts "#{key&.to_s&.ljust(28, '.')} | #{value}" end ```  ### Project Install the Ruby version specified in `.ruby-version` </br> Clone the project and install Bundler ``` git clone git@github.com:DMazzei/log-analyser.git cd log-analyser gem install bundler ``` #### Setup: Run the initial setup $ bin/setup > If you need to reinstall dependencies or something alike: > ``` > $ bundle install > ``` #### Usage Call `./bin/parse_pageview_file.rb` passing a logfile path as argument, it will return the pageview count ordered from most to less viewed.</br> Check `--help` for more options  An example log can be found in :file_folder:`resources` folder: $ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' |--------------------------------------------------| | All pageviews | |--------------------------------------------------| | /about/2.................... | 90 | | /contact.................... | 89 | | /index...................... | 82 | | /about...................... | 81 | | /help_page/1................ | 80 | | /home....................... | 78 | |--------------------------------------------------| The `-u` or `--unique` option will also display the unique pageview count: $ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -u And any specific page can be filtered with `-p` or `--page`: $ ./bin/parse_pageview_file.rb --file 'resources/webserver.log' -p '/index' |--------------------------------------------------| | View count for page: /index | |--------------------------------------------------| | All pageviews | |--------------------------------------------------| | /index...................... | 82 | |--------------------------------------------------| ## Logs and Pageviews ### Definitions > :page_facing_up: A pageview is defined as a view of a page on your site that is being tracked by the Analytics tracking code. If a user clicks reload after reaching the page, this is counted as an additional pageview. If a user navigates to a different page and then returns to the original page, a second pageview is recorded as well. > :page_with_curl: A unique pageview, as seen in the Content Overview report, aggregates pageviews that are generated by the same user during the same session. A unique pageview represents the number of sessions during which that page was viewed one or more times. ### Log Formatting The library is prepared to parser text files, containing one entry per line, in the format: `\page_name identifier`. A space must separate the page name (first column) from the user identifier (e.g. IP address): ``` /help_page/1 126.318.035.038 /contact 184.123.665.067 /home 184.123.665.067 ``` ## Development #### Start with the project: ``` $ git clone git@github.com:DMazzei/log-analyser.git $ cd log-analyser $ gem install bundler $ bundle install ``` And the world is your oyster... You can also run `$ bundle exec console` for an interactive prompt that will allow you to experiment. To install this gem onto your local machine, run `$ bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `$ bundle exec rake release`, which will create a git tag for the version, push git commits and tags, and push the `.gem` file to [rubygems.org](https://rubygems.org). #### Linter (rubocop) _*Rubocop*_ is used as code analyser and maintain code formatting (as well as some best practices). Use `$ bundle exec rake rubocop` to run the checks. #### Test coverage [](https://coveralls.io/github/DMazzei/log-analyser?branch=master) Use `$ bundle exec rspec` or `$ bundle exec rake spec:all` to run all the tests. :white_check_mark: To run only unit-tests $ bundle exec rake spec:unit :white_check_mark: To run only integration tests $ bundle exec rake spec:integration The test coverage is handled by `rspec`, `simplecov` and `coveralls`. Status and coverage history can be checked [here](https://coveralls.io/github/DMazzei/log-analyser). #### Deployment Following the creation of a _*Pull Request*_ a CI workflow is triggered in CircleCI, that can be checked [here](https://app.circleci.com/pipelines/github/DMazzei/log-analyser).</br> This workflow consist in _building_ the library; Running _rubocop_ and _rspec_ to validate integrity and code quality; And lastly generating and pushing a _feature-gem_ that can be used for development and tests. After passing all checks and requirements on github, a *PR* can be merged as soon as it is reviewed and approved. The _*master branch*_ merge process will trigger the deployment process on CircleCI, and this workflow ends with the generation of a _*tagged-gem*_. The whole deployment process will finish by building and tagging a new gem version and pushing it to [rubygems.org](https://rubygems.org/gems/log-analyser). > :warning: To merge changes into _*master*_, the version must be bumped up, otherwise the deployment will fail!</br> > The version must be updated in `version.rb`. ## Contributing Bug reports and pull requests are welcome on GitHub at https://github.com/DMazzeig/log-analyser. ## Next Steps - One conundrum faced that can be reviewed, deciding between: * reading the file whilst aggregation data, preserving memory - e.g. using `Set`; * loading data into memory and leaving aggregation and count to be dealt later, gaining flexibility and performance; - Extend the accepted logfile format; - Add more options for sorting and filtering; - Automate library version bump up; ## License The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).