---
title: Manual
---
# Introduction

This manual describes all of webgen in as much detail as possible. If you still find something
missing, don't hesistate to write a mail to the mailing list!

The manual is structured in such a way that it can be read from top to bottom while creating a
webgen website. This means that first the information on how to use the webgen CLI comes, then how a
webgen website is structured and how content can be added and so on.


# The `webgen` command

The executable for webgen is called... webgen ;-) It uses a command style syntax (like Subversion's
`svn` or Rubygem's `gem` commands) through the [cmdparse] library. To get an overview of the
possible commands run `webgen help`.

The main command is the `render` command which does the actual website generation. This command uses
the current working directory as website directory if none was specified via the gloabl `-d` option.

You can invoke a command by specifying its name after the executable name. Also counting the
executable `webgen` as a command, the options for a command are specified directly after the command
name and before the next command or any arguments. For example, all the following command lines are
valid:

    $ webgen
    $ webgen render
    $ webgen -d doc render
    $ webgen -v create -t project new_site
    $ webgen help create

Following is a short overview of the available commands:

*   `create [-t TEMPLATE] [-s STYLE] SITE_DIR`

    Creates a basic webgen website in `SITE_DIR` using the optional template and styles. All
    available templates and styles are listed in the help output for the command.

*   `help`

    Displays usage information. Can be used to show information about a command by using the command
    name as argument, eg. `webgen help create`.

*   `render`

    Renders the given webgen website.

*   `version`

    Displays the version of webgen.

*   `webgui`

    Starts the webgen webgui, a browser based graphical user interface for managing webgen
    websites. First the webgui web application is started and then the webgui is opened in the
    default browser.

[cmdparse]: http://cmdparse.rubyforge.org


# A webgen Website {#website}

webgen needs a special directory structure so that it works out of the box. This directory structure
is automatically created by the `webgen create` command. You can alternatively use the webgui to
create the website directory.

The root directory of webgen website is called the website directory. You can have the following
files and directories under this directory:

* `src`: The source directory in which all the source files for the website are. If this directory
  should not be called `src` or you want to include additional source directories, you need to
  change the `sources` configuration option.

* `out`: This directory is created, if it does not exist, when webgen generates the HTML files. All
  the output files are put into this directory. The name of this directory can be changed by setting
  the `output` configuration option.

* `ext`: The extension directory (optional). You can put self-written extensions into this directory
  so that they are used by webgen. See the [extension directory]({relocatable: '#website-ext'})
  section for more information.

* `config.yaml`: This file can be used to set configuration options for the website. See the
  [Configuration File]({relocatable: '#website-configfile'}) section for more information.

* `Rakefile`: This file is provided for your convenience to execute tasks via `rake` and provides
  some useful tasks out of the box. See the [Rakefile]({relocatable: '#website-rakefile'}) section
  for more information.

## Extension Directory {#website-ext}

The extension directory is used for modifying core behaviour of webgen or adding extensions. It is
called `ext` and has to reside directly under the website directory. All files called `init.rb` in
this directory or any of its subdirectories are loaded when webgen renders the website. So you need
to make sure to either place all extensions in `init.rb` or load them from this file. The latter
approach is better since you can use the lazy loading feature that webgen uses internally,
ie. extensions are loaded only when they are needed.

## Configuration File {#website-configfile}

Many user will want to change some configuration options, for example, the default language of the
website since not all people will want to write websites in English. This is primarily done via the
configuration file.

The configuration file is called `config.yaml` and has to be placed directly under the website
directory. It uses [YAML](http://www.yaml.org) as file format. You can find a list of all available
configuration options that can be set in the [Configuration Options Reference]({relocatable:
reference_configuration.html}).

Each configuration option can be set in the configuration file by specifing the configuration option
name and the new value as a key/value pair. A sample configuration file looks like this:

    website.lang: de
    website.link_to_current_page: true

Since some configuration options have a complex structure, there exist several special configuration
keys to help setting these configuration options:

*   `default_meta_info`: Set the default meta information for one or more source handlers.

    This key needs needs a Hash as value which should be of the following form:

        SOURCE_HANDLER_NAME:
          mi_key: mi_value
          other_key: other_value
          :action: replace

    So each entry has to specify the name of a source handler (or the special value `:all` which
    sets the default meta information for all source handlers) and the meta information items that
    should be set or modified. If you don't want to update the meta information hash but want to
    replace it, you need to add `:action: replace` as meta information entry.

    If a source handler called `Webgen::SourceHandler::SOURCE_HANDLER_NAME` exists, the meta
    information is set for this source handler instead.

    For example, the following snippet of a configuration file sets the meta information item
    `in_menu` to `true` for `Webgen::SourceHandler::Page`:

        default_meta_info:
          Page:
            in_menu: true

*   `default_processing_pipeline`: Set the default processing pipeline for one or more source
    handlers.

    This key needs a Hash as value which should be of the following form:

        SOURCE_HANDLER_NAME: PIPELINE

    Each key-value pair specifies the name of a source handler and the new default processing
    pipeline. The value for the processing pipeline has to consist of the short names of content
    processors separated by commas and normally includes `erb`, `tags`, `blocks` and the name of a
    content processor for a markup language. The short name(s) of a content processor is (are)
    stated on its documentation page. Be aware that the content processors are executed in the order
    in which they are specified!

    If a source handler called `Webgen::SourceHandler::SOURCE_HANDLER_NAME` exists, the meta
    information is set for this source handler instead.

    For example, the following snippet of a configuration file sets the default processing pipeline
    for 'Webgen::SourceHandler::Page':

        default_processing_pipeline:
          Page: erb,tags,textile,blocks

*   `patterns`: Set the used path patterns for one or more source handlers.

    This key needs a Hash value and provides two different ways of setting the path patterns:

        SOURCE_HANDLER_NAME: [**/*.jpg, **/*.css]

        SOURCE_HANDLER_NAME:
          add: [**/*.jpg]
          del: [**/*.js]

    The first form replaces the path patterns for the source handler with the given patterns. The
    second form allows to add or delete individual patterns.

    If a source handler called `Webgen::SourceHandler::SOURCE_HANDLER_NAME` exists, the meta
    information is set for this source handler instead.

    For example, the following snippet of a configuration file adds the pattern `**/*.pdf` to the
    patterns of `Webgen::SourceHandler::Copy`:

        patterns:
          Copy:
            add: [**/*.pdf]

## Rakefile {#website-rakefile}

The Rakefile that is automatically created upon website creation provides a place to specify
recurring task for your website, for example, for deploying the website to a server. It contains
some useful tasks out of the box:

* `webgen`: Renders the webgen website once.
* `auto_webgen`: Automatically renders the website when a source file has changed. This is very
  useful when developing a website because you don't need to change back and forth between your
  website code and the command line to rebuild the site.
* `clobber_webgen`: Removes all webgen generated products (the output files and the cache file).


# All About Paths and Sources   {#source}

A source provides paths that identity files or directories. webgen can use paths from many sources.
The most commonly used source is the file system source which provides paths and information on them
from the file system.


## Path Types {#source-types}

webgen can handle many different types of files through the different source handler classes.

The most important files are the page and template files as they are used to define the content and
the layout of a website. Have a look at the [Webgen Page Format documentation]({relocatable:
webgen_page_format.html}) to see how these files look like and how they are structured. After that
have a look at the documentation of the source handler class SourceHandler::Page and
SourceHandler::Template as they are responsible for handling these page and template files!

You can naturally use any other type of file. However, be aware that some files may not be processed
by webgen when no source handler class for them exists. For example, there is currently no source
handler class for `.svg` files, so those files would be ignored. If you just want to have files
copied from a source to the output directory (like images or CSS files), the SourceHandler::Copy
class is what you need! Look through the documentation of the [availabe source handler
classes]({relocatable: extensions.html}) to get a feeling of what files are handled by webgen.


## Source Paths Naming Convention {#source-naming}

webgen assumes that the paths provided by the sources follow a special naming convention sothat meta
information can be extracted correctly from the path name:

    [sort_info.]basename[.lang].extension

*   `sort_info`

    This part is optional and has to consist of the digits 0 to 9. Its value is used for the meta
    information `sort_info`. If not specified, it defaults to the value zero.

*   `basename`

    This part is used on the one hand to generate the `title` meta information (but with `_` and `-`
    replaced by spaces). And on the other hand, the canonical name is derived from it. `basename`
    must not contain any dots, spaces or any character from the following list: ``; / ? * : ` & = +
    $ ,``. If you do use one of them webgen may not work correctly!

    > If two paths have the same `basename` and `extension` part, they should define the same
    > content but for different languages. This allows webgen to automatically deliver the right
    > language version of the content
    {.information}

*   `lang`

    This part is optional and has to be an [ISO-639-1/2](http://www.loc.gov/standards/iso639-2/)
    language identifier (two or three characters (a-z) long). If not specified, it is assumed that
    the path is language independent (for example, images are normally not specific for a specific
    language). However, this behaviour may be different for some source handler classes (for
    example, all paths handled by SourceHandler::Page are assigned the default language if none is
    set).

    If the language identifier can't be matched to a valid language, it is assumed that this part
    isn't actually a language identifier but a part of the extension. This also means that in the
    special case where the first part of an extension is also a valid language identifier, the first
    part is interpreted as language identifier and not as part of the extension.

*   `extension`

    The file extension can be anything and can include dots.

Following are some examples of source path names:

|Path name                 | Parsed meta information
|--------------------------|------------------------------------------------
|`name.png`                | title: Name, language: none, sort\_info: 0, basename: name, cn: name.png
|`name.de.png`             | title: Name, language: de, sort\_info: 0, basename: name, cn: name.png
|`01.name_of-file.eo.page` | title: Name of file, language: eo, sort\_info: 1, basename: name_of-file, cn: name_of-file.page
|`name.tar.bz2`            | title: Name, language: none, sort\_info: 0, basename: name, cn: name.tar.bz2
|`name.de.tar.bz2`         | title: Name, language: de, sort\_info: 0, basename: name, cn: name.tar.bz2

Notice: The first two and the last two examples define the same content for two different languages
(or more exactly: the first one is unlocalized and the second one localized to German) as they have
the same canonical name.


## Canonical Name of a File ### {#source-cn}

webgen provides the functionality to define the same content in more than one language, ie. to
localize content. This is achieved with the _canonical name_ of a path.

When multiple paths share the same canonical name, webgen assumes that they have the same content
but in different languages. It is also possible to specify a _language independent_ path which is
used as a fallback. Therefore when a path should be resolved using a canonical name and a given
language, it is first tried to get the path in the requested language. If this is not possible
(ie. no such localization exists), the unlocalized path is returned if it exists.

> Directories and fragments are never localized, only files are!
{.exclamation}

It is also possible to use the _localized canonical name_ of a path to resolve it. The localized
canonical name is the same as the canonical name but with a language code inserted before the
extension. If the localized canonical name is used to resolve a path, a possibly additionally
specified language is ignored as it is assumed that the user really only wants the path in the
specified language!

This also means that all paths are not resolved using their real source or output names but using
the (localized) canonical name! This is different from previous webgen versions!


## Output Path Name Construction ### {#source-output}

There is currently only one method, called `standard`, for creating output path names which is
described here. However, webgen allows developers to create other output path name creation
methods. More information on this can be found in the [API documentation]({relocatable: api.html}).

The output path for a given source path is constructed using the meta information
`output_path_style`. This meta information is set to a default value and can be overwritten by
setting it for a specific source handler or for a path individually. The value for this meta
information key is an array which can have the following values:

* strings (for inserting arbitrary text into output names)
* arrays (for grouping values - only interesting for the language part)
* symbols for inserting special values:
  * `:cnbase`: The basename of the path.
  * `:parent`: The parent path.
  * `:lang`: The language.
  * `:ext`: The file extension including the leading dot.
  * `:year`, `:month`, `:day`: The creation year, month and day, respectively, of the source path.
    An error is raised if the needed meta information `created_at` is not set on the path.

> The contructed output path must always be an absolute one, ie. it has to start at the root of the
> output directory. Therefore, the `:parent` part should always be included!
{.information}

Following are some examples of output path names for given source path names (assuming that `en` is
the default language and that the path is under a directory called `/img/`):

*   `output_path_style=[:parent, :cnbase, [., :lang], :ext]` (the default)

    *   `index.jpg` --> `/img/index.jpg`

        Since the source path is unlocalized, no language part is used and the whole sub array with
        the `:lang` symbol is dropped.

    *   `index.en.jpg` --> `/img/index.jpg`

        This happens if the configuration option `sourcehandler.default_lang_in_output_path` is
        `false` and no unlocalized version of this path exists.

    *   `index.en.jpg` --> `/img/index.en.jpg`

        Similar to the last example but this result occurs when there is an unlocalized version of
        the path which is naturally named `index.jpg`!

    *   `index.de.jpg` --> `/img/index.de.jpg`

        Since `de` is not the default language, the language part is always used!

*   `output_path_style=[:parent, :cnbase, :ext, ., :lang]`

    *   `index.jpg` --> `/img/index.jpg.`

        Be aware of the trailing dot since the `:lang` value is not defined in a sub array!

*   `output_path_style=[:parent, :year, /, :month, /, :cnbase, [., :lang], :ext]`

    *   `index.jpg` --> `/img/2008/09/index.jpg`

        This output path style can be used to create blog archive style output names.


## Path Patterns {#source-pathpattern}

Each source handler specifies path patterns which are used to locate the files that the class can
handle. Normally these patterns are used to match file extensions, however, they are much more
powerful. For detailed information on the structure of path patterns have a look at the
[Dir.glob](http://ruby-doc.org/core/classes/Dir.html#M002375) API documentation.

The path patterns that are handled by a particular source handler are stated on its documentation
page. These patterns can be changed by modfying the configuration option `sourcehandler.patterns`
although that is not recommended except in some few cases (for example, it is useful to add some
patterns for SourceHandler::Copy). The information about how these path patterns work are useful for
the usage of webgen because of two reasons:

* so that you know which files will be processed by a specific source handler class

* so that you can specify additional path patterns for some source handlers like the
  SourceHandler::Copy

Here are some example path patterns:

<table class="examples" markdown='1'>
<tr><th>Path Pattern</th><th>Result</th></tr>
<tr>
  <td>`*/*.html`</td>
  <td>All files with the extension `html` in the subdirectories of the source directory</td>
</tr>
<tr>
  <td>`**/*.html`</td>
  <td>All files with the extension `html` in all directories</td>
</tr>
<tr>
  <td>`**/{foo,bar}*`</td>
  <td>All files in all directories which start with `foo` or `bar`</td>
</tr>
<tr>
  <td>`**/???`</td>
  <td>All files in all directories whose file name is exactly three characters long</td>
</tr>
</table>


## Handling of files in the source directory {#source-handling}

Following is the list of rules how source files are handled by webgen:

* All path names of all sources specified in the configuration option `sources` are fetched. Prior
  listed sources have priority over later listed sources if both specify the same path.

* Those paths which match a pattern of the configuration option `sourcehandler.ignore` are excluded.

* The source handler classes are invoked according to the invocation order specified in
  `sourcehandler.invoke` and they use only those paths that match one of their path patterns
  specified in `sourcehandler.patterns`.

As you might have deduced from the processing list above, it is possible that one path is handled by
multiple source handlers. This can be used, for example, to render an XML file as HTML and copy it
verbatim.

Internally a tree structure is created reflecting the source directory hierarchy and each path that
is handled by webgen. Normally a source handler creates one node from one path but it is also
possible that a source handler creates multiple nodes from one path.

> The name used for describing a directory or file once it is placed in the internal tree structure
> is 'node'.
{.information}

After this internal tree structure is created, it is traversed and each node is processed. First the
node is checked if has changed (the notion of 'changed' depends on many things but a node is
changed, for example, if its meta information or the associated source path has changed since the
last webgen run). If it has not changed, nothing needs to be written. Otherwise, the information
needed to write out the node is gathered and its content is written to the output path represented
by the node.


## Path Meta Information {#source-metainfo}

Each path can have meta information, i.e. information about the path itself, associated with it, for
example the title of the path, if it should appear in a menu and so on. This meta information can be
specified in several ways, including:

* Source handlers can provide default meta information for their handled paths (set using the
  configuration option `sourcehandler.default_meta_info`)

* Some file types allow meta information to be specified directly in the file, for example page
  files in [Webgen Page Format]({relocatable: webgen_page_format.html}).

* Meta information can also be specified in special meta information backing files. These files are
  handled by the source handler SourceHandler::Metainfo.

There is clearly defined order in which meta information is applied to a node for a path:

1. The path gets read by a source and some meta information is extracted from the path name.

2. This meta information is overwritten with meta information specified for all source handlers and
   then with meta information specified for the source handler that handles the path.

3. Extensions can now overwrite meta information before the path gets handled by a source
   handler. For example, the SourceHandler::Metainfo assigns meta information specified for paths at
   this stage.

4. Before the node for the path gets created, meta information from the content of the path itself
   may overwrite the current meta information (this is the case, for example, with files in Webgen
   Page Format).

5. The node gets created with the provided meta information.

6. After the node has been created, extensions may overwrite meta information again. For example,
   the SourceHandler::Metainfo assigns meta information specified for nodes at this stage.


# webgen Extensions

webgen comes with many extensions that allow for rapid creation of static websites. The variety of
the extensions allows you to use your tools of choice, for example, your preferred markup
language. And if your choice is not available, you can write the extension for it yourself or make a
feature request!

Extensions are only loaded when they are needed. For example, if you don't use `.feed` files for
automatically generating atom/RSS feeds for your website, the source handler that handles these
files will never be loaded. Therefore you don't need to explicitly enable extensions, they just sit
in the background and wait till the webgen core needs them.

There are several types of extensions:

*   **Content Processors**: These extensions process content, for example, the content of files in
    Webgen Page Format. It is not specified how they have to process the content but this type of
    extension can basically be divided into two groups:

    * Markup processors: Processors like Maruku or RedCloth belong to this group and they convert
      markup text that is easy to read and write to a more structure format like HTML. This allows
      you to write an HTML page without knowing HTML.

    * String replacers: These processors normally process special strings and substitute them with
      other content. For example, the `erb` processors replaces delimited strings interpreted as
      Ruby code with the interpreted value. Another example would be webgen's `tags` processor which
      replaces strings like `\{relocatable: ../index.html}` with a processed value.

*   **Source Handler**: These extensions are used for handling source paths. They read the content of
    a path and produce one or more nodes (the internal representation of an output path, see [source
    handling]({relocatable: '#source-handling'}) for more information on nodes). So a source handler
    can do something simple like just copying a source path to the output directory but they can
    also do complex things like creating a whole image gallery from just one source path.

*   **Tags**: This type of extension allows the easy inclusion of dynamic content in, for example,
    page and template files. Each tag extension is used for a specific task like creating a menu or
    a breadcrumb trail.


# Extending webgen

If you know the programming language Ruby a little bit, you can easily extend webgen and add new
features that you need. All the needed developer documentation is available in the [API
documentation](rdoc/index.html), along with sample implementations for all major types of extensions
(source class, output classes, source handler class, content processors classes, tag classes, ...)
and detailed information about the inner workings of webgen. Have a look at the [extension
directory]({relocatable: '#website-ext'}) section for more information on naming files and where to
put the extensions.