Chapter 5. Summitting Translation Branches

Computer programs (though not only them) are sometimes concurrently developed and released from several branches. For example, there may be one "stable" branch, which sees only small fixes and from which periodical releases are made, and another, "development" branch, which undergoes larger changes and may or may not be periodically released as well; at one point, the development branch will become the new stable branch, and the old stable branch will be abandoned. There may also be more than two branches which see active work, such as "development", "stable", and "old stable".

From programmers' point of view, working by branches can be very convenient. They can freely experiment with new features in the development branch, without having to wory that they will mess something up in the stable branch, from which periodical releases are made. In the stable branch they may fix some bugs discovered between the releases, or carry over some important and well-tested features from the development branch. For users who want to be on the cutting edge, they may provide experimental releases from the development branch.

For translators, however, having to deal with different branches of the same collection of PO files is rarely a convenience. It is text to be translated just as any, only duplicated across two or more file hierarchies. This means that translators additionaly have to think about how to make sure that new and modified translations made in one branch appear in other branches too. It gets particularly ugly if there are mismatches in PO file collections in different branches, like when a PO file is renamed, split into two or more PO files, or merged into another PO file.[14] Sometimes this branch juggling is not necessary; in strict two-branch setting, translators may choose to work only on the stable branch, and switch to the next stable branch when it gets created (or switch to the development branch shortly before it becomes stable). Even so, branch switching may not go very smooth in presence of mismatches in PO file collections.

Instead, for translators the most convenient would be to work on a single, "supercollection" of PO files, from which new and modified translations would be automatically periodically sent to appropriate PO files in branches. Such a supercollection can be created and maintained by Pology's posummit script. In terms of this script, the supercollection is called the summit, the operation of creating and updating it is called gathering, and the operation of filling out branch PO files is called scattering.

How do summit PO files look like? When all branches contain the same PO file, then the counterpart summit PO file is simply the union of all messages from branch PO files. A message in the summit PO file differs from branch messages only by having the special #. +> ... comment, which lists the branches that contain this message. If there would be two branches, named with devel and stable keywords, an excerpt from a summit PO file could be:

#. +> devel
#: kdeui/jobs/kwidgetjobtracker.cpp:469
msgctxt "the destination URL of a job"
msgid "Destination:"
msgstr ""#. +> stable
#: kdeui/jobs/kwidgetjobtracker.cpp:469
msgid "Destination:"
msgstr ""#. +> devel stable
#: kdeui/jobs/kwidgetjobtracker.cpp:517
msgid "Keep this window open after transfer is complete"
msgstr ""

The first message above exists only in the development branch, the second only in the stable branch, and the third in both branches. The source reference always refers to the source file in the first listed branch. Any other extracted comments (#.) are also taken from the first listed branch.

Note that the first two messages are different only by context. The context was added in development branch, but not in stable, probably in order not to break the message freeze. However, due to special ordering of messages in summit PO files, these two messages appear together, allowing the translator to immediately make the correction in stable branch too if the new context in development branch shows it to be necessary.

When a PO file from one branch has a different name in another branch, or several PO files from one branch are represented with a single PO file in another branch, the summit can still handle it gracefully, by manually mapping branch PO files to summit PO files. One branch PO file can be mapped to one or more summit PO files, and several branch PO files can be mapped to one summit PO file. Usually, but not necessarily, one branch (e.g. the development branch) is taken as reference for the summit file organization, and stray PO files from other branches are mapped accordingly.

If a team of translators works in the summit, it is sufficient that one team member (and possibly another one as backup) manages the summit. After the initial setup, this team member should periodically run posummit to update summit and branch PO files. All other team members can simply translate the summit PO files, oblivious of any summit operations behind the scenes. It is also possible that team members perform summit operations on their own, on a subset of PO files that they are about to work on. It is up to the team to agree upon the most convenient workflow.

5.1. Setting Up The Summit with posummit

There are two major parts in setting up the summit: linking locations and organization of PO files in the branches to that of the summit, and deciding what summit mode will be used.

Great flexibility is possible in linking branches to the summit, but at the expense of possibly heavy configuring. To make it simpler, currently there are two types of branch organization which can be handled automatically, just by specifying a few paths and options. In the by-language branch organization, PO files in branches are grouped by language and their file names reflect their domain names:

devel/                  # development branch
    aa/                 # language A
        alpha.po
        bravo.po
        charlie.po
        ...
    bb/                 # language B
        alpha.po
        bravo.po
        charlie.po
        ...
    ...
    templates/          # templates
        alpha.pot
        bravo.pot
        charlie.pot
        ...
stable/                 # stable branch
    aa/
        ...
    bb/
        ...
    templates/
        ...
...

The other organization that can be automatically handled is by-domain:

devel/                  # development branch
    alpha/              # domain alpha
        aa.po           # language A
        bb.po           # language B
        ...
        alpha.pot       # template
    bravo/
        aa.po
        bb.po
        ...
        bravo.pot
    charlie/
        aa.po
        bb.po
        ...
        charlie.pot
    ...
stable/                 # stable branch
    alpha/
        ...
    bravo/
        ...
    charlie/
        ...
...

In both organizations, there can be any number of subdirectories in the middle, between the branch top directory and directory where PO files are. For example, in by-language organization there could be some categorization:

path/to/devel/
    aa/
        utilities/
            alpha.po
            bravo.po
            ...
        games/
            charlie.po
            ...
    bb/
        ...

while in by-domain categorization the domain directories could be within their respective sources[15]:

devel/
    appfoo/
        src/
        doc/
        po/
            foo/
                aa.po
                bb.po
                ...
                foo.pot
            libfoo/
                aa.po
                bb.po
                ...
                libfoo.pot
        ...
    appbar/
        ...

There are three possible summit modes: direct summit, summit over dynamic templates, and summit over static templates. In the direct summit, only branch PO files are processed, in that new and modifed messages are gathered from them and summit translations scattered to them. In summit over dynamic templates, messages from branch PO files are gathered only once, at creation of the summit; after that, it is branch templates (POT files) that are gathered into summit templates, and then summit PO files are merged with them. Summit templates are not actually seen, but are gathered internally when merging command is issued and removed after merging is done. Summit over static templates is quite the same, except that summit templates are explicitly gathered and kept around, and merging is done separately.

What is the reason for having three summit modes to choose from? Direct summit mode is there because it is the easiest to explain and understand, and does not require that branches contain templates. It is however not recommended, for two reasons. Firstly, someone may mistakenly translate directly in a branch[16], and those translations may be silently gathered into the summit. This is bad for quality control (review, etc.), as it is expected that the summit is the sole source of translations. Secondly, you may want to perform some automatic modifications on translation when scattering, but not to get those modifications back into the summit on gathering, which would happen with direct summit. These issues are avoided by using summit over dynamic templates, though now branches must provide templates. Finally, summmit over static templates makes sense when several language teams share the summit setup: since gathering is the most complicated operation and sometimes requires manual intervention, it can be done once (by one person) on summit templates, while language teams can then merge and scatter their summits in a fully automatic fashion.

There is one important design decisions which holds for all summit modes: all summit PO files must be unique by domain name (i.e. base file name without extension), even if they are in different subdirectories within the summit top directory. This in turn means that in automatically supported branch organizations (by-domain and by-language) PO domains should be unique as well.[17] This was done for two reasons. Less importantly, it is convenient to be able to identify a summit PO file simply by its domain name rather than the full path (especially in some posummit invocations). More importantly, uniqueness of domain names allows that PO files are located in different subdirectories between different branches. This happens, for example, in large projects in which code moves between modules. If branches do not satisfy this property, i.e. they contain same-name PO domains with totally different content, it is necessary to define a path transformation (see Section 5.1.4, “Transforming Branch Paths”) which will produce unique domain names with respect to the summit.

The following sections describe how to set up each of the modes, in each of the outlined branch organizations. They should be read in turn up to the mode that you want to use, because they build upon each other.

5.1.1. Setting Up Direct Summit

Let us assume that branches are organized by-language, that branch top directories are in the same parent directory, and that you want the summit top directory to be on the level of branch parent directory. That is:

branches/
    devel-aa/
        alpha.po
        bravo.po
        ...
    stable-aa/
        alpha.po
        bravo.po
        ...
summit-aa/
    alpha.po
    bravo.po
    ...
    summit-config

aa is the language code, which can be added for clarity, but is not necessary. It could also be a subdirectory, as in branches/devel/aa and summit/aa. At start you have the branches/ directory ready; now you create the summit-aa/ directory, and within it the summit configuration file summit-config with the following content:

S.lang = "aa"

S.summit = dict(
    topdir=S.relpath("."),
)

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel-aa")),
    dict(id="stable",
         topdir=S.relpath("../branches/stable-aa")),
    # ...and any other branches.
]

S.mappings = [
]

This is all that is necessary to set up a direct summit. The configuration file must be named exactly summit-config, because posummit will look for a file named like that through parent directories and automatically pick it up. As you may have recognized, summit-config is actually a Python source file; posummit will insert the special S object when evaluating summit-config, and it is through this object that summit options are set. S.lang states the language code of the summit. S.summit is a Python dictionary that holds options for the summit PO files (here only its location, through topdir= key), while S.branches is a list of dictionaries, each specifying options per branch (here the branch identifier by id= key and top directory). The S.relpath function is used to make file and directory paths relative to summit-config itself. S.mappings is a list of PO file mappings, for cases of splitting, mergings and renamings between branches. In this example S.mappings is set to empty only to point out its importance, but it does not need to be present if there are no mappings.

If branches are organized by-domain, the summit tree will still look the same, with PO files named by domain rather than by language:

branches/
    devel/
        alpha/
            aa.po
            bb.po
            ...
        bravo/
            aa.po
            bb.po
            ...
        ...
    stable/
        alpha/
            aa.po
            bb.po
            ...
        bravo/
            aa.po
            bb.po
            ...
        ...
summit-aa/
    alpha.po
    bravo.po
    ...
    summit-config

The only difference in the summit configuration is the addition of by_lang= keys into the branch dictionaries:

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel"),
         by_lang=S.lang),
    dict(id="stable",
         topdir=S.relpath("../branches/stable"),
         by_lang=S.lang),
]

Presence of the by_lang= key signals that the branch is organized by-domain (i.e. PO files named by language), and the value is the language code within the branch. Normaly it is set to previously defined S.lang, but it can also be something else in case different codes are used between the branches or the branches and the summit.

When the configuration file has been written, the summit can be gathered for the first time (i.e. summit PO files created):

$ cd .../summit-aa/
$ posummit gather --create

The path of each created summit PO file will be written out, along with paths of branch PO files from which messages were gathered into the summit file. After the run is finished, the summit is ready for use.

While this was sufficient to set up a summit, there is a miriyad of options available for specialized purposes, which will be presented throughout this chapter. Also, given that summit configuration file is Python code, you can add into it any scripting that you wish. Some summit options (defined through the S object) even take Python functions as values.

5.1.2. Setting Up Summit over Dynamic Templates

Again consider by-language organization of branches, similar to the direct summit example above, except that now template directories too must be present in branches:

branches/
    devel/
        aa/
            alpha.po
            bravo.po
            ...
        templates/
            alpha.pot
            bravo.pot
            ...
    stable/
        aa/
            alpha.po
            bravo.po
            ...
        templates/
            alpha.pot
            bravo.pot
            ...
summit-aa/
    alpha.po
    bravo.po
    ...
    summit-config

Here the language PO files and templates are put in subdirectories within the branch directory only for convenience, but this is not mandatory. For example, language files could reside in branches/devel-aa and templates in branches/devel-templates, no path connection is required between the two. This is because the template path per branch is explicitly given in summit-config, which would look like this:

S.lang = "aa"
S.over_templates = True

S.summit = dict(
    topdir=S.relpath("."),
)

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel/aa"),
         topdir_templates=S.relpath("../branches/devel/templates")),
    dict(id="stable",
         topdir=S.relpath("../branches/stable/aa"),
         topdir_templates=S.relpath("../branches/stable/templates")),
]

S.mappings = [
]

Compared to the configuration of a direct summit, two things are added here. S.over_templates option is set to True to indicate that summit over templates is used. The path to templates is set with topdir_templates= key for each branch.

In by-domain branch organization, the directory tree looks just the same as for direct summit, except that each domain directory also contains the templates:

branches/
    devel/
        alpha/
            aa.po
            bb.po
            ...
            alpha.pot
        bravo/
            aa.po
            bb.po
            ...
            bravo.pot
        ...
    stable/
        alpha/
            aa.po
            bb.po
            ...
            alpha.pot
        bravo/
            aa.po
            bb.po
            ...
            bravo.pot
        ...
summit-aa/
    alpha.po
    bravo.po
    ...
    summit-config

Summit configuration is modified in the same way as it was for the direct summit, by adding the by_lang= key to branch specifications:

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel/aa"),
         topdir_templates=S.relpath("../branches/devel/templates"),
         by_lang=S.lang),
    dict(id="stable",
         topdir=S.relpath("../branches/stable/aa"),
         topdir_templates=S.relpath("../branches/stable/templates"),
         by_lang=S.lang),
]

Initial gathering of the summit is done slightly differently compared to the direct summit:

$ cd .../summit-aa/
$ posummit gather --create --force

The --force option must be used here because, unlike in direct summit, explicit gathering is not regularly done in summit over dynamic templates.

5.1.3. Setting Up Summit over Static Templates

As mentioned earlier, summit over static templates can be used when several language teams want to share the summit setup, for the reasons of greater efficiency. The branch directory tree looks exactly the same as in summit over dynamic templates (with several languages being present), but the summit tree is somewhat different:

branches/
    # as before, either by-language or by-domain
summit/
    summit-config-shared
    aa/
        alpha.po
        bravo.po
        ...
    bb/
        alpha.po
        bravo.po
        ...
    templates/
        alpha.pot
        bravo.pot
        ...

First of all, there is now the summit/ directory which contains subdirectories by language (the language summits) and one subdirectory for summit templates (the template summit). Then, there is no more the summit-config file, but summit-config-shared; the name can actually be anything, so long as it is not exactly summit-config. This is in order to prevent posummit from automatically picking it up, as now the configuration is not tied to a single language summit. Instead, the path to the configuration file and the language code are explicitly given as arguments to posummit.

The configuration file for by-language branches looks like this:

S.over_templates = True

S.summit = dict(
    topdir=S.relpath("%s" % S.lang),
    topdir_templates=S.relpath("templates")),
)

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel/%s" % S.lang),
         topdir_templates=S.relpath("../branches/devel/templates")),
    dict(id="stable",
         topdir=S.relpath("../branches/stable/%s" % S.lang),
         topdir_templates=S.relpath("../branches/stable/templates")),
]

S.mappings = [
]

Compared to summit over dynamic templates, here S.lang is no longer hardcoded in the configuration file, but set at each run of posummit through the command line. This means that paths of language directories too have to be dynamically adapted based on S.lang, hence the string interpolations "...%s..." % S.lang.

For by-domain branches, again simply by_lang= keys are added to branches:

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel/%s" % S.lang),
         topdir_templates=S.relpath("../branches/devel/templates"),
         by_lang=S.lang),
    dict(id="stable",
         topdir=S.relpath("../branches/stable/%s" % S.lang),
         topdir_templates=S.relpath("../branches/stable/templates"),
         by_lang=S.lang),
]

In summit over static templates mode, initital gathering is first done for summit templates, like this:

$ cd .../summit/
$ posummit summit-config-shared templates gather --create

The first two arguments are now the path to the configuration file and the language code, where templates is the dummy language code for templates[18]. After this is finished, language summits can be gathered:

$ posummit summit-config-shared aa gather --create --force
$ posummit summit-config-shared bb gather --create --force
$ ...

Note that --force was not needed when gathering templates, because in this mode the template summit is periodically gathered, while language summits are not.

5.1.4. Transforming Branch Paths

When branches contain only PO files which are used natively, by programs fetching translations at runtime, then all branch PO files will be unique by their domain name (as mandated by the Gettext runtime system). It will not happen that two branch subdirectories contain a PO file with the same name. This fits perfectly with the summit requirement that all summit PO files be unique by domain names.

However, if PO files are used as an intermediate to other formats, branches may contain same-name PO files which have otherwise nothing in common, in different subdirectories. For example, each subdirectory may contain a PO file named index.po, help.po, etc. If this would be left unattended, all the same-name PO files would be collapsed into single summit PO file, which makes no sense given that they have (almost) no common messages. For this reason, it is possible to define transformations which modify absolute branch paths during processing, such that branch PO files are seen with unique names.

Consider the following example of two branches for language aa (i.e. by-language organization) with PO files non-unique by domain name:

branches/
    devel-aa/
        chapter1/
            intro.po
            glossary.po
            ...
        chapter2/
            intro.po
            glossary.po
            ...
        ...
    stable-aa/
        chapter1/
            intro.po
            glossary.po
            ...
        chapter2/
            intro.po
            glossary.po
            ...
        ...

These branches cover some sort of a book, where each chapter has some standard elements, and thus some same-name PO files with totally different content in each chapter's subdirectory. To have unique domain names in the summit, you might decide upon a flat file tree with chapter in prefix:

summit-aa/
    chapter1-intro.po
    chapter1-glossary.po
    ...
    chapter2-intro.po
    chapter2-glossary.po
    ...
    summit-config

To achieve this, you must first write two Python functions (remember that the summit configuration file is a normal Python source file), one to split branch paths and another to join them, and add them to branch specifications in S.branches.

The function to split branch paths takes a single argument, the branch PO file path relative to the branch top directory, and returns the summit PO domain name and the summit subdirectory. For the example above, the splitting function would look like this:

def split_branch_path (subpath):
    import os
    filename = os.path.basename(subpath)      # get PO file name
    domain0 = filename[:filename.rfind(".")]  # strip .po extension
    subdir0 = os.path.dirname(subpath)        # get branch subdirectory
    domain = subdir0 + "-" + domain0          # set final domain name
    subdir = ""                               # set summit subdirectory
    return domain, subdir

Note that the branch subdirectory was used only to construct the summit domain name, while the summit subdirectory is an empty string because summit flat file tree should be flat.

The function to join branch paths takes three arguments. The first two are the summit PO domain name and the summit subdirectory. The third argument is the the value of by_lang= key for the given branch. The return value is the branch PO file path relative to the branch top directory. It would look like this:

def join_branch_path (domain, subdir, bylang):
    import os
    subdir0, domain0 = domain.split("-", 1)    # get branch domain name
                                               # and branch subdirectory
                                               # from summit domain name
    filename = domain0 + ".po"                 # branch PO file name
    subpath = os.path.join(subdir0, filename)  # branch relative path
    return subpath

Here the subdir argument (summit subdirectory) is not used is not used because it is always empty due to flat summit file tree, and bylang is not used because it is None due to by-language branch organization.

The definitions of splitting and joining functions are written into the summit-config file somewhere before the S.branches branch specification, and added to each branch through transform_path= key:

S.branches = [
    dict(id="devel",
         topdir=S.relpath("../branches/devel-aa"),
         transform_path=(split_branch_path, join_branch_path)),
    dict(id="stable",
         topdir=S.relpath("../branches/stable-aa"),
         transform_path=(split_branch_path, join_branch_path)),
]

This means that it is possible, if necessary, to define different splitting and joining functions per branch.

5.2. Maintaining the Summit

From time to time, summit PO files need to be updated to reflect changes in branch PO files, and scattered so that branch PO files get new translations from the summit. How are summit PO files updated, by whom and in which amount, depends on the summit mode and the organization of the translation team. The same holds for when and by whom the scattering is done.

5.2.1. Centralized Summit Maintenance

The usual maintenance procedure would be for one designated person (e.g. the team coordinator) to update all summit PO files and to scatter new translations to branch PO files, at certain periods of time agreed upon in the translation team.

If there are no mismatches between the branch and summit PO files, the summit update procedure is fully automatic. How the summit is updated depends on the summit mode. In direct summit, the update is performed by gathering:

$ cd $SUMMITDIR
$ posummit gather

In summit over dynamic templates, merging is performed instead:

$ cd $SUMMITDIR
$ posummit merge

Finally, in summit over static templates, first the template summit is gathered, and then language summits are merged:

$ posummit $SOMEWHERE/summit-config-shared templates gather
$ posummit $SOMEWHERE/summit-config-shared aa merge
$ posummit $SOMEWHERE/summit-config-shared bb merge
...

Note that unlike when setting up the summit, no --create or --force options are used. Without them, posummit will warn about any new mismatches between branches and the summit and abort the operation, leaving the user to examine the situation and take corrective measures. Section 5.2.3, “Handling Mismatches Between Branches and Summit” discusses this in detail.

Scattering to branches is always fully automatic. For direct summit and summit over dynamic templates it is performed with:

$ cd $SUMMITDIR
$ posummit scatter

For summit over static templates, scattering is done for each language summit:

$ posummit $SOMEWHERE/summit-config-shared aa scatter
$ posummit $SOMEWHERE/summit-config-shared bb scatter
...

If summit update (merge, gather, or both, depending on the summit mode) is scheduled to run automatically, the maintainer should make sure to be notified when posummit aborts, so that mismatches can be promptly handled.

The obvious advantage of this maintenance method is that other team members do not need to know anything about workings of the summit. They only fetch updated summit PO files, translate them, and submit them back. The disadvantage is that summit update may interfere with a particular translator who happened to be working on a PO file which just got updated in the repository, causing merge conflicts when he attempts to submit that PO file.

5.2.2. Distributed Summit Maintenance

In this maintenance mode, each team member performs summit operations on exactly the PO files that he wants to work on. This has the advantage over centralized maintenance in that translators do not interfere in each others work, as summit PO files get updated only at the request of the translator working on it. Additionally, it may provide faster gather(-merge)-scatter turnaround time. Unfortunately, the disadvantage is that now all team members have to know how the summit is maintained, so this method is likely applicable only to strongly technical teams.

Distributed maintenance is in general the same as centralized, except that now all posummit command lines take extra arguments, namely the selection of PO files to operate on -- so called operation targets. Operation targets can be given in two ways. One is directly by file or directory paths. For example, in summit over dynamic templates mode, when working on the foobaz.po file, the translator would use the following summit commands to merge it and scatter to the branches:

$ cd $SUMMITDIR
$ posummit merge foosuite/foobaz.po
$ # ...update the translation...
$ posummit scatter foosuite/foobaz.po

To update all files in foosuite/ subdirectory at once, the translator can execute instead:

$ cd $SUMMITDIR
$ posummit merge foosuite/
$ posummit scatter foosuite/

It is also possible to single out a particular branch for scattering, by giving the path to the PO file in that branch instead of the summit. To scatter foobaz.po only to devel branch, in by-language branch organization the translator would use:

$ posummit scatter $SOMEWHERE/devel/aa/foosuite/foobaz.po

and in by-domain branch organization:

$ posummit scatter $SOMEWHERE/devel/foosuite/foobaz/po/foobaz/aa.po

Note that the current working directory still has to be within the summit directory, so that posummit can find the summit configuration file. (This requirement is not present for summit over static templates, as there the path to configuration file is given in command line.)

The other kind of operation targets are PO domain names and subdirectory names alone. In this formulation, the first example above could be replaced with:

$ posummit merge foobaz
$ posummit scatter foobaz

Since all summit PO file names are unique, this is sufficient information for posummit to know what it should operate on. To limit operation to a certain branch, the branch name is added in front of the domain names, separated by a colon. To scatter foobaz.po to devel branch:

$ posummit scatter devel:foobaz

and to scatter the complete foosuite/ subdirectory to the same branch:

$ posummit scatter devel:foosuite/

Note that trailing slash is significant here, since otherwise the argument would be interpreted as single PO file (posummit would exit with an error, reporting that such a file does not exist). Summit also has a "branch name" assigned for use in operation targets of this kind, and that is +.

When merging (or gathering in direct summit mode) is attempted, posummit may abort with the report of mismatches between branches and the summit. The translator must then make the adjustments (Section 5.2.3, “Handling Mismatches Between Branches and Summit” describes how, case by case), or report it to someone else to handle.

After selected summit and branch PO files have been updated, the translator can commit them. Alternatively, a half-distributed workflow could be used, where translators only update and commit summit PO files, while scattering to branches is centralized, and automatically performed at a given period. This makes sense because the scattering in no way interferes with translators' workflow and never needs any manual intervention.

5.2.3. Handling Mismatches Between Branches and Summit

When something changes in the PO file tree in one of the branches, posummit will by default abort gathering (or merging in summit over dynamic templates), and present a list of its findings. At this point posummit could be made to continue by issuing the --create option, but then it will resolve mismatches in a simplistic way, which will be wrong in many cases. Instead, you should examine what had happened in branches, possibly manually perform some operations on summit PO files and possibly add some branch-to-summit mappings, and rerun posummit after the necessary adjustments have been made.

Typical mismatches and their remedies are as follows:

A branch PO file has been moved to another subdirectory (moving).

In a translation project with modules represented by subdirectories, it may happen that a program or a library is moved from one module to another, with its PO files following the move. If this happened in all branches, posummit will report that the summit PO file should be moved as well; it can be rerun with --create to do the move itself, or you can make the move manually. If the move happened in only one of the branches, posummit will not complain at all; more precisely, if at least one branch PO file is in same relative subdirectory as the summit PO file, it is not considered a mismatch.

Another, less obvious case of moving may arise when two same-named branch PO files appear in different subdirectories of the same branch. posummit will by default simply gather them into single summit PO file, without reporting anything. However, it may be that one of the two subdirectories is of higher priority for translation. Then that it would be better if the summit PO file is located in that subdirectory, and that posummit reports if that is not the case, or make the move itself under --create. Subdirectory precedence can be specified through S.subdir_precedence field, which is simply a list of subdirectories:

S.subdir_precedence = [
    "library",
    "application",
    "plugins/base",
    ...
]

Earlier subdirectories in the list have higher precedence. If a subdirectory is below one of the listed subdirectories, that subdirectory will have the same precedence as its listed top directory. If a subdirectory is neither listed nor it is below any of the listed, its precedence will be lower than all the listed.

A totally new branch PO file has been added (addition).

When a piece of software appears (created or imported) in the project, its PO files will appear with it. These PO files are "totally" new, in the sense that they are not derived from any existing PO file. In this case, posummit will report that new branch PO files have no corresponding summit PO files, and expected paths of the missing summit PO files. After having checked that the branch PO files are indeed totally new, you can rerun posummit with --create, or manually copy branch PO files to expected summit paths (they will be equipped with summit-specific information when posummit rolls over them).

A branch PO file has been removed (removal).

A piece of software may be removed from the project (not maintained any more, moved to another project), which will cause its PO files to disappear. posummit will then report that some summit PO files have no corresponding branch PO files. You should check that branch PO files have indeed been simply removed, and then rerun posummit with --create, or manually remove summit PO files.

A branch PO file has been renamed (renaming).

When, for example, a program changes its name, normally its PO file will be renamed as well. What will happen in this case is that posummit will report two problems: a branch PO file without corresponding summit PO file (new name), and a summit PO file without any corresponding branch PO files (old name). When you realize that the cause of these paired reports is indeed renaming (they could also be an unrelated addition and removal), you must rename the summit PO file manually. Note that if you had not done this and issued --create option instead, the existing summit PO file would have been removed, and an empty one with the new name created -- definitely not what was desired.

A more complicated case of renaming is when the name is changed in only one branch. posummit then reports only the branch PO file with the new name as having no summit PO file, since the existing summit PO file matches non-renamed branch PO files. In this case, the usual correction is to rename the summit PO file to new name and map old names from other branches to the new name. If foobaz.po was renamed to fooqwyx.po in devel branch, but kept its name in stable, then the mapping in the summit configuration file would be:

S.mappings = [
    ...
    ("stable", "foobaz", "fooqwyx"),
    ...
]

Each mapping entry is a sequence of strings in parenthesis. The first string is the branch name, the second string is the domain name of the PO file in that branch, and the third string the domain name of the PO file in summit. When you add this mapping (and rename summit foobaz.po to fooqwyx.po), you can rerun posummit.

If the summit is over static templates, i.e. there are separate template and language summits, then renamings should be done in all of them.

A branch PO file has been split into several files (splitting).

If a single PO file becomes very big, it may be split into several smaller files by categories of messages (e.g. UI and help texts). A program may also be modularized, when the factored modules may take over part of the messages from the main PO file into their own PO files. Either way, posummit will again simply report that some new branch PO files have appeared and possibly some disappeared, and you recognize that the cause of this is a splitting. Splitting typically happens in the newest branch, but not in older branches. You should then make the same split in summit PO files and map the monolithic PO file from older branches to the newly split summit files. For example, if foobaz.po in devel branch got split into foobaz.po (of reduced size), libfoobaz.po, and foobaz_help.po, the mapping for the old monolithic PO file in the stable branch would be:

S.mappings = [
    ...
    ("stable", "foobaz", "foobaz", "libfoobaz", "foobaz_help"),
    ...
]

The first string in the mapping is the branch name, the second string is the PO domain name in that branch, and all following strings are the new summit PO domain names which contain part of original messages. The order of summit PO domains is somewhat important: if a message exists only in the monolithic PO file in the stable branch and not in split PO files in devel branch, and summit heuristics detects no appropriate insertion point into one of the summit PO files, that message will be added to the end of the first summit PO file listed.

"Making the same split in summit" deserves some special attention. For the templates summit (which exists in summit over static templates), this simply means adding any new files and removing old ones (posummit will do that itself if run with --create). But for language summits, you should manually copy the original summit PO file to each new name in turn, and then perform gather (direct summit) or merge (summit over templates). In this way no translated messages will be lost in the split.[19]

Several branch PO files have been merged into one (merging).

Sometimes formerly independent pieces of software are joined into a single package, for more effective maintenance and release. This can happen, for example, when selected plugins are taken into the host program distribution as "core plugins". Their separate PO files may then be merged into a single new PO file, or into an existing PO file. Like in the opposite case of splitting, posummit will simply report that some summit PO files no longer have branch counterparts, and possibly that a new branch PO file has appeared. This usually happens in the newest branch first, while older branches retain the separation. Then the same merging should be done in summit too, and mappings added for each of the old separate PO files in other branches. If foobaz_info.po, foobaz_backup.po, and foobaz_filters.po have been merged into existing foobaz.po in devel branch, the following mappings for the stable branch should be added:

S.mappings = [
    ...
    ("stable", "foobaz_info", "foobaz"),
    ("stable", "foobaz_backup", "foobaz"),
    ("stable", "foobaz_filters", "foobaz"),
    ...
]

As for making the same merge in the summit, for templates summit (in summit over static templates) you should manually remove old separate files and possibly add the new monolithic one, or run posummit with --create. In language summits, in order to retain all existing translations, you should manually concatenate separate files into one (using Gettext's msgcat) and then perform gather (direct summit) or merge (summit over templates).

A language branch PO file has appeared in summit over templates (injection).

In summit over templates modes (dynamic or static), the normal way for a language summit PO file to appear is by starting from a clean template, and the corresponding branch PO file is then created on scatter. However, when a program previously developed elsewhere is imported into the project, its PO files are imported too. This will lead to the situation where there are translated branch PO files with no corresponding language summit PO files. This is corrected by forced gathering of the "injected" branch PO file. If the injected file is alien.po, in summit over dynamic templates you would execute:

$ cd $SUMMITDIR
$ posummit gather --create --force alien

and in summit over static templates:

$ posummit $SOMEWHERE/summit-config-shared aa gather --create --force alien
$ posummit $SOMEWHERE/summit-config-shared bb gather --create --force alien
$ ...

The --force option is necessary because, in summit over template modes, language summit PO files are normally gathered just once when the summit is created, and later only merged.

Important thing to note about mismatches is that reports produced by posummit may be misleading, especially in more complicated situations (splitting, merging). This means that you must carefully examine what has actually happened, not based only on the branch file trees themselves, but also by keeping an eye on channels (e.g. mailing lists) where information for translators is most likely to appear.

There is also the possibility to map a whole branch subdirectory to another directory in the summit. Since summit PO files are unique by domain name, the only effect of subdirectory mapping is to prevent posummit from reporting that files should be moved to another subdirectory, and to have it report proper expected summit paths when new branch catalogs are discovered. For example, if the PO files from subdirectory foosuite/ in devel branch and from subdirectory foopack/ in stable branch should both be collected in summit subdirectory foo/, the subdirectory mapping would be:

S.subdir_mappings = [
    ...
    ("devel", "foosuite", "foo"),
    ("stable", "foopack", "foo"),
    ...
]

Subdirectory mappings should be needed rarely compared to file mappings. A tentative example could be when two closely related software forks are translated within the same project, and they have many PO files in their own subdirectories.

At some moment translation branches will be "shifted", for example devel will become the new stable, stable may become oldstable (if three branches are maintained), etc. When that happens, mappings should be shifted too. A typical case would be two branches, devel and stable, and some mappings only for stable; then, when the shift comes, all existing mappings would be simply removed.

5.2.4. Checking Summit Dependencies

As the number of mappings grows, or if branch path transformation is employed, it may not be readily clear which summit PO files are related to which branch PO files. Translator may need this information to know exactly which summit PO files to work on in order to have some set of branch files fully translated. For this reason, posummit provides the operation mode deps, in which any number of operation targets are given in command line, and the dependency chains are reported for those targets.

If you recall the example mapping due to merging, you can check the dependency chain for the file foobaz_info.po in stable branch by executing one of:

$ cd $SUMMITDIR
$ posummit deps $STABLEDIR/foobaz_info.po
$ posummit deps stable:foobaz_info

in direct summit or summit over dynamic templates, or

$ posummit $SOMEWHERE/summit-config-shared aa deps $STABLEDIR/foobaz_info.po
$ posummit $SOMEWHERE/summit-config-shared aa deps stable:foobaz_info

in summit over static templates. The output would look like this:

:    summit-dir/foobaz.po  devel-dir/foobaz.po stable-dir/foobaz_info.po \
     stable-dir/foobaz_backup.po stable-dir/foobaz_filters.po

You can see that the complete dependency chain to which foobaz_info.po from stable belongs to has been written out. The first path in the chain is always the summit PO file, followed by all mapped PO files from each branch in turn.

If the file for which the dependency is mapped to more than one summit PO file, then the dependency chains for each of them is displayed. In the example of mapping due to splitting, if you request dependency for monolithic foobaz.po from stable branch, you would get three dependency chains:

:    summit-dir/foobaz.po  devel-dir/foobaz.po  stable-dir/foobaz.po
:    summit-dir/libfoobaz.po  devel-dir/libfoobaz.po  stable-dir/foobaz.po
:    summit-dir/foobaz_help.po  devel-dir/foobaz_help.po  stable-dir/foobaz.po

5.3. Elements of Summit Configuration

Other then the main configuration fields for setting the summit type, summit and branch locations, and mappings, there are many other optional configuration fields. They can be used to make the translation workflow yet more efficient, by relieving translators from taking care of various details.

5.3.1. Summit Hooks

Summit operations (gather, merge, scatter) are characterized by having PO files and messages flowing between the summit and branches. It is then natural to think of adding some filtering into these flows. For example, on scatter, one could do small ortographic adjustments in translation, or automatically insert translated UI references.[20]

Filtering is implemented by being able to insert Pology hooks (see Section 9.10, “Processing Hooks”) into various stages of summit operations; a particular stage will admit only certain types of hooks. To fetch and insert translated UI references on scatter, the resolve-ui hook can be added like this:

from pology.uiref import resolve_ui
S.hook_on_scatter_msgstr = [
    (resolve_ui(uicpathenv="UI_PO_DIR"),),
]

S.hook_on_scatter_msgstr is a list of hooks which are applied on translation (msgstr fields) before it is written into branch PO files on scatter. Each element of this list is a tuple of one to three elements. The first element in the tuple is the hook function, here resolve_ui[21]. resolve_ui is an F3C hook, which is the type of hooks expected in S.hook_on_scatter_msgstr list.

The second and third element in the hook tuple are, respectively, selectors by branch and file. These are used when the hook is not meant to be applied on all branches and all PO files. The selector can be either a regular expression string, which is matched against the branch name or PO domain name (positive match means to apply the hook), or a function (return value evaluating as true means to apply the hook). If it is a function, the branch selector gets the branch name as input argument, and the file selector gets the summit PO domain name and summit subdirectory. For example, to add the specialized resolve_ui_docbook4 hook only to foobaz-manual.po file, and plain resolve_ui to all other files, the hook list would be:

from pology.uiref import resolve_ui, resolve_ui_docbook4

S.hook_on_scatter_msgstr = [
    (resolve_ui_docbook4(uicpathenv="UI_PO_DIR"), "", "-manual$"),
    (resolve_ui(uicpathenv="UI_PO_DIR"), "", "(?<!-manual)$"),
]

The branch selector here is empty string, which means that both hooks apply to all branches (since empty regular expression matches any string). The resolve_ui_docbook4 hook has "-manual$" regular expression as the file selector, which means that is should be applied to all PO domain names ending in -manual. The resolve_ui hook has been given the opposite regular expression, "(?<!-manual)$", which matches any PO domain name not ending in -manual.[22] Regular expressions can quickly become unreadable, so here is how the same selection could be achieved with selector functions:

from pology.uiref import resolve_ui, resolve_ui_docbook4

def is_manual (domain, subdir):
    return domain.endswith("-manual")
def is_not_manual (domain, subdir):
    return not is_manual(domain, subdir)

S.hook_on_scatter_msgstr = [
    (resolve_ui_docbook4(uicpathenv="UI_PO_DIR"), "", is_manual),
    (resolve_ui(uicpathenv="UI_PO_DIR"), "", is_not_manual),
]

When is more than one hook in the list, they are applied in the order if which they are listed.

This is all there is to say about hook application in general. What follows is a list of all presently defined hook insertion lists, with admissible hook types given in parentheses. Usually paired F* and S* hook types are possible, such that F* hooks are primary used for modification, while S* hooks could be employed for validation (e.g. writing out warnings).

S.hook_on_scatter_msgstr (F3A, F3C, S3A, S3C)

Applied to the branch translation (msgstr fields) on scatter, before it is written into the branch PO file.

S.hook_on_scatter_msg (F4A, S4A)

Applied to branch message on scatter, before it is written into the branch PO file. These hooks can modify any part of the message, like comments, or even the msgid field.

S.hook_on_scatter_cat (F5A, S5A)

Applied to the branch PO file while still in internal parsed state on scatter, after S.hook_on_scatter_msgstr had been applied to all messages.

S.hook_on_scatter_file (F6A, S6A)

Applied to the branch PO file as raw file on disk on scatter, after S.hook_on_scatter_cat had been applied. If one of the hooks reports non-zero value, the rest of the hooks in the list are not applied to that file.

S.hook_on_scatter_branch

Applied to the complete branch on scatter, after all other hooks on scatter had been applied. Functions used here are not part of the formal hook system. They take a single argument, the branch name, and return a number. If the return value is not zero, rest of the hooks are skipped on that branch.

S.hook_on_gather_file_branch (F6A, S6A)

Applied to the branch PO file as raw file on disk on gather, before S.hook_on_gather_cat_branch is applied. The branch PO file will not be modified for real, but only its temporary copy.

S.hook_on_gather_cat_branch (F5A, S5A)

Applied to the branch PO file while still in internal parsed state on gather, before S.hook_on_gather_msg_branch is applied to all messages.

S.hook_on_gather_msg_branch (F4A, S4A)

Applied to the branch message on gather, before it is used to gather the corresponding summit message.

S.hook_on_gather_msg (F4A, S4A)

Applied to the summit message on gather, after it had been gathered from the corresponding branch messages, but before it is written into the summit PO file.

S.hook_on_gather_cat (F5A, S5A)

Applied to the summit PO file while still in internal parsed state on gather, after S.hook_on_gather_msgstr had been applied to all messages.

S.hook_on_gather_file (F6A, S6A)

Applied to the summit PO file as raw file on disk on gather, after S.hook_on_gather_cat had been applied.

S.hook_on_merge_head (F4B, S4B)

Applied to summit PO header on merge, after the summit PO file has been merged.

S.hook_on_merge_msg (F4A, S4A)

Applied to summit message on merge, after S.hook_on_merge_head had been applied.

S.hook_on_merge_cat (F5A, S6A)

Applied to the summit PO file while still in internal parsed state on merge, after S.hook_on_gather_msg had been applied to all messages.

S.hook_on_merge_file (F6A, S6A)

Applied to the summit PO file as raw file on disk on merge, after S.hook_on_merge_cat had been applied.

You may notice that some logically possible hook insertion lists are missing (e.g. S.hook_on_merge_msgstr). This is because they are implemented on demand, as the need is observed in practice, and not before the fact.

Here is another example of hook interplay. Branch PO files may still rely on embedding the context into the msgid field:

msgid "create new document|New"
msgstr ""

but you would nevertheless like to have proper msgctxt contexts in the summit:

msgctxt "create new document"
msgid "New"
msgstr ""

You can achieve this by writing two small F4A hooks, and inserting them at proper points:

def context_from_embedded (msg, cat):
    if "|" in msg.msgid:
        msg.msgctxt, msg.msgid = msg.msgid.split("|", 1)

def context_to_embedded (msg, cat):
    if msg.msgctxt is not None:
        msg.msgid = "%s|%s" % (msg.msgctxt, msg.msgid)
        msg.msgctxt = None

S.hook_on_gather_msg_branch = [
    (context_from_embedded,),
]

S.hook_on_scatter_msg = [
    (context_to_embedded,),
]

In this way, branch messages will be converted to proper context just before they are gathered into the summit, and the proper context will be converted back into the embedded when the messages are scattered to branches.

5.3.2. Integration with Version Control Systems

Most likely, branch and summit directories will be kept under some sort of version control. This means that when posummit has finished running, any files that it had added, moved or removed, would have to be manually "reported" to the version control system (VCS). To avoid this, you can set in the summit configuration which VCS is used, among those supported by Pology, and posummit will issue proper VCS commands when changing the file tree. Then, after the posummit run, you can simply issue the VCS commit command on appropriate paths.

Since different VCS may be used for the summit and the branches, it is possible to set them separately. For example, if branches are in a Subversion repository and the summit in a Git repository, the summit configuration would contain:

S.summit_version_control = "git"
S.branches_version_control = "svn"

If the same VCS is used for branches and the summit (whether or not they are in the same repository), only one configuration field can be set:

S.version_control = "git"

If you would like posummit to execute VCS commands only in the summit and not in branches, then you would set only the S.summit_version_control field.

5.3.3. Text Wrapping in PO Files

While wrapping of text fields in PO files (msgid, msgstr, etc) makes no technical difference, it may be convenient for editing for them to be wrapped in a particular way. Since posummit is anyway modifying PO files both in the summit and branches, it might as well be told what kind of wrapping to use.

For example, a reasonable wrapping setup could be:

S.summit_wrap = False
S.summit_fine_wrap = True
S.branches_wrap = True
S.branches_fine_wrap = False

S.*_wrap fields activate or deactivate basic (column-based) wrapping, while S.*_fine_wrap fields do the same for logical breaks. So in this example, summit messages are wrapped only on logical breaks (may be good for editing), while branch messages are wrapped only on columns (may reduce size of VCS deltas).

If not set, the default is basic wrapping without fine wrapping, for both branches and the summit.

5.3.4. Vivification of Summit PO Files

In direct summit, summit PO files spring into existence by gathering branch PO files. However, in summit over static templates, by default translators have to start a new PO file by copying over the summit template and initializing it. While dedicated PO editors can do this automatically, all translators in the team have to configure their PO editor correctly (language code, plural forms...), and they have to have templates at hand. Furthermore, any statistic processing on the translation project as whole has to specifically consider templates as empty PO files.

Instead of this, it is possible to tell posummit to automatically initialize summit PO files from summit templates -- to "vivify" them -- when the language summit is merged. There is a summit configuration switch to enable vivification, as well as several fields to specify the information needed to initialize a PO file. Here is an example:

S.vivify_on_merge = True
S.vivify_w_translator = "Simulacrum"
S.vivify_w_langteam = "Nevernissian <l10n@neverwhere.org>"
S.vivify_w_language = "nn"
S.vivify_w_plurals = "nplurals=7; plural=(n==1 ? ...)"

Setting S.vivify_on_merge to True engages vivification. The S.vivify_w_translator field specifies the value of Last-Translator: header field in vivified PO file; it can be something catchy rather than a real translator's name, to be able to see later which summit PO files were not yet worked on. S.vivify_w_langteam is the contents of Translation-Team: header field (team's name and email address), S.vivify_w_language of Language: (language code), and S.vivify_w_plurals of Plural-Forms:.

In summit over dynamic templates, vivification is unconditionally active, whether S.vivify_on_merge is set or not. This is because synchronization of branches and the summit is checked by comparing template trees, and summit PO files are the only indicator of "virtual" presence of summit templates (while in summit over static templates, the summit template tree is physically present). Without vivification, it would also be very hard for project-wide statistics to take templates into account as empty summit PO files.

5.3.5. Merging in Branches

By default it is assumed that branch PO files are merged with branch templates using a separate mechanism, which was already in place when the summit was introduced into the workflow. In summit over templates modes, if branch merging is performed asynchronously to summit merging, on scatter it may happen that some messages recently added to branch PO file are not yet present in corresponding summit PO file. In that case, posummit will issue warnings about missing messages in the summit. This is normally not a problem, because merging asynchronicity will stop causing such differences as the pre-release message freeze in the source sets in.

However, on the one hand side, warnings of about messages missing in the summit may be somewhat disconcerting, or aesthetically offending in the otherwise clean scatter output. On the other hand side, perhaps the existing mechanism of merging in branches is not too clean, and it would be nice to replace it with something more thorough. Therefore, in summit over templates modes, it is possible to configure the summit such that on merge, posummit merges not only the summit PO files, but also all branch PO files. This is achieved simply by adding the merge= key to each branch that should be merged:

S.branches = [
    dict(id="devel", ..., merge=True),
    dict(id="stable", ..., merge=True),
]

When merging in branches is activated, it is still possible to merge only the summit, or any single branch. This is done by using giving an operation target on merge, either the path to the branch top directory or the branch name. For example, in summit over dynamic templates:

$ cd $SUMMITDIR
$ posummit merge $DEVELDIR/  # merge only the devel branch
$ posummit merge devel:      # same
$ posummit merge .           # merge only the summit
$ posummit merge +:          # same

5.3.6. Propagation of Header Parts

PO headers are treated somewhat differently from PO messages in summit operations:

  • On gather, almost all of the standard header field of the primary branch PO file are copied into the summit PO file. The primary branch PO file is defined as the first branch PO file (in case of several branch files being mapped onto the same summit PO file) from the first branch (as listed in the branch specification in summit configuration). The only exception is the POT-Creation-Date:, which is set to the time of gathering, if there were any other modifications to the summit PO file. Header comments are not copied over, except when the summit PO files is being automatically created for the first time.

  • On merge, the summit PO file is merged with the summit PO template using msgmerge, so its header propagation rules apply. For example, no header comments will be touched, POT-Creation-Date: will be copied over from templates but Last-Translator: will not be touched, etc. This also means that, by default, any non-standard fields in the template (e.g. those starting with X-*) will be silently dropped.

  • On scatter, almost the complete header is copied over from the primary summit PO file into the branch PO file. The primary summit PO file is defined as the first mapped summit PO file, in cases when the single branch PO file has been mapped to several summit PO files. The exception are Report-Msgid-Bugs-To: and POT-Creation-Date:, which are preserved as they are in the branch PO file. Also, PO-Revision-Date: is set to that of the primary summit PO file only if there were any other modifications to the branch PO file (because it may happen that all updates to the summit PO file since the last scatter were for messages from other branches).

There exists the possibility to influence this default header propagation. In particular, non-standard header fields may be added into branch and summit PO files and templates by different tools, and it may be significant to preserve and propagate these fields in some contexts. The following summit configuartion fields can used for that purpose:

  • S.header_propagate_fields field can be set to a list of non-standard header field names which should be propagated in gather and merge operations, from branch into summit PO files. For example, to propagate fields named X-Accelerator-Marker: and X-Sources-Root-URL:, the following can be added to summit configuration:

    S.header_propagate_fields = [
        "X-Accelerator-Marker",
        "X-Sources-Root-URL",
    ]
    

    Only the primary branch PO file is considered for determining the presence and getting the values of these header fields.

  • Instead of simply overwriting on scatter most of the branch PO header fields with summit PO header fields, some additional branch fields may be preserved by setting S.header_skip_fields_on_scatter to the list of header field names to preserve. For example, to preserve X-Scheduled-Release-Date: field in branch PO files:

    S.header_skip_fields_on_scatter = [
        "X-Scheduled-Release-Date",
    ]
    

5.3.7. Filtering by Ascription on Scatter

Chapter 6, Ascribing Modifications and Reviews describes a translation review system available in Pology, in which every PO message has its modification and review history kept up to some depth in the past. Based on that history, it is possible to select which messages from working PO files (those under ascription) can be passed into release PO files, provided that these two file trees exist. Summit and branches can be viewed exactly as an instance of such separation, where the summit is the working tree, and each branch a release tree.

In this context, only the summit tree should be kept under ascription. Filtering for release is then, naturally, performed on scatter: to each summit PO message a sequence of one or more ascription selectors is applied, and if the message is selected by the selector sequence, it is passed into the branch PO file. Several selector sequences may be defined, for use in various release situations, through S.ascription_filters configuration field.

For example, to have a single filtering sequence which simply lets through all messages not modifed after last review, the following should be added to summit configuration:

S.ascription_filters = [
    ("regular", ["nmodar"]),
]

Each filtering sequence is represented by a two-element tuple. The first element is the name of the filtering sequence, here regular. You can set the name to anything you like; when there is only one filtering sequence, the name is actually nowhere used later. The second element of the tuple is a list of ascription selectors, which are given just as the values to -s options in poascribe command line. Here only one selector is issued, nmodar, which is the negation of modified-after-review selector. This yields the desired filter to pass all messages "not modifed after last review".

A more involving example would be that of having one filter for regular scatter, and another "emergency" filter, which relaxes the strictness a bit in case there was no time to properly review all new translations. This emergency filter may let through unreviewed messages if modified by a select few persons, which are known to produce sufficient quality translators in first attempt. If these persons are, for example, alice and bob (by their ascription user names), then the two-filter setup could look like this:

S.ascription_filters = [
    ("regular", ["nmodar"]),
    ("emergency", ["nmodar:~alice,bob"]),
]

The regular filter looks like in the previous example. The emergency filter also uses just one nmodar selector, but with additional argument to consider all users except for alice and bob. Due to the fact that it is listed first, the regular filter is applied on scatter by default. Application of the emergency filter is requested by issuing the -a/--asc-filter option with filter name as value:

$ cd $SUMMITDIR
$ posummit scatter -a emergency

When scattering is performed under the ascription filter, messages stopped by the filter will be counted and their number (if non-zero) reported per branch PO file.

5.3.8. Other Branch Options

Each branch entry in branch specification (S.branches configuration field) can have some keys in addition to those described earlier.

It is possible to exclude some branch PO files from summit operations, or to include only certain branch PO files into summit operations. This is done by setting excludes= and includes= keys. The value is a list of tests on branch PO file absolute path: if any test matches, the file is matched on the whole (logical OR). Each test can be either a regular expression string, or a function taking the file path as argument and returning a truth value. If only excludes= is set, then all files not matched are operated on, and if includes= is set, only matched files are operated on. If both keys are set, then only files matched by includes= and not matched by excludes= are operated on.

If branches are under version control and posummit is told to issue version control commands as appropriate (i.e. S.branches_version_control configuration field is set), it is possible to exclude a specific branch from this, by setting its skip_version_control= key to True.

5.3.9. Other Merge Options

As is usual, merging performed by posummit by default produces fuzzy messages; in summit PO files, as well as in branch PO files if merging in branches is enabled. It is possible to prevent fuzzy matching, by setting S.summit_fuzzy_merging and S.branches_fuzzy_merging configuration fields to True. There should be little reason to disable fuzzy matching in summit PO files, but it may be convenient to do so in branch PO files, which are not directly translated. For example, lack of fuzzy message will lead to smaller version control deltas.

Fuzzy messages are by default produced by msgmerge alone. This can be more finely tuned by processing the PO file before and after it has been merged, as done by the poselfmerge command. The S.merge_min_adjsim_fuzzy configuration field can be set to a number in range from 0 to 1, having the same effect on fuzzy matching as the -A/--min-adjsim-fuzzy option of poselfmerge. The S.merge_rebase_fuzzy field can be set to True, with the same meaning as the -b/--rebase-fuzzies option of poselfmerge.

Summit PO files may be merged by consulting a compendium, to produce additional exact and fuzzy matches. This possibility also draws on the functionality provided by poselfmerge. The S.compendium_on_merge configuration field is used to set the path to a compendium[23], equivalently to the -C/--compendium option of poselfmerge. Since compendium matches are less likely to be appropriate than own matches, you may set the S.compendium_fuzzy_exact field to True, or the S.compendium_min_words_exact field to a positive integer number, with the same effect as -x/--fuzzy-exact and -W/--min-words-exact options of poselfmerge, respectively.

5.3.10. Other Scatter Options

Sometimes a summit PO file may be "pristine", meaning that all messages in it are clear, neither translated nor fuzzy. Pristine summit PO files may appear, for example, when vivification is active. A pristine summit PO file will by default cause a likewise empty branch PO file to appear on scatter. This may or may not be a problem in a given project. If it is a problem, it is possible to set the minimal translation completeness of a summit PO file at which the branch PO file will be created on scatter. For example:

S.scatter_min_completeness = 0.8

sets the minimum completeness to scatter at 80%. Completeness is taken to be the ratio of the number of translated to all messages in the file (excluding obsolete).

Translation completeness of a summit PO file may deteriorate over time, as it is periodically gathered or merged, and no one comes around to update the translation. At some point, the completeness may become too low to be considered useful, so that it is better to stop releasing remaining translations in that file until it is updated. The completeness at which this happens, at which the branch PO file is automatically cleared of all translations on scatter, can be set through S.scatter_acc_completeness configuration field. The meaning of the value of this field is the same as for S.scatter_min_completeness; in fact, one might ask why not simply use S.scatter_min_completeness for this purpose as well. The reason is that sometimes a higher bar is put for the initial release, and having two separate configuration fields enables you to make this difference.

5.4. Disadvantages to Summit Workflow and Remedies

Although hopefully shadowed by the advantages, working in summit is not without some disadvantages. These should be weighed in when deciding on whether to try out the summit workflow.

In summit over template modes, any changes made manually in branch PO files will not propagate into summit, and will be soon lost to scattering. This means that the whole translation team must work in the summit. It is not possible for some members to use the summit, and some not. In direct summit mode, modifying branches directly would be even messier, as some changes would find their way into the summit and some not, depending on which branch contains the change and the order of gather and scatter operations.

A summit PO file will necessarily have more messages than either of the branch files. For example, in two successive development-stable branch cyclings within the KDE translation project (at the time about 1100 PO files with 750.000 words), summit PO files were on average 5% bigger (by number of words) than their branch counterparts. This percentage can be taken as the top limit of possibly wasted translation effort due to messages in development branch coming and going, given that as the next branch cycling approaches more and more messages become fixed and make into the next stable branch.

A more pressing issue with increased size of summit PO files is the following scenario: next stable release is around the corner, and the translation team has no time to update summit PO files fully, but could update only stable messages in them. For example, there are 1000 untranslated and fuzzy messages in the summit, out of which only 50 are coming from the stable branch. A clever dedicated PO editor could allow jumping only through untranslated and fuzzy messages which additionaly satisfy a general search criteria, in this case that a comment matches \+>.*stable regular expression (assuming the stable branch is named stable in summit configuration). Lacking such a feature, with some external help it is enough if the editor can merely search through comments. First, Pology's posieve command can be used to equip all untranslated and fuzzy stable messages in summit PO files with an untranslated flag (producing #, ..., untranslated comment):

$ posieve tag-untranslated -sbranch:stable -swfuzzy paths...

Then, in the PO editor you can jump through incomplete stable messages by simply searching for this flag. While doing that, you are not obligated to manually remove the flag: it will either automatically disappear on the next merge, or you can remove all flags afterwards by running:

$ posieve tag-untranslated -sstrip paths...

There are some organizational issues with starting to use the summit, and, if it turns out counter-productive, stopping to use it. Team members have first to be reminded not to send in or commit branch PO files, and then if the summit is disbanded, to be reminded to go back to branch PO files. On the plus side, disbanding the summit is technically simple, simply removing its top directory and summit configuration file will do the job.



[14] One may think of relying upon the translation memory: translate only PO files from one branch, and batch-apply translation memory to PO files other branches, accepting only exact matches. This is dangerous, because short messages may need different translations in different PO files, resulting in hilarious mistranslations.

[15] Unfortunatelly, the following common organization cannot be automatically supported:

path/to/devel/
    appfoo/
        src/
        doc/
        po/
            aa.po
            bb.po
            ...
            # no template!
        ...
    appbar/
        ...

The problem is that there is no way to determine domain names from the file tree alone, and that different handling would be required for sources which actually have multiple PO domains.

[16] New translations do not have to appear in branches only by mistake. For example, some external sources, which have been translated elsewhere, may be integrated into the project.

[17] More precisely, if there are two same-name PO domains inside one branch, they will both be gathered into the same summit PO file. The assumption is that PO files with same domain names have mostly common messages.

[18] It can be changed by assigning another string to S.templates_lang.

[19] One could also skip this and allow immediate loss of translations, and rely on the translation memory when later translating new PO files. But, especially in centralized summit maintenance, it is better to make things right early. Also, translation memory matches may not be as reliable, since they come not only from the original PO file, but from all PO files in the project.

[20] Another possibility are validation filters, which do not modify the text but report possible problems, though validation rules and the check-rules sieve are likely a better solution.

[21] resolve_ui is not the hook function itself, but a hook factory. It is called with the argument uicpathenv="UI_PO_DIR" to produced the actual hook function. See its documentation for details.

[22] This pattern makes use of a negative lookbehind token, a fairly advanced bit of regular expression syntax.

[23] Here you can also use the S.relpath() function, to have the compendium path be relative to the directory of the summit configuration file.