Package pology :: Module split

Module split

Splitting message fields into syntactical elements.


Author: Chusslove Illich (Часлав Илић) <caslav.ilic@gmx.net>

License: GPLv3

Functions
list of strings, list of strings
split_text(text, markup=False, format=None)
Split text into words and intersections.
list of strings
proper_words(text, markup=False, accels=None, format=None)
Mine proper words out of the text.
Variables
  __package__ = 'pology'
Function Details

split_text(text, markup=False, format=None)

 

Split text into words and intersections.

The text is split into lists of words and intersections (inter-word segments), such that there is always an intersection before the first and after the last word, even if empty. That is, there is always one more of interesections than of words.

The text may contain <...> tags, and be of certain format supported by Gettext (e.g. c-format). If specified, these elements may influence splitting.

Parameters:
  • text (string) - the text to split
  • markup (bool) - whether text contains markup tags
  • format (None or string) - Gettext format flag
Returns: list of strings, list of strings
words and intersections

proper_words(text, markup=False, accels=None, format=None)

 

Mine proper words out of the text.

The proper words are those one would expect to find in a dictionary, or at least having that latent quality (jargon, etc.) As opposed to URLs, email addresses, shell variables, etc.

The text may contain XML-like markup (<...> tags, entities...), or keyboard accelerator markers. It may also be of certain format known to Gettext (e.g. c-format). If specified, these elements may influence splitting.

Parameters:
  • text (string) - the text to split
  • markup (bool) - whether text contains markup tags
  • accels (sequence) - accelerator characters to ignore
  • format (None or string) - Gettext format flag
Returns: list of strings
proper words