ajmc.commons package¶
ajmc.commons contains all the utilities (helpers, functions, objects, hard-coded variables) which are common (i.e. which must be accessible) to
task-specific repositories. These include notably:
file_managementutilities, which allow to handle files systematically in the ajmc’s data organisation (seenotebooks/data_organisation.ipnbfor more) and to retrieve information from the various project spreadsheets.arithmetic.pycontains helper maths function, mainly to deal with intervals, which are a common object in our Canonical format (of which more below).docstrings.pycentralizes common function and class docstrings in a single place and provides a decorator to retrieve them easily.geometry.pyprovides helper functions and an object,Shape, to deal with geometrical objects such as contours and bounding boxes.image.pyprovides helper functions and an object,AjmcImage, to deal with images.miscellaneous.pyreceives everything which doesn’t fit anywhere else. It notably contains generic functions and decorator, lazy objects for efficiency etc…variablescontains all the hard-coded variables such as PATHS, COLORS, SPREADSHEET_IDS, CHARSETS and many more.
Submodules¶
ajmc.commons.arithmetic module¶
This module contains basic arithmetic functions.
- ajmc.commons.arithmetic.are_intervals_within_intervals(contained: List[Tuple[int, int]], container: List[Tuple[int, int]]) bool[source]¶
Applies
is_interval_within_intervalon a list of intervals, making sure that all the contained intervals are contained in one of the container intervals.
- ajmc.commons.arithmetic.compute_interval_overlap(i1: Tuple[int, int], i2: Tuple[int, int])[source]¶
Computes the overlap between two intervals defined by their start and their stop included.
- Parameters:
i1 – A
Tuple[int, int]defining the included boundaries of an interval, with start <= stop.i2 – A
Tuple[int, int]defining the included boundaries of an interval, with start <= stop.
- Returns:
The length of the overlap.
- Return type:
int
- ajmc.commons.arithmetic.is_interval_within_interval(contained: Tuple[int, int], container: Tuple[int, int]) bool[source]¶
Checks if the
contained, interval is included in thecontainerinterval.- Parameters:
container – A
Tuple[int, int]defining the included boundaries of an interval, with start <= stop.contained – A
Tuple[int, int]defining the included boundaries of an interval, with start <= stop.
ajmc.commons.docstrings module¶
This file contains generic docstring chunks to be formatted using``docstring_formatter``.
- ajmc.commons.docstrings.docstring_formatter(**kwargs)[source]¶
Decorator with arguments used to format the docstring of a functions.
docstring_formatteris a decorator with arguments, which means that it takes any set ofkwargsas argument and returns a decorator. It should therefore always be called with parentheses (unlike traditional decorators - see below). It follows the grammar ofstr.format(), i.e.{my_format_value}. grammar.Example
This code snippet:
@docstring_formatter(greeting = 'hello') def my_func(): "A simple greeter that says {greeting}" # Do your stuff
Is actually equivalent with :
def my_func(): "A simple greeter that says {greeting}" # Do your stuff my_func.__doc__ = my_func.__doc__.format(greeting = 'hello')
Note
Best practice is to name your arguments in compliance with
docstrings.docstringsin order to simply call@doctring_formatter(**docstrings.docstrings).
ajmc.commons.file_management module¶
ajmc.commons.geometry module¶
ajmc.commons.image module¶
ajmc.commons.miscellaneous module¶
ajmc.commons.unicode_utils module¶
This file contains unicode variables and functions which serve processing unicode characters.
- ajmc.commons.unicode_utils.chunk_string_by_charsets(string: str, fallback: str = 'latin')[source]¶
Chunk a string by character set, returning a list of tuples of the form (chunk, charset).
Example
>>> chunk_string_by_charsets('Hello Γειά σου Κόσμε World') [('Hello ', 'latin'), ('Γειά σου Κόσμε ', 'greek'), ('World', 'latin')]
- Parameters:
string (str) – The string to chunk.
- Returns:
A list of tuples of the form (chunk, charset).
- Return type:
list
- ajmc.commons.unicode_utils.count_chars_by_charset(string: str, charset: str) int[source]¶
Counts the number of chars by unicode characters set.
Example
count_chars_by_charset('γεια σας, world', 'greek')returns7as there are 7 greek chars instring.- Parameters:
string – a NFC-normalized string (default). For NFD-normalized strings, use
count_chars_by_charset_nfd.charset – should be
'greek','latin','numeral','punctuation'.
- Returns:
the number of charset-matching characters in
string.- Return type:
int
- ajmc.commons.unicode_utils.count_chars_by_charset_nfd(string: str, charset: str) int[source]¶
Counts the number of chars by unicode characters set.
Example
count_chars_by_charset('γεια σας, world', 'greek')returns7as there are 7 greek chars instring.- Parameters:
string – a NFD-normalized string. For NFC-normalized strings, use
count_chars_by_charset.charset – should be
'greek','latin','numeral','punctuation'.
- Returns:
the number of charset-matching characters in
string.- Return type:
int
- ajmc.commons.unicode_utils.get_all_chars_from_range(start: str, end: str) str[source]¶
Get all characters from a range of unicode characters.
- Parameters:
start (str) – The first character in the range.
end (str) – The last character in the range.
- Returns:
A string containing all characters in the range.
- Return type:
str
- ajmc.commons.unicode_utils.get_all_chars_from_ranges(ranges: List[Tuple[str, str]]) str[source]¶
Get all characters from a list of ranges of unicode characters.
- Parameters:
ranges (list) – A list of tuples of unicode characters ranges.
- Returns:
A string containing all characters in the ranges.
- Return type:
str
- ajmc.commons.unicode_utils.get_char_charset(char: str, fallback: str = 'fallback') str[source]¶
Returns the charset of a character, if any,
fallbackotherwise.
- ajmc.commons.unicode_utils.get_char_unicode_name(char: str) str[source]¶
Returns the unicode name of a character.
- ajmc.commons.unicode_utils.get_string_charset(string: str, fallback: str = 'latin') str[source]¶
Returns the charset of a string, if any,
fallbackotherwise.
- ajmc.commons.unicode_utils.harmonise_unicode(text: str, harmonise_functions: ~typing.Tuple[~typing.Callable[[str], str]] = (<function harmonise_punctuation>, <function harmonise_miscellaneous_symbols>, <function harmonise_ligatures>)) str[source]¶
Harmonise unicode characters.
Note
This function takes an
NFCstring and returns anNFCstring.- Parameters:
text (str) – The text to harmonise.
harmonise_functions (tuple) – A tuple of functions to apply to the text. Each function should take an NFC string as input and return an NFC string as output.
harmonise_space_chars (bool) – Whether to harmonise space characters.
- Returns:
The harmonised text (an
NFCstring).- Return type:
str
- ajmc.commons.unicode_utils.is_charset_string(string: str, charset: str, threshold: float = 0.5, strict: bool = True) bool[source]¶
Returns True if more than
thresholdof chars in string are incharset, False otherwise.- Parameters:
string – self explanatory
charset – should be
'greek','latin','numeral','punctuation'or a validre-pattern, for instancer'([ô-ÿ])'threshold – the threshold above which the function returns True
strict – if True, only chars in
charsetare considered, if False, chars incharset,'numeral'and'punctuation'are considered.
- ajmc.commons.unicode_utils.is_charset_string_nfd(string: str, charset: str, threshold: float = 0.5, strict: bool = True) bool[source]¶
Returns True if more than
thresholdof chars in string are incharset, False otherwise.- Parameters:
string – a NFD-normalized string. For NFC-normalized strings, use
is_charset_string.charset – should be
'greek','latin','numeral','punctuation'.threshold – the threshold above which the function returns True
strict – if True, only chars in
charsetare considered, if False, chars incharset,'numeral'and'punctuation'are considered.