ajmc.commons package¶
ajmc.commons
contains all the utilities (helpers, functions, objects, hard-coded variables) which are common (i.e. which must be accessible) to
task-specific repositories. These include notably:
file_management
utilities, which allow to handle files systematically in the ajmc’s data organisation (seenotebooks/data_organisation.ipnb
for more) and to retrieve information from the various project spreadsheets.arithmetic.py
contains helper maths function, mainly to deal with intervals, which are a common object in our Canonical format (of which more below).docstrings.py
centralizes common function and class docstrings in a single place and provides a decorator to retrieve them easily.geometry.py
provides helper functions and an object,Shape
, to deal with geometrical objects such as contours and bounding boxes.image.py
provides helper functions and an object,AjmcImage
, to deal with images.miscellaneous.py
receives everything which doesn’t fit anywhere else. It notably contains generic functions and decorator, lazy objects for efficiency etc…variables
contains all the hard-coded variables such as PATHS, COLORS, SPREADSHEET_IDS, CHARSETS and many more.
Submodules¶
ajmc.commons.arithmetic module¶
This module contains basic arithmetic functions.
- ajmc.commons.arithmetic.are_intervals_within_intervals(contained: List[Tuple[int, int]], container: List[Tuple[int, int]]) bool [source]¶
Applies
is_interval_within_interval
on a list of intervals, making sure that all the contained intervals are contained in one of the container intervals.
- ajmc.commons.arithmetic.compute_interval_overlap(i1: Tuple[int, int], i2: Tuple[int, int])[source]¶
Computes the overlap between two intervals defined by their start and their stop included.
- Parameters:
i1 – A
Tuple[int, int]
defining the included boundaries of an interval, with start <= stop.i2 – A
Tuple[int, int]
defining the included boundaries of an interval, with start <= stop.
- Returns:
The length of the overlap.
- Return type:
int
- ajmc.commons.arithmetic.is_interval_within_interval(contained: Tuple[int, int], container: Tuple[int, int]) bool [source]¶
Checks if the
contained
, interval is included in thecontainer
interval.- Parameters:
container – A
Tuple[int, int]
defining the included boundaries of an interval, with start <= stop.contained – A
Tuple[int, int]
defining the included boundaries of an interval, with start <= stop.
ajmc.commons.docstrings module¶
This file contains generic docstring chunks to be formatted using``docstring_formatter``.
- ajmc.commons.docstrings.docstring_formatter(**kwargs)[source]¶
Decorator with arguments used to format the docstring of a functions.
docstring_formatter
is a decorator with arguments, which means that it takes any set ofkwargs
as argument and returns a decorator. It should therefore always be called with parentheses (unlike traditional decorators - see below). It follows the grammar ofstr.format()
, i.e.{my_format_value}
. grammar.Example
This code snippet:
@docstring_formatter(greeting = 'hello') def my_func(): "A simple greeter that says {greeting}" # Do your stuff
Is actually equivalent with :
def my_func(): "A simple greeter that says {greeting}" # Do your stuff my_func.__doc__ = my_func.__doc__.format(greeting = 'hello')
Note
Best practice is to name your arguments in compliance with
docstrings.docstrings
in order to simply call@doctring_formatter(**docstrings.docstrings)
.
ajmc.commons.file_management module¶
ajmc.commons.geometry module¶
ajmc.commons.image module¶
ajmc.commons.miscellaneous module¶
ajmc.commons.unicode_utils module¶
This file contains unicode variables and functions which serve processing unicode characters.
- ajmc.commons.unicode_utils.chunk_string_by_charsets(string: str, fallback: str = 'latin')[source]¶
Chunk a string by character set, returning a list of tuples of the form (chunk, charset).
Example
>>> chunk_string_by_charsets('Hello Γειά σου Κόσμε World') [('Hello ', 'latin'), ('Γειά σου Κόσμε ', 'greek'), ('World', 'latin')]
- Parameters:
string (str) – The string to chunk.
- Returns:
A list of tuples of the form (chunk, charset).
- Return type:
list
- ajmc.commons.unicode_utils.count_chars_by_charset(string: str, charset: str) int [source]¶
Counts the number of chars by unicode characters set.
Example
count_chars_by_charset('γεια σας, world', 'greek')
returns7
as there are 7 greek chars instring
.- Parameters:
string – a NFC-normalized string (default). For NFD-normalized strings, use
count_chars_by_charset_nfd
.charset – should be
'greek'
,'latin'
,'numeral'
,'punctuation'
.
- Returns:
the number of charset-matching characters in
string
.- Return type:
int
- ajmc.commons.unicode_utils.count_chars_by_charset_nfd(string: str, charset: str) int [source]¶
Counts the number of chars by unicode characters set.
Example
count_chars_by_charset('γεια σας, world', 'greek')
returns7
as there are 7 greek chars instring
.- Parameters:
string – a NFD-normalized string. For NFC-normalized strings, use
count_chars_by_charset
.charset – should be
'greek'
,'latin'
,'numeral'
,'punctuation'
.
- Returns:
the number of charset-matching characters in
string
.- Return type:
int
- ajmc.commons.unicode_utils.get_all_chars_from_range(start: str, end: str) str [source]¶
Get all characters from a range of unicode characters.
- Parameters:
start (str) – The first character in the range.
end (str) – The last character in the range.
- Returns:
A string containing all characters in the range.
- Return type:
str
- ajmc.commons.unicode_utils.get_all_chars_from_ranges(ranges: List[Tuple[str, str]]) str [source]¶
Get all characters from a list of ranges of unicode characters.
- Parameters:
ranges (list) – A list of tuples of unicode characters ranges.
- Returns:
A string containing all characters in the ranges.
- Return type:
str
- ajmc.commons.unicode_utils.get_char_charset(char: str, fallback: str = 'fallback') str [source]¶
Returns the charset of a character, if any,
fallback
otherwise.
- ajmc.commons.unicode_utils.get_char_unicode_name(char: str) str [source]¶
Returns the unicode name of a character.
- ajmc.commons.unicode_utils.get_string_charset(string: str, fallback: str = 'latin') str [source]¶
Returns the charset of a string, if any,
fallback
otherwise.
- ajmc.commons.unicode_utils.harmonise_unicode(text: str, harmonise_functions: ~typing.Tuple[~typing.Callable[[str], str]] = (<function harmonise_punctuation>, <function harmonise_miscellaneous_symbols>, <function harmonise_ligatures>)) str [source]¶
Harmonise unicode characters.
Note
This function takes an
NFC
string and returns anNFC
string.- Parameters:
text (str) – The text to harmonise.
harmonise_functions (tuple) – A tuple of functions to apply to the text. Each function should take an NFC string as input and return an NFC string as output.
harmonise_space_chars (bool) – Whether to harmonise space characters.
- Returns:
The harmonised text (an
NFC
string).- Return type:
str
- ajmc.commons.unicode_utils.is_charset_string(string: str, charset: str, threshold: float = 0.5, strict: bool = True) bool [source]¶
Returns True if more than
threshold
of chars in string are incharset
, False otherwise.- Parameters:
string – self explanatory
charset – should be
'greek'
,'latin'
,'numeral'
,'punctuation'
or a validre
-pattern, for instancer'([ô-ÿ])'
threshold – the threshold above which the function returns True
strict – if True, only chars in
charset
are considered, if False, chars incharset
,'numeral'
and'punctuation'
are considered.
- ajmc.commons.unicode_utils.is_charset_string_nfd(string: str, charset: str, threshold: float = 0.5, strict: bool = True) bool [source]¶
Returns True if more than
threshold
of chars in string are incharset
, False otherwise.- Parameters:
string – a NFD-normalized string. For NFC-normalized strings, use
is_charset_string
.charset – should be
'greek'
,'latin'
,'numeral'
,'punctuation'
.threshold – the threshold above which the function returns True
strict – if True, only chars in
charset
are considered, if False, chars incharset
,'numeral'
and'punctuation'
are considered.