ajmc.ocr.pytorch package¶
Submodules¶
ajmc.ocr.pytorch.config module¶
ajmc.ocr.pytorch.ctc_decoder_torch module¶
- class ajmc.ocr.pytorch.ctc_decoder_torch.BeamCTCDecoder(classes, lm_path=None, alpha=0, beta=0, cutoff_top_n=40, cutoff_prob=1.0, beam_width=100, num_processes=4, blank_index=0)[source]¶
Bases:
Decoder
- decode(probs, sizes=None)[source]¶
Decodes probability output using ctcdecode package.
- Parameters:
probs – Tensor of character probabilities, where probs[c,t] is the probability of character c at time t
sizes – Size of each sequence in the mini-batch
- Returns:
sequences of the model’s best guess for the transcription
- Return type:
string
- class ajmc.ocr.pytorch.ctc_decoder_torch.Decoder(classes, blank_index=0)[source]¶
Bases:
object
Basic decoder class from which all other decoders inherit. Implements several helper functions. Subclasses should implement the decode() method.
- Parameters:
classes (list) – mapping from integers to characters.
blank_index (int, optional) – index for the blank ‘_’ character. Defaults to 0.
- decode(probs, sizes=None)[source]¶
Given a matrix of character probabilities, returns the decoder’s best guess of the transcription
- Parameters:
probs – Tensor of character probabilities, where probs[c,t] is the probability of character c at time t
sizes (optional) – Size of each sequence in the mini-batch
- Returns:
sequence of the model’s best guess for the transcription
- Return type:
string
- class ajmc.ocr.pytorch.ctc_decoder_torch.GreedyDecoder(classes, blank_index=0)[source]¶
Bases:
Decoder
- decode(probs, sizes=None, remove_repetitions: bool = True) List[str] [source]¶
Returns the argmax decoding given the probability matrix. Removes repeated elements in the sequence, as well as blanks.
- Parameters:
probs – Tensor of character probabilities from the network. Expected shape of batch x seq_length x output_dim
sizes (optional) – Size of each sequence in the mini-batch
remove_repetitions – Whether to remove repeated characters in the probs
- Returns:
sequences of the model’s best guess for the transcription on inputs offsets: time step per character predicted
- Return type:
strings