Formats¶
Corpus Gesproken Nederlands¶
- exception pynlpl.formats.cgn.InvalidFeatureException¶
- exception pynlpl.formats.cgn.InvalidTagException¶
- pynlpl.formats.cgn.parse_cgn_postag(rawtag, raisefeatureexceptions=False)¶
FoLiA¶
See folia : folia.html
GIZA++¶
- class pynlpl.formats.giza.GizaModel(filename, encoding='utf-8')¶
- class pynlpl.formats.giza.GizaSentenceAlignment(sourceline, targetline, index)¶
- getalignedtarget(index)¶
Returns target range only if source index aligns to a single consecutive range of target tokens.
- intersect(other)¶
- class pynlpl.formats.giza.IntersectionAlignment(source2target, target2source, encoding=False)¶
- reset()¶
- class pynlpl.formats.giza.MultiWordAlignment(filename, encoding=False)¶
Source to Target alignment: reads source-target.A3.final files, in which each source word may be aligned to multiple target words (adapted from code by Sander Canisius)
- reset()¶
- targetword(index, targetwords, alignment)¶
Return the aligned targeword for a specified index in the source words. Multiple words are concatenated together with a space in between
- targetwords(index, targetwords, alignment)¶
Return the aligned targetwords for a specified index in the source words
- class pynlpl.formats.giza.WordAlignment(filename, encoding=False)¶
Target to Source alignment: reads target-source.A3.final files, in which each source word is aligned to one target word
- reset()¶
- targetword(index, targetwords, alignment)¶
Return the aligned targetword for a specified index in the source words
- pynlpl.formats.giza.parseAlignment(tokens)¶
Moses¶
- class pynlpl.formats.moses.PTFactory(phrasetable)¶
- protocol¶
alias of
pynlpl.formats.moses.PTProtocol
- class pynlpl.formats.moses.PTProtocol¶
- lineReceived(phrase)¶
Override this for when each line is received.
@param line: The line which was received with the delimiter removed. @type line: C{bytes}
- class pynlpl.formats.moses.PhraseTable(filename, quiet=False, reverse=False, delimiter='|||', score_column=3, max_sourcen=0, sourceencoder=None, targetencoder=None, scorefilter=None)¶
- class pynlpl.formats.moses.PhraseTableClient(host='localhost', port=65432)¶
- class pynlpl.formats.moses.PhraseTableServer(phrasetable, port=65432)¶
SoNaR¶
- class pynlpl.formats.sonar.Corpus(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
- class pynlpl.formats.sonar.CorpusDocument(filename, encoding='iso-8859-15')¶
This class represent one document/text of the Corpus (read-only)
- paragraphs(with_id=False)¶
Extracts paragraphs, returns list of plain-text(!) paragraphs
- sentences()¶
Iterate over all sentences (sentence_id, sentence) in the document, sentence is a list of 4-tuples (word,id,pos,lemma)
- words()¶
- class pynlpl.formats.sonar.CorpusDocumentX(filename, tree=None, index=True)¶
This class represent one document/text of the Corpus, loaded into memory at once and retaining the full structure
- paragraphs(node=None)¶
iterate over paragraphs
- save(filename=None, encoding='iso-8859-15')¶
- sentences(node=None)¶
iterate over sentences
- validate(formats_dir='../formats/')¶
checks if the document is valid
- words(node=None)¶
iterate over words
- xpath(expression)¶
Executes an xpath expression using the correct namespaces
- class pynlpl.formats.sonar.CorpusFiles(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
- class pynlpl.formats.sonar.CorpusX(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
- pynlpl.formats.sonar.ns(namespace)¶
Resolves the namespace identifier to a full URL