Modificar diccionario es-gl

Actualizar dicionario es-gl

Engadir unha entrada

Para engadir unha entrada:
Engadila no dicionario de Galego
En apertium-es-gl.gl.dix.xml. Na sección principal:

<section id="main" type="standard"></section>

Cara o final
Engadir unha entrada

<e lm="píntega" a="pablof"><i>píntega</i><par n="aba__n"></par></e>

Etiquetas opcionais

  • "lm" descrición
  • "a" autor
  • "par": indica regras para o uso, neste caso a aba__n que está o principio do documento

Engadila no dicionario de castelán
En apertium-es-gl.es.dix.xml
Na scción principal, cara o final

<e lm="salamandra" a="pablof"><i>salamandra</i><par n="abundancia__n"></par></e>

Igual que no caso anterior as etiquetas opcionais
As regras funcionan igual, aínda que cada dicionario emprega os seus propios nomes:

Engadila ós pares de tradución
En apertium-es-gl.es-gl.dix.xml

<e><p><l>salamandra<s n="n"></s></l><r>píntega<s n="n"></s></r></p></e>

Compilar e instalar

#!/bin/bash
export APERTIUM_CFLAGS=-I/usr/local/apertium/include
export APERTIUM_LIBS=-L/usr/local/apertium/lib/
make uninstall
make clean     
make
make install
     


Modificar dicionario a partir dunha lista de palabras

Os dicionarios se poden modificar facilmente a partir dunha lista de palabras, neste exemplo empregamos o módulo apertium-es-gl (versión 1.0.8), un listado de palabras nun arquivo CSV (exportado dende LibreOffice neste format) e un script el Perl que convirte o arquivo CSV ós xml que espera Apertium.

Para compilalo:

#!/bin/bash
./convierteDicionario.pl lista-termos.csv
make uninstall
make clean
make
make install

O script en Perl convierteDicionario.pl é o seguinte (é bastante mellorable):

#!/usr/bin/perl

$| = 1 ;
use XML::DOM ;
use Text::CSV_XS ;
use File::Copy "cp" ;
use Data::Dumper ;

my $PREFIX_DICT = "apertium-es-gl" ;
my $parserES          = new XML::DOM::Parser ;
my $parserGL         = new XML::DOM::Parser ;
my $parserES_GL = new XML::DOM::Parser ;
my $dict_ES            = $PREFIX_DICT . ".es.dix.xml" ;
my $dict_GL            = $PREFIX_DICT . ".gl.dix.xml" ;
my $dict_ES_GL    = $PREFIX_DICT . ".es-gl.dix.xml" ;

if ( ! -e "$dict_ES.orig" ) {
  copy ( $dict_ES, "$dict_ES.orig" ) ;
}
if ( ! -e "$dict_GL.orig" ) {
  copy ( $dict_GL, "$dict_GL.orig" ) ;
}
if ( ! -e "$dict_ES_GL.orig" ) {
  copy ( $dict_ES_GL, "$dict_GL.orig" ) ;
}
  
print STDOUT "Interpretando arquivo ES...\n" ;
$docES = $parserES->parsefile ( 'apertium-es-gl.es.dix.xml.orig', ProtocolEncoding => 'UTF-8' ) ;
print STDOUT "Interpretando arquivo GL...\n" ;
$docGL = $parserGL->parsefile ( 'apertium-es-gl.gl.dix.xml.orig' ) ;
print STDOUT "Interpretando arquivo ES_GL...\n" ;
$docES_GL = $parserES_GL->parsefile ( 'apertium-es-gl.es-gl.dix.xml.orig' ) ;

print STDOUT "Abrindo modificacions CSV...\n" ;
open ( $fd, "<:encoding my="" text::csv_xs-="">new () ;
$csv->column_names ($csv->getline ( $fd )) ;

while ( $fila = $csv->getline_hr ( $fd )) {
  $r_es    = $fila->{r_es} ;
  $r_gl    = $fila->{r_gl} ;
  $es      = $fila->{es} ;
  $gl      = $fila->{gl} ;
  $tipo    = $fila->{tipo} ;
  $eti     = ( $fila->{etiqueta} == "" ) ? $es : $fila->{etiqueta} ;
  $autor   = ( $fila->{autor} == "" ) ? "osl" : $fila->{autor} ;
  $sentido = ( $fila->{sentido} == "" ) ? "" : "r=\"" . $fila->{sentido} . "\"" ;

  $sec = $docES->getElementsByTagName ( "section")->[0] ;
  $ent = $docES->createElement ( "e" ) ;
  $ent->setAttribute ( "lm", $eti ) ;
  $ent->setAttribute ( "a", $autor ) ;
  $i = $docES->createElement ( "i" ) ;
  $p = $docES->createElement  ( "par" ) ;
  $t = $docES->createTextNode ( $es  ) ; 
  $i->appendChild ( $t ) ;
  $p->setAttribute ( "n", $r_es ) ;
  $ent->appendChild ( $i ) ;
  $ent->appendChild ( $p ) ;
  $sec->appendChild ( $ent ) ;

  $secG = $docGL->getElementsByTagName ( "section" )->[0] ;
  $entG = $docGL->createElement ( "e" ) ;
  $entG->setAttribute ( "lm", $eti ) ;
  $entG->setAttribute ( "a", $autor ) ;
  $iG = $docGL->createElement ( "i" ) ;
  $pG = $docGL->createElement  ( "par" ) ;
  $tG = $docGL->createTextNode ( $gl  ) ;
  $iG->appendChild ( $tG ) ;
  $pG->setAttribute ( "n", $r_gl ) ;
  $entG->appendChild ( $iG ) ;
  $entG->appendChild ( $pG ) ;
  $secG->appendChild ( $entG ) ;

  $secT = $docES_GL->getElementsByTagName ( "section" )->[0] ;
  $entT = $docES_GL->createElement ( "e" ) ;
  $pT = $docES_GL->createElement  ( "p" ) ;
  $lT = $docES_GL->createElement ( "l" ) ;
  $tES = $docES_GL->createTextNode ( $es  ) ;
  $sES = $docES_GL->createElement("s");
  $sES->setAttribute("n",$tipo);
  $lT->appendChild($tES);
  $lT->appendChild($sES);
  $rT = $docES_GL->createElement ( "r" ) ;
  $tGL = $docES_GL->createTextNode ( $gl  ) ;
  $sGL = $docES_GL->createElement("s");
  $sGL->setAttribute("n",$tipo);
  $rT->appendChild($tGL);
  $rT->appendChild($sGL);
  $pT->appendChild($lT);
  $pT->appendChild($rT);
  $entT->appendChild ( $pT ) ;
  $secT->appendChild ( $entT ) ;
}

close ( $fd ) ;

# Non empregar ...->printToFile, porque altera a codificacion (erro en XML::DOM sen correxir dende hai anos)
open ( $fh, ">:encoding(UTF-8)", $dict_ES);
$docES->print($fh);
$docES->dispose;
close ( $fh ) ;
open ( $fh, ">:encoding(UTF-8)", $dict_GL);
$docGL->print($fh);
$docGL->dispose;
close ( $fh ) ;
open ( $fh, ">:encoding(UTF-8)", $dict_ES_GL);
$docES_GL->print($fh);
$docES_GL->dispose;
close ( $fh ) ;