<div align="left"><font size="4">From: Helene Mazo < mazo@elda.org > <br />Subject: NEMLAR Arabic Resouces in ELRA Catalogue - 08/06 </font></div><div align="left"><font size="4">ELRA - Language Resources Catalogue - Update </font></div><div align="left"><font size="4">We are happy to announce the following Arabic resources, produced within the NEMLAR project ( <a href="http://www.nemlar.org/">www.nemlar.<wbr />org</a> ). All 3 resources are owned and copyrighted by the Nemlar Consortium. They are available in our catalogue. To view all the Language Resources available, you can visit our on-line catalogue: </font><a href="javascript
l('http://www.elra.info/');"><font size="4">http://www.elra.<wbr />info</font></a><font size="4"> or </font><a href="javascript
l('http://www.elda.org/');"><font size="4">http://www.elda.<wbr />org</font></a><font size="4"> </font></div><p align="left"></p><div align="left"><font size="4">ELRA-W0042 NEMLAR Written Corpus </font></div><div align="left"><font size="4">This corpus consists of about 500,000 words of Arabic text from 13 <br />different categories. The text is provided in 4 different versions: <br />- Raw text <br />- Fully vowelized text <br />- Text with Arabic lexical analysis <br />- Text with Arabic POS-tags </font></div><div align="left"><font size="4">The database is distributed on 1 ISO 9660 CD-ROM volume. </font></div><div align="left"><font size="4">For more information, see </font><a href="javascript
l('http://catalog.elda.org:8080/product_info.php');"><font size="4"><a href="http://catalog.elda.org:8080/product_info.php?%20%20products_id=873&osCsid= 2eb47737dba8e4365c4972784a235948 ">http://catalog.<wbr />elda.org:<wbr />8080/product_<wbr />info.php<font size="4">?%20%20products_id=<wbr />873&osCsid=2eb47737dba8<wbr />e4365c4972784a23<wbr />5948 </font></a></font></a></a /></div><div align="left"><font size="4">ELRA-S0219 NEMLAR Broadcast News Speech Corpus </font></div><div align="left"><font size="4">The data consists of about 40 hours and is provided by ELDA of Arabic <br />data <br />(mainly Standard Arabic from a number of broadcast companies); <br />Transcriptions follow the Transcriber conventions as used by ELDA and <br />focus <br />on the orthographic, named entities, speaker/turn segmentation <br />levels. No <br />phonetic transcription/<wbr />segmentation is planned. </font></div><div align="left"><font size="4">The database is distributed in 1 ISO 9660 DVD-ROM volume. </font></div><div align="left"><font size="4">For more information, see </font><a href="javascript
l('http://catalog.elda.org:8080/product_info.php');"><font size="4"><a href="http://catalog.elda.org:8080/product_info.php?%20%20products_id=874&osCsid= 2eb47737dba8e4365c4972784a235948 ">http://catalog.<wbr />elda.org:<wbr />8080/product_<wbr />info.php<font size="4">?%20%20products_id=<wbr />874&osCsid=2eb47737dba8<wbr />e4365c4972784a23<wbr />5948 </font></a></font></a></a /></div><div align="left"><font size="4">ELRA-S0220 NEMLAR Speech Synthesis Corpus </font></div><div align="left"><font size="4">The NEMLAR Speech Synthesis Corpus contains the recordings of 2 native <br />Egyptian speakers (male and female, 35 years old) recorded in a <br />studio over <br />2 channel (voice + laryngograph)<wbr />. The data collection and transcription <br />were performed by RDI (Egypt). </font></div><div align="left"><font size="4">Speech samples are stored in 96 kHz, 24 bit with the least <br />significant byte <br />first ("lohi" or Intel format) as (signed) integers. </font></div><div align="left"><font size="4">The speaker read 2,032 prompted sentences covering approx. 42,000 <br />words in <br />three categories: transcribed speech (20%), written text (50%), and <br />constructed phrases (30%). </font></div><div align="left"><font size="4">The database is provided with orthographic, prosodic and phonetic <br />transcriptions in SAMPA. All transcriptions were segmented at the <br />utterance (sentence/command word) level, annotated at the word level and <br />checked manually. A pronunciation lexicon including 3,589 headwords with <br />phonetics in SAMPA is also available. </font></div><div align="left"><font size="4">The database is distributed on 3 ISO 9660 DVD-ROM volumes. </font></div><div align="left"><font size="4">For more information, see </font><a href="javascript
l('http://catalog.elda.org:8080/product_info.php');"><font size="4"><a href="http://catalog.elda.org:8080/product_info.php?%20%20products_id=875&osCsid= 2eb47737dba8e4365c4972784a235948 ">http://catalog.<wbr />elda.org:<wbr />8080/product_<wbr />info.php<font size="4">?%20%20products_id=<wbr />875&osCsid=2eb47737dba8<wbr />e4365c4972784a23<wbr />5948 </font></a></font></a></a /></div><div align="left"><font size="4">For more information on the catalogue, please contact Valérie Mapelli </font><a href="http://by113fd.bay113.hotmail.msn.com/cgi-bin/compose?mailto=1&msg=47BA20EB-5A17-4AB3-8EEA-C8B134375C45&start=0&len=21714&src=&am p;type=x&to=mapelli@elda.org&cc=&bcc=& amp;subject=&body=&curmbox=00000000-0000-0000-0000-000000000001&a=81f1df3a507c7c2c3855f1755269c91 e259c6b8eb02dcdc79b5e9f4a314f7a8c"><font size="4">mailto:mapelli@<wbr />elda.org</font></a><font size="4"> <br />Linguistic Field(s): Computational Linguistics <br /> Lexicography <br /> Phonetics <br /> Text/Corpus Linguistics </font></div>





تعليق