Institute for Computational Linguistics A.Zampolli (ILC-CNR)

The Istituto di Linguistica Computazionale "Antonio Zampolli" (ILC-CNR) - working in the field of Computational Linguistics since 1967, when a Division of Computational Linguistics was formed at the Centro Nazionale Universitario di Calcolo Elettronico (CNUCE) - was founded as an independent Institute of the CNR in 1978.

ILC-CNR has been one of the major promoters of the notion of language resources as the central component of the "linguistic infrastructure" (aware also of its cultural, economical and political implications), has coordinated the major initiatives relating to language resources and standardisation and has often been the promoter of new "paradigms" in the field.

ILC-CNR has designed and built several types of corpora and lexicons and the respective ontologies, has developed a complete chain of tools for a robust processing of the Italian language, for the acquisition of information from corpora and for word-sense disambiguation and has developed technologies for several application domains (question answering, information retrieval, text mining, monolingual and multilingual terminology extraction, ontology acquisition and structuring, summarisation, filtering of Web documents, preservation of the cultural heritage through digital image processing and digital libraries techniques etc.).

ILC-CNR has participated in 52 EC projects, coordinating 15 of them, and has participated in 4 national projects as coordinator.

ILC-CNR has coordinated numerous international, European and national strategic initiatives and projects, such as:

-ENABLER, a EU network of national projects on Human Language Technology aiming at "enabling" the realisation of a framework of co-operation; the standardisation initiatives EAGLES (funded by the European Commission) and ISLE (co-funded by the European Commission and the National Science Foundation); the major EU projects on Language Resources and the major EU infrastructural projects (PAROLE, SIMPLE, SPARKLE, RELATOR, NERC, ACQUILEX, ACQUILEX-II etc.); EU Projects on Human Language Technology (POESIA, MUSI etc.); the WRITE (Written Resources Infrastructure, Technology and Evaluation) Committee; LREC (Language Resources and Evaluation Conference).

ILC-CNR is represented in many international and national committees, boards, associations (ELRA, ELSNET, ICCL, WRITE, ISO, SIGLEX, SENSEVAL etc.) and has many collaborations: international (89 in 29 countries), national (30), with industry (23 in 11 countries), ministries, regions, public administrations etc..

ILC-CNR works through a structured staff (22 units of personnel with open-ended contracts and 7 units of personnel with fixed term contracts), an unstructured staff (about 20 units of personnel among junior researchers, grant and scholarship holders, PhD students etc.) and a substantial self-financing.

The activities of ILC-CNR are articolated in 7 main research lines.

Disegn of standards and building of computational language resources
More and more products (for e-Commerce, e-Government, Web, office, data mining, digital libraries etc.) include components based on language technologies and resources the production of which requires a co-operative effort of competences, funds and subjects.
Objective
To create the linguistic-computational infrastructure of resources and tools indispensable for automatizing the linguistic operations necessary for producing, representing, recovering, elaborating, acquiring, translating, interpreting and sharing knowledge.
ILC-CNR promotes a new paradigm of Open Linguistic Infrastructure at an international level, in order to realize the Semantic Web vision, with a multilingual and multicultural access, by emphasizing the inclusion of Italian in a multilingual network.
Within the sector of language resources, ILC-CNR has a distinguished international and national leadership, which is performed also through strategic activities leading to the formulation of new scientific objectives and of international and national research projects, to the constitution of networks, to the organization of surveys and conferences and to the collaboration with the most important (public and private) groups all over the continents.
Effects
Technologic effects (the development of systems and products as well as their evaluation are made possible), cultural effects, economic effects (there is a growing market), occupational effects and image effects (for instance, the Digital Olimpic Games in Peking).

Models and methods for the processing of natural languages and monolingual and multilingual prototypes application-oriented
Reading and understanding a newspaper title, using a sentence to give somebody an order or a piece of information or to express a desire are activities that require the knowledge of the "rules" of a language.
These rules form the set of "directions for use" of the linguistic behaviour.
Although the extremely natural way in which a child learns a language, up to now either a general model of the linguistic behaviour or a computer that can simulate this behaviour do not exist.
Objective
To create a cycle of theoretic analysis, projecting, experimentation, design of prototypes and methodology for the realization of advanced prototypes and tools adequate to the needs of innovative applications based on the processing of natural languages.
The topics dealt with are: i) design and development of models and methods of sentences analysis and generation in a natural language; ii) design and development of new methods of linguistic and extra-linguistic knowledge acquisition from texts; iii) experimentation of various techniques for the extraction of information (“text mining”, "text categorization” etc.), design of a Question-Answering system for searching information in the Web and design of intelligent monolingual and multilingual as well as multimodal and multimedial man-machine interfaces; iv) implementation of a multilingual system in the Web, of an intelligent interactive training and of multimedial techniques for didactics and handicaps.
Effects
Technologic, economic, social and cultural effects (transformed in commercial programs, some of these kits are already in the computers used at home or at office; if integrated in a more general model, they will be able to show us the secrets of our behaviour as "speaking animals").

Computational methods and tools for the humanistic research, with a particular care of linguistic and literary disciplines and of lexicography
ILC-CNR has had a fundamental role in the origin of the sector of the automatic processing of texts, representing a model for enterprises and research institutions in Italy, in Europe and in the world.
The development of the information technologies and of the Internet has made it possible to enlarge the applicatory survey and the interaction between different disciplinary sectors and the basin of potential users.
In this context, training young researchers becomes important in a changed panorama where technologies represent an indispensable instrument for the whole sector of Human Sciences.
Obiective
To develop methodologies, tools and resources to be made available for both the whole scientific community (for more effective and in-depth researches) and the industrial world.
By now, Computational Linguistics is carrying out a fundamental role in the sectors of e-Publishing, e-Learning, e-Governement and of the industry of languages.
The study of texts has a huge importance in multilingual environments for the protection of the specificness and of the cultural patrimony of every single language in a globalized context.
Effects
Technologic effects (mainly for Computer Science for the Humanities), commercial effects (for instance, products of publishing houses) and cultural effects (fruition of the cultural heritage).

Library material and Computational Philology
This research line operates in the sector of the study, of the development and of the realization of systems for the processing of library material in digital format (texts and images) for philological and linguistic analysis.
It is a model for similar research and university institutions in other European and non-European countries with which collaborative relations have been established.
The following activities relate to this research line: i) development of a system of Computational Philology (in a stand alone and Web-based version) for the management of critical computer equipment both in a papyrological environment and in a philological-medieval one as well as integration with linguistic parsers (morphological systems of classical languages); ii) creation of an OCR module (characters identification) for ancient printed texts with such characteristics as to be able to contribute to the integration of fragmentary words; iii) analysis of archives of ostraka images drawn up in demotic characters by means of systems of Artificial Intelligence (neural networks); iv) carrying on of the activity relating to the BIBLOS (the virtual library of the classical branches of CNR) initiative.
Obiective
To study innovative methodologies for the fruition and the exploitation of the Italian library material, among which the philological and linguistic study of ancient documents.
Effects
Technologic effects (development of new products for libraries and archives) and cultural effects (aided circulation of library material and of the data transmitted by it).

Linguistic Miner: a virtual observatory of contemporary Italian
The daily flow of Italian texs produced in the Internet is an unlimited source of primary linguistic information, whose size itself, however, makes traditional methodologies of analysis and classification little efficacious.
Obiective
To dynamically sample the data available in the Internet, to organize them in a homogeneous and comparable way, to add linguistic glosses and glosses pertaining to content to them as well as to analyse them through the most modern technics of both quantitative and qualitative automatic analysis.
To do that could satisfy three complementary needs: i) to analyse contemporary Italian in order to verify its condition in real time; ii) to make the most of the information potential of digital texts, by relating them with other comparable texts drawn up in other languages; iii) to improve our understanding of the functioning of languages through more and more trustworthy linguistic technics.
For instance, it could be possible to produce ever-up-to-date inventories of Italian terminology, highlighting their contact points with other languages, or to localize the emerging use of new constructions, connoting them from the pont of view of their domain or from the one of the communication means used.

Architecture of language technologies for the promotion of Italian in the knowledge society
Natural Language Processing, formerly a highly specialized research sector, has developed into a provider of technology of fundamental importance for the information society.
A language is not only a vehicle and a key of access to information, but even the basis of the cultural heritage of a nation.
Obiective
To create the presuppositions – in terms of new competences needed to develop an activity - for the design, the promotion and the setting up of a basic linguistic architecture - on the level of tools, technics and components for applications and products - that is able to process the written language, the spoken language and multimodal documents: a platform of linguistic technics allowing everybody to participate in the information society (considered in all its aspects, from the commercial ones to the cultural ones) in a natural way and by using his/her own language.
Such a platform is necessary for an actual integration of the different activities of the sector of Natural Language Processing in Italy (inside and outside CNR) and for keeping Italian among technologically advanced languages, as a support for applications for the management of digital contents.

Natural Language Processing and natural access to knowledge
This research line is aimed at acting as a bridge between the phases of design and development of models, methods and language resources for Natural Language Processing and the technological and applicative requirements connected to the themes of extraction and distribution of knowledge and pervasive human-machine interfaces.
Obiective
To promote: i) the development of models adapting and auto-organizing the language considered as an open communicative system; ii) the development of information applications for the pursuit, the acquisition and an intelligent management of knowledge for the most diversified ambits and services (e-Government, e-Health, e-Learning, defence and security etc.); iii) a sophisticated level of interoperability among information technics in integrated - even Web-based - environments that requires an explicit representation of digital contents; iv) the development of user-interfaces of new production, based on the natural language, that promote a more and more immediate and flexible interaction among human users, interactive systems and services with a high informative content.

The main tasks of ILC-CNR are:

to promote basic research for the progress of knowledge in the sector of Natural Language Processing, on topics in which the analysis of the state of the art suggests the necessity and the possibility of significant innovations, fostering the symbiosis among the different disciplinary competences involved;
to study innovative methods and tools and develop technologies and basic linguistic resources which can be used and integrated in services of various types and in systems application-oriented in order to promote the development of the Italian industry of the sector, in particular reducing the “start-up” costs of the development activities;
to study and develop methods and models for multimodality, through the integration of language technologies with image and speech processing;
to study and realize innovative prototypes and systems for the usage of language technologies in support of researches and applications in the field of humanistic disciplines, of the access to the cultural heritage and of the promotion of the Italian language;
to spur a constant relation with industries and effect the transfer of technologies towards industry;
to study and adopt modalities for monitoring projecting activities and activities of evaluation and validation of results by means of methodologies in the state of the art and of international, even competitive, datum systems;
to promote and participate in the activities and programmes of the European Community and, generally speaking, of international bodies which imply the usage of language technologies;
to secure the representativity of our country in the major international scientific and professional venues;
to secure an appropriate interdisciplinary education in the research and technological development to junior researchers, by means of PhDs (even European), grants and cheques (a great obstacle, for a development of Natural Language Processing appropriate to the strategic needs of the country, has been repeatedly identified by the Italian industries in the difficulty of finding personnel with an education and disciplinary competences specific for Natural Language Processing);
to organize conferences, workshops, international and national meetings on strategic topics in the sector of Computational Linguistics in order to foster the diffusion of scientific knowledge and the creation of synergies among the various communities active in the sector.