Labour History Index Project
In Brief | Project Approach | Project Plan (outline)
In Brief
The Labour History Index project aims to build an integrated, web-based search platform for collection related data from IALHI institutions. The goal is to make existing data more accessible to users, not to create or manage data.
The first stage of the project is a pilot undertaken together with the following IALHI members:
- Arbejderbevægelsens Bibliotek og Arkiv, Kopenhagen,
- AMSAB Instituut voor Sociale Geschiedenis, Ghent,
- Arbetarrörelsens Arkiv och Bibliotek, Stockholm,
- Bibliothèque de documentation internationale contemporaine, Nanterre
- Friedrich Ebert Stiftung, Bonn,
- International Institute of Social History, Amsterdam,
- John Rylands University Library, Manchester,
- National Library of Scotland, Edinburgh.
The collections these institutions have are from social movements that were generally international. Collections that have similar content are scattered among different institutions. Accessing the collections can be by very different interfaces, with very different functionalities. This means these collections are not sufficiently known and used. The development of a joint search engine with a user-friendly interface may provide a web resource that is relevant for social historians, other historians, and people with an interest in (social) history worldwide.
IALHI institutions provide copies of select data sources (such as descriptions and inventories of archives). These data sources are made available in another format on a central server. On this server search functionalities will be developed that allow users to search all data sources in a 'Google-like' interface. Search results will refer users directly or indirectly to the data in their original location.
With each participating institution a number of relevant data sources will be determined, and ways developed to deliver and update these data to the central server. After uploading the first set of data, the web interface will be developed in Amsterdam and tested by the 'pilot group'. The more definitive search platform will be made accessible to the public and presented to IALHI, with the hope that other institutions will participate. For the second stage additional funding will be required. The costs of the pilot project, however, will be covered by the IISG.
Project Approach
In 2002 the IISG started developing a data platform, based on XML, to create a single search engine for a number of different sources: the library catalog, inventories of archives, image collections, databases on trade unions, strikes and historical occupations, a bibliography on women's history, a biographical dictionary, and the institute's website. These sources are copied and made XML files, and then delivered to an XML document server. Another set of scripts makes it possible to search these documentsby means of a web interface. Users can
- search all sources at once, from a very simple search screen,
- choose a selection of sources to search,
- search a separate source with some advanced search and sort options,
- mark search results for email,
- contact the IISG for further information.
The 'raw' data are not edited or enriched in any way on the XML server; any editing or enriching should take place only in the original data. Language and spelling differences are retained from the originals, nor is there a meta-thesaurus. Even if it were theoretically possible, it would have been too major and difficult an undertaking. The power and speed of the search engine and the usability of the interface should compensate for this. Almost every study of 'web search engine user behavior' shows that people prefer simple interfaces where they can type a few words, get results quickly, and obtain references to related information - as in Google. The IISG took this as a model for creating this project.
A similar approach could be used to build a platform for data from IALHI institutions. Just as in the IISG project:
- data are very diverse in content, structure, and format,
- implementation of a joint format to the data within their original repository is not an option,
- the best way to harvest data can be developed for each data source individually,
- a central repository can be used to transform the data to make them searchable through a single interface and to build indexes for speed of retrieval,
- result lists can refer to full records in their original location or to records in the XML repository,
- as the structure of the data platform is modular, changes in existing data sources and new data sources should be relatively easy to cope with,
- without demanding implementation of standards (such as OAI) from contributing institutions, the data platform itself can adopt a standard.
Project Plan (Outline)
The Labour History Index project will be coordinated and hosted by IISG. Representatives from the IISG will communicate with representatives of each participating institute to
- inventory data sources of possible interest. Besides descriptions and inventories of archives, data on other collections can be considered (library holdings, visual, and audiovisual collections), and data closely related to these collections (data on persons and organizations, digitized contents of sources),
- discuss how to harvest the data (directly by a webbot from Amsterdam, by exporting the data to an accessible FTP server, by sending data by email or even on a CD or DVD; as 'full' or in 'incremental', frequency),
- investigate the structure and contents of the data in order to decide which 'fields' should be mapped to which index,
- discuss and evaluate the web interface and the functionalities to be developed.
Each institute agrees to deliver data to the Labour History Index to participate in the brainstorming group and to test the data and the functionalities. This will not be very time consuming; communication will be mostly by email and a project website.
In 2004-2005, IISG representatives will visit all participating institutes. In the first half of 2005 scripts to harvest and transform data will be developed and tested. A server and an XML database will be installed to hold the data. First drafts of the joint data model, the functionalities and the interface will be developed and tested. The first pilot version of the Labour History Index can be shown at the 2005 IALHI congress.
In the second half of 2005 second versions of the scripts, data model, functionalities, and interface will be developed and tested by project group members and a wider group of test users. Possibly by the end of 2005 the Labour History Index will be online.
After this more institutions will be invited to participate (and the participants from the first stage will be invited to contribute more data). Funding will also need to be found for this next stage.
See also:
Diagram. (PDF file)
The first draft plan, originally written by the International Institute of Social History for the IALHI Automation Group meeting in Milan, November 2000.
More about datasets, PDF file with examples of the transformation of data to XML documents in the Labour History Index.
search.labourhistory.net - Test version of the search interface.
