Title: Genealogy Project
Team: Wei LIU
Data opening become the responsibility and development opportunity of library. Shanghai library has been trying to use the Linked Open Data technologies to reorganize the traditional resources, by constructing the genealogy of knowledge service platform( jp.library.sh.cn ). With the process of BIBFRAME ontology designing, data transformation from RDB to RDF, system design based on the four principles of Linked Data, and system development based on the framework of semantic technologies, the platform can support bibliographic control in the environment of Internet, family information search service for the ordinary user requirements, knowledge mining and knowledge value-added functions for the scholars of humanities.
The description of digital library resources has followed the traditional standard (such as MARC) in the past twenty years. The information, such as title, author, publication information, carrier information, etc, has been well described. However, in this way, it is difficult to directly meet the query requirements of the knowledge implicated in the content. Linked data technologies via building relationships among resources can provide a better way for knowledge organization, description, navigation and retrieval. By reusing and connecting with the open data, linked data technologies can help enrich the relationships among data, expand data using scene, release the potential energy of the data, and build the architecture of data service on the Web. Shanghai Library is trying to use linked data technologies to reorganize the traditional library resources in order to meet the requirements of data sharing, reusing, and also bibliographic control in the internet environment. And at the same time, try to build the historical data services platform which can meet differentiated users service needs. Firstly, we designed an ontology based on Bibliographic Framework (BIBFRAME). Secondly, we extracted the surname, person, place, time, event and other entities from the metadata records according to the ontology. Thirdly, we cleaned the data by merging, disambiguation and standardization, and supplemented information for some important properties (e.g. headstream of the surnames and GIS information of the places). Then, we assigned HTTP URI for each entity and described the entities based on the RDF abstract data model. By using the RDB2RDF data conversion tools which support W3C R2RML standards and the data processing tools called OpenRefine, we transformed the data format from RDB to RDF, and loaded the RDF data into RDF store called Virtuoso. Finally, we designed the system based on the four principles of linked data, and developed the system based on semantic technologies such as Jena, SPARQL, and other data visualization tools. So the system can support bibliographic control in internet environment. That means users can know the genealogy documents location information about nearly 600 organizations all over the world. The open access to all RDF data for the machines is based on simple technologies such as content-negotiation and Restful API. There are easy-to-use search services for those who just want to know about the stories of the surname and family, and advanced search services for those who want professional data mining and knowledge discovering. Most importantly, the platform allows authenticated users to contribute content by submitting comments and suggestions, or modify data directly. After other experts confirm, the modifications would be published openly. All comments and modifications would be recorded automatically. Linked genealogy data is the first project to provide open data services based on linked open data technologies in the area of libraries in China. There are some innovation meanings in the methodology of implementation, the process of development and the usage of technological tools. But it is just a starting point for Shanghai Library. There would be lots of work to do about the authority data, which is still insufficient. And there are more external data sets such as Geonames, DBPedia, VIAF and so on need to mash up with the local data. Finally, there are some unresolved problems such as geographical names authority control in a historical view.