Feeding the GND into SolRDF

During my Easter holiday last week I spent some time with Andrea Gazzarini's Solr plugin SolRDF which builds a SPARQL endpoint on top of RDF data you feed into Solr. As a first attempt I fed it a small excerpt from our research bibliography metadata I converted to BiBo just to see how it works. After these preliminary tests I set out to throw something bigger at it: The data dump of the GND is a whopping 7GB collection of authority data collected by the German National Library.

After spending quite some time on repairing the data as it currently has some serious problems with invalid dates I could finally feed it on a pretty decent machine with an SSD and sufficient RAM. After 66 minutes it successfully finished and I had a high-performance SPARQL endpoint to map roughly 10,000 subject heading literals from the aforementioned dataset to GND identifiers. I put that Solr instance behind an nginx proxy and started to throw these subject headings against the index. From the Solr logs I could see that none my requests took more than 5ms to finish.

When I chatted with my colleague Hans-Georg this morning, he told me that he had done the same thing with SolRDF and he had similar results. For some time now he has been using LMF as his storage and retrieval backend and so has some experience to compare the two. We both find SolRDF very exciting and are planning to use it on a project we are currently working on...

Go Top
comments powered by Disqus