about | people | jobs | info | experience | gridify | duetopia | images |
               
                | enter | discover | share | r d k |




Welcome

 
To the next generation of spatial data infrastructures.

A new vision to the production, management and distribution of huge spatial data where grid technology joins geographical data inventory and query systems, carried out collaboratively with best-known open source projects.

Terradue works to exploit and strengthen best practices in distributed data processing, archiving and discovery for Earth sciences, finance and medicine. Our emphasis is on the immediate delivery of robust operational systems while keeping a concrete roadmap to build the next generation data processing and storage systems.





Sand and dust from the Sahara Desert blowing across the Atlantic

 




Stepwise Geospatial OpenSearch

2008-06-12

 
Previously we looked at how, in the abstract, an OpenSearch-based approach could meet the INSPIRE requirements for discovery services. Here we offer some reflections on how we got here and how a prototype worked.

I first heard of OpenSearch when Stefan Keller pointed at Andrew Turner's work on the OpenSearch geospatial extensions. This offered a simplified, "neogeography"-inspired approach to spatial search. I liked the Opensearch Geo Extensions because they looked so similar to other suggestions I'd seen for simple search of spatial data - the first specification for WFS-Basic, earlier work on spatial RSS and RDF aggregation. It seemed a good fit for the "Simple Catalog Interface", the need for which has been felt for so long. Best of all, OpenSearch-Geo comes from a context that isn't specifically geospatial, but provides a spatial search add-on to a means of querying any kind of index or archive. Take the interface approach of web-based search engines, attach it to machine-readable results; even library information scientists saw OpenSearch as a "friendly" discovery alternative to complex protocols.

As Andrew said when touting this work around the "Geospatial Mass Market" community, "I imagine it would be more likely that someone who uses & understands CSW may want to add OpenSearch than the reverse." Essentially one could request all records found within a spatial area defined as a point-plus-radius, a bounding box, or a polygon. The flexibility of the OpenSearch protocol allows one to return lists of search results in any format that a client can be persuaded to understand. While Atom is common, search results in KML or JSON are equally possible, as Andrew illustrated in his article. A JSON result set could be used to supply configuration options to a Javascript client such as OpenLayers, or provide the same search result set to a human-readable interface.

In our prototype implementation of an OpenSearch interface to a geospatial metadata repository, a summary of results is offered in Atom format. Each entry in the result set links to more "complete", alternate versions of the appropriate metadata - in RDF using Dublin Core, and in a simple XML serialisation of ISO 19115.

So far so good, but of course there has to be a catch or two. How far can one push OpenSearch into use for structured data search, beyond the "q=cat" level of keyword searching that text-based indexes offer?

box contains what?

Also significant is lack of a standard ability to specify a geometry query type to apply with a bounding box or polygon, as given by the OpenSearch Geo extensions. The draft spec assumes this is a query for objects wholly Within the area - "box contains what?"

A likely case in which one wants to see more results is the set of relevant objects whose area Overlaps the specified box. One might also want to qualify this with a scale difference between the query and the result. If searching for overlaps with a small box, I probably don't want to see global data sets. It may be appropriate to specify cartographic scale of the desired result sets. However this risks confusing or repelling non-specialist users of a search service. It also risks generating a lot of "false negatives" where there is incomplete metadata.

As a web-inspired search service, we want to err on the side of irrelevance. At least initially, while the user is new to the service and the index may not be that large or comprehensive yet, the risk of missing results that *are* of interest because our treatment of the appropriate metadata is too rigid, is too great.

This may largely be a user interface challenge, or one of "locative literacy" amongst a user base - not a problem which tinkering with the tech will really help solve. However, there is a risk of lessening utility in assuming a "within" behaviour to a shape-based search. We can change the way a shape-based search works, with extra parameters, and submit successful changes as addition to the Geo extensions. As the appeal of OpenSearch is its simplicity, this is a balancing act.

AND or OR?

For me a more important catch is uncertainty about combinatorial behaviours of key/value search constraints. For example I want to find data from the MERIS sensor contained within a given bounding box in a given date range. I know to expect a keyword, so i'd look for ?q=meris&bbox=-5.98,27.95,36.91,44,47 If i find a dataset which matches the keyword, but not the bounding box, should it be in the returned results or not?

Well, sometimes it depends on the application, sometimes on the user. Looking again at the example of web search, it's common to get back a long list of partially relevant results appended to a few completely relevant results. A view of partial relevance can be a very useful in making a follow-up query. Getting no results back, even if there are "close" matches, leaves a lackluster use experience.

The OpenSearch template syntax doesn't state any requirements for how queries with multiple parameters are evaluated. We could assume that all key-value requests are ANDed together, but there's no explicit basis for that assumption. There's no way to introduce operators between sets of key-value parameters - and requiring that may introduce more complexity into the OpenSearch protocol than it was ever intended to handle.

If we write the client as well as the server, the lack of explicit support for operators isn't a problem, because we can rely on implicit assumptions about how both ends will behave. Being able to transfer those assumptions to others is a different question.

Distributed, cascaded?

Many potential applications, including those outlined in the INSPIRE Discovery drafts, would benefit from the ability to "cascade" one search client request through multiple different servers; to "federate" the ability to produce answers across multiple installations.

OpenSearch doesn't appear to be designed with this in mind; its origin at Amazon's A9 was as a protocol allowing data at many smaller sites to be scooped up by one very large aggregator, which presents one face for the whole collection. Essentially, it would be a smarter and more "standards" oriented version of Google Sitemaps, re-usable for semi-structured data. For a "Geospatial one-stop" setup as most national and regional spatial data portals are envisaged to be, this would work out fine.

There is the potential for finding a "middle way", making what use we can of OpenSearch for duetopia, and that's a subject for future discussion here.


Posted by Jo Walsh in: duetopia, OpenSearch

Post your comment:
Name :
Comment:




duetopia released as Open Source (GPL3)

2008-05-23

 
Terradue has released the very first public version of duetopia as open source and hosted in google code pages. This version presents the initial implementation of the libraries to provide import and export from common standard formats into a shared model (ISO 19115, ISO 19139, FGDC, RDF/XML) [...]


BEinGRID Industry Days in Barcelona

2008-04-11

 
During the 3rd through 5th of June, 2008, the BEinGRID project will be hosting its Industry Days in Barcelona, Spain, in a shared venue at the Barceló Hotel Sants with the 23rd Open Grid Forum Conference (OGF23). The event will highlight the best of BEinGRID, the European Union’s largest integrated project funded by the Information Society Technologies (IST) research through the sixth research Framework Programme (FP6), including selected demos of the current 18 Business Experiments, as well as key guest speakers from the software and services industry. [...]


New endeavour to save Earth Science data

2008-03-10

 
The amount of information being generated about our planet is increasing at an exponential rate, but it must be easily accessible in order to apply it to the global needs relating to the state of the Earth. GENESI-DR (Ground European Network for Earth Science Interoperations - Digital Repositories), an ESA-led, European Commission (EC)-funded two-year project, is taking the lead in providing reliable, easy, long-term access to Earth Science data via the Internet. [...]





© terradue srl   2006-2008       |       powered by gridify™       |