{toc:maxLevel=2} h1. Solr schema The schema XML is on Github here: https://github.com/sul-dlss/geohydra/blob/master/solr/kurma-app-test/conf/schema.xml. h2. Primary key * *uuid*: Unique Identifier. Examples: ** [http://purl.stanford.edu/vr593vj7147], ** [http://ark.cdlib.org/ark:/28722/bk0012h535q], ** urn:geodata.tufts.edu:Tufts.CambridgeGrid100_04. h2. Dublin Core {note} See the [Dublin Core Elements Guide|http://dublincore.org/documents/dcmi-terms/] for semantic descriptions of all of these fields. We're using both DC Elements and DC Terms {note} * *dct_spatial_sm*: Coverage, placenames. Multiple values allowed. Example: "Paris, France". * *dct_temporal_sm*: Coverage, years. Multiple values allowed. Example: "2010". * *dc_creator_sm*: Author(s). Example: "Washington, George". * *dct_issued_dt*: Date in Solr syntax. Example: "2001-01-01T00:00:00Z". * *dc_description_s*: Description. * *dc_format_s*: File format (not MIME types). Valid values: ** "Shapefile" ** "GeoTIFF" * *dc_identifier_s*: Unique identifier. Same as UUID. * *dc_language_s*: Language. Example: "English". * *dc_publisher_s*: Publisher. Example: "ML InfoMap (Firm)". * *dct_references_sm*: URLs to referenced resources. Used scheme and url parameters. scheme values are based on [CatInterop|https://github.com/OSGeo/Cat-Interop/blob/master/link_types.csv] Multiple values allowed. Example: scheme="urn:ogc:serviceType:WebFeatureService" url="http://geowebservices-restricted.stanford.edu/geoserver /wfs" * *dc_rights_s*: Rights for access. Valid values: ** "Restricted" ** "Public" * *dct_provenance_s*: Source institution: Examples: ** Berkeley ** Harvard ** MassGIS ** MIT ** Stanford ** Tufts * *dc_subject_sm*: Subject. Multiple values allowed. Example: "Human settlements", "Census". * *dc_title_s*: Title. * *dc_type_s*: Resource type. dc:type=Dataset for georectified images, dc:type=Image for digitaized, non-georectified images, or dc:type=PhysicalObject for paper maps (no digitization). * *dct_isPartOf_sm*: Collection to which the layer belongs. h2. GeoRSS metadata * *georss_point_s*: Point representation for layer -- i.e., centroid? * *georss_box_s*: Bounding box as maximum values for S W N E. Example: "12.62309 76.76 19.91705 84.76618" * *georss_polygon_s*: Shape of the layer as a Polygon. Example: "n w n e s e s w n w" h2. Layer-specific metadata * *layer_slug_s*. Unique identifier visible to the user, used for Permalinks. * Example: stanford-vr593vj7147. * *layer_id_s*. The complete identifier for the WMS/WFS/WCS layer. Example: "druid:vr593vj7147", * *layer_geom_type_s*. Valid values are: "Point", "Line", "Polygon", and "Raster". h2. Derived metadata used by Solr index * *solr_bbox*: Bounding box as maximum values for W S E N. Example: "76.76 12.62309 84.76618 19.91705" * *solr_geom*: Shape of the layer as a Point, LineString, or Polygon WKT. Example: "POLYGON((76.76 19.91705, 84.76618 19.91705, 84.76618 12.62309, 76.76 12.62309, 76.76 19.91705))" * *solr_ne_pt* (from solr_bbox). North-eastern most point of the bounding box, as (y, x). Example: "83.1,-128.5" * *solr_sw_pt* (from solr_bbox). South-western most point of the bounding box, as (y, x). Example: "81.2,-130.1" * *solr_year_i* (from dc_coverage_temporal_sm): Year for which layer is valid. Example: 2012. h2. Solr schema syntax See complete schema on https://github.com/sul-dlss/geomdtk/blob/master/solr/kurma-app-test/conf/schema.xml Note on the types: || Suffix || Solr data type using dynamicField || | \_s | String | | \_sm | String, multivalued | | \_t | Text, English | | \_i | Integer | | \_dt | Date time | | \_url | URL as a non-indexed String | | \_bbox | Spatial bounding box, Rectangle as (w, s, e, n) | | \_pt | Spatial point as (y,x) | | \_geom | Spatial shape as WKT | {code:xml} uuid ... ... {code} ---- h1. Solr queries * Use the Solr query interface with LatLon data on [sul-solr-a|http://sul-solr-a/solr/#/] to try these using ogp core. * For the polygon or JTS queries use [ogpapp-test|http://localhost:8983/solr/#/] via ssh tunnel to jetty 8983. h2. Solr 3: Pseudo-spatial using _solr.LatLon_ {warning} solr.LatLonType does not correctly work across the international dateline in these queries. _latlon in these examples are assumed to be solr.LatLonType. {warning} h3. Search for point within 50 km of N40 W114 Note: Solr _bbox_ uses circle with radius not rectangles. {code:xml} 50 *:* solr_latlon 40,-114 {!geofilt} {code} h3. Search for single point _within_ a bounding box of SW=40,-120 NE=50,-110 {code:xml} *:* solr_latlon:[40,-120 TO 50,-110] {code} h3. Search for bounding box _within_ a bounding box of SW=20,-160 NE=70,-70 {code:xml} *:* solr_sw_latlon:[20,-160 TO 70,-70] AND solr_ne_latlon:[20,-160 TO 70,-70] {code} h2. Solr 4 Spatial -- non JTS {warning} _pt and _bbox in these examples are assumed to be solr.SpatialRecursivePrefixTreeFieldType. {warning} h3. Search for point _within_ a bounding box of SW=20,-160 NE=70,-70 {code:xml} *:* solr_pt:"Intersects(-160 20 -70 70)" {code} h3. Search for bounding box _within_ a bounding box of SW=20,-160 NE=70,-70 {code:xml} *:* solr_sw_pt:[20,-160 TO 70,-70] AND solr_ne_pt:[20,-160 TO 70,-70] {code} h3. Solr 4: ... using polygon intersection {code:xml} *:* solr_bbox:"Intersects(-160 20 -70 70)" {code} h3. Solr 4: ... using polygon containment {code:xml} *:* solr_bbox:"IsWithin(-160 20 -150 30)" {code} h3. Solr 4: ... using polygon containment for spatial relevancy {code:xml} solr_bbox:"IsWithin(-160 20 -150 30)"^10 railroads solr_bbox:"Intersects(-160 20 -150 30)" {code} h2. Solr 4 Spatial -- JTS {warning} This query requires [JTS|http://tsusiatsoftware.net/jts/main.html] installed in Solr 4, where the spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory" for the solr.SpatialRecursivePrefixTreeFieldType field class. {warning} h3. Search for bbox _intersecting_ bounding box of SW=20,-160 NE=70,-70 using polygon intersection {code:xml} *:* solr_bbox:"Intersects(POLYGON((-160 20, -160 70, -70 70, -70 20, -160 20)))" {code} h2. Scoring formula {code} text^1 dc_description_ti^2 dc_creator_tmi^3 dc_publisher_ti^3 dct_isPartOf_tmi^4 dc_subject_tmi^5 dct_spatial_tmi^5 dct_temporal_tmi^5 dc_title_ti^6 dc_rights_ti^7 dct_provenance_ti^8 layer_geom_type_ti^9 layer_slug_ti^10 dc_identifier_ti^10 {code} h2. Facets {code:xml} dct_spatial_sm dc_format_s dc_language_s dc_publisher_s dc_rights_s dct_provenance_s dc_subject_sm dct_isPartOf_sm layer_geom_type_s solr_year_i {code} ---- h1. Solr example documents See [https://github.com/sul-dlss/geohydra/blob/master/ogp/transform.rb]. These metadata would be generated from the OGP Schema, or MODS, or FGDC, or ISO 19139. {code} "uuid": "http://purl.stanford.edu/zy658cr1728", "dc_description_s": "This point dataset shows village locations with socio-demographic and economic Census data f or 2001 for the Union Territory of Andaman and Nicobar Islands, India linked to the 2001 Census. Includes village s ocio-demographic and economic Census attribute data such as total population, population by sex, household, literac y and illiteracy rates, and employment by industry. This layer is part of the VillageMap dataset which includes soc io-demographic and economic Census data for 2001 at the village level for all the states of India. This data layer is sourced from secondary government sources, chiefly Survey of India, Census of India, Election Commission, etc. T his map Includes data for 547 villages, 3 towns, 2 districts, and 1 union territory.; This dataset is intended for researchers, students, and policy makers for reference and mapping purposes, and may be used for village level demo graphic analysis within basic applications to support graphical overlays and analysis with other spatial data.; ", "dc_format_s": "Shapefile", "dc_identifier_s": "http://purl.stanford.edu/zy658cr1728", "dc_language_s": "English", "dc_publisher_s": "ML InfoMap (Firm)", "dc_rights_s": "Restricted", "dc_subject_sm": [ "Human settlements", "Villages", "Census", "Demography", "Population", "Sex ratio", "Housing", "Labor supply", "Caste", "Literacy", "Society", "", "Location" ], "dc_title_s": "Andaman and Nicobar, India: Village Socio-Demographic and Economic Census Data, 2001", "dc_type_s": "Dataset", "dct_isPartOf_sm": "My Collection", "dct_references_sm": [ "scheme=\"urn:ogc:serviceType:WebFeatureService\" url=\"http://geowebservices-restricted.stanford.edu/geoserver/wfs\"", "scheme=\"urn:ogc:serviceType:WebMapService\" url=\"http://geowebservices-restricted.stanford.edu/geoserver/wms\"", "scheme=\"urn:iso:dataFormat:19139\" url=\"http://purl.stanford.edu/zy658cr1728.iso19139\"", "scheme=\"urn:x-osgeo:link:www\" url=\"http://purl.stanford.edu/zy658cr1728\"", "scheme=\"urn:loc:dataFormat:MODS\" url=\"http://purl.stanford.edu/zy658cr1728.mods\"", "scheme=\"urn:x-osgeo:link:www-thumbnail\", url=\"http://example.com/preview.jpg\"" ], "dct_spatial_sm": [ "Andaman and Nicobar Islands", "Andaman", "Nicobar", "Car Nicobar Island", "Port Blair", "Indira Point", "Diglipur", "Nancowry Island" ], "dct_temporal_sm": "2001-01-01T00:00:00Z", "dct_issued_dt": "2000-01-01T00:00:00Z", "dct_provenance_s": "Stanford", "georss_box_s": "6.761581 92.234924 13.637013 94.262535", "georss_polygon_s": "13.637013 92.234924 13.637013 94.262535 6.761581 94.262535 6.761581 92.234924 13.637013 92.234924", "layer_slug_s": "stanford-zy658cr1728", "layer_id_s": "druid:zy658cr1728", "layer_srs_s": "EPSG:4326", "layer_geom_type_s": "Point", "solr_bbox": "92.234924 6.761581 94.262535 13.637013", "solr_ne_pt": "13.637013,94.262535", "solr_sw_pt": "6.761581,92.234924", "solr_geom": "POLYGON((92.234924 13.637013, 94.262535 13.637013, 94.262535 6.761581, 92.234924 6.761581, 92.234924 13.637013))" "score": 1.6703978 } {code} h1. Links * Solr 4: [http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4] * Solr 3: [http://wiki.apache.org/solr/SpatialSearch] * JTS: [http://tsusiatsoftware.net/jts/main.html]