Combining MGRS, Python and elasticsearch


nerdblog » Combining MGRS, Python and elasticsearch

Posted on 12 Dec 2013 19:47


Note that this entry uses Google Widgets to visually demonstrate what we are doing with Google Maps. If you have blockers like Ghostery that prevent these from working you will not be able to see the examples.

I recently had to implement some search functionality using elasticsearch that involved the MGRS coordinate system. The application itself was written in Django so while elasticsearch, MGRS nor Django is a new deal I figured it might be worth while to talk a little bit about using these things together. The MGRS system is a UTM based system that is used primarily by the military. In reality this can be applied to any grid based system.


The MGRS system.

I probably can’t improve upon the resources that are already out there about this system so I will not try, I’ll just reference those:

  • Of course, the Wikipedia article
  • There are many videos on Youtube, but this one is short and to the point.
  • Here is a pretty dry but informative Word document. Why you would serve up Docs on the web is beyond me.



It’s actually a very simple system but a key difference that you need to know when exposing a search API to users is that GPS is a point based system while MGRS is really an area based system. At its finest resolution an MGRS string it can express an area of one square meter. In practice, this is actually better than you will get with most GPS receivers which are usually 3 – 4 meters, and more rarely 2 (with some help). The point is a 1-meter square and a point with a 3-meter tolerance are practically the same for most things unless you are surveying land. If you start removing precision from your MGRS string the systems are no longer the same. The caveat to this is that the GPS tolerance is a radius, but MGRS is based on the southwest corner of a square. We’ll talk about how to reconcile these differences in code a little later.

An example data model

I guess first thing is to come up with a nice test data model for elasticsearch. Your location data will probably not be sitting at the top of your index (like many of the examples) you may even need to express your model using sub-documents. So lets do something like that and add some data with it:

curl -XPUT 'http://localhost:9200/deliveries' -d '{
    "mappings": {
       "driver" : {
           "properties" : {
                "id" : {"type" : "long"},
                "name": { "type" : "string"},
                "drop_points" : {
                    "type" : "nested",
                    "properties" : {
                        "contact" : {"type" : "string"},
                        "drop_reference": {"type" : "string"},
                        "gps_coords": {"type" : "geo_point"}
                    }
               }
           }
        }
    }
}'
echo
echo "Putting Shady Susan..."
curl -XPUT 'http://localhost:9200/deliveries/driver/2' -d '{   
    "id" : 2,
   "name" : "Shady Susan", 
   "drop_points" : [
        {
            "contact" : "John P.",
            "gps_coords": {
                "lat" : 33.748825,
               "lon" : -84.386263
            }
       },
       {
            "contact" : "Javier",
            "gps_coords" : {
               "lat" : 33.748196,
               "lon" : -84.387845
            }
       }
    ]
}'
echo
echo "Putting Sly Bob..."
curl -XPUT 'http://localhost:9200/deliveries/driver/1' -d '{   
    "id" : 1,
   "name" : "Sly Bob", 
   "drop_points" : [
        {
            "contact" : "Ray",
            "gps_coords": {
                "lat" : 33.749168,
               "lon" : -84.388816
            }
       },
       {
            "contact" : "Theo",
            "gps_coords" : {
               "lat" : 33.750598,
               "lon" : -84.386032
            }
       },
       {
            "contact" : "Amy",
            "gps_coords" : {
               "lat" : 33.749556,
               "lon" : -84.387312
            }
       }
    ]
}'

echo
echo "Refreshing..."
curl -XPOST localhost:9200/deliveries/_refresh

… OK, we have a mapping an some data. Of course this is a ridiculous data model but the nesting adds another layer of complexity in defining and querying the data model. I had to use sub documents so it seems helpful to include it in the example.

The indexer knows nothing about MGRS so we are using the built in geo_point type to index our location data. If it’s the case that your source data is MGRS then you will have to convert it to the geo_point type to index it. Our source MGRS data is all 1-meter so this was appropriate, otherwise we would have stored a geo_shape (10m, 100m, 1km square). This conversion is the same process that is used to convert incoming MGRS query strings before we build the elasticsearch query document. We will cover these details below.

Putting it together

So we have gps data as type geo_point in our index. Now we just need to bridge the two the best way to do this is to just show you the code. The heavy lifting can be done with two readily available Python packages: mgrs and geopy. We are going to need a little glue code to accomplish what we want to do. If you are not using Django or even python don’t sweat it. The Python packages are just wrappers around popular libraries and there are translations in other languages of them. The calculations are the same.

from pyelasticsearch import ElasticSearch
from geopy import Point
from geopy.distance import VincentyDistance
from math import sqrt
from mgrs import MGRS
from re import compile
from sys import argv
 
MGRS_FORMAT = compile('(\d{1,2}[A-Za-z]) *([A-Za-z]{2}) *(\d{2,10})')
SOUTH_WEST = 225
 
def parse_coordinates(mgrs_string):
   if MGRS_FORMAT.match(mgrs_string):
       groups = list(*MGRS_FORMAT.findall(mgrs_string))
       precision_digits = groups[2]
       assert len(precision_digits) % 2 is 0, "The string must have an equal number of easting and northing digits."
       converter = MGRS()
       gps_point = Point(*converter.toLatLon(mgrs_string.encode('ascii', 'ignore')))
       return gps_point, len(precision_digits) / 2
 
def build_query_document(top_left, bottom_right):
   return {
       'fields': ['name'],
       'query': {
           'nested': {
               'path': 'drop_points',
               'filter': {
                   'geo_bounding_box': {
                       'gps_coords': {
                           'top_left': {
                               'lat': top_left.latitude,
                               'lon': top_left.longitude
                           },
                           'bottom_right': {
                               'lat': bottom_right.latitude,
                               'lon': bottom_right.longitude
                           }
                       }
                   }
               }
           }
       }
   }
 
def perform_search(mgrs_string, tolerance_m):
   gps_point, precision = parse_coordinates(mgrs_string)
   padding_km = float(tolerance_m or 1.0) / float(1000)
   precision_m = 10 ** (5 - precision)
   envelope_km = float(precision_m) / float(1000)
   distance = VincentyDistance()
   bottom_left = distance.destination(gps_point, SOUTH_WEST, padding_km * sqrt(2))
   total_padding = 2 * padding_km
   top_left = distance.destination(bottom_left, 0, envelope_km + total_padding)
   bottom_right = distance.destination(bottom_left, 90, envelope_km + total_padding)
 
   query_document = build_query_document(top_left, bottom_right)
   elasticsearch = ElasticSearch('http://127.0.0.1:9200')
   return elasticsearch.search(query_document, index='deliveries', doc_type='driver')['hits']['hits']
 
if __name__ == '__main__':
   mgrs_string = argv[1]
   tolerance = argv[2]
   for hit in perform_search(mgrs_string, tolerance):
       print hit['fields']


One other note. This is simplified code to shorten it. I left out a a lot of string checking and edge case detection for various formats that the users may use. There are a few variants out there. Here is a more inclusive Regexp that allows for the inclusion of white space or "N and "E" for separating the northing and easting coordinates. This allows separate precisions for these sets of coordinates. Here it is:

(\d{1,2}[A-Za-z]) *([A-Za-z]{2}) *(?:(?:([Nn]|[Ee])? *(\d{1,5})) *(?:([Nn]|[Ee])? *(\d{1,5})))?'

Querying

For MGRS the geo_bounding_box is a natural fit for determining fitness of a point for any grid based system mostly because the queries can easily be described in the this format.

Lets say I wanted to define a search area that was for all practical purposes a point in the middle of the Georgia capitol building. Pretty useless but to demonstrate, the 1-meter MGRS coordinate for that is 16SGC4194137390.

In these examples the green triangles are the actually coordinate and the box is the area that is represented by the coordinate string. The data model above sets the example up as follows:

  • Sly Bob has three delivery points: Amy, Ray and Theo
  • Shady Susan has two: Javier and John P.

We want to query our index to find out which delivery people (if any) match our query.



Yeah, that’s way too small and misses everyone. You have to zoom way in just to see the square on the capitol building. Maybe what we really meant to say is “Want to find points in the area around the capitol building. We can start this search process by just removing some precision from our string. So now we get 16SGC419373 which specifies a 100-meter square area. That is closer to a city block perhaps. Here is what this search looks like:


Well, that sort of works but not really since the predefined grid rarely lines up with what we want to search. So we almost got Javier but no one else, even Ray is left out and he is sitting at the capitol. Also one meter in any direction may result in a totally different coordinate if you happen to be searching on the boundary of a major zone. So unlike GPS you can’t just increase or decrease a number in some cases. In my opinion this is why MGRS is great for communicating a known coordinate in combination with a map but not so good for querying an unknown location on a computer. What we really need when querying is the GPS radius style searching. So what we will do is take the grid defined in the search, whatever precision it represents, and place a fence around it at a particular spacing. To demonstrate we will use a 10-meter square primary search area with a 100-meter spacing on either side.


Results are:

{u'name': u'Sly Bob'}

That’s much better. We got Amy and Ray as matching subdocuments and therefore got Sly Bob as a match for our driver. The fence is calculated from the grid and will always be oriented with it. The alternative would be to pull one more place from the string and make the primary search 1-km square which is obviously too much for what we want here.

Precision and Padding

One tricky part of this is interpreting the MGRS query string itself. As we saw before we can increase area by removing (truncating) numbers from the string but this begs the question if someone is searching with query string 16SGC419373:
This defines a point on the grid at the south-west corner of either a 100-meter, 10-meter or 1-meter square area. Which one?
The generally accepted way to do this is to rely on placeholders so 16SGC41903730 would mean 10-meter and 16SGC4190037300 would be 1-meter. I have also seen other schemes which use N for northing and E for easting. So you get something like 16SGCE419N373. I have also seen examples where there were differing precisions for the easting and northing coordinates. I guess this (or a similar) scheme would let you accomplish that. Otherwise the decimal coordinates have to be in pairs. It just depends on what you need. It’s pretty easy to support all formats actually with a little help from Regexp. I'll include some regexp in the code sample.

MGRS, Python and Windows.

Really this is less about mgrs now and more about deploying it in a Windows environment with Django. If you are not using that you can skip this. This package behaves beautifully on unix and by extension, OSX. I started having problems while trying to use it on Windows. To be fair to the maintainer(s) of package mgrs, Python on Windows is a royal pain. If you are interested in researching you can check out the issue for yourself.
Since it’s really just a wrapper to a the gdal code that works with MGRS it’s just one file. I just moved that file with the init.py into an internally maintained package and made the following changes in core.py:

This basically lets you used an unchanged pip installation and uses the DLL that comes with that (on Windows).

The next problem you are likely to encounter if you are using the standard build of Apache 2.2.x to front Django is the fact that the manifest for that executable does not (any longer) contain a reference to VC90.DLL. In the windows world if a process invokes or hosts another process then the runtime manifest of the hosted process is ignored. Well, mod_wsgi is just such an entity. The Python executable lists this as a requirement however since it’s invoked by Apache via mod_wsgi it is ignored and the result is a

WindowsError: [Error 126] The specified module could not be found

which is Python not being able to find VC90, not libmgrs.dll. You can read a verbose explanation of this issue (even though we are not using psycopg) in this post. To fix this you need to download the Windows manifest tool (mt.exe). Just search around, you will find it. Here are the commands to fix this problem:
mt.exe -inputresource:"C:\Python27\python.exe";#1 -out:"C:\Python27\python.exe.manifest"
copy C:\Python27\python.exe.manifest C:\Apache2.2\bin\httpd.exe.manifest
copy C:\Apache2.2\bin\httpd.exe C:\Apache2.2\bin\httpd.exe.bak
mt.exe -manifest "C:\Apache2.2\bin\httpd.exe.manifest" -outputresource:"C:\Apache2.2\bin\httpd.exe";1

Like this entry?

rating: 0+x

Leave a comment

Add a New Comment
Unless otherwise stated, the content of this page is licensed under GNU Free Documentation License.