Stop the tiling terror!

Stop the tiling terror!


Being Smart about Big Geo Data

Ever since Google Maps was introduced, followed by Google Earth, everybody has gotten used to having maps readily available, everywhere and always, fast and at high resolutions. The professional geo community has also embraced this source information. The use of webservices for geographic content has since then taken a giant leap.

It’s an impressive piece of work what Google has achieved, to be able to put geo on the map so prominently, in such short timeframe. In fact, they have been much faster than us, traditional geospatial companies. I believe the most important reason for this success is the fact that Google offers fast, worldwide coverage through an intuitive user interface.

It allows Google to connect advertisement channels and send these to a large audience, using ‘location’ as its vehicle. The map has become the means to get to the products advertised. And it works. Let’s see where the nearest fast food restaurant or ATM is, and while we’re at it, give me directions on how to get there. And it’s all available at lightning speed.

The ‘Google’ experience for professionals

As said before, also the professional geo-community has gotten used to Google’s maps and has set this (intentionally or unintentionally) as its benchmark for serving speeds of maps. And this is where the shoe pinches a little.

The professional geo community has other demands, next to speed, when it comes to mapping. It’s about being up to date, being accurate and having full control over the data that’s being used, etc. What many people don’t know, or at least tend to forget when they declare the Google services as the benchmark, is that the speed one experiences is achieved by Google using a ‘brute force’ approach.  They simply cram huge warehouses with servers, so every individual server has a limited load and is able to send out the previously rendered map tiles fast. This is hardly comparable to the average ICT infrastructure we find at a municipality or province. In most cases, all they have to work with is 1 or a few servers at best, with limited capacity. Getting to the Google serving performance then becomes a significant challenge, to say the least.

Tiling as the answer to speed expectancy

The strategy of the huge warehouse with servers is not an option for these organizations, but the other part of the solution (tiling) is being adopted. And that does indeed result in a significant speed improvement compared to rendering each map image as it is requested. Great, one might think. And it is, as long as the amount of data you need to serve out is pretty limited, does not change very often and there are no limits to available storage space. This last aspect is something many people don’t realize when they opt full force for a tiling strategy.

On average, it’s fair to say that the footprint of your data expands with a factor of 2 to 2,5, compared to the original. So when you start with a 1 Tb file, you will need a capacity of roughly 2,5 Terabytes to include the tile set. That’s quite a lot. And it’s not that strange; in essence, tiling is nothing more than copying your data into smaller, and smaller, and smaller pieces. The cost of storage has decreased slightly over the past years, but it’s certainly not free. Especially when you use hosted storage (provided and maintained by a third party), the cost per Terabyte can run into several thousand euro’s a year.

Developments in the amount of geo-data that’s being collected

The world of sensors is developing constantly. Where in the late 90’s, we were very happy to have a 50cm resolution aerial photograph, today the standard lies at 10cm and many municipalities have even gone up to 7,5 or 5cm. The level of detail in this data is wonderful of course, and very useful for things like mapping, change detection, etc. More and more organizations also discover the added value that extra spectral bands can bring, like infrared (which is collected automatically with virtually all modern aerial camera’s). The collecting frequency has also gone up in recent years, from once every 5 years, to every 2 years and nowadays every year, sometimes even twice a year. Very useful for maintenance of base maps like BAG and BGT.

Now back to storage. All the factors mentioned above have one consequence: the amount of geospatial data we collect annually is growing exponentially. This in itself is quite a challenge for IT departments inside an organization. When you add the 2,5 tiling factor to this, it’s simply impossible to manage. And even more so: unaffordable.

Big Data management : What’s the trick?

This is where a format like ECW, combined with APOLLO Essentials proves its power. It delivers the same speed, using the same open WMTS protocol, to the same end user applications. But instead of a 2,5 times growth of the original data, ECW creates a 94% reduction of the original data, without visual loss. That 1 Tb is then reduced to about 60 Gb, with the same end user experience. Compared to a tiling strategy, this is a difference of 4266%! Now this is a strategy that enables us to manage a number of datasets in the coming years.

An added advantage is the time it takes to achieve this. As an example we take the raster image of the whole of Germany. This file, with an original footprint of 38 Tb (40cm resolution), was fitted with a tile cache up to 19 levels in a matter of 152 days, leading to a growth of the footprint to 71 Terabytes. The ECW of the same orignal data totalled at 0,85 Tb and was created in only 7 days. So also in this respect, enormous savings can be achieved.

Use a server for what’s made for

Serving a raster dataset for an allround GIS-server means a significant load. The bigger these datasets  get, the harder such a server has to work and ultimately, overall performance suffers, also that of the other services that have nothing to do with that raster data.

ERDAS APOLLO Essentials is developed as a serving solution with only one goal: serving raster data as fast and efficiently as possible. By optimizing its processes up to the level of machine code, APOLLO is  capable of serving hundreds of users from a single server without strain. In developing APOLLO Essentials further, speed is always at the very top of the list of requirements. Not every new release will therefore have extensive new capabilities, but there’s a constant effort to be able to server even bigger datasets, even faster. And this is where the savings are for end users, at the end of the day.

The optimal architecture that fits this is a ‘best of breed’ approach. Let APOLLO Essentials manage and serve enormous amounts of raster data being a dedicated raster server, and use your traditional GIS server for the vector layers. This might sound like a heavy configuration, but the cost of expanding your traditional GIS Server to achieve the same performance are higher in almost all cases.

On our way to smart management and delivery of ‘Big Geo Data’

The tiling-strategy being advised by many geospatial companies can be seen as the  ‘brute force’ approach that Google uses with its server warehouses. The developments described above and the associated cost levels should indicate that this is not a feasible long term strategy. The combination of compression and smart serving techniques in APOLLO allow for the same user experience, at a fraction of the cost. With this in mind, we can step into the future of geospatial data with confidence, without the danger of cost and management get out of control. That to me is a comforting thought!