Brazil’s national Institute for Space Research and NASA are partnering to monitor events in the Brazilian rain forest
The great thing about the Internet of Things (IoT) is that it takes communication to unprecedented heights. That’s also its weakness. Those ‘Things’ never shut up, babbling away in tongues and confusing everything.
In theory the global village of IoTs could potentially save the world by helping us use our precious food, water and energy wisely. However, keeping up with all this unstructured talk would consume more of the earth’s resources than could be saved. You’d need to burn tons of fossil fuels to power up the data centre to support all this ‘eco’ data. Or you did, until someone invented the new SciDB system.
For example, the North American Space Agency (NASA) has to manage and process a mass influx of radar data as part of its study on the intensity and frequency of storms. The technology for gathering this data is constantly evolving, so the data feeds are in all kinds of file formats and levels of detail that goes from micro to macroscopic levels. Meanwhile, the representation of the data is equally diverse, from raw signal data to wind speeds.
This jumble meant that NASA’s scientists were in danger of missing the bigger pictures evolving, as they’d be too busy trying to write programmes that would match up all the different intelligence.
So US company Paradigm4 created a specialized database management system, SciDB, for scientific use cases, which can give a consistent, abstracted representation of their information. This saves NASA’s scientists from having to hand code a file data management system. It means they can get straight onto the job of examining multiple lines of IoT evidence through running ad-hoc queries. The more lines of inquiry they pursue, the more likely they are to crack an investigation.
Brazil’s national Institute for Space Research (INPE), uses NASA’s MODIS (Moderate Resolution Imaging Radiometer) satellite data for its rainforest ecology. Its teams use satellite data to monitor events in the Brazilian rain forest as climate change alters rainfall patterns. They also combine data from multiple satellites and to include a lot of “ground truth” sensor data.
Databases struggle to keep up with the pace of processing needed in the cloud driven mobile world. Now that the cloud allows computing power to be ramped up and down at will in response to demand, the sheer liquidity of CPU, storage and memory far outstrips the speed at which they can respond. Ironically, it’s software – specifically the database – that is now the bottleneck while the hardware is now so fast and fluid that it can shapeshift to meet all eventualities.
This is why the database industry has had to rethink its processes to get around the limited movement of two dimensional structures of row and tables stored in a single location.
Getting data from different systems, and interrogating it, has become a massive problem thanks to the cloud and the IoT, namely because it exists in so many formats. This punishes scientists in particular, mainly because there are so many different fields of information gathered from so many different types of discipline. If, for example, researchers wanted to cross compare genomics data, with patient records with, say, occupational information and the geographic disparity of the subjects, this calls for cross collaboration between multiple specialities, each of which has its own uniquely shaped, intransigent systems for storing and presenting.
At the risk of over-simplifying the rationale behind the SciDB system, it appears to industrialise the process of discovery, by automating the process of stripping out data and connecting up the relevant parts.
“An epidemiologist doesn’t need to spend time learning about four different file formats, multiple interfaces and three new programming languages,” says Paul Brown, chief scientist at Paradigm4.
The additional advantage of using SciDB is that it helps the team exploit the array data model to reduce the complexity of the data scientist’s analytic operations. The ordering of satellite and IoT data may suit an array data model but SQL-based database management system, being characteristically unordered, cannot create the conditions under which parallel access to information can take place. This is one of many areas where SciDB re-orders information and prepare it for a new way of working.
The IoT can save the world. But first we need to save the data scientists!