Many analytics and machine learning use cases connect to data stored in data warehouses or data lakes, run algorithms on complete data sets or a subset of the data, and compute results on cloud architectures. This approach works well when the data doesn’t change frequently. But what if the data does change frequently?
Today, more businesses need to process data and compute analytics in real-time. IoT drives much of this paradigm shift as data streaming from sensors requires immediate processing and analytics to control downstream systems. Real-time analytics is also important in many industries including healthcare, financial services, manufacturing, and advertising, where small changes in the data can have significant financial, health, safety, and other business impacts.
If you’re interested in enabling real-time analytics—and in emerging technologies that leverage a mix of edge computing, AR/VR, IoT sensors at scale, and machine learning at scale—then understanding the design considerations for edge analytics is important. Edge computing use cases such as autonomous drones, smart cities, retail chain management, and augmented reality gaming networks all target deploying large scale, highly reliable edge analytics.
Edge analytics, streaming analytics, and edge computing
Several different analytics, machine learning, and edge computing paradigms are related to edge analytics:
- Edge analytics refers to analytics and machine learning algorithms deployed to infrastructure outside of cloud infrastructure and “on the edge” in geographically localized infrastructure.
- Streaming analytics refers to computing analytics in real time as data is processed. Streaming analytics can be done in the cloud or on the edge depending on the use case.
- Event processing is a way to process data and drive decisions in real time. This processing is a subset of streaming analytics, and developers use event-driven architectures to identify events and trigger downstream actions.
- Edge computing refers to deploying computation to edge devices and network infrastructure.
- Fog computing is a more generalized architecture that splits computation among edge, near edge, and cloud computing environments.
When designing solutions requiring edge analytics, architects must consider physical and power constraints, network costs and reliability, security considerations, and processing requirements.
Reasons to deploy analytics on the edge
You might ask why you would deploy infrastructure to the edge for analytics? There are technical, cost, and compliance considerations that factor into these decisions.
Applications that impact human safety and require resiliency in the computing architecture are one use case for edge analytics. Applications that require low latency between data sources such as IoT sensors and analytics computing infrastructure are a second use case that often requires edge analytics. Examples of these use cases include:
- Self-driving cars, automated machines, or any transportation where control systems are automating all or parts of the navigation.
- Smart buildings that have real-time security controls and want to avoid having dependencies on network and cloud infrastructure to allow people to enter and exit the building safely.
- Smart cities that track public transportation, deploy smart meters for utility billing, and smart waste management solutions.
Cost considerations are a significant factor in using edge analytics in manufacturing systems. Consider a set of cameras scanning the manufactured products for defects while on fast-moving conveyor belts. It can be more cost-effective to deploy edge computing devices in the factory to perform the image processing, rather than having high-speed networks installed to transmit video images to the cloud.
I spoke with Achal Prabhakar, VP of engineering at Landing AI, an industrial AI company with solutions that focus on computer vision. “Manufacturing plants are quite different from mainstream analytics applications and therefore require rethinking AI including deployment,” Prabhakar told me. ”A big focus area for us is deploying complex deep learning vision models with continuous learning directly on production lines using capable but commodity edge devices.”
Deploying analytics to remote areas such as construction and drilling sites also benefits from using edge analytics and computing. Instead of relying on expensive and potentially unreliable wide area networks, engineers deploy edge analytics infrastructure on-site to support the required data and analytics processing. For example, an oil and gas company deployed a streaming analytics solution with an in-memory distributed computing platform to the edge and reduced the drilling time by as much as 20 percent, from a typical 15 days to 12 days.
Compliance and data governance is another reason for edge analytics. Deploying localized infrastructure can help meet GDPR compliance and other data sovereignty regulations by storing and processing restricted data in the countries where the data is collected.
Designing analytics for the edge
Unfortunately, taking models and other analytics and deploying them to edge computing infrastructure isn’t always trivial. The computing requirements for processing large data sets through computationally intensive data models may require re-engineering before running and deploying them on edge computing infrastructure.
For one thing, many developers and data scientists now take advantage of the higher-level analytics platforms that are available on public and private clouds. IoT and sensors often utilize embedded applications written in C/C++, which may be unfamiliar and challenging terrain for cloud-native data scientists and engineers.
Another issue may be the models themselves. When data scientists work in the cloud and scale computing resources on-demand at relatively low costs, they are able to develop complex machine learning models, with many features and parameters, to fully optimize the results. But when deploying models to edge computing infrastructure, an overly complex algorithm could dramatically increase the cost of infrastructure, size of devices, and power requirements.
I discussed the challenges of deploying AI models to the edge with Marshall Choy, VP of product at SambaNova Systems. “Model developers for edge AI applications are increasingly focusing more on highly-detailed models to achieve improvements in parameter reduction and compute requirements,” he noted. “The training requirements for these smaller, highly-detailed models remains daunting.”
Another consideration is that deploying a highly reliable and secure edge analytics system requires designing and implementing highly fault-tolerant architectures, systems, networks, software, and models.
I spoke with Dale Kim, senior director of product marketing at Hazelcast, about use cases and constraints when processing data at the edge. He commented that, while equipment optimizations, preventive maintenance, quality assurance checks, and critical alerts are all available at the edge, there are new challenges like limited hardware space, limited physical accessibility, limited bandwidth, and greater security concerns.
“This means that the infrastructure you’re accustomed to in your data center won’t necessarily work,” Kim said. “So you need to explore new technologies that are designed with edge computing architectures in mind.”
The next frontier in analytics
The more mainstream use cases for edge analytics today are data processing functions, including data filtering and aggregations. But as more companies deploy IoT sensors at scale, the need to apply analytics, machine learning, and artificial intelligence algorithms in real-time will require more deployments on the edge.
The possibilities at the edge make for a very exciting future of smart computing as sensors become cheaper, applications require more real-time analytics, and developing optimized, cost-effective algorithms for the edge becomes easier.