Remote Plants, Real Risks: Closing the Observability Gap in Industrial Networks

Staff
By Staff
8 Min Read

When a production line at a remote facility goes down, manufacturers typically focus on the impact to operations. However, they can overlook the visibility of their remote IT and OT systems and the role it plays in preventing outages.

These sites, often reliant on legacy systems, short on IT resources and leaning more on IIoT, are increasingly becoming network blind spots. This can create the perfect storm for outages that start at the edge and ripple across the entire organization.

In this Q&A, Eileen Haggerty, the area vice president of product and solutions marketing at Netscout, explores why these blind spots lead to costly shutdowns, how limited visibility delays a response and what IT leaders can do to boost observability without sacrificing production.

Nolan Beilstein (NB): What does a typical observability gap look like at smaller or remote facilities?

Eileen Haggerty (EH): There are several noticeable clues to employees and IT professionals when observability is lacking in smaller or remote facilities.

For instance, employees may have user experience issues with Salesforce, Office 365 or poor quality using Teams or Zoom for phone calls or meetings. They might have trouble accessing the VPN or VDI that’s hosted in colocation sites. Or they may find the company’s critical business apps in the public cloud have become painfully slow.

This becomes very problematic, for instance when they are trying to perform key business transactions such as inputting production specifications at their factory for a particular custom order or checking raw materials inventory and they experience a major delay over the network. This impacts the business and could potentially cause a production slowdown or delay. 

The longer the problems persist, and the more challenging they are for the IT organization to pinpoint the source of the problem, the greater the impact to the business, including costly overtime to complete production, frustrated customers with delayed orders, loss of confidence in the business and employee productivity.

NB: How do these network blind spots initially develop?

EH: These network blind spots in the OT network have developed over time. Because of digital transformations, many of the applications and services that manufacturing employees use every day were gradually moved out of corporate data centers where many organizations already have observability in place.

Instead, these applications have been migrated to the cloud or colocation sites or are hosted by SaaS and UCaaS vendors. All of these digital transformations are delivered by third party vendors, however, when an employee has a problem accessing them or experiences a slowdown, they still call the corporate IT helpdesk because it is a “corporate service.”

This is the case when it is a SaaS or UCaaS-based service or an access technology like VPN or VDI hosted in a co-location site or a critical business application that has been migrated to one of the public cloud providers. In most cases, IT organizations have not included these third-party vendors in their observability strategy.

And the challenges don’t end there. The communications path from remote offices to the co-los, cloud and SaaS provider environments traverse WAN links such as an Internet, MPLS or SD-WAN service. Any one of these is coming from yet another third-party vendor.

Frankly, IT organizations just don’t have visibility and control over all these potential points of failure which is how the blind spot for problems at small and remote locations has developed.

NB: What’s the most common misconception companies have about remote site observability and security?

EH: One of the biggest misconceptions companies, or IT organizations, have about remote site observability for performance and security is that the vendor’s own point tools will be sufficient to cover the observability needs.

If a problem emerges, an IT or NetOps team may try to use a point tool offered by their cloud provider, co-location vendor or their SaaS/UCaaS partner to research the problem and their tool indicates that their service is operating just fine. Yet, the company’s users/employees are still complaining of a slowdown or lack of access to a critical business service. 

What often happens next is that the company calls for a “War Room” and includes all their vendors, including WAN and cloud providers and several members of their IT organization.

However, these war rooms are rarely productive, losing valuable time indicating their part of the communications path is operating well and finger pointing at each other as the source of the problem. This point tool strategy lacks an ecosystem-wide view of the communications path. It may rule out parts of the network that a slowdown or outage may exist but lacks ability to pinpoint the source. 

NB: What can IT leaders do to improve visibility without slowing production and overwhelming staff?

EH: Manufacturers need an ecosystem-wide observability approach, from the factory and/or sales and support offices, across the WAN, to wherever the application they are using is hosted: in the cloud, co-lo, private data center or SaaS/UCaaS provider. Anything less than this strategy will elongate troubleshooting, increase security vulnerabilities and intensify the potential for negative business outcomes. 

Observability strategies that started with coverage in private data centers, application server farms, at the business edge of WAN access points in the data centers and headquarters locations, were a good start. Remote factories and smaller locations are critical parts of the manufacturing OT environment and it is important to have observability there as well. 

Leveraging a single solution for their observability strategy is critical for two reasons. First, to obtain the invaluable ecosystem wide views and analysis to truly pinpoint the source of a problem, not to just rule out one area. Second, to reduce costs and delays associated with tool clutter (too many tools that conflict with each other). 

This question is interesting in that it asks about not slowing production, which is, of course, essential. The goal of the above recommendation is to do exactly that, avoid slowing production by identifying the source of slowdowns faster. It is important to point out that in some cases, system-wide observability has identified problems is under two hours that had persisted for weeks. 

The second point related to not wanting to overwhelm staff is also important. There are many situations where smaller locations will lack IT professionals or have very limited staff at a factory for instance.

In this case, centralized IT will likely be more overwhelmed trying to resolve the user and business impacting problems at the remote location because they lack the very thing they need to be successful. They need consistent observability for all those networked applications critical to keeping the employees and remote locations contributing at a high level.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *