What is Utility Data?

If utilities are the lifeblood of our modern civilization, then utility data is the lifeblood of our future planning.

That might sound obvious in a world underpinned by data in all forms, from Internet communication to logistics networks.

But there’s one big caveat: when it comes to data, the utility industry is practically in the Dark Ages. Utility data may be crucial to our future, but in most cases it is incomplete, unreliable, and incompatible—if it can even be found.

In this post we’ll cover:
  • ‍What is data in general?
  • Utility data vs. Utility guesswork
  • Key aspects of utility data
  • Data gathering and data generation
  • Geospatial, physical, and societal data
  • Levels of reliability and completeness
  • Static, up-to-date, and obsolete data
  • Utility data location and data storage
  • Compatibility between different pieces of data
  • Why is utility data important?

What is data in general?

Data is easy to define: a collection of qualitative and quantitative observations, organized in a way that allows for systematic analysis to serve a specific purpose, whether that is prediction, evidence, strategic action, or recorded longevity.

Today, data is so abundant, and so intensively processed using higher-level analysis, that we tend to think of “raw” data as something that simply exists in the wild, waiting to be transformed into sophisticated insights.

That assumption is captured in the well-known saying: “data is the new oil.” Many CEOs and magazine editors have used those words over the past 10 years, but the earliest recorded use comes from a 2006 blogpost by Michael Palmer, where he attributes the quote to British data scientist Clive Humby. Palmer adds an important caveat:

Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.

The most important but least-mentioned aspect of data is that it has to be generated by human actions in the first place. It has to be discoverable and accessible from public or private sources. It has to be standardized to a degree that allows for useful comparison but does not remove meaningful detail. And it should have an estimated threshold of accuracy based on the sources from which the data was generated.

Utility data vs. Utility guesswork

Utility data is the basis for all informed interactions with our infrastructure, whether that entails planning and designing a new project, constructing or altering utilities or the areas around them, or managing or assessing areas of land.

That being said, the general definition of data does not necessarily apply to all information collected about utilities. In fact, an important distinction needs to be made between utility data and utility guesswork.

Utility data is collected and recorded in an organized manner at regular intervals and shared transparently with stakeholders. For example, a customer utility bill is a miniature data report including type of service, metered usage, geographic point of service, unit pricing, ownership, contact information, etc. Central utility operations depend on huge amounts of data to monitor network usage, fluctuations, efficiencies, outages or leaks, historic patterns, safety levels, etc.

On the other hand, there is a huge range of information gathering that would best be described as utility guesswork. Here’s a scenario: walk to any point outside and ask yourself: what utilities are buried under my feet? What depth are they located? What encasing material are they made of? What contents do they convey? How old are they? Which nearby facilities are they connected to? Are they in use or abandoned? How many people will lose service if they are cut? What is the predicted risk of working in their vicinity? What is the financial penalty of disrupting their service? Who owns these utilities? What is their contact information?

The uncomfortable truth: across the world (outside of a few countries like Singapore or Japan and specific cities, counties, or states), there is no systematic collection, recording, and sharing of unified utility data that could be used to answer these questions. They are mostly answered in an ad hoc manner, for specific sites, by specific stakeholders, at specific moments in time, and new findings are rarely recorded in a retrievable database. What’s more, there are significant costs, risks, and logistical and legal implications associated with collecting high-quality data on-site through geophysical methods or exposure.

Nevertheless, this is the common standard for the way we gather urgently needed information about existing utility infrastructure. Working with data is different for utilities than for most other industries because no single body­ (neither a public authority nor a specific utility) holds a comprehensive and high-quality database, and a significant amount of essential utility data has never been collected or is not found in any records.

Key aspects of utility data

Still, there is value in understanding the current context of utility data in greater detail. As Donald Rumsfeld famously said in 2002:

…there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don't know we don't know … it is the latter category that tends to be the difficult ones.

Utility data deals with all three categories, and it’s useful to know what level of certainty and comprehensiveness to expect from each method of data collection and interpretation. Here are some key considerations:

Data gathering and data generation

Existing data is gathered from various sources and analyzed, while original data is generated through sensing or exposing the utilities on site and interpreting the findings. Gathered data tends to give more comprehensive depictions of entire sites or utility lines, but there is no guarantee that these depictions are correct. Many as-builts show intended design schemes without including any changes made during construction. Even if the data was accurate at the time it was recorded, it will not reflect any alterations or additions to the site that took place later.

On the other hand, original data generation can give an up-to-date reading of the site, but only at the points or lines where it is collected. The tools must be selected and calibrated skillfully to collect useful data, and geophysical data must be interpreted further to give reliable indications of the buried utilities. Exposing utilities can generate clear and concrete findings, but may require ground surface disruption. All newly generated data must be GIS-captured as well as mapped in order to be accessible off-site and durable beyond a limited time span.

Geospatial, physical, and societal data

There are many ways to describe existing infrastructure with collected data. Geospatial data defines the precise location of facilities in a specific mapping reference system, such as latitude and longitude, as well as height or depth. Geospatial data is often based on a set of known locations, which are used to interpolate data points that describe linear utilities over large territories. Physical data, meanwhile, describes the qualities and makeup of the infrastructure artifacts—both the container (material, insulation, diameter, wall thickness, resonating frequency, tracing elements, strength, age, etc.) and the contents (chemical or energy contents, flow/transmission rate, pressurization, etc.). Finally, societal data relates to the management of the utility, such as ownership, access, liability, active or inactive status, health and environmental hazards, etc.

Levels of reliability and completeness

When dealing with a utility dataset, it’s important to understand not only what data is included but also what data has been excluded or could be missing. This can be understood in two ways: reliability and completeness.

The reliability of a data set depends on accuracy and precision. For example, what is the tolerance of the tool used to measure the location of a specific utility? When interpolating linear or area geometries from point measurements, what is the potential for error in the calculations? Another way to look at reliability is through the quality of the source material. For example, what are the potentials for inaccuracy or deviation from reality when taking data from an as-built? How reliable is the person who provided the data based on their training, skillset, and experience? Who is liable for the accuracy of the data, and does that mean the data has been produced and checked to a higher standard of scrutiny?

The completeness of a data set depends on how, when, and by whom it was produced. Many utilities hold records only for their own facilities, but not the surrounding infrastructure. Government authorities may hold comprehensive records for the utilities in the public right of way, but not the facilities located in private property. Each data set may exclude utilities that were installed after, that were abandoned before, or that were not detected or documented at the time of the data recording. Geophysics data may be reliable for geospatial position, but lack any identifying information about what the detected utility is and who operates it. Crucially, the completeness of a data set usually has to be inferred by the viewer based on context.

A data set may be highly reliable but only for a limited geographical section or a specific type of utility, and therefore completeness can only be achieved by aggregating data from multiple sources. Even highly comprehensive datasets have the possibility of missing data from undetectable utilities.

Static, up-to-date, and obsolete data

Each data set represents a particular snapshot in time, as well as a particular frequency of renewal or verification based on the needs of the data recorder or records holder. It is therefore necessary to know not only when the data was produced, but in what part of the utility lifespan it was collected, from design through construction and operation.

Some data points may be up to date, but other data may no longer be true or may have never been true. For example, an as-built drawing may portray the intended design scheme without including all of the alterations carried out on site when it was really built. A report of geophysics detection carried out in one project phase will not include new utilities added during later phases.

The more time has passed since the data was recorded, the greater the probability that changes have been made to the site. But even obsolete data can offer some value, as abandoned utilities are often removed from records after they are no longer in use. Therefore, a pipeline that is shown in archival records but is not known to have been fully decommissioned has a high likelihood of being found below the surface.

Utility data location and data storage

Data is only useful if it is stored in a known location in an accessible format. Unfortunately, a significant amount of utility data falls short of those conditions.

Some data is only found in paper records, but scanned drawings or digital plans can be equally inaccessible. Even when the data is available, it can be difficult to find where it is located and how to request access to it. Each source has a different information management system, so the search and request method must be adapted to find the relevant data for a specific site.

Many utility companies keep their records private due to competition or security concerns, and will only share their data when it is absolutely necessary, e.g. shortly before construction begins. Some companies publish a limited amount of utility data on their websites, such as maps of outages or service coverage. However, extracting information from JPEGs, PDFs, or digital map interfaces is too labor-intensive for such low-quality data.

Public authorities also keep records on some utilities, especially in the right of way. This data can be requested through the One-Call system before excavation on site. However, the data is not shared directly, but rather used in combination with geophysics tools to physically mark the utility locations in colored lines on the ground. These marks are temporary, so they must be mapped independently to store the data for future access.

Utility data may be gathered from other sources, such as as-built plans or design documents kept by engineering firms or clients, or in development plans or environmental studies. However, there is no reliable way to search for these sources, and their data may be out of date and unreliable.

Compatibility between different pieces of data

Utility data can be extracted from multiple sources and integrated into a comprehensive map or record, as long as it is compatible. If not, this process can introduce errors and give a false depiction of reality.

Unfortunately, utility data formats and standards vary significantly between different utilities and public agencies. Most data is not collected to GIS standards, and each source may use a different base map. Some utility data is not recorded to scale or referenced to the surrounding geography. This data may be useful, but a great deal of time and skill is needed to make it compatible with other data, and the process is inherently unreliable.

Why is utility data important?

Our modern infrastructure and society are so unimaginably complex these days that we require constant data collection to manage how to use scarce resources in the most intelligent way for our survival.

Utilities lag behind many other industries, from manufacture to communication, in terms of productivity and optimization, due to incomplete, unreliable, or unfindable data.

Insufficient utility data can have disastrous consequences, from hazardous explosions and devastated nature to personal injuries and company bankruptcies.

Without good utility data, it is impossible to plan for our future societal needs. And as more utilities accumulate underground over time, the lack of utility data poses a growing risk.

Project planning, design, and construction benefit from utility data when it is reliable, easy to access, and affordable. A comprehensive one-stop source for utility data is not only convenient, but gives more value and insight to use the information more productively.

Tamar Shafrir

A dedicated researcher that doesn’t stop investigating until she reaches the truth, no matter how hard it is to accept or comprehend (and there are a lot of those in our industry). Tamar took her first career steps in architecture and design, both as a practitioner and a journalist. Throughout her journey, her curiosity has taken her all across the globe, from North America through Europe to the Middle East, discovering and explaining the micro and macro challenges of the industry. Today she focuses most of her efforts on unlocking the challenges of the subsurface, through research and education. If you’re not following her on LinkedIn yet - you should.