• Skip to main content
itrc_logo

EDM

Home
Interactive Directory
Introduction and Overview
Introduction
Overview of Guidance Document
Data Management Planning
Data Management Planning Home
Data Management Planning Overview
Data Governance
Data Lifecycle
Data Access, Sharing, and Security
Data Storage, Documentation, and Discovery
Data Disaster Recovery
Data Quality
Data Quality Home
Data Quality Overview 
Analytical Data Quality Review: Verification, Validation, and Usability
Using Data Quality Dimensions to Assess and Manage Data Quality
Considerations for Choosing an Analytical Laboratory 
Active Quality Control During Screening-level Assessments
Field Data Collection
Field Data Collection Home
Introduction to and Overview of Field Data Collection Best Practices
Defining Field Data Categories and Collection Methods
Field Data Collection Process Development Considerations
Field Data Collection Quality Assurance and Quality Control (QA/QC)
Field Data Collection Training Best Practices
Field Data Collection Training Best Practices Training Development Checklist
Other Considerations for Field Data Collection
Data Exchange
Data Exchange Home
Data Exchange Overview
Valid Values
Electronic Data Deliverables and Data Exchange
Data Migration Best Practices
Traditional Ecological Knowledge
Traditional Ecological Knowledge Home
What is Traditional Ecological Knowledge?
Acquiring Traditional Ecological Knowledge Data
Using and Consuming Traditional Ecological Knowledge Data
Managing Traditional Ecological Knowledge Data
Geospatial Data
Geospatial Data Home
Overview of Best Practices for Management of Environmental Geospatial Data
Organizational Standards for Management of Geospatial Data
Geospatial Data Standards
Geospatial Data: GIS Hardware
Geospatial Metadata
Geospatial Data Software
Geospatial Data Collection Consistency
Geospatial Data Field Hardware
Geospatial Data Dissemination: Web Format
Geospatial Visualization of Environmental Data
Public Communications
Public Communications Home
Public Communication and Stakeholder Engagement
Environmental Data Management Systems
Environmental Data Management Systems Home
Environmental Data Management Systems
Case Studies
Case Studies Home
Historical Data Migration Case Study: Filling Minnesota’s Superfund Groundwater Data Accessibility Gap
Case Study: USGS Challenges with secondary use of multi-source water quality monitoring data
LEK Case Study: Collection and Application of Local Ecological Knowledge to Local Environmental Management in Duluth, Minnesota
TEK Case Study: Improving Coastal Resilience in Point Hope, Alaska
Case Study: Integration of Traditional Ecological Knowledge to the Remediation of Abandoned Uranium Sites
Case Study: Local Ecological Knowledge of Historic Anthrax in a Natural Gas Field
Rest in Peace? A Cautionary Tale of Failure to Consult with an Indigenous Community
Case Study: Use of Traditional Ecological Knowledge to Support Revegetation at a Former Uranium Mill Site
Additional Information
Supplemental Resources
References
Acronyms
Glossary
Acknowledgments
Team Contacts
Navigating this Website
Document Feedback

 

Environmental Data Management (EDM) Best Practices
HOME

Case Study: USGS Challenges with secondary use of multi-source water quality monitoring data

1 INTRODUCTION

Over the past 30 years, the U.S. Geological Survey (USGS) lost many stream monitoring sites, primarily to budget cuts. To help offset the loss, they investigated using existing, publicly accessible water quality monitoring data from all 50 states to conduct nutrients trend analyses. They documented their effort in a peer-reviewed paper, “Challenges with secondary use of multi-source water-quality data in the United States” (Sprague, Oelsner, and Argue 2017). Key takeaways are: 

  • USGS surveyed 25 million records from 322,000 river sites and 488 environmental organizations, including federal, regional, state, tribal, county, academic, nongovernmental, volunteer, and private.
  • Of those records, 14.5 million (more than half) were either missing or had unclear metadata and could not be used.
  • USGS estimated that the unusable data were worth $12 billion.

2 SECONDARY DATA USE

Most environmental monitoring data are collected for a specific purpose, known as primary data use. The primary data user is familiar with their own data. Even if they don’t use a standard metadata format, they know, for example, that “nitrate” means nitrate in the NO3 form.Publicly accessible environmental data management systems (EDMS) allow for data use beyond its primary use. Using existing data for new purposes is secondary data use. This can be beneficial for many types of studies, like the USGS trend analyses. However, the data’s value is reduced if they don’t have standardized, essential metadata and valid values. If used with incorrect assumptions, these data can result in flawed or incorrect conclusions. For example, a secondary user of nitrate data wouldn’t know if it was the N or NO3 form, without proper metadata.

3 PROBLEM METADATA AREAS

USGS found issues with the following types of results-level metadata.

3.1 Parameter Name

A single parameter (characteristic; analyte) can have many names or synonyms (aliases). A parameter name can also contain additional information, such as chemical form (for example, “nitrate as N”). USGS counted 1,046 unique nutrient parameter names in the data they surveyed. They reduced these to 10 commonly monitored nutrient parameters, including ammonia, nitrate, and orthophosphate. Most names were reconciled after thorough examination, but 115 names could not be reconciled and therefore remained unique. 

Example: Do “phosphorus,” and “phosphate-phosphorus” in Figure 1 mean “orthophosphate” or do they mean “total phosphorus, mixed forms”?

Figure 1: The same parameter under two different names can interfere with data analyses.
Source: Neumiller & Shumway 2017

Knowing the analytical method might help resolve some ambiguities, but it’s not always included with the data set.

3.2 Sample Fraction

Sample fraction (fraction analyzed; filtration status) is the portion of a water sample that was analyzed. Filtering or centrifuging removes particulates from water samples. Water samples may be filtered (dissolved portion), unfiltered (whole sample), or particulate (unfilterable or residual portion).

Missing or ambiguous sample fraction metadata affected roughly 12 million records (56%) in the USGS study. Sometimes it was part of the parameter name (for example, “dissolved Kjeldahl nitrogen”). Other times it was included in sample fraction codes, such as “F,” “D,” and “U” (representing filtered, dissolved, and unfiltered samples, respectively) that would not be clear to a secondary data user.

USGS also found that the use of “total” in parameter names creates ambiguity. Total can mean an unfiltered sample or a parameter made up of multiple species. 

Example: Nitrogen can include ammonia (NH3), ammonium (NH4), organic nitrogen, nitrite (NO2), and nitrate (NO3). How are data users supposed to know? USGS suggested using “mixed forms” to indicate a parameter with multiple species (that is, “total nitrogen, mixed forms” instead of “total nitrogen”).

3.3 Speciation

Certain nutrient parameters—nitrate, nitrite, ammonia, and orthophosphate—can be reported in elemental or molecular forms. This is known as speciation (method speciation; chemical form). 

Example: For nitrate, “nitrate as NO3” is the molecular form (counts nitrogen and oxygen atoms) and “nitrate as N” is the elemental form (counts only nitrogen atoms) (Figure 2). The nitrogen atom makes up 22.5% of a nitrate molecule. 45 milligrams per liter (mg/L) of “nitrate as NO3” equals 45 mg/L × 0.225, or 10 mg/L, of “nitrate as N.” This is why the U.S. Environmental Protection Agency (USEPA) maximum contaminant level (MCL) for “nitrate as NO3” is 45 mg/L and “nitrate as N” is 10 mg/L (Figure 3).

Lab-reported speciation is independent of the analytical method. Some methods specify what form to report in, but labs don’t always follow that.

Labs often report the molecular form without any qualification (that is, nitrate missing “as NO3”). They usually indicate the elemental form (that is, “as N”). However, a secondary data user can’t assume this.

Incorrect speciation assumptions can dramatically skew data analyses because conversion factors are significant. USGS found that over 4 million records were missing or had ambiguous speciation.

Figure 2: Nitrate as NO3 and as N, with and without oxygen atoms.
Source: Neumiller & Shumway 2017
Figure 3: You can’t compare apples to oranges. MCL for nitrate as NO3 and nitrate as N.
Source: Neumiller & Shumway 2017

3.4 Data Qualifiers

Data qualifiers (qualifier codes; remark codes; detection condition) explain censored data (that is, values less than reporting or detection limits). One of the most common data qualifiers is “non-detect,” a value below the detection limit. The code is usually “U.” 

USGS found almost 600 unique data qualifiers or data quality remarks in the data they surveyed. Many of these were in comment fields and not a dedicated field. This makes them difficult to find. Others were ambiguous. Censored data was completely missing from the data sets of 53 organizations. All of these situations create potential bias in data analyses.

USGS found missing or ambiguous data qualifiers for 124,523 records. 

3.5 Missing or Zero Values

USGS found over 600,000 records with values of zero, negative values, or censored data with no values. Because nutrient data can’t have negative values, they were assumed to be errors. Zero values aren’t possible either; these may have been non-detects. Some censored data had no values and/or detection limits or reporting limits. USGS found other combinations of ambiguous reporting as well. There was no way to confidently use this data.

4 USGS CONCLUSIONS

USGS concluded that the adoption of standardized metadata practices across organizations would dramatically increase the ability to reuse and share data. They cited these points:

  • Consistently use parameter naming conventions (or create ability to accommodate synonyms).
  • Limit use of “total” to parameters that contain multiple species (for example, total nitrogen, mixed forms).
  • Create dedicated field for sample fraction, using valid values “filtered,” “unfiltered,” and “particulate.”
  • Consistently report speciation, and units.
  • Create a dedicated field for data qualifiers and adopt consistent valid values.
  • Discontinue improper use of zero, negative, and missing values.

Although they didn’t evaluate it for this report, USGS mentioned the importance of documenting other information, such as geographic, sample, and laboratory methods in the metadata.

5 FURTHER ACTION

USGS’s findings lead to a year-long work group on nutrient data management. It included USGS and EPA Water Quality eXchange (WQX) staff and representatives of several state EDMSs. Their effort resulted in a document, Best Practices for Submitting Nutrient Data to the Water Quality eXchange (WQX) (USEPA 2017), which is published on the EPA WQX website.

6 REFERENCES AND ACRONYMS

The references cited in this fact sheet, and the other ITRC EDM Best Practices fact sheets, are included in one combined list that is available on the ITRC web site. The combined acronyms list is also available on the ITRC web site.

image_pdfPrint this page/section


EDM

Home
glossaryGlossary
referencesReferences
acronymsAcronyms
ITRC
Contact Us
About ITRC
Visit ITRC
social media iconsClick here to visit ITRC on FacebookClick here to visit ITRC on TwitterClick here to visit ITRC on LinkedInITRC on Social Media
about_itrc
Permission is granted to refer to or quote from this publication with the customary acknowledgment of the source (see suggested citation and disclaimer). This web site is owned by ITRC • 1250 H Street, NW • Suite 850 • Washington, DC 20005 • (202) 266-4933 • Email: i[email protected] • Terms of Service, Privacy Policy, and Usage Policy ITRC is sponsored by the Environmental Council of the States.