How to Work With Satellite-Based Sustainability Data

by Jeremy Tamanini (Founder, Dual Citizen LLC)

Data. It is a word we hear a lot of in sustainability circles. Big data. Climate data. Data analytics. Data-driven insights. Data collection. Satellite-based data. Less often, though, do we delve deeper into where these data come from, how to assess their relevance or accuracy, and what questions they are best suited to help answer. I am guilty of this lately with “satellite-based” data, a data collection method that has long monitored climate and environmental metrics. These data have recently been enhanced by artificial intelligence and machine learning applications. Dual Citizen clients and the datasets referenced in the Global Green Economy Index are increasingly integrating these satellite-based data. This brief insight shares our recent experience working with these datasets, ways to validate their utility, and how to best activate them to promote sustainability agendas.

Some quick background on what got me excited about this topic. Publishing the Global Green Economy Index brings me close to country-level data related to various aspects of the green economy. The traditional “bottom-up” methods for generating these data (e.g. country reporting, sector-based estimates, and modeled datasets) are not always as accurate as required, and lack timeliness and granularity. Recently, “top-down” methods for data collection from satellites, sensors and other technology-enabled tools introduced a new approach to collecting and analyzing these data. AI plays a central role in this process, automating systems for data capture and teaching machines to translate images and observations into datasets related to different green economy topics. These topics include GHG emissions, land-use patterns, and site-specific readings from company assets generating power, manufacturing goods, or extracting raw materials. New initiatives employing AI related to GHG emissions include Climate TRACE, Kayrros, CarbonMapper, and GHGSat; platforms focused on agriculture, land-use patterns, and biodiversity include SkyWatch, Planet, and Gro Intelligence.

Bottom-up methods for measuring emissions

When considering datasets generated by satellites, it is valuable to consider the bigger picture of how emission measurements are generally derived. Most emission inventories – linked to power generation, transportation, buildings or land use – are based on modeled estimates. Modeled emissions data linked to thousands of individual sites in a given city, region or country are aggregated to constitute these inventories. In the simplest way: data on types of emission-producing activities are combined with the emissions typically associated with those activities. This process takes time and is obviously imperfect. The process of collecting, modeling, and validating these data is very time-consuming. Regulations can help: in the United States, Environmental Protection Agency (EPA) guidelines often require a continuous emission monitoring system (CEMS) to measure active GHG emissions from a specific source. For example, the EPA requires many industrial facilities to install CEMS to ensure compliance with the Clear Air Act. These monitoring components like sensors offer critical inputs to models that track the typical emissions associated with different activities (e.g. power generation).

Satellites and Top-down methods for measuring emissions

Satellites add a different layer to this data collection process. Satellites can have “eyes” on these emission-producing activities continuously. This means that the granularity of these data improve: satellites can observe emissions from multiple sites in real-time. But how do these more complete observations translate into actual datasets? Just because satellites provide more observational granularity doesn’t mean that these readings magically translate into actual data measurements.

This is where the traditional “bottom-up” approach to emissions measurement and the “top-down” one associated with satellites intersect, often through the application of artificial intelligence (AI). Emissions values from actual sites provide “ground-truth data” that help translate satellite based observations into actual measurements. For example, a satellite may observe smokestacks from a coal-fired power plant, but it is only with the integration of actual emissions values associated with the site (or ones similar to it) that the visual observation can be transformed into a dataset. AI matters because a single site or type of site (e.g. coal-fired power plant) can have many emissions data observations globally. These various “ground-truth data” become the training inputs to AI models that over time, produce their own (also imperfect) measurements of site-based emissions.

Case study: Climate TRACE

Climate TRACE is a non-profit coalition of organizations building an inventory of where exactly GHG emissions are coming from, often enhanced by satellite-based measurements and AI. The coalition approach is intuitive: Climate TRACE is in effect a consortium of organizations collecting data on emissions from a variety of sectors, including power (WattTime), shipping (OceanMind), and cement and steel (TransitionZero). These member organizations employ different approaches to data collection, often blending bottom-up and top-down methods. For example, OceanMind collects shipping data from Automatic Identification Systems (AIS) which provide information on a ship’s location, speed, engine power, size and additional factors relevant to predicting emissions. TransitionZero fills in gaps in emissions reporting on cement and steel production by training satellites to see production “hotspots,” contributing tracking to these activities that didn’t exist before. And WattTime tracks site-specific GHG emissions from power plants.

These efforts by various consortium members underlie the Climate TRACE emissions database, which as of December 2023 tracked more than 352 million assets globally. These individual assets – ranging from industrial activities (power plants, steel mills, ships and oil refineries) to land-use ones (fertilizer application, deforestation, wildfires) are aggregated to produce emissions estimates on the country-level and sector-level. They are also represented in a global map, where users can conveniently click on an asset and explore available data. Living in New York City, I was curious on some of the assets covered in my neighborhood. The closest asset was a power plant on 59th Street in Manhattan where Climate TRACE reported 23,000 metric tons of CO2 equivalent in 2022. In addition to this value, Climate TRACE shares a level of data confidence for each asset, on attributes like emissions, capacity, capacity factor and activity. This helps users assess the veracity of the data measurement when integrating these data into decision making.

Issues to Consider Working with Satellite-Based Data

Coverage: The intersection of satellite-based data and “ground-truth reporting” is rapidly expanding the site-specific coverage of GHG emissions measurements. This provides much more granular climate data to market-actors as well as watchdog groups keen on keeping companies honest around their scope 1, 2, and 3 emissions. However, it is important to remember that databases like Climate TRACE are still early-on in their development, and it will take time for the quality and coverage of the ground-truth data to be sufficient to train AI algorithms to estimate emissions measurement with consistently higher-levels of confidence.

Key takeaway: check the confidence-level of any measurements derived from a platform like Climate TRACE and validate against other measurements from different sources.

Compatibility: As users gain sophistication with sustainability data, a common mistake is integrating data from various sources that may appear to be measuring the same attribute, but actually are not. This pitfall is particularly salient when working with satellite-based data and datasets from more traditional sources. More traditional sources may label a site-specific emissions value as reference year 2022 when in fact these data could have been based on measurements from 2000. Satellite-based data, on the other hand, could also be labeled as reference year 2022 for the same site, and in reality be an average of emissions values observed during the year 2022. It would be a mistake to assume these two data sources are measuring the same thing, as they each are referencing different time periods,

Key takeaway: check carefully the underlying source, methodology and time period for all data used in proprietary models and carefully notate the results of this check to promote transparency and confidence in your model’s output and associated conclusions.

Continuous Learning: AI models are continuously learning and over time, the relationship between different sites/assets and the amount of different GHGs being emitted will strengthen, with higher confidence measurements. This process could happen quite rapidly, with fast improvements in the accuracy of asset measurements. This means that values derived from a platform like Climate TRACE are always changing and updating, hopefully mostly in the direction of higher confidence readings. When using these data for internal purposes, users should formalize a plan for integrating these data updates in real-time to ensure that any models relying on them are taking full advantage of the greater timeliness and accuracy over time.

Key takeaway: make a plan and confirm internal capacity for integrating frequent updates to data derived from real-time sources like satellites. Failing to do this leaves your company or organization at a competitive disadvantage: after all, one of the main benefits of these data are granularity, timeliness and coverage, so configure your team to fully leverage this!

Helping you Navigate

The intersection of artificial intelligence and sustainability has been on my mind since 2020, when we published an insight on the topic, authored by strategic advisor Karuna Ramakrishnan. To follow up, we convened a webinar with experts employing AI in the realm of waste management (Conor Riffle, Rubicon), finance (Faiz Sayed, Aquantix AI), and nature-based solutions (Nan Pond, formerly of NCX). In 2020, the foundation was being set for AI in the sustainability space, and our research identified multiple ways in which these new applications could accelerate green breakthroughs in the 2020s.

In the 2020s, AI is everywhere and holds huge potential to accelerate sustainability and climate action. Finding the right balance between human and AI-centered decision-making will be paramount for all companies and organizations, with data collection, management and reporting at the center of this challenge. For more information on how to integrate AI tools with your sustainability strategy, take a look at a recent webinar with Knowledge Group Consulting (Abu Dhabi) here, and work with clients in this space, including recent remarks on AI & Building and Construction. And be in touch if you would like to talk more about how to advance AI x sustainability in your company or organization. Contact me here.

Contact us - we'd love to hear from you.