Four Ways in which we’ve Improved the Data in v6

NEM-Review has been used by clients in the NEM since as early as 2001.

Over the 10 years since NEM-Review was first introduced, the NEM has developed significantly – and the needs of our clients with respect to the NEM has developed significantly as well.  Hence, with v6, we have invested many person-weeks of time in a “Data Cleansing” process designed to ensure that NEM-Review 6 is keeping pace with the times.

This Data Cleansing process has been designed to deliver added value in four ways:

  1. We have updated a number of factors associated at a Station or Unit level
  2. We have filled a small number of gaps in the data that had inadvertently slipped through the update process
  3. We have, in a small number of cases, corrected data
  4. We have implemented a framework for testing coded calculations.

In this post we will explain each in turn:

1)  Updating Generator Factors and Information

New to v6, we’ve added in a series of derived data sets – such as unit revenue, CO2 emissions, and coal burn.  These derived data sets are able to be trended and analysed in the same way that the raw AEMO data can be analysed.

To facilitate this, we’ve utilized a series of estimated factors that are contained in the NEM-Review database.  You can view these factors in the Generator Catalogue (another new addition to v6).

These factors were present in v5, but not used within the core application.  Given that they will be much more heavily used in v6, we’ve invested time in updating these factors with the following objectives in mind:

(a)    Wherever possible, utilize sources for these factors in the public domain.
(b)    Wherever possible, utilize a single source for these factors, to deliver comparability of the results obtained from one unit to another.

As such, we’ve standardized on the following references for many of the factors included in the database:

ACIL Tasman.  April 2009.  “Fuel resource, new entry and generation costs in the NEM”
located at http://www.aemo.com.au/planning/419-0035.pdf

and

AEMO.  14 May 2010.  “Registration and exemption listing”
located at http://www.aemo.com.au/registration/registration.html

The following tables provide an indication of what factors are included in the NEM-Review 6 database, why they are included, and how we have reviewed the data as part of the cleansing process:

(a)  Factors At a Unit Level

Some of the factors are stored in the NEM-Review database at a unit level, meaning that they are able to be varied across units in a given station:

What’s the purpose for inclusion? How have we cleansed the data?
Fuel Type This is used as one of several methods by which the available data sets can be sorted, and totaled. We’ve reviewed the AEMO document, and corrected any errors.
Generation Technology This is used as one of several methods by which the available data sets can be sorted, and totaled. We’ve reviewed the AEMO document, and corrected any errors.
NEMMCO Classification This is used as one of several methods by which the available data sets can be sorted, and totaled. We’ve reviewed the AEMO document, and corrected any errors.
OEM Turbine Supplier This is used as one of several methods by which the available data sets can be sorted, and totaled. We have reviewed a wide range of sources.
Participant Type This is used as one of several methods by which the available data sets can be sorted, and totaled. We’ve reviewed the AEMO document, and corrected any errors.
Marginal Loss Factor The MLF (which varies each financial year) was obtained from AEMO. We’ve sourced this data directly from AEMO.

(b)  Factors At a Station Level

Some of the factors are stored in the NEM-Review database at a station level, meaning that they are uniform for all units within a station:

What’s the purpose for inclusion? How have we cleansed the data?
Bidders The Bidders classification is used to represent the organization responsible for bidding the station’s output into the NEM – and factors in the Non-Scheduled class.

This is used as one of several methods by which the available data sets can be sorted, and totaled.

We’ve reviewed the AEMO document, and other sources, to make a professional judgement.
Owners The Owners classification has been used to identify the ultimate owner (or owners) of the asset.

This is used as one of several methods by which the available data sets can be sorted, and totaled.

We’ve reviewed the AEMO document, and other sources, to make a professional judgement.
Traders The Bidders classification is used to represent the organization that is paid the spot price for the output – and factors in the Non-Market class.

This is used as one of several methods by which the available data sets can be sorted, and totaled.

We’ve reviewed the AEMO document, and other sources, to make a professional judgement.
Auxiliary Factors These are used to calculate sent-out generation on a unit basis. We’ve used the factors in the ACIL Tasman document, where they exist – and other sources where they don’t
Coal Quality These are used to calculate coal burn (i.e. for coal plant). We’ve used the factors in the ACIL Tasman document, where they exist – and other sources where they don’t
Heat Rate These are used to calculate energy burn (i.e. for thermal plant). We’ve used the factors in the ACIL Tasman document, where they exist – and other sources where they don’t
Emissions Factors These are used to calculate CO2 emissions on a unit basis. We’ve used the factors in the ACIL Tasman document, where they exist – and other sources where they don’t

2)  Filling gaps in the database

Calculations in NEM-Review are performed with respect to the raw data set that represents 17,520 time points per year, over more than 11 years, for each raw data set published by AEMO and updated through to NEM-Review.

On a small number of occasions in the past, the data update process used by previous versions of NEM-Review has missed individual data points.  As a result we have:

(a)  Corrected the data update process, to reduce the risk of this happening in future;
(b)  Filled in gaps in the data from the source CSV files obtained from AEMO; and
(c)  Developed a testing tool that we will continue to use in future to ensure that no further gaps are introduced in future.

This data cleansing was done with respect to the following data sets:

(a)    Regional pool price

(b)    Regional cumulative price

(c)  Regional available generation

(d) Regional demand

(e)  Regional intermittent generation

(f)  Interconnector flow and loss

(g)  Interconnector import and export limit

(h)  Generator unit available generation

(i)  Generator unit output

(j)  FCAS data

As a result of our efforts, we are confident that the database supplied with NEM-Review 6 is complete with respect to all of the data supplied – with the exception of these known data gaps in NEM-Review, which still exist in NEM-Review 6 because of gaps in the original source files supplied to us by AEMO.  We are following this up and will feed through the missing data when it becomes available to us.

3)  Correcting data errors

With v5, the processing of price revisions was a manual process.  As such, there were a couple of instances where price revisions were missing.

As a result we have:

(a)  Corrected the data update process, to reduce the risk of this happening in future;
(b)  Reprocessed the data from the source CSV files obtained from AEMO; and
(c)  Performed a benchmarking of the price data in NEM-Review 6 against the average daily prices published in the AEMO Average Price Tables to ensure that they were all identical (within a tolerance allowed for rounding).

4)  Testing Data Selection and Calculation

To check that calculations had not been broken in the development of v6, we ran 50 queries in both NEM-Review v5 and v6 (on the same set of data), and compared the results.

For queries where the output from v5 and v6 differed, the results were analysed to determine the cause of the differences.

(a)  In some cases the cause was a known issue in v5 that had been fixed in v6.

(b)  In cases where the differences were due to calculation errors in v6, these were corrected by our development team.

This test has been an effective test, as the v5 and v6 code bases are entirely different, and coded using a different architecture and language.

This set of tests covered several of the more commonly used types queries in NEM-Review:

  • Regional pool prices and demand
  • Interconnector flow and loss
  • Generator output and available generation
  • Intermittent generation
  • Trading data, time of day, daily, weekly, monthly, quarterly, seasonal, yearly (calendar and financial) calculated data
  • Mean, max, min and total calculations
  • A variety of date ranges, both long (start of the NEM until now) and short
  • Limited by days of the week or times during the day

* * *

5)  For more information

If you’d like to know more detail about our testing procedures or think we’ve missed something, please leave us a comment below or contact us directly.

Comments are closed.