# Data Cleaning The data provided by the Alaska Energy Authority (AEA) is in its most raw format. This means there has been no quality control on the data. The data has many user entry errors, impossible entries, or entries that have been done automatically - resulting in negative numbers when a variable was not entered into the form. On this page, a number of data cleaning rules are described that were enforced to produce the cleaned version of the data. The uncleaned version of the data is also available on the download page. ```{seealso} These data cleaning rules are inspired by the work of [Meléndez 2014](https://github.com/acep-uaf/pce-database/blob/main/docs/PCE_DataCleaningImputation_MethodologyNotes_09172014.pdf). This work showcased data cleaning methods for a previous version of the PCE UMR data that was last updated in 2014. Since then, the type of data we receive is slightly different, and some cleaning rules have been modified to account for this. ``` ## User Experience The following are improvements made to the dataset to facilitate a better user experience. - Community codes (PCE reporter ID) are added to each record. - Column names must be lowercase with underscores. - A calendar date is added to each record. - Intertie information is added to each record where applicable. - There cannot be duplicate records. ## Illogical or Erroneous Data The following are logical rules for the data. When these rules are violated, we remove the data point in question (replace it with a NULL value). - Residential rates ($/kWh) cannot be less than or equal to zero, or greater than $9.98/kWh. - PCE rates ($/kWh) cannot be less than or equal to zero. - Effective rates ($/kWh) cannot be less than zero. - Non-fuel expenses ($) cannot be less than or equal to zero. - Powerhouse consumption (kWh) (station service) cannot be less than or equal to zero. - Monthly peak consumption (kW) cannot be less than or equal to zero. - Fuel prices ($/gal) cannot be less than or equal to zero. - Fuel use (gal) cannot be less than or equal to zero, or be reported along with a diesel efficiency (kWh/gal) values outside of 5 kWh/gal to 25 kWh/gal. - Fuel cost ($) cannot be less than or equal to zero, or be reported along with a diesel efficiency (kWh/gal) values outside of 5 kWh/gal to 25 kWh/gal. - Diesel efficiency (kWh/gal) cannot be outside of 5 kWh/gal to 25 kWh/gal. - Diesel generation (kWh) cannot be negative or zero. However, zero is allowed if the community is intertied with another. - Hydropower generation (kWh) cannot be negative. - Other (1) generation (kWh)cannot be negative. - Other (2) generation (kWh) cannot be negative. - Residential customer sales (kWh) cannot be less than or equal to zero. - Commercial, community, government, and unbilled sales (kWh) cannot be negative. - Residential customers (accounts) cannot be less than or equal to zero. - Commercial, community, government, unbilled, and other customers (accounts) cannot be negative. - PCE eligible sales (kWh) for the residential and community facility customer classes cannot be negative. Total PCE eligible sales (kWh) cannot be negative either. ## Data Modifications There are some instances where we are reasonably confident in modifying the data. - PCE eligible sales (kWh) for the commercial customer class should always be zero for our data years. We enforce this rule throughout the dataset. - The Alaska Village Electric Cooperative (AVEC) and the Alaska Power Company (APC) report non-fuel expenses for each community on an annual basis. When this occurs, we distribute this equally across each month of the data. - The other generation types have been replaced with the consistent set of fuel types used in Catalyst Co-ops's [PUDL dataset](https://catalystcoop-pudl.readthedocs.io/en/stable/index.html), derived from EIA data.