database.resources.cleaning =========================== .. py:module:: database.resources.cleaning .. autoapi-nested-parse:: This script contains data cleaning functions. Functions --------- .. autoapisummary:: database.resources.cleaning.add_calendar_dates database.resources.cleaning.add_community_codes database.resources.cleaning.add_interties database.resources.cleaning.apply_cleaning_rules database.resources.cleaning.apply_pre_cleaning_rules database.resources.cleaning.clean_commercial_customers database.resources.cleaning.clean_commercial_sales database.resources.cleaning.clean_community_customers database.resources.cleaning.clean_community_sales database.resources.cleaning.clean_diesel_efficiency database.resources.cleaning.clean_diesel_generation database.resources.cleaning.clean_effective_rate database.resources.cleaning.clean_fuel_cost database.resources.cleaning.clean_fuel_price database.resources.cleaning.clean_fuel_use database.resources.cleaning.clean_government_customers database.resources.cleaning.clean_government_sales database.resources.cleaning.clean_hydropower database.resources.cleaning.clean_nonfuel_expenses database.resources.cleaning.clean_nonfuel_expenses_distribution database.resources.cleaning.clean_other_customers database.resources.cleaning.clean_other_gen_types database.resources.cleaning.clean_other_generation_1 database.resources.cleaning.clean_other_generation_2 database.resources.cleaning.clean_pce_eligible_commercial_sales database.resources.cleaning.clean_pce_eligible_sales database.resources.cleaning.clean_pce_rate database.resources.cleaning.clean_peak_consumption database.resources.cleaning.clean_powerhouse_consumption database.resources.cleaning.clean_residential_customers database.resources.cleaning.clean_residential_rates database.resources.cleaning.clean_residential_sales database.resources.cleaning.clean_unbilled_customers database.resources.cleaning.clean_unbilled_sales database.resources.cleaning.normalize_column_names database.resources.cleaning.reformat_interties database.resources.cleaning.remove_duplicates Module Contents --------------- .. py:function:: add_calendar_dates(dataframe: polars.DataFrame) Adds calendar datetime object to the data. :param dataframe: Power Cost Equalization Data. :type dataframe: pl.DataFrame :returns: PCE data with calendar dates. :rtype: pl.DataFrame .. note:: Currently implements a manual fix to data entry error. .. py:function:: add_community_codes(pce_dataframe: polars.DataFrame, identity_dataframe: polars.DataFrame) Adds the community codes (PCE Reporter ID) to the dataframe. :param pce_dataframe: Original PCE data. :type pce_dataframe: pl.DataFrame :param identity_dataframe: Identity lookup table. :type identity_dataframe: pl.DataFrame :returns: PCE data with community codes. :rtype: pl.DataFrame .. py:function:: add_interties(pce_dataframe: polars.DataFrame, intertie_dataframe: polars.DataFrame) Adds intertie data to the PCE data. :param pce_dataframe: Power Cost Equalization data. :type pce_dataframe: pl.DataFrame :param intertie_dataframe: Intertie data. :type intertie_dataframe: pl.DataFrame :returns: Power Cost Equalization data with interties. :rtype: pl.DataFrame .. note:: PCE data should be column normalized and have calendar dates. Intertie data should be the result of reformat_interties(). .. py:function:: apply_cleaning_rules(dataframe: polars.DataFrame, interties: polars.DataFrame, gen_type_map: polars.DataFrame) Applies the data cleaning rules. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :param intertie: Intertie data. :type intertie: pl.DataFrame :returns: Cleaned Power Cost Equalization data :rtype: pl.DataFrame .. py:function:: apply_pre_cleaning_rules(raw_pce_dataframe: polars.DataFrame, community_identity: polars.DataFrame, intertie_historical: polars.DataFrame, intertie_coordinates: polars.DataFrame) Applies the pre-cleaning rules for the original data. :param raw_pce_dataframe: Original PCE UMR data. :type raw_pce_dataframe: pl.DataFrame :param community_identity: Community-code lookup table. :type community_identity: pl.DataFrame :param intertie_historical: Historical intertie table. :type intertie_historical: pl.DataFrame :param intertie_coordinates: Coordinates for interties table. :type intertie_coordinates: pl.DataFrame :returns: PCE UMR data with column normalizations and . :rtype: pl.DataFrame .. py:function:: clean_commercial_customers(dataframe: polars.DataFrame) Sets commercial customers to NULL based on cleaning rule. Commercial customers cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned commercial customers. :rtype: pl.DataFrame .. py:function:: clean_commercial_sales(dataframe: polars.DataFrame) Sets commercial sales to NULL based on cleaning rule. Commercial sales cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned commercial sales. :rtype: pl.DataFrame .. py:function:: clean_community_customers(dataframe: polars.DataFrame) Sets community facility customers to NULL based on cleaning rule. Community facility customers cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned community facility customers. :rtype: pl.DataFrame .. py:function:: clean_community_sales(dataframe: polars.DataFrame) Sets community sales to NULL based on cleaning rule. Community sales cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned community sales. :rtype: pl.DataFrame .. py:function:: clean_diesel_efficiency(dataframe: polars.DataFrame) Sets the diesel efficiency to NULL based on cleaning rule. Diesel efficiency cannot be less than 5 kWh/gal or more than 25 kWh/gal. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned diesel efficiency. :rtype: pl.DataFrame .. py:function:: clean_diesel_generation(dataframe: polars.DataFrame) Sets the diesel generation to NULL based on cleaning rule. Diesel generation cannot be negative. Zero is allowed if the community is interconnected. Diesel generation cannot be less than or equal to zero when Can be zero if interconnected to another community. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned diesel efficiency. :rtype: pl.DataFrame .. py:function:: clean_effective_rate(dataframe: polars.DataFrame) Sets the effective rate to NULL based on cleaning rule. Effective rates cannot be less than zero. .. math:: \{x \in \mathbb{R} | 0 \le x \} :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned PCE rates. :rtype: pl.DataFrame .. py:function:: clean_fuel_cost(dataframe: polars.DataFrame) Sets the fuel cost to NULL based on cleaning rule. Fuel cost cannot be less than or equal to zero if diesel efficiency is less than 5 kWh/gal or more than 25 kWh/gal. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned fuel cost. :rtype: pl.DataFrame .. py:function:: clean_fuel_price(dataframe: polars.DataFrame) Sets the fuel price to NULL based on cleaning rule. Fuel price cannot be less than or equal to zero. 0 <= x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned fuel prices. :rtype: pl.DataFrame .. note:: This needs to be updated to include lower and upper bound thresholds as well. .. py:function:: clean_fuel_use(dataframe: polars.DataFrame) Sets the fuel use to NULL based on cleaning rule. Fuel use cannot be less than or equal to zero if diesel efficiency is less than 5 kWh/gal or more than 25 kWh/gal. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned fuel use. :rtype: pl.DataFrame .. py:function:: clean_government_customers(dataframe: polars.DataFrame) Sets government facility customers to NULL based on cleaning rule. Government facility customers cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned government facility customers. :rtype: pl.DataFrame .. py:function:: clean_government_sales(dataframe: polars.DataFrame) Sets government sales to NULL based on cleaning rule. Government sales cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned government sales. :rtype: pl.DataFrame .. py:function:: clean_hydropower(dataframe: polars.DataFrame) Sets hydropower generation to NULL based on cleaning rule. Hydropower generation cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned hydropower generation. :rtype: pl.DataFrame .. py:function:: clean_nonfuel_expenses(dataframe: polars.DataFrame) Sets the nonfuel expenses to NULL based on cleaning rule. Nonfuel expenses cannot be less than or equal to zero. .. math:: \{x \in \mathbb{R} | 0 < x \} :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned nonfuel expenses. :rtype: pl.DataFrame .. py:function:: clean_nonfuel_expenses_distribution(dataframe: polars.DataFrame) Distributes the annualized non-fuel expenses for AVEC and APC. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with distributed annualized non-fuel expenses. :rtype: pl.DataFrame .. note:: The Alaska Village Electric Cooperative and the Alaska Power Company report non-fuel expenses annually and the annual value is recorded in the month of June. The annual amount is distributed equally over twelve months of the year. .. py:function:: clean_other_customers(dataframe: polars.DataFrame) Sets other customers to NULL based on cleaning rule. Other customers cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned other customers. :rtype: pl.DataFrame .. py:function:: clean_other_gen_types(dataframe: polars.DataFrame, gen_type_map: polars.DataFrame) Corrects typos and inconsistencies in the other generation types by replacing them with a standard set which was defined by hand and stored as JSON. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :param gen_type_map: 2 columns are the original and transformed names :type gen_type_map: pl.DataFrame :returns: Power Cost Equalization data with with distributed annualized non-fuel expenses. :rtype: pl.DataFrame .. py:function:: clean_other_generation_1(dataframe: polars.DataFrame) Sets other (1) generation to NULL based on cleaning rule. Other generation cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned other (1) generation. :rtype: pl.DataFrame .. py:function:: clean_other_generation_2(dataframe: polars.DataFrame) Sets other (2) generation to NULL based on cleaning rule. Other generation cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned other (2) generation. :rtype: pl.DataFrame .. py:function:: clean_pce_eligible_commercial_sales(dataframe: polars.DataFrame) Sets PCE eligible sales for commercial customers to zero based on cleaning rule. PCE eligible sales for commercial customers must be zero. 0 == x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned PCE eligible sales to commercial customers. :rtype: pl.DataFrame .. py:function:: clean_pce_eligible_sales(dataframe: polars.DataFrame) Sets PCE eligible sales to NULL based on cleaning rule. PCE eligible sales cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned PCE eligible sales. :rtype: pl.DataFrame .. note:: Loops through residential and community facility customers. Also, the total. .. py:function:: clean_pce_rate(dataframe: polars.DataFrame) Sets the PCE rate to NULL based on cleaning rule. Rates cannot be less than zero. .. math:: \{x \in \mathbb{R} | 0 \le x \} :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned PCE rates. :rtype: pl.DataFrame .. py:function:: clean_peak_consumption(dataframe: polars.DataFrame) Sets the peak consumption to NULL based on cleaning rule. Peak consumption cannot be less than or equal to zero. 0 <= x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned peak consumption. :rtype: pl.DataFrame .. py:function:: clean_powerhouse_consumption(dataframe: polars.DataFrame) Sets the powerhouse consumption to NULL based on cleaning rule. Powerhouse consumption cannot be less than or equal to zero. 0 <= x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned powerhouse consumption. :rtype: pl.DataFrame .. py:function:: clean_residential_customers(dataframe: polars.DataFrame) Sets residential customers to NULL based on cleaning rule. Residential customers cannot be less than or equal to zero. 0 <= x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned residential customers. :rtype: pl.DataFrame .. py:function:: clean_residential_rates(dataframe: polars.DataFrame) Sets residential rates to NULL based on cleaning rule. Rates cannot be less than or equal to zero, or larger than $9.98/kWh. .. math:: \{x \in \mathbb{R} | 0 < x \le 9.98 \} :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned residential rates. :rtype: pl.DataFrame .. py:function:: clean_residential_sales(dataframe: polars.DataFrame) Sets residential sales to NULL based on cleaning rule. Residential sales cannot be less than or equal to zero. 0 <= x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned residential sales. :rtype: pl.DataFrame .. py:function:: clean_unbilled_customers(dataframe: polars.DataFrame) Sets unbilled customers to NULL based on cleaning rule. Unbilled customers cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned unbilled customers. :rtype: pl.DataFrame .. py:function:: clean_unbilled_sales(dataframe: polars.DataFrame) Sets unbilled sales to NULL based on cleaning rule. Unbilled sales cannot be less than zero. 0 < x :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with with cleaned unbilled sales. :rtype: pl.DataFrame .. py:function:: normalize_column_names(dataframe: polars.DataFrame) Converts column headers to a standardized format. :param dataframe: Power Cost Equalization Data. :type dataframe: pl.DataFrame :returns: PCE data with normalized columns names. :rtype: pl.DataFrame .. note:: Currently implements a manual fix to column misspelling. .. py:function:: reformat_interties(lookup: polars.DataFrame, coordinates: polars.DataFrame) Reformat the intertie lookup table for use in database. :param lookup: Historical intertie lookup data. :type lookup: pl.DataFrame :param coordinates: Intertie-community coordinate lookup data. :type coordinates: pl.DataFrame :returns: Interconnection data better suited for integration. :rtype: pl.DataFrame .. note:: The historical intertie lookup table was taken from the 2011-21 Alaska Electricity Trends Report. This table needs to be reformatted so that it may be integrated further into the dataset. .. py:function:: remove_duplicates(dataframe: polars.DataFrame) Removes duplicate records. :param dataframe: Power Cost Equalization data. :type dataframe: pl.DataFrame :returns: Power Cost Equalization data with no duplicate rows. :rtype: pl.DataFrame