database.resources.cleaning#
This script contains data cleaning functions.
Functions#
|
Adds calendar datetime object to the data. |
|
Adds the community codes (PCE Reporter ID) to the dataframe. |
|
Adds intertie data to the PCE data. |
|
Applies the data cleaning rules. |
|
Applies the pre-cleaning rules for the original data. |
|
Sets commercial customers to NULL based on cleaning rule. |
|
Sets commercial sales to NULL based on cleaning rule. |
|
Sets community facility customers to NULL based on cleaning rule. |
|
Sets community sales to NULL based on cleaning rule. |
|
Sets the diesel efficiency to NULL based on cleaning rule. |
|
Sets the diesel generation to NULL based on cleaning rule. |
|
Sets the effective rate to NULL based on cleaning rule. |
|
Sets the fuel cost to NULL based on cleaning rule. |
|
Sets the fuel price to NULL based on cleaning rule. |
|
Sets the fuel use to NULL based on cleaning rule. |
|
Sets government facility customers to NULL based on cleaning rule. |
|
Sets government sales to NULL based on cleaning rule. |
|
Sets hydropower generation to NULL based on cleaning rule. |
|
Sets the nonfuel expenses to NULL based on cleaning rule. |
|
Distributes the annualized non-fuel expenses for AVEC and APC. |
|
Sets other customers to NULL based on cleaning rule. |
|
Corrects typos and inconsistencies in the other generation types by replacing them with a standard |
|
Sets other (1) generation to NULL based on cleaning rule. |
|
Sets other (2) generation to NULL based on cleaning rule. |
|
Sets PCE eligible sales for commercial customers to zero based on cleaning rule. |
|
Sets PCE eligible sales to NULL based on cleaning rule. |
|
Sets the PCE rate to NULL based on cleaning rule. |
|
Sets the peak consumption to NULL based on cleaning rule. |
|
Sets the powerhouse consumption to NULL based on cleaning rule. |
|
Sets residential customers to NULL based on cleaning rule. |
|
Sets residential rates to NULL based on cleaning rule. |
|
Sets residential sales to NULL based on cleaning rule. |
|
Sets unbilled customers to NULL based on cleaning rule. |
|
Sets unbilled sales to NULL based on cleaning rule. |
|
Converts column headers to a standardized format. |
|
Reformat the intertie lookup table for use in database. |
|
Removes duplicate records. |
Module Contents#
- database.resources.cleaning.add_calendar_dates(dataframe: polars.DataFrame)#
Adds calendar datetime object to the data.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization Data.
- Returns:
PCE data with calendar dates.
- Return type:
pl.DataFrame
Note
Currently implements a manual fix to data entry error.
- database.resources.cleaning.add_community_codes(pce_dataframe: polars.DataFrame, identity_dataframe: polars.DataFrame)#
Adds the community codes (PCE Reporter ID) to the dataframe.
- Parameters:
pce_dataframe (pl.DataFrame) – Original PCE data.
identity_dataframe (pl.DataFrame) – Identity lookup table.
- Returns:
PCE data with community codes.
- Return type:
pl.DataFrame
- database.resources.cleaning.add_interties(pce_dataframe: polars.DataFrame, intertie_dataframe: polars.DataFrame)#
Adds intertie data to the PCE data.
- Parameters:
pce_dataframe (pl.DataFrame) – Power Cost Equalization data.
intertie_dataframe (pl.DataFrame) – Intertie data.
- Returns:
Power Cost Equalization data with interties.
- Return type:
pl.DataFrame
Note
PCE data should be column normalized and have calendar dates. Intertie data should be the result of reformat_interties().
- database.resources.cleaning.apply_cleaning_rules(dataframe: polars.DataFrame, interties: polars.DataFrame, gen_type_map: polars.DataFrame)#
Applies the data cleaning rules.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
intertie (pl.DataFrame) – Intertie data.
- Returns:
Cleaned Power Cost Equalization data
- Return type:
pl.DataFrame
- database.resources.cleaning.apply_pre_cleaning_rules(raw_pce_dataframe: polars.DataFrame, community_identity: polars.DataFrame, intertie_historical: polars.DataFrame, intertie_coordinates: polars.DataFrame)#
Applies the pre-cleaning rules for the original data.
- Parameters:
raw_pce_dataframe (pl.DataFrame) – Original PCE UMR data.
community_identity (pl.DataFrame) – Community-code lookup table.
intertie_historical (pl.DataFrame) – Historical intertie table.
intertie_coordinates (pl.DataFrame) – Coordinates for interties table.
- Returns:
PCE UMR data with column normalizations and .
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_commercial_customers(dataframe: polars.DataFrame)#
Sets commercial customers to NULL based on cleaning rule.
- Commercial customers cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned commercial customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_commercial_sales(dataframe: polars.DataFrame)#
Sets commercial sales to NULL based on cleaning rule.
- Commercial sales cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned commercial sales.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_community_customers(dataframe: polars.DataFrame)#
Sets community facility customers to NULL based on cleaning rule.
- Community facility customers cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned community facility customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_community_sales(dataframe: polars.DataFrame)#
Sets community sales to NULL based on cleaning rule.
- Community sales cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned community sales.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_diesel_efficiency(dataframe: polars.DataFrame)#
Sets the diesel efficiency to NULL based on cleaning rule.
Diesel efficiency cannot be less than 5 kWh/gal or more than 25 kWh/gal.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned diesel efficiency.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_diesel_generation(dataframe: polars.DataFrame)#
Sets the diesel generation to NULL based on cleaning rule.
Diesel generation cannot be negative. Zero is allowed if the community is interconnected.
Diesel generation cannot be less than or equal to zero when
Can be zero if interconnected to another community.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned diesel efficiency.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_effective_rate(dataframe: polars.DataFrame)#
Sets the effective rate to NULL based on cleaning rule.
Effective rates cannot be less than zero.
\[\{x \in \mathbb{R} | 0 \le x \}\]- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned PCE rates.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_fuel_cost(dataframe: polars.DataFrame)#
Sets the fuel cost to NULL based on cleaning rule.
Fuel cost cannot be less than or equal to zero if diesel efficiency is less than 5 kWh/gal or more than 25 kWh/gal.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned fuel cost.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_fuel_price(dataframe: polars.DataFrame)#
Sets the fuel price to NULL based on cleaning rule.
- Fuel price cannot be less than or equal to zero.
0 <= x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned fuel prices.
- Return type:
pl.DataFrame
Note
This needs to be updated to include lower and upper bound thresholds as well.
- database.resources.cleaning.clean_fuel_use(dataframe: polars.DataFrame)#
Sets the fuel use to NULL based on cleaning rule.
Fuel use cannot be less than or equal to zero if diesel efficiency is less than 5 kWh/gal or more than 25 kWh/gal.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned fuel use.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_government_customers(dataframe: polars.DataFrame)#
Sets government facility customers to NULL based on cleaning rule.
- Government facility customers cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned government facility customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_government_sales(dataframe: polars.DataFrame)#
Sets government sales to NULL based on cleaning rule.
- Government sales cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned government sales.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_hydropower(dataframe: polars.DataFrame)#
Sets hydropower generation to NULL based on cleaning rule.
- Hydropower generation cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned hydropower generation.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_nonfuel_expenses(dataframe: polars.DataFrame)#
Sets the nonfuel expenses to NULL based on cleaning rule.
Nonfuel expenses cannot be less than or equal to zero.
\[\{x \in \mathbb{R} | 0 < x \}\]- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned nonfuel expenses.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_nonfuel_expenses_distribution(dataframe: polars.DataFrame)#
Distributes the annualized non-fuel expenses for AVEC and APC.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with distributed annualized non-fuel expenses.
- Return type:
pl.DataFrame
Note
The Alaska Village Electric Cooperative and the Alaska Power Company report non-fuel expenses annually and the annual value is recorded in the month of June. The annual amount is distributed equally over twelve months of the year.
- database.resources.cleaning.clean_other_customers(dataframe: polars.DataFrame)#
Sets other customers to NULL based on cleaning rule.
- Other customers cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned other customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_other_gen_types(dataframe: polars.DataFrame, gen_type_map: polars.DataFrame)#
Corrects typos and inconsistencies in the other generation types by replacing them with a standard set which was defined by hand and stored as JSON.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
gen_type_map (pl.DataFrame) – 2 columns are the original and transformed names
- Returns:
Power Cost Equalization data with with distributed annualized non-fuel expenses.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_other_generation_1(dataframe: polars.DataFrame)#
Sets other (1) generation to NULL based on cleaning rule.
- Other generation cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned other (1) generation.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_other_generation_2(dataframe: polars.DataFrame)#
Sets other (2) generation to NULL based on cleaning rule.
- Other generation cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned other (2) generation.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_pce_eligible_commercial_sales(dataframe: polars.DataFrame)#
Sets PCE eligible sales for commercial customers to zero based on cleaning rule.
- PCE eligible sales for commercial customers must be zero.
0 == x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned PCE eligible sales to commercial customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_pce_eligible_sales(dataframe: polars.DataFrame)#
Sets PCE eligible sales to NULL based on cleaning rule.
- PCE eligible sales cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned PCE eligible sales.
- Return type:
pl.DataFrame
Note
Loops through residential and community facility customers. Also, the total.
- database.resources.cleaning.clean_pce_rate(dataframe: polars.DataFrame)#
Sets the PCE rate to NULL based on cleaning rule.
Rates cannot be less than zero.
\[\{x \in \mathbb{R} | 0 \le x \}\]- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned PCE rates.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_peak_consumption(dataframe: polars.DataFrame)#
Sets the peak consumption to NULL based on cleaning rule.
- Peak consumption cannot be less than or equal to zero.
0 <= x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned peak consumption.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_powerhouse_consumption(dataframe: polars.DataFrame)#
Sets the powerhouse consumption to NULL based on cleaning rule.
- Powerhouse consumption cannot be less than or equal to zero.
0 <= x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned powerhouse consumption.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_residential_customers(dataframe: polars.DataFrame)#
Sets residential customers to NULL based on cleaning rule.
- Residential customers cannot be less than or equal to zero.
0 <= x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned residential customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_residential_rates(dataframe: polars.DataFrame)#
Sets residential rates to NULL based on cleaning rule.
Rates cannot be less than or equal to zero, or larger than $9.98/kWh.
\[\{x \in \mathbb{R} | 0 < x \le 9.98 \}\]- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned residential rates.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_residential_sales(dataframe: polars.DataFrame)#
Sets residential sales to NULL based on cleaning rule.
- Residential sales cannot be less than or equal to zero.
0 <= x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned residential sales.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_unbilled_customers(dataframe: polars.DataFrame)#
Sets unbilled customers to NULL based on cleaning rule.
- Unbilled customers cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned unbilled customers.
- Return type:
pl.DataFrame
- database.resources.cleaning.clean_unbilled_sales(dataframe: polars.DataFrame)#
Sets unbilled sales to NULL based on cleaning rule.
- Unbilled sales cannot be less than zero.
0 < x
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with with cleaned unbilled sales.
- Return type:
pl.DataFrame
- database.resources.cleaning.normalize_column_names(dataframe: polars.DataFrame)#
Converts column headers to a standardized format.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization Data.
- Returns:
PCE data with normalized columns names.
- Return type:
pl.DataFrame
Note
Currently implements a manual fix to column misspelling.
- database.resources.cleaning.reformat_interties(lookup: polars.DataFrame, coordinates: polars.DataFrame)#
Reformat the intertie lookup table for use in database.
- Parameters:
lookup (pl.DataFrame) – Historical intertie lookup data.
coordinates (pl.DataFrame) – Intertie-community coordinate lookup data.
- Returns:
Interconnection data better suited for integration.
- Return type:
pl.DataFrame
Note
The historical intertie lookup table was taken from the 2011-21 Alaska Electricity Trends Report. This table needs to be reformatted so that it may be integrated further into the dataset.
- database.resources.cleaning.remove_duplicates(dataframe: polars.DataFrame)#
Removes duplicate records.
- Parameters:
dataframe (pl.DataFrame) – Power Cost Equalization data.
- Returns:
Power Cost Equalization data with no duplicate rows.
- Return type:
pl.DataFrame