Datasets¶
CceeOpenDataEnergyGc(ccee)
¶
Class used for handling CCEE Energy at Gravity Center. Can be accessed via ccee.opendata.energygc.
Values come from the "geracao_horaria_usina", which is available at https://dadosabertos.ccee.org.br/dataset/geracao_horaria_usina.
Previously this data was acquired from the "parcela_usina_montante_mensal" dataset, but it we moved it to a hourly dataset to have more granularity.
Source code in echo_ons/ccee_root.py
def __init__(self, ccee: e_o.Ccee) -> None:
"""Base class that all subclasses should inherit from.
Parameters
----------
ccee : Ccee
Top level object carrying all functionality and the connection handler.
"""
self._ccee: e_o.Ccee = ccee
get(period, spes=None, ccee_names=None, columns='spe', get_details=False)
¶
Get the Energy at Gravity Center.
Values are returned in MWavg per hour. timestamp hour represents the start of the interval
Most useful columns are:
- spe: SPE name / CCEE Name / Site code, depending on the columns parameter; will be the columns if get_details is False;
- timestamp: Timestamp of the value; Will be the index if get_details is False;
- value: GERACAO_CENTRO_GRAVIDADE;
- internal_loss_factor: FATOR_PERDA_INTERNA (only acquired if get_details is True);
- shared_loss_factor: FATOR_RATEIO_PERDA_GERACAO (only acquired if get_details is True);
Parameters:
-
(period¶DateTimeRange) –Period to get the data for. As the data is monthly, the start and end dates will be adjusted to the first and last day of the month.
-
(spes¶list[str] | None, default:None) –List of SPEs to filter by. It is the name as registered in performance database. If set to None will get all SPEs, by default None
If both
spesandccee_namesare None, will get all SPEs.This cannot be used with
ccee_namesat the same time.If
spesis used, the SPEs must have the attributeccee_spe_nameset with the correct in the performance database. -
(ccee_names¶list[str] | None, default:None) –List of CCEE Names (SIGLA_USINA or SIGLA_USINA) to filter by. This will search for the name in the data, allowing to get data for SPEs that are not in the performance database. If set to None will get all CCEE Names, by default None
This cannot be used with
spesat the same time. -
(columns¶Literal['ceg', 'cod_ativo', 'sigla_ativo'], default:'spe') –What value will be used as columns names in the DataFrame. Options are: - "spe": SPE name as in the performance database. Only available if
spesis not None. - "ccee_name": CCEE Name (SIGLA_USINA) - "cod_ativo": Site code (CODIGO_PARCELA_USINA).Not available if
get_detailsis True. -
(get_details¶bool, default:False) –Whether to get the details columns (internal_loss_factor and shared_loss_factor). By default False
If set to True, columns will be value, internal_loss_factor and shared_loss_factor.
Returns:
-
DataFrame–DataFrame with the data.
Source code in echo_ons/ccee_opendata_energy_gc.py
@validate_call
def get(
self,
period: DateTimeRange,
spes: list[str] | None = None,
ccee_names: list[str] | None = None,
columns: Literal["spe", "ccee_name", "cod_ativo"] = "spe",
get_details: bool = False,
) -> pd.DataFrame:
"""Get the Energy at Gravity Center.
Values are returned in MWavg per hour. timestamp hour represents the start of the interval
Most useful columns are:
- spe: SPE name / CCEE Name / Site code, depending on the `columns` parameter; will be the columns if `get_details` is False;
- timestamp: Timestamp of the value; Will be the index if `get_details` is False;
- value: GERACAO_CENTRO_GRAVIDADE;
- internal_loss_factor: FATOR_PERDA_INTERNA (only acquired if get_details is True);
- shared_loss_factor: FATOR_RATEIO_PERDA_GERACAO (only acquired if get_details is True);
Parameters
----------
period : DateTimeRange
Period to get the data for. As the data is monthly, the start and end dates will be adjusted to the first and last day of the month.
spes : list[str] | None, optional
List of SPEs to filter by. It is the name as registered in performance database. If set to None will get all SPEs, by default None
If both `spes` and `ccee_names` are None, will get all SPEs.
This cannot be used with `ccee_names` at the same time.
If `spes` is used, the SPEs must have the attribute `ccee_spe_name` set with the correct in the performance database.
ccee_names : list[str] | None, optional
List of CCEE Names (SIGLA_USINA or SIGLA_USINA) to filter by. This will search for the name in the data, allowing to get data for SPEs that are not in the performance database. If set to None will get all CCEE Names, by default None
This cannot be used with `spes` at the same time.
columns : Literal["ceg", "cod_ativo", "sigla_ativo"], optional
What value will be used as columns names in the DataFrame. Options are:
- "spe": SPE name as in the performance database. Only available if `spes` is not None.
- "ccee_name": CCEE Name (SIGLA_USINA)
- "cod_ativo": Site code (CODIGO_PARCELA_USINA).
Not available if `get_details` is True.
get_details : bool, optional
Whether to get the details columns (internal_loss_factor and shared_loss_factor). By default False
If set to True, columns will be value, internal_loss_factor and shared_loss_factor.
Returns
-------
DataFrame
DataFrame with the data.
"""
# Normalizing inputs
period.start = period.start.replace(hour=0, minute=0, second=0)
period.end = period.end.replace(hour=23, minute=59, second=59)
# getting SPEs in case spes and ccee_names are None
if spes is None and ccee_names is None:
spes = self._ccee._perfdb.objects.instances.get( # noqa: SLF001
object_models=["wind_farm", "solar_farm"],
output_type="DataFrame",
get_attributes=True,
attribute_names=["ccee_spe_name"],
)
# checking if any SPE does not have ccee_spe_name
if spes["ccee_spe_name"].isna().any():
wrong_spes = spes[spes["ccee_spe_name"].isna()].index.to_list()
logger.warning(f"The following SPEs do not have the ccee_spe_name attribute: {wrong_spes}")
spes = spes[spes["ccee_spe_name"].notna()].index.to_list()
if columns == "spe" and spes is None:
raise ValueError("Cannot use 'spe' as columns if spes is None.")
# getting the ceg numbers from the database in case spes is not None
if spes is not None:
objs = self._ccee._perfdb.objects.instances.get( # noqa: SLF001
object_names=spes,
object_models=["wind_farm", "solar_farm"],
get_attributes=True,
output_type="DataFrame",
attribute_names=["ccee_spe_name"],
)
# validating that the SPEs exist
wrong_spes = set(spes) - set(objs.index)
if wrong_spes:
raise ValueError(f"The following SPEs do not exist in the performance database: {wrong_spes}")
# validating that all SPEs have ccee_spe_name attribute
if "ccee_spe_name" not in objs.columns:
raise ValueError("The SPEs do not have the ccee_spe_name attribute.")
if objs["ccee_spe_name"].isna().any():
wrong_spes = objs[objs["ccee_spe_name"].isna()].index.to_list()
raise ValueError(f"The following SPEs do not have the ccee_spe_name attribute: {wrong_spes}")
# getting the ceg numbers
ccee_names = objs["ccee_spe_name"].to_list()
# getting list of resource names that will be used based on period
resource_names = list(self._ccee.opendata.datasets.resources.get(dataset_name="geracao_horaria_usina").keys())
wanted_months = period.split_multiple(separator=relativedelta(months=1), normalize=True)
wanted_months = [f"{subperiod.start.year}{subperiod.start.month:02d}" for subperiod in wanted_months]
resource_names = [resource_name for resource_name in resource_names if resource_name[-6:] in wanted_months]
# getting the df
df = self._ccee.opendata.datasets.resources.values.get(
resource_names=resource_names,
dataset_name="geracao_horaria_usina",
filters={"SIGLA_USINA": ccee_names},
)
# checking if found data for all the ccee_names
missing_ccee_names = set(ccee_names) - set(df["SIGLA_USINA"].unique())
if missing_ccee_names:
# getting the SPEs that are missing
if spes is not None:
spes_with_missing_ccee_names = objs[objs["ccee_spe_name"].isin(missing_ccee_names)]
logger.warning(
f"The following SPEs were not found in the data: {spes_with_missing_ccee_names.index.to_list()} - {missing_ccee_names}",
)
else:
logger.warning(f"The following CEGs were not found in the data: {missing_ccee_names}")
columns_mapping = {
"spe": "SIGLA_USINA",
"ccee_name": "SIGLA_USINA",
"cod_ativo": "CODIGO_PARCELA_USINA",
}
df = df[
[
columns_mapping[columns],
"DATA",
"PERIODO_COMERCIALIZACAO",
"GERACAO_CENTRO_GRAVIDADE",
"FATOR_PERDA_INTERNA",
"FATOR_RATEIO_PERDA_GERACAO",
]
]
if not get_details:
df = df.drop(columns=["FATOR_PERDA_INTERNA", "FATOR_RATEIO_PERDA_GERACAO"])
# in case spes is not None, changing the values of the SIGLA_USINA column to the SPE names
if columns == "spe":
ccee_name_remap = objs["ccee_spe_name"].to_dict()
ccee_name_remap = {v: k for k, v in ccee_name_remap.items()}
df["SIGLA_USINA"] = df["SIGLA_USINA"].map(ccee_name_remap)
# converting PERIODO_COMERCIALIZACAO to a column date represents the hour of the day (currently it's the hour of the month)
df["PERIODO_COMERCIALIZACAO"] = (df["PERIODO_COMERCIALIZACAO"] - 1) % 24 # changing from 1-24 to 0-23
# creating a date column in string format to later convert to datetime
df["timestamp"] = df["DATA"].astype(str) + df["PERIODO_COMERCIALIZACAO"].astype(str).str.zfill(2)
df["timestamp"] = pd.to_datetime(df["timestamp"], format="%d/%m/%Y%H")
# dropping unwanted columns
df = df.drop(columns=["DATA", "PERIODO_COMERCIALIZACAO"])
# renaming columns
col_remap = {
columns_mapping[columns]: "spe",
"GERACAO_CENTRO_GRAVIDADE": "value",
"FATOR_PERDA_INTERNA": "internal_loss_factor",
"FATOR_RATEIO_PERDA_GERACAO": "shared_loss_factor",
}
df = df.rename(columns=col_remap)
# dropping unwanted dates
df = df[(df["timestamp"] >= period.start) & (df["timestamp"] <= period.end)]
# pivoting the df
if not get_details:
df = df.pivot(index="timestamp", columns="spe", values="value")
# removing column index name
df.columns.name = None
# checking if found the correct amount of spes/ccee_names
if not get_details:
wanted_cols = len(ccee_names) if spes is None else len(spes)
if len(df.columns) != wanted_cols:
logger.warning(f"Found data for {len(df.columns)} SPEs/CCEE Names, but was expecting data for {wanted_cols}.")
# sorting columns in case get_details is True
if get_details:
df = df[["spe", "timestamp", "value", "internal_loss_factor", "shared_loss_factor"]]
# returning the df
return df
import_database(period, spes=None, on_conflict='ignore')
¶
Imports the CCEE Energy at Gravity Center data for a given period to the database.
The values acquired from the CCEE API are in MWavg. It will be converted to kWh in daily resolution. As a result, all values of the month will be the same, being the sum of the values of the month equal to the total energy generated in the month in kWh.
Parameters:
-
(period¶DateTimeRange) –Desired period to import the data for. As data is in monthly resolution, the start and end dates will be adjusted to the first and last day of the month.
-
(spes¶list[str] | None, default:None) –List of SPEs to import the data. If set to None all will be imported. By default None
-
(on_conflict¶Literal['ignore', 'update'], default:'ignore') –What to do in case of conflict. Can be one of ["ignore", "update"]. By default "ignore"
Source code in echo_ons/ccee_opendata_energy_gc.py
@validate_call
def import_database(
self,
period: DateTimeRange,
spes: list[str] | None = None,
on_conflict: Literal["ignore", "update"] = "ignore",
) -> None:
"""Imports the CCEE Energy at Gravity Center data for a given period to the database.
The values acquired from the CCEE API are in MWavg. It will be converted to kWh in daily resolution. As a result, all values of the month will be the same, being the sum of the values of the month equal to the total energy generated in the month in kWh.
Parameters
----------
period : DateTimeRange
Desired period to import the data for. As data is in monthly resolution, the start and end dates will be adjusted to the first and last day of the month.
spes : list[str] | None, optional
List of SPEs to import the data. If set to None all will be imported. By default None
on_conflict : Literal["ignore", "update"], optional
What to do in case of conflict. Can be one of ["ignore", "update"].
By default "ignore"
"""
# getting the possible SPEs to import
all_spes = list(self._ccee._perfdb.objects.instances.get(object_models=["wind_farm", "solar_farm"]).keys()) # noqa: SLF001
if spes is not None and not set(spes).issubset(all_spes):
wrong_spes = set(spes) - set(all_spes)
raise ValueError(f"The following SPEs do not exist in the performance database: {wrong_spes}")
if spes is None:
spes = all_spes
# getting the data
df = self.get(period, spes=spes)
# skipping if no data was found
if df.empty:
logger.warning(f"No data found for the period {period}.")
return
# converting the values to kW
df = df * 1000 # converting from MW to kW
# creating the DataFrame to save ActivePowerGC_1h.AVG features in the database
features_df = df.copy()
features_df.index.name = "timestamp"
# adding -SMF1 to the SPE names to convert to the power meter object names
features_df.columns = [f"{spe}-SMF1" for spe in features_df.columns]
# converting columns to a multindex with the feature name
features_df.columns = pd.MultiIndex.from_product(
[features_df.columns, ["ActivePowerGC_1h.AVG"]],
names=["object_name", "feature_name"],
)
# inserting into the database
self._ccee._perfdb.features.values.series.insert( # noqa: SLF001
df=features_df,
on_conflict=on_conflict,
)
# now calculating daily production in kWh
df = df.resample("D").sum()
# converting the DataFrame to have columns object_name, date, measurement_point (Gravity Center), energy
df.index.name = "date"
df = df.reset_index()
df = df.melt(id_vars="date", var_name="object_name", value_name="energy")
df["measurement_point"] = "Gravity Center"
# dropping rows with NaN values in the energy column
df = df.dropna(subset=["energy"])
# saving the data to the database
self._ccee._perfdb.kpis.energy.values.insert(df=df, on_conflict=on_conflict) # noqa: SLF001
imported_period = DateTimeRange(df["date"].min(), df["date"].max())
logger.info(f"Imported CCEE Energy at Gravity Center data for the period {imported_period} to the database. SPEs: {spes}")