Datasets¶

`CceeOpenDataEnergyGc(ccee)` ¶

Class used for handling CCEE Energy at Gravity Center. Can be accessed via ccee.opendata.energygc.

Values come from the "geracao_horaria_usina", which is available at https://dadosabertos.ccee.org.br/dataset/geracao_horaria_usina.

Previously this data was acquired from the "parcela_usina_montante_mensal" dataset, but it we moved it to a hourly dataset to have more granularity.

Source code in echo_ons/ccee_root.py

def __init__(self, ccee: e_o.Ccee) -> None:
    """Base class that all subclasses should inherit from.

    Parameters
    ----------
    ccee : Ccee
        Top level object carrying all functionality and the connection handler.
    """
    self._ccee: e_o.Ccee = ccee

`get(period, spes=None, ccee_names=None, columns='spe', get_details=False)` ¶

Get the Energy at Gravity Center.

Values are returned in MWavg per hour. timestamp hour represents the start of the interval

Most useful columns are: - spe: SPE name / CCEE Name / Site code, depending on the columns parameter; will be the columns if get_details is False; - timestamp: Timestamp of the value; Will be the index if get_details is False; - value: GERACAO_CENTRO_GRAVIDADE; - internal_loss_factor: FATOR_PERDA_INTERNA (only acquired if get_details is True); - shared_loss_factor: FATOR_RATEIO_PERDA_GERACAO (only acquired if get_details is True);

Parameters:

period ¶
(DateTimeRange) –

Period to get the data for. As the data is monthly, the start and end dates will be adjusted to the first and last day of the month.
spes ¶
(list[str] | None, default: None ) –

List of SPEs to filter by. It is the name as registered in performance database. If set to None will get all SPEs, by default None

If both spes and ccee_names are None, will get all SPEs.

This cannot be used with ccee_names at the same time.

If spes is used, the SPEs must have the attribute ccee_spe_name set with the correct in the performance database.
ccee_names ¶
(list[str] | None, default: None ) –

List of CCEE Names (SIGLA_USINA or SIGLA_USINA) to filter by. This will search for the name in the data, allowing to get data for SPEs that are not in the performance database. If set to None will get all CCEE Names, by default None

This cannot be used with spes at the same time.
columns ¶
(Literal['ceg', 'cod_ativo', 'sigla_ativo'], default: 'spe' ) –

What value will be used as columns names in the DataFrame. Options are: - "spe": SPE name as in the performance database. Only available if spes is not None. - "ccee_name": CCEE Name (SIGLA_USINA) - "cod_ativo": Site code (CODIGO_PARCELA_USINA).

Not available if get_details is True.
get_details ¶
(bool, default: False ) –

Whether to get the details columns (internal_loss_factor and shared_loss_factor). By default False

If set to True, columns will be value, internal_loss_factor and shared_loss_factor.

Returns:

DataFrame –

DataFrame with the data.

Source code in echo_ons/ccee_opendata_energy_gc.py

@validate_call
def get(
    self,
    period: DateTimeRange,
    spes: list[str] | None = None,
    ccee_names: list[str] | None = None,
    columns: Literal["spe", "ccee_name", "cod_ativo"] = "spe",
    get_details: bool = False,
) -> pd.DataFrame:
    """Get the Energy at Gravity Center.

    Values are returned in MWavg per hour. timestamp hour represents the start of the interval

    Most useful columns are:
    - spe: SPE name / CCEE Name / Site code, depending on the `columns` parameter; will be the columns if `get_details` is False;
    - timestamp: Timestamp of the value; Will be the index if `get_details` is False;
    - value: GERACAO_CENTRO_GRAVIDADE;
    - internal_loss_factor: FATOR_PERDA_INTERNA (only acquired if get_details is True);
    - shared_loss_factor: FATOR_RATEIO_PERDA_GERACAO (only acquired if get_details is True);

    Parameters
    ----------
    period : DateTimeRange
        Period to get the data for. As the data is monthly, the start and end dates will be adjusted to the first and last day of the month.
    spes : list[str] | None, optional
        List of SPEs to filter by. It is the name as registered in performance database. If set to None will get all SPEs, by default None

        If both `spes` and `ccee_names` are None, will get all SPEs.

        This cannot be used with `ccee_names` at the same time.

        If `spes` is used, the SPEs must have the attribute `ccee_spe_name` set with the correct in the performance database.
    ccee_names : list[str] | None, optional
        List of CCEE Names (SIGLA_USINA or SIGLA_USINA) to filter by. This will search for the name in the data, allowing to get data for SPEs that are not in the performance database. If set to None will get all CCEE Names, by default None

        This cannot be used with `spes` at the same time.
    columns : Literal["ceg", "cod_ativo", "sigla_ativo"], optional
        What value will be used as columns names in the DataFrame. Options are:
        - "spe": SPE name as in the performance database. Only available if `spes` is not None.
        - "ccee_name": CCEE Name (SIGLA_USINA)
        - "cod_ativo": Site code (CODIGO_PARCELA_USINA).

        Not available if `get_details` is True.
    get_details : bool, optional
        Whether to get the details columns (internal_loss_factor and shared_loss_factor). By default False

        If set to True, columns will be value, internal_loss_factor and shared_loss_factor.

    Returns
    -------
    DataFrame
        DataFrame with the data.
    """
    # Normalizing inputs
    period.start = period.start.replace(hour=0, minute=0, second=0)
    period.end = period.end.replace(hour=23, minute=59, second=59)

    # getting SPEs in case spes and ccee_names are None
    if spes is None and ccee_names is None:
        spes = self._ccee._perfdb.objects.instances.get(  # noqa: SLF001
            object_models=["wind_farm", "solar_farm"],
            output_type="DataFrame",
            get_attributes=True,
            attribute_names=["ccee_spe_name"],
        )
        # checking if any SPE does not have ccee_spe_name
        if spes["ccee_spe_name"].isna().any():
            wrong_spes = spes[spes["ccee_spe_name"].isna()].index.to_list()
            logger.warning(f"The following SPEs do not have the ccee_spe_name attribute: {wrong_spes}")
        spes = spes[spes["ccee_spe_name"].notna()].index.to_list()

    if columns == "spe" and spes is None:
        raise ValueError("Cannot use 'spe' as columns if spes is None.")

    # getting the ceg numbers from the database in case spes is not None
    if spes is not None:
        objs = self._ccee._perfdb.objects.instances.get(  # noqa: SLF001
            object_names=spes,
            object_models=["wind_farm", "solar_farm"],
            get_attributes=True,
            output_type="DataFrame",
            attribute_names=["ccee_spe_name"],
        )
        # validating that the SPEs exist
        wrong_spes = set(spes) - set(objs.index)
        if wrong_spes:
            raise ValueError(f"The following SPEs do not exist in the performance database: {wrong_spes}")
        # validating that all SPEs have ccee_spe_name attribute
        if "ccee_spe_name" not in objs.columns:
            raise ValueError("The SPEs do not have the ccee_spe_name attribute.")
        if objs["ccee_spe_name"].isna().any():
            wrong_spes = objs[objs["ccee_spe_name"].isna()].index.to_list()
            raise ValueError(f"The following SPEs do not have the ccee_spe_name attribute: {wrong_spes}")

        # getting the ceg numbers
        ccee_names = objs["ccee_spe_name"].to_list()

    # getting list of resource names that will be used based on period
    resource_names = list(self._ccee.opendata.datasets.resources.get(dataset_name="geracao_horaria_usina").keys())
    wanted_months = period.split_multiple(separator=relativedelta(months=1), normalize=True)
    wanted_months = [f"{subperiod.start.year}{subperiod.start.month:02d}" for subperiod in wanted_months]
    resource_names = [resource_name for resource_name in resource_names if resource_name[-6:] in wanted_months]

    # getting the df
    df = self._ccee.opendata.datasets.resources.values.get(
        resource_names=resource_names,
        dataset_name="geracao_horaria_usina",
        filters={"SIGLA_USINA": ccee_names},
    )

    # checking if found data for all the ccee_names
    missing_ccee_names = set(ccee_names) - set(df["SIGLA_USINA"].unique())
    if missing_ccee_names:
        # getting the SPEs that are missing
        if spes is not None:
            spes_with_missing_ccee_names = objs[objs["ccee_spe_name"].isin(missing_ccee_names)]
            logger.warning(
                f"The following SPEs were not found in the data: {spes_with_missing_ccee_names.index.to_list()} - {missing_ccee_names}",
            )
        else:
            logger.warning(f"The following CEGs were not found in the data: {missing_ccee_names}")

    columns_mapping = {
        "spe": "SIGLA_USINA",
        "ccee_name": "SIGLA_USINA",
        "cod_ativo": "CODIGO_PARCELA_USINA",
    }
    df = df[
        [
            columns_mapping[columns],
            "DATA",
            "PERIODO_COMERCIALIZACAO",
            "GERACAO_CENTRO_GRAVIDADE",
            "FATOR_PERDA_INTERNA",
            "FATOR_RATEIO_PERDA_GERACAO",
        ]
    ]

    if not get_details:
        df = df.drop(columns=["FATOR_PERDA_INTERNA", "FATOR_RATEIO_PERDA_GERACAO"])
    # in case spes is not None, changing the values of the SIGLA_USINA column to the SPE names
    if columns == "spe":
        ccee_name_remap = objs["ccee_spe_name"].to_dict()
        ccee_name_remap = {v: k for k, v in ccee_name_remap.items()}
        df["SIGLA_USINA"] = df["SIGLA_USINA"].map(ccee_name_remap)

    # converting PERIODO_COMERCIALIZACAO to a column date represents the hour of the day (currently it's the hour of the month)
    df["PERIODO_COMERCIALIZACAO"] = (df["PERIODO_COMERCIALIZACAO"] - 1) % 24  # changing from 1-24 to 0-23

    # creating a date column in string format to later convert to datetime
    df["timestamp"] = df["DATA"].astype(str) + df["PERIODO_COMERCIALIZACAO"].astype(str).str.zfill(2)
    df["timestamp"] = pd.to_datetime(df["timestamp"], format="%d/%m/%Y%H")

    # dropping unwanted columns
    df = df.drop(columns=["DATA", "PERIODO_COMERCIALIZACAO"])

    # renaming columns
    col_remap = {
        columns_mapping[columns]: "spe",
        "GERACAO_CENTRO_GRAVIDADE": "value",
        "FATOR_PERDA_INTERNA": "internal_loss_factor",
        "FATOR_RATEIO_PERDA_GERACAO": "shared_loss_factor",
    }
    df = df.rename(columns=col_remap)

    # dropping unwanted dates
    df = df[(df["timestamp"] >= period.start) & (df["timestamp"] <= period.end)]

    # pivoting the df
    if not get_details:
        df = df.pivot(index="timestamp", columns="spe", values="value")

    # removing column index name
    df.columns.name = None

    # checking if found the correct amount of spes/ccee_names
    if not get_details:
        wanted_cols = len(ccee_names) if spes is None else len(spes)
        if len(df.columns) != wanted_cols:
            logger.warning(f"Found data for {len(df.columns)} SPEs/CCEE Names, but was expecting data for {wanted_cols}.")

    # sorting columns in case get_details is True
    if get_details:
        df = df[["spe", "timestamp", "value", "internal_loss_factor", "shared_loss_factor"]]

    # returning the df
    return df

`import_database(period, spes=None, on_conflict='ignore')` ¶

Imports the CCEE Energy at Gravity Center data for a given period to the database.

The values acquired from the CCEE API are in MWavg. It will be converted to kWh in daily resolution. As a result, all values of the month will be the same, being the sum of the values of the month equal to the total energy generated in the month in kWh.

Parameters:

period ¶
(DateTimeRange) –

Desired period to import the data for. As data is in monthly resolution, the start and end dates will be adjusted to the first and last day of the month.
spes ¶
(list[str] | None, default: None ) –

List of SPEs to import the data. If set to None all will be imported. By default None
on_conflict ¶
(Literal['ignore', 'update'], default: 'ignore' ) –

What to do in case of conflict. Can be one of ["ignore", "update"]. By default "ignore"

Source code in echo_ons/ccee_opendata_energy_gc.py

@validate_call
def import_database(
    self,
    period: DateTimeRange,
    spes: list[str] | None = None,
    on_conflict: Literal["ignore", "update"] = "ignore",
) -> None:
    """Imports the CCEE Energy at Gravity Center data for a given period to the database.

    The values acquired from the CCEE API are in MWavg. It will be converted to kWh in daily resolution. As a result, all values of the month will be the same, being the sum of the values of the month equal to the total energy generated in the month in kWh.

    Parameters
    ----------
    period : DateTimeRange
        Desired period to import the data for. As data is in monthly resolution, the start and end dates will be adjusted to the first and last day of the month.
    spes : list[str] | None, optional
        List of SPEs to import the data. If set to None all will be imported. By default None
    on_conflict : Literal["ignore", "update"], optional
        What to do in case of conflict. Can be one of ["ignore", "update"].
        By default "ignore"
    """
    # getting the possible SPEs to import
    all_spes = list(self._ccee._perfdb.objects.instances.get(object_models=["wind_farm", "solar_farm"]).keys())  # noqa: SLF001
    if spes is not None and not set(spes).issubset(all_spes):
        wrong_spes = set(spes) - set(all_spes)
        raise ValueError(f"The following SPEs do not exist in the performance database: {wrong_spes}")
    if spes is None:
        spes = all_spes

    # getting the data
    df = self.get(period, spes=spes)

    # skipping if no data was found
    if df.empty:
        logger.warning(f"No data found for the period {period}.")
        return

    # converting the values to kW
    df = df * 1000  # converting from MW to kW

    # creating the DataFrame to save ActivePowerGC_1h.AVG features in the database
    features_df = df.copy()
    features_df.index.name = "timestamp"
    # adding -SMF1 to the SPE names to convert to the power meter object names
    features_df.columns = [f"{spe}-SMF1" for spe in features_df.columns]
    # converting columns to a multindex with the feature name
    features_df.columns = pd.MultiIndex.from_product(
        [features_df.columns, ["ActivePowerGC_1h.AVG"]],
        names=["object_name", "feature_name"],
    )

    # inserting into the database
    self._ccee._perfdb.features.values.series.insert(  # noqa: SLF001
        df=features_df,
        on_conflict=on_conflict,
    )

    # now calculating daily production in kWh
    df = df.resample("D").sum()

    # converting the DataFrame to have columns object_name, date, measurement_point (Gravity Center), energy
    df.index.name = "date"
    df = df.reset_index()
    df = df.melt(id_vars="date", var_name="object_name", value_name="energy")
    df["measurement_point"] = "Gravity Center"

    # dropping rows with NaN values in the energy column
    df = df.dropna(subset=["energy"])

    # saving the data to the database
    self._ccee._perfdb.kpis.energy.values.insert(df=df, on_conflict=on_conflict)  # noqa: SLF001

    imported_period = DateTimeRange(df["date"].min(), df["date"].max())

    logger.info(f"Imported CCEE Energy at Gravity Center data for the period {imported_period} to the database. SPEs: {spes}")

Datasets¶

`CceeOpenDataEnergyGc(ccee)` ¶

`get(period, spes=None, ccee_names=None, columns='spe', get_details=False)` ¶

`period` ¶

`spes` ¶

`ccee_names` ¶

`columns` ¶

`get_details` ¶

`import_database(period, spes=None, on_conflict='ignore')` ¶

`period` ¶

`spes` ¶

`on_conflict` ¶

Datasets¶

CceeOpenDataEnergyGc(ccee) ¶

get(period, spes=None, ccee_names=None, columns='spe', get_details=False) ¶

period ¶

spes ¶

ccee_names ¶

columns ¶

get_details ¶

import_database(period, spes=None, on_conflict='ignore') ¶

period ¶

spes ¶

on_conflict ¶

`CceeOpenDataEnergyGc(ccee)` ¶

`get(period, spes=None, ccee_names=None, columns='spe', get_details=False)` ¶

`period` ¶

`spes` ¶

`ccee_names` ¶

`columns` ¶

`get_details` ¶

`import_database(period, spes=None, on_conflict='ignore')` ¶

`period` ¶

`spes` ¶

`on_conflict` ¶