Datasets¶
CceeOpenDataDatasets(ccee)
¶
Class used for handling CCEE datasets. Can be accessed via ccee.opendata.datasets.
Parameters:
-
(ccee¶Ccee) –Top level object carrying all functionality.
Source code in echo_ons/ccee_opendata_datasets.py
def __init__(self, ccee: e_o.Ccee) -> None:
"""Class used for handling CCEE datasets. Can be accessed via `ccee.opendata.datasets`.
Parameters
----------
ccee : Ccee
Top level object carrying all functionality.
"""
super().__init__(ccee)
# * subclasses
self.resources = CceeOpenDataDatasetsResources(ccee)
list()
¶
List all datasets available.
Returns:
-
list[str]–List of all datasets available.
Source code in echo_ons/ccee_opendata_datasets.py
def list(
self,
) -> list[str]:
"""List all datasets available.
Returns
-------
list[str]
List of all datasets available.
"""
endpoint = "api/3/action/package_list"
response = self._ccee._opendata_conn.get(endpoint) # noqa: SLF001
if not response.status_code == 200:
raise ValueError(f"Failed to get dataset list from CCEE open data. {response.text}")
return response.json()["result"]
search(field_name, field_value, output_type='dict')
¶
Search for a dataset by a specific field.
This uses ckan API to search for datasets by a specific field. More documentation at https://docs.ckan.org/en/2.11/api/index.html?highlight=package_search#ckan.logic.action.get.package_search
The most useful key is the "resources", which contais a list of dictionaries with the most important keys being "name" and "id", where the last one will be used to query the data by other APIs.
Parameters:
-
(field_name¶str) –Name of the field to search by, such as "title", "name", etc.
-
(field_value¶str) –Value of the field to search by.
-
(output_type¶Literal['dict', 'DataFrame'], default:'dict') –Type of the output, by default "dict"
Returns:
-
dict[str, dict[str, Any]]–If output_type is "dict", returns a dictionary with the results. The keys are the dataset names.
-
DataFrame–If output_type is "DataFrame", returns a DataFrame with the results. The index is the dataset names.
Source code in echo_ons/ccee_opendata_datasets.py
@validate_call
def search(
self,
field_name: str,
field_value: str,
output_type: Literal["dict", "DataFrame"] = "dict",
) -> dict[str, dict[str, Any]] | DataFrame:
"""Search for a dataset by a specific field.
This uses ckan API to search for datasets by a specific field. More documentation at https://docs.ckan.org/en/2.11/api/index.html?highlight=package_search#ckan.logic.action.get.package_search
The most useful key is the "resources", which contais a list of dictionaries with the most important keys being "name" and "id", where the last one will be used to query the data by other APIs.
Parameters
----------
field_name : str
Name of the field to search by, such as "title", "name", etc.
field_value : str
Value of the field to search by.
output_type : Literal["dict", "DataFrame"], optional
Type of the output, by default "dict"
Returns
-------
dict[str, dict[str, Any]]
If output_type is "dict", returns a dictionary with the results. The keys are the dataset names.
DataFrame
If output_type is "DataFrame", returns a DataFrame with the results. The index is the dataset names.
"""
endpoint = "api/3/action/package_search"
payload = {
"q": f"{field_name}:{field_value}",
"rows": 1000,
"start": 0,
}
got_all = False
page_num = 0
results = []
while not got_all:
payload["start"] += page_num * 1000
response = self._ccee._opendata_conn.post(endpoint, json=payload) # noqa: SLF001
if not response.status_code == 200:
raise ValueError(f"Failed to get data from CCEE open data. {response.text}")
response = response.json()
if len(response["result"]["results"]) < 1000:
got_all = True
results.extend(response["result"]["results"])
page_num += 1
# converting to output_type
if output_type == "dict":
return {result["name"]: result for result in results}
df = DataFrame.from_dict(results, orient="columns")
df = df.set_index("name")
return df