{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8884b6fe",
   "metadata": {},
   "source": [
    "(data-download)=\n",
    "# Data download and preparation\n",
    "\n",
    "A set of archived data are available at the [World Data Center for Climate](https://www.wdc-climate.de/): [https://www.wdc-climate.de/ui/project?acronym=unseen-awg](https://www.wdc-climate.de/ui/project?acronym=unseen-awg). \n",
    "\n",
    "The data are split into two \"Experiments\":\n",
    "- [\"An instance of the analog weather generator unseen-awg with precomputed similarities and a large set of generated weather data\"](https://doi.org/10.26050/WDCC/unsawg_wg)\n",
    "- [\"Preprocessed atmospheric circulation and impact-relevant variables from ERA5 and \"Extended ensemble forecast hindcast\" (ECMWF) for unseen-awg simulations\"](https://doi.org/21.14106/43da459a79e9f91e817e0b8690d494e8e91a00a5)\n",
    "\n",
    "Together, these data allow:\n",
    "- Using an existing large dataset of *unseen-awg* simulations. Available impact-relevant variables are: daily minimum, maximum, and mean temperatures, daily total precipitation sums.\n",
    "- Generating new simulations.\n",
    "- Creating new *unseen-awg* instances.\n",
    "\n",
    "No bias correction was applied to these datasets. The provided {ref}`Snakemake workflow <snakemake-workflow>` allows applying simple mean bias correction (per lead time, location and day of year) to the temperature variables; implementing more elaborate for bias computation and correction methods is possible.\n",
    "\n",
    "(extracting-data)=\n",
    "## Data extraction\n",
    "\n",
    "The archived data consist of NetCDF4 files and Zarr stores. For practical reasons, the Zarr stores are archived as multi-part (split) zip files. When using the Zarr stores, you must unzip all partial files for a given dataset; otherwise, the resulting store will suffer from missing data.\n",
    "\n",
    "We provide a script `unzip_provided_datasets.sh` to facilitate unzipping the data. The general syntax is:\n",
    "```Bash\n",
    "./unzip_provided_datasets.sh <dir_containing_zipped_files> <dir_target>\n",
    "```\n",
    "\n",
    "(snakemake-workflow)=\n",
    "## Integrating extracted data into provided Snakemake workflow\n",
    "\n",
    "The directory `workflow/` contains the [Snakemake](https://snakemake.readthedocs.io/) workflow we built for our study. It can be used to generate new weather generators or bias-corrected datasets of impact-relevant variables.\n",
    "\n",
    "Unzipping the data results in a directory structure and file names that facilitate integration into this workflow (e.g., file names include parameter hashes).\n",
    "\n",
    "To use the workflow, the unzipped data must be placed in the specific directories defined in `configs/paths.yaml` (where important base directories are defined, as described {ref}`here <configs-paths>`). The weather generator's directory must be placed in the `<dir_wgs>` directory, the input datasets in `<dir_preprocessed_datasets>`, i.e.,\n",
    "\n",
    "```Bash\n",
    "./unzip_provided_datasets.sh <dir_with_partial_weather_generator_files> <dir_preprocessed_datasets>\n",
    "```\n",
    "```Bash\n",
    "./unzip_provided_datasets.sh <dir_with_partial_input_files> <dir_wgs>\n",
    "````\n",
    "\n",
    "After extraction, your resulting directory paths should look like this:\n",
    "\n",
    "    <dir_wgs>/wg_reforecasts_5e06172f_f40e9460_1e69bda9\n",
    "    <dir_preprocessed_datasets>/preprocessed_impact_variables_reforecasts/combined_7d1d3d97.zarr\n",
    "    <dir_preprocessed_datasets>/preprocessed_impact_variables_era5/combined_facc0e91.zarr\n",
    "    <dir_preprocessed_datasets>/preprocessed_circulation_reforecasts/combined_5e06172f.zarr\n",
    "    <dir_preprocessed_datasets>/preprocessed_circulation_era5/combined_f3d1f2f7.zarr\n",
    "\n",
    "There is no prescribed path that the provided *unseen-awg* simulations must be stored under."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "unseen-awg",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}