Documentation

Ingest datasets

You can upload your own datasets to the EOTDL platform.

The following constraints apply to the dataset name:

  • It must be unique
  • It must be between 3 and 45 characters long
  • It can only contain alphanumeric characters and dashes.

CLI

The CLI is the most convenient way to ingest datasets. You can ingest a dataset using the following CLI command:

eotdl datasets ingest -p "dataset path"

Where dataset-path is the path to a folder containing your dataset.

A file named README.md is expected in the root of the folder. This file should contain the following information:

---
name: dataset-name
authors: 
  - author 1 name
  - author 2 name
  - ...
license: dataset license
source: link to source
thumbnail: link to thumbnail (optional)
---

some markdown content (titles, text, links, code, images, ...)

If this file is not present, the ingestion process will fail.

After uploading a dataset with the CLI you can edit this information visiting the dataset page in the website.

You can update your dataset in multiple ways. If you modify your local folder and run the ingest command again, a new version will be created reflecting the new data structure and files.

If the metadata in the README.md file is not consistent with the one in the platform (either because you edited the file or because you edited the dataset in the platform), you should use:

  • the --force flag to overwrite the metadata in the platform with the one in the README.md file.
  • the --sync flag to update your file with the metadata in the platform.

For Q1+ datasets, a file called catalog.json is expected in the root of the folder containing the STAC metadata for your dataset, that will be used as an entrypoint to ingest all the assets.

Library

You can ingest datasets using the following Python code:

from eotdl.datasets import ingest_dataset

ingest_dataset("dataset-path")

The library also enables the ingestion of “virtual datasets”, that is, datasets where only the metadat will be ingested while the assets live in a different place (such as a remote storage, cloud boucket, third party repositories, etc.). The only requirement is for the assets to be accesible via a public URL. The following example shows how to ingest a virtual dataset:

from eotdl.datasets import ingest_virtual_dataset
links = [ 'https://link1.com', 'https://link2.com', 'https://link3.com', ]
ingest_virtual_dataset("dataset-path", links)

where “dataset-path” is the path to a folder where the metadata will be stored. If the path exists with a valid README.md file, the ingestion will work as usual. Otherwise, either create the README or pass its content as an additional argument.

Learn more about virtual datasets in this tutorial.

EOTDL is carried out under a programme of, and funded by the European Space Agency (ESA).

Disclaimer: The views expressed on this site shall not be construed to reflect the official opinion of ESA.

Contact Us

Contact

Follow Us