.. _tutorial_dataset_from_csv: 

Tutorial: Creating a ``DatasetFromCSV`` Dataset
===============================================

The ``XAI.FNFGradCAM.do_gradcam()`` and ``XAI.FSGradCAM.do_gradcam()`` functions
take in a PyTorch dataset to perform the Grad-CAM (as opposed to images). You
can refer to :ref:`Step 2, Method 1<tutorial_gradcam_fnf_method1.2>` of
:ref:`tutorial_gradcam_fnf` and *Step 2, Method 1 of Tutorial: Doing Grad-CAM on
the Food Scoring (FS) model* for more details on how the dataset is used.

This tutorial covers how to create this custom PyTorch dataset using the
``XAI.utils.datasets.DatasetFromCSV()`` class.

**Note:**

- The same dataset can be used for both the FNF and FS models.

-----

The Required ``.csv`` File
--------------------------

As stated in the name of ``XAI.utils.datasets.DatasetFromCSV()``, this cutom
PyTorch dataset requires a ``.csv`` to be created. For applications of Grad-CAM
on FoodDX models, this ``.csv`` is expected to to already exist as part of the
various FoodDX pipelines and should be the ``data.csv`` file. Specifically, it
should look something like the following:

.. table::
    :widths: auto
    
    +--------------------------------------+-----------+---------------------------------------------------------------+---------------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+------------+------------+-------------------+----------+-----------+
    | id                                   | source_id | s3_path_fns                                                   | s3_path_fbd                                                   | path_fns                                                 | path_fbd                                                 | annot_json | food_score | food_type         | label_fs | label_fnf |
    +======================================+===========+===============================================================+===============================================================+==========================================================+==========================================================+============+============+===================+==========+===========+
    | 164c729b-86d2-11eb-b774-06d7ab6752a4 | local_HK  | s3://path/to/fns_img/164c729b-86d2-11eb-b774-06d7ab6752a4.png | s3://path/to/fbd_img/164c729b-86d2-11eb-b774-06d7ab6752a4.png | path/to/fns_img/164c729b-86d2-11eb-b774-06d7ab6752a4.png | path/to/fbd_img/164c729b-86d2-11eb-b774-06d7ab6752a4.png |            | 1          | ChineseShortbread | 1        | 1         |
    +--------------------------------------+-----------+---------------------------------------------------------------+---------------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+------------+------------+-------------------+----------+-----------+
    | ...                                  | ...       | ...                                                           | ...                                                           | ...                                                      | ...                                                      | ...        | ...        |                   |          |           |
    +--------------------------------------+-----------+---------------------------------------------------------------+---------------------------------------------------------------+----------------------------------------------------------+----------------------------------------------------------+------------+------------+-------------------+----------+-----------+


When performing Grad-CAM on FNF and FS models, it is compulsory that the ``id``,
``path_fns``, ``food_score`` and ``label_fnf`` columns are present. All other
columns can be ommited.

-----

Creating the Dataset
--------------------

To create the dataset, run the following:

.. code:: python3
    :number-lines:

    import XAI

    ds = XAI.datasets.DatasetFromCSV(path_to_csv="/path/to/desired/csv/data.csv")

The ``XAI.datasets.DatasetFromCSV()`` function also takes in a ``transform``
argument. This has to be a callable function. When it is specified (i.e. not
``None``), this transformation function will be applied to the images in the
dataset on the fly when loading the images to perform Grad-CAM. By default, this
function is set to ``transform=XAI.utils.datasets.data_transforms()``.

This default transformation function (``data_transforms()``) transforms each
image from a numpy array into a PyTorch tensor before normalising its pixel
values to a range of [-1 , 1]. Refer to THIS for more information.

-----

Exploring the Dataset
---------------------

Since the dataset ``ds`` was created using the
``XAI.utils.datasets.DatasetFromCSV()`` class, it has the following type:

>>> type(ds)
XAI.utils.datasets.DatasetFromCSV

To get the length of the dataset, use the ``len()`` function as such:

>>> len(ds)
10

The dataset returns the following **5** items/information for each image:

1. The image itself
2. The image ID
3. The path to the image
4. The ground truth FNF label (``0`` = non-food, ``1`` = food)
5. The ground truth FS label (food score = ``{1, 2, 3, 4, 5}``)

You can iterate through the dataset and print the 5 items by doing the
following. Below, we print only the first entry of the dataset:

>>> for img_rgb, img_id, img_path, img_label_fnf, img_food_score in ds:
>>>
>>>     print("img.shape     :", img_rgb.shape)
>>>     print("img_id        :", img_id)
>>>     print("img_path      :", img_path)
>>>     print("img_label_fnf :", img_label_fnf)
>>>     print("img_food_score:", img_food_score)
>>>
>>>     break
img.shape     : torch.Size([3, 299, 299])
img_id        : 164c729b-86d2-11eb-b774-06d7ab6752a4
img_path      : path/to/fns_img/164c729b-86d2-11eb-b774-06d7ab6752a4.png
img_label_fnf : 1
img_food_score: 1

Notice that:

- The default transformation function ``XAI.utils.datasets.data_transforms()``
  has been applied to the image, resulting in the image being a torch tensor as
  opposed to a numpy array.
- The dataset ``ds`` can be used for both FNF and FS models because both ground
  truth labels (``img_label_fnf``, ``img_food_score``) are returned by the
  dataset.