Pytorch Tutorial Part 1: Dataset and Dataloader

Explore image classification and segmentation with PyTorch, covering dataset setup and data loaders in this beginner-friendly guide.

Pytorch Tutorial Part 1: Dataset and Dataloader

I was initially a big fan of TensorFlow 2.0, particularly because of its integration with Keras, which made building neural networks straightforward and user-friendly. However, as TensorFlow's popularity began to wane and Google's support seemed to diminish, I felt disheartened and decided to switch to PyTorch.

Many experts praise PyTorch for its dynamic computational graph and greater flexibility compared to TensorFlow. PyTorch's dynamic nature allows for more intuitive and interactive model development, which can be beneficial for research and experimentation. However, for lazy person like me who prefers simplicity and ease of use, this flexibility came with a lot of painful efforts.

In this article we will explore first two steps of image classification and segmentation tasks:

  1. Define Datasets
  2. Dataloader

Of course, the first step is to install and import the necessary libraries. Let's start with that.

pip install torch torchvision pandas pillow

Now import libraries:

import os
import pandas as pd
from PIL import Image
from torchvision import transforms
from torch.utils.data import Dataset, DataLoader

Define Dataset

There are two methods to organize data:

Method 1

Make folder in the name of every label and save the respected images there. For example you have 2 classes CAT and DOG, so inside Train and Test folder, you will make two folders naming CAT and DOG and put the images of cats and dogs inside the folder. As shown here:

Train 
|--CAT - images_1, images_2  ........etc
|--DOG - images_1, images_2 ........etc
Test
|--CAT - images_1, images_2 ........etc
|--DOG - images_1, images_2 ........etc

transform = transforms.Compose([
    transforms.Resize((128, 128)),
    transforms.ToTensor()
])

train_dataset = datasets.ImageFolder(root='train', transform=transform)
test_dataset = datasets.ImageFolder(root='test', transform=transform)

Method 2

This method is used quite well among Deep Learning practitioners and easy to develop, here we will talk about two methods as an example.

For image classification

Save Images in a folder and you have 2 .csv files in which you have image's name in first column and their labels in second column, as shown here:

Train - images_1, images_2 ........etc
Test - images_1, images_2 ........etc
Train.csv
Test.csv
class ImageDataset(Dataset):
    def __init__(self, csv_file, root_dir, transform=None):
        self.labels_df = pd.read_csv(csv_file)
        self.root_dir = root_dir
        self.transform = transform

    def __len__(self):
        return len(self.labels_df)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, self.labels_df.iloc[idx, 0])
        image = Image.open(img_name)
        label = self.labels_df.iloc[idx, 1]
        
        if self.transform:
            image = self.transform(image)
        
        return image, label

# train and test dataset
train_dataset = ImageDataset(csv_file='train_labels.csv', root_dir='train', transform=transform)

test_dataset = ImageDataset(csv_file='test_labels.csv', root_dir='test', transform=transform)

For Segmentation

For Segmentation the data structure should be same. But there will be some changes according to the given data structure:

Train 
|--Image - images_1, images_2  ........etc
|--mask - images_1, images_2 ........etc
Test
|--Image - images_1, images_2 ........etc
|--mask - images_1, images_2 ........etc
import os
from PIL import Image
from torch.utils.data import Dataset
import pandas as pd

class ImageDataset(Dataset):
    def __init__(self, image_dir, mask_dir, transform=None):
        self.image_dir = image_dir
        self.mask_dir = mask_dir
        self.transform = transform
        self.image_names = sorted(os.listdir(image_dir))

    def __len__(self):
        return len(self.image_names)

    def __getitem__(self, idx):
        img_name = os.path.join(self.image_dir, self.image_names[idx])
        mask_name = os.path.join(self.mask_dir, self.image_names[idx])
        
        image = Image.open(img_name)
        mask = Image.open(mask_name)
        
        if self.transform:
            image = self.transform(image)
            mask = self.transform(mask)
        
        return image, mask

# Example usage
from torchvision import transforms

transform = transforms.Compose([
    transforms.Resize((256, 256)),
    transforms.ToTensor(),
])

# train and test dataset
train_image_dir = 'Train/Image'
train_mask_dir = 'Train/mask'
test_image_dir = 'Test/Image'
test_mask_dir = 'Test/mask'

train_dataset = ImageDataset(image_dir=train_image_dir, mask_dir=train_mask_dir, transform=transform)
test_dataset = ImageDataset(image_dir=test_image_dir, mask_dir=test_mask_dir, transform=transform)

Pros and cons of both methods

Here are the pros and cons of both methods:

Method 1

Pros: The code is short and straightforward to implement. It is easy to write for a small number of classes.
Cons: For a larger number of classes, such as in datasets with 100 or more classes, it becomes more difficult to divide the data properly. As per my knowledge there is no inbuilt dataset class for segmentation dataset.

Method 2

Pros: For a larger number of classes, such as in datasets with 100 or more classes, it is easy to write.
Cons: The code can be a bit complicated to write.

Dataloader

Dataloader is very simple, just call the dataset inside the dataloader function.

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

That's it, in next tutorial we will talk about model and training. See you soon πŸ˜„