If you use pytorch as your deep learning framework, it's likely that you'll need to use DataLoader in your model training loop.

In this tutorial, you'll learn about

  • How to construct a custom Dataset class
  • How to use DataLoader to split a dataset into batches
  • How to randomize a dataset in DataLoader
  • How to return the randomized index in DataLoader

What are pytorch DataLoader and Dataset

A Dataset is a container of your data that

  • has a length attribute, i.e. you can do len(dataset)
  • is indexable, i.e.you can do dataset[index]


Creating a Custom Dataset

A custom Dataset class must have three functions:

  • __init__ : instantiates the Dataset object
  • __len__: returns the number of samples in the dataset
  • __getitem__: loads and returns a sample from the dataset at the given index idx

Here's an example custom dataset that takes in a pandas DataFrame with columns "text" and "label".

from torch.utils.data import Dataset

class NewsDataset(Dataset):
    def __init__(self, df):
        self.df = df
        self.texts = df['text']
        self.labels = df['label']

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        text = self.texts.iloc[idx]
        label = self.labels.iloc[idx]
        return  text,label


Creating a DataFrame with news data

To see how this custom dataset works, let's create a DataFrame with some news data from agnews dataset and put it into a NewsDataset we just created.

You can refer here for a more detailed tutorial on loading datasets from the Hugging Face Datasets.

from datasets import load_dataset
agnews = load_dataset('ag_news')
agnews.set_format(type="pandas")
news_df = agnews['train'][:25]

Using custom data configuration default
Reusing dataset ag_news (/root/.cache/huggingface/datasets/ag_news/default/0.0.0/bc2bcb40336ace1a0374767fc29bb0296cdaf8a6da7298436239c54d79180548)
news_df.head()
text label
0 Wall St. Bears Claw Back Into the Black (Reute... 2
1 Carlyle Looks Toward Commercial Aerospace (Reu... 2
2 Oil and Economy Cloud Stocks' Outlook (Reuters... 2
3 Iraq Halts Oil Exports from Main Southern Pipe... 2
4 Oil prices soar to all-time record, posing new... 2

Initiating a NewsDataset

news_dataset = NewsDataset(news_df)

Check if news_dataset has a length

len(news_dataset)
25

This means that the news_dataset has 25 news data. Now check if the news_dataset is indexable.

news_dataset[7]
('Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.',
 2)

We can see that the index 7 of news_dataset has a piece of news with label 2.


Initiating a DataLoader

A DataLoader splits a dataset into batches which can be used for training.

You can adjust the batch size with the parameter batch_size.

from torch.utils.data import DataLoader
news_dataloader = DataLoader(news_dataset,batch_size=4)

To load the data in batches, we can do

for batch_index, (batch,label) in enumerate(news_dataloader):
  print(f'batch index: {batch_index},\n label: {label},\n batch: {batch}')
batch index: 0,
 label: tensor([2, 2, 2, 2]),
 batch: ("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", 'Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.', "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums.", 'Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.')
batch index: 1,
 label: tensor([2, 2, 2, 2]),
 batch: ('Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.', 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past  #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)', "Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by  #36;1.17 billion in the latest week to  #36;849.98 trillion, the Investment Company Institute said Thursday.", 'Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.')
batch index: 2,
 label: tensor([2, 2, 2, 2]),
 batch: ('Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.', "Wall St. Bears Claw Back Into the Black  NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling  band of ultra-cynics, are seeing green again.", "Oil and Economy Cloud Stocks' Outlook  NEW YORK (Reuters) - Soaring crude prices plus worries  about the economy and the outlook for earnings are expected to  hang over the stock market next week during the depth of the  summer doldrums.", "No Need for OPEC to Pump More-Iran Gov  TEHRAN (Reuters) - OPEC can do nothing to douse scorching  oil prices when markets are already oversupplied by 2.8 million  barrels per day (bpd) of crude, Iran's OPEC governor said  Saturday, warning that prices could fall sharply.")
batch index: 3,
 label: tensor([2, 2, 2, 2]),
 batch: ('Non-OPEC Nations Should Up Output-Purnomo  JAKARTA (Reuters) - Non-OPEC oil exporters should consider  increasing output to cool record crude prices, OPEC President  Purnomo Yusgiantoro said on Sunday.', "Google IPO Auction Off to Rocky Start  WASHINGTON/NEW YORK (Reuters) - The auction for Google  Inc.'s highly anticipated initial public offering got off to a  rocky start on Friday after the Web search company sidestepped  a bullet from U.S. securities regulators.", "Dollar Falls Broadly on Record Trade Gap  NEW YORK (Reuters) - The dollar tumbled broadly on Friday  after data showing a record U.S. trade deficit in June cast  fresh doubts on the economy's recovery and its ability to draw  foreign capital to fund the growing gap.", "Rescuing an Old Saver If you think you may need to help your elderly relatives with their finances, don't be shy about having the money talk -- soon.")
batch index: 4,
 label: tensor([2, 2, 2, 2]),
 batch: ('Kids Rule for Back-to-School The purchasing power of kids is a big part of why the back-to-school season has become such a huge marketing phenomenon.', "In a Down Market, Head Toward Value Funds There is little cause for celebration in the stock market these days, but investors in value-focused mutual funds have reason to feel a bit smug -- if only because they've lost less than the folks who stuck with growth.", 'US trade deficit swells in June The US trade deficit has exploded 19 to a record \\$55.8bn as oil costs drove imports higher, according to a latest figures.', "Shell 'could be target for Total' Oil giant Shell could be bracing itself for a takeover attempt, possibly from French rival Total, a  press report claims.")
batch index: 5,
 label: tensor([2, 2, 2, 2]),
 batch: ("Google IPO faces Playboy slip-up The bidding gets underway for Google's public offering, despite last-minute worries over an interview with its bosses in Playboy magazine.", 'Eurozone economy keeps growing Official figures show the 12-nation eurozone economy continues to grow, but there are warnings it may slow down later in the year.', 'Expansion slows in Japan Economic growth in Japan slows down as the country experiences a drop in domestic and corporate spending.', 'Rand falls on shock SA rate cut Interest rates are trimmed to 7.5 by the South African central bank,  but the lack of warning hits the rand and surprises markets.')
batch index: 6,
 label: tensor([2]),
 batch: ('Car prices down across the board The cost of buying both new and second hand cars fell sharply over the past five years, a new survey has found.',)


The data loading order and Sampler

A DataLoader leverages a Sampler to generate indices of dataset for each batch. We can use either the parameter shuffle or sampler to adjust the order of indices.


SequentialSampler

By default, the parameter shuffle is False. And the DataLoader will use a SequentialSampler that returns indices sequentially.

news_dataloader = DataLoader(news_dataset,batch_size=4)
type(news_dataloader.sampler)
torch.utils.data.sampler.SequentialSampler

We can also explicitly create a SequentialSampler and pass it into a DataLoader using the parameter sampler.

# initiate a SequentialSampler
from torch.utils.data import SequentialSampler
sequential_news_sampler = SequentialSampler(news_dataset)

# let's print out the indices in the SequentialSampler
for i in sequential_news_sampler:
  print(i)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
news_dataloader = DataLoader(news_dataset,batch_size=4,sampler = sequential_news_sampler )

for batch_index, (batch,label) in enumerate(news_dataloader):
  print(f'batch index: {batch_index},\n label: {label},\n batch: {batch}')
batch index: 0,
 label: tensor([2, 2, 2, 2]),
 batch: ("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", 'Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.', "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums.", 'Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.')
batch index: 1,
 label: tensor([2, 2, 2, 2]),
 batch: ('Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.', 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past  #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)', "Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by  #36;1.17 billion in the latest week to  #36;849.98 trillion, the Investment Company Institute said Thursday.", 'Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.')
batch index: 2,
 label: tensor([2, 2, 2, 2]),
 batch: ('Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.', "Wall St. Bears Claw Back Into the Black  NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling  band of ultra-cynics, are seeing green again.", "Oil and Economy Cloud Stocks' Outlook  NEW YORK (Reuters) - Soaring crude prices plus worries  about the economy and the outlook for earnings are expected to  hang over the stock market next week during the depth of the  summer doldrums.", "No Need for OPEC to Pump More-Iran Gov  TEHRAN (Reuters) - OPEC can do nothing to douse scorching  oil prices when markets are already oversupplied by 2.8 million  barrels per day (bpd) of crude, Iran's OPEC governor said  Saturday, warning that prices could fall sharply.")
batch index: 3,
 label: tensor([2, 2, 2, 2]),
 batch: ('Non-OPEC Nations Should Up Output-Purnomo  JAKARTA (Reuters) - Non-OPEC oil exporters should consider  increasing output to cool record crude prices, OPEC President  Purnomo Yusgiantoro said on Sunday.', "Google IPO Auction Off to Rocky Start  WASHINGTON/NEW YORK (Reuters) - The auction for Google  Inc.'s highly anticipated initial public offering got off to a  rocky start on Friday after the Web search company sidestepped  a bullet from U.S. securities regulators.", "Dollar Falls Broadly on Record Trade Gap  NEW YORK (Reuters) - The dollar tumbled broadly on Friday  after data showing a record U.S. trade deficit in June cast  fresh doubts on the economy's recovery and its ability to draw  foreign capital to fund the growing gap.", "Rescuing an Old Saver If you think you may need to help your elderly relatives with their finances, don't be shy about having the money talk -- soon.")
batch index: 4,
 label: tensor([2, 2, 2, 2]),
 batch: ('Kids Rule for Back-to-School The purchasing power of kids is a big part of why the back-to-school season has become such a huge marketing phenomenon.', "In a Down Market, Head Toward Value Funds There is little cause for celebration in the stock market these days, but investors in value-focused mutual funds have reason to feel a bit smug -- if only because they've lost less than the folks who stuck with growth.", 'US trade deficit swells in June The US trade deficit has exploded 19 to a record \\$55.8bn as oil costs drove imports higher, according to a latest figures.', "Shell 'could be target for Total' Oil giant Shell could be bracing itself for a takeover attempt, possibly from French rival Total, a  press report claims.")
batch index: 5,
 label: tensor([2, 2, 2, 2]),
 batch: ("Google IPO faces Playboy slip-up The bidding gets underway for Google's public offering, despite last-minute worries over an interview with its bosses in Playboy magazine.", 'Eurozone economy keeps growing Official figures show the 12-nation eurozone economy continues to grow, but there are warnings it may slow down later in the year.', 'Expansion slows in Japan Economic growth in Japan slows down as the country experiences a drop in domestic and corporate spending.', 'Rand falls on shock SA rate cut Interest rates are trimmed to 7.5 by the South African central bank,  but the lack of warning hits the rand and surprises markets.')
batch index: 6,
 label: tensor([2]),
 batch: ('Car prices down across the board The cost of buying both new and second hand cars fell sharply over the past five years, a new survey has found.',)


RandomSampler

To randomize the data loading order in news_dataloader, we can

1. set the parameter shuffle to True

news_dataloader = DataLoader(news_dataset,batch_size=4,shuffle=True)
for batch_index, (batch,label) in enumerate(news_dataloader):
  print(f'batch index: {batch_index},\n label: {label},\n batch: {batch}')
batch index: 0,
 label: tensor([2, 2, 2, 2]),
 batch: ('US trade deficit swells in June The US trade deficit has exploded 19 to a record \\$55.8bn as oil costs drove imports higher, according to a latest figures.', 'Kids Rule for Back-to-School The purchasing power of kids is a big part of why the back-to-school season has become such a huge marketing phenomenon.', 'Car prices down across the board The cost of buying both new and second hand cars fell sharply over the past five years, a new survey has found.', 'Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.')
batch index: 1,
 label: tensor([2, 2, 2, 2]),
 batch: ('Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.', 'Rand falls on shock SA rate cut Interest rates are trimmed to 7.5 by the South African central bank,  but the lack of warning hits the rand and surprises markets.', "Shell 'could be target for Total' Oil giant Shell could be bracing itself for a takeover attempt, possibly from French rival Total, a  press report claims.", "Rescuing an Old Saver If you think you may need to help your elderly relatives with their finances, don't be shy about having the money talk -- soon.")
batch index: 2,
 label: tensor([2, 2, 2, 2]),
 batch: ("Google IPO Auction Off to Rocky Start  WASHINGTON/NEW YORK (Reuters) - The auction for Google  Inc.'s highly anticipated initial public offering got off to a  rocky start on Friday after the Web search company sidestepped  a bullet from U.S. securities regulators.", 'Eurozone economy keeps growing Official figures show the 12-nation eurozone economy continues to grow, but there are warnings it may slow down later in the year.', "Google IPO faces Playboy slip-up The bidding gets underway for Google's public offering, despite last-minute worries over an interview with its bosses in Playboy magazine.", 'Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.')
batch index: 3,
 label: tensor([2, 2, 2, 2]),
 batch: ("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", "No Need for OPEC to Pump More-Iran Gov  TEHRAN (Reuters) - OPEC can do nothing to douse scorching  oil prices when markets are already oversupplied by 2.8 million  barrels per day (bpd) of crude, Iran's OPEC governor said  Saturday, warning that prices could fall sharply.", "Oil and Economy Cloud Stocks' Outlook  NEW YORK (Reuters) - Soaring crude prices plus worries  about the economy and the outlook for earnings are expected to  hang over the stock market next week during the depth of the  summer doldrums.", 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past  #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)')
batch index: 4,
 label: tensor([2, 2, 2, 2]),
 batch: ('Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.', "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums.", 'Expansion slows in Japan Economic growth in Japan slows down as the country experiences a drop in domestic and corporate spending.', "Wall St. Bears Claw Back Into the Black  NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling  band of ultra-cynics, are seeing green again.")
batch index: 5,
 label: tensor([2, 2, 2, 2]),
 batch: ('Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.', "Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by  #36;1.17 billion in the latest week to  #36;849.98 trillion, the Investment Company Institute said Thursday.", "Dollar Falls Broadly on Record Trade Gap  NEW YORK (Reuters) - The dollar tumbled broadly on Friday  after data showing a record U.S. trade deficit in June cast  fresh doubts on the economy's recovery and its ability to draw  foreign capital to fund the growing gap.", "In a Down Market, Head Toward Value Funds There is little cause for celebration in the stock market these days, but investors in value-focused mutual funds have reason to feel a bit smug -- if only because they've lost less than the folks who stuck with growth.")
batch index: 6,
 label: tensor([2]),
 batch: ('Non-OPEC Nations Should Up Output-Purnomo  JAKARTA (Reuters) - Non-OPEC oil exporters should consider  increasing output to cool record crude prices, OPEC President  Purnomo Yusgiantoro said on Sunday.',)

Note that the dataloader will automatically use RandomSampler as its sampler when shuffle is True

type(news_dataloader.sampler)
torch.utils.data.sampler.RandomSampler

We can also

2. Pass a RandomSampler to a DataLoader

RandomSampler returns random indices.

# initiate a RandomSampler
from torch.utils.data import RandomSampler
random_news_sampler = RandomSampler(news_dataset)

# print out the indices in the RandomSampler
for i in random_news_sampler:
  print(i)

15
23
16
8
1
3
7
20
12
13
9
24
0
19
5
22
6
18
21
11
10
2
17
4
14

We can then pass random_news_sampler to the dataloader using the parameter sampler.

random_news_dataloader = DataLoader(news_dataset,batch_size=4,sampler=random_news_sampler)

for batch_index, (batch,label) in enumerate(random_news_dataloader):
  print(f'batch index: {batch_index},\n label: {label},\n batch: {batch}')
batch index: 0,
 label: tensor([2, 2, 2, 2]),
 batch: ('Car prices down across the board The cost of buying both new and second hand cars fell sharply over the past five years, a new survey has found.', "In a Down Market, Head Toward Value Funds There is little cause for celebration in the stock market these days, but investors in value-focused mutual funds have reason to feel a bit smug -- if only because they've lost less than the folks who stuck with growth.", "Oil and Economy Cloud Stocks' Outlook  NEW YORK (Reuters) - Soaring crude prices plus worries  about the economy and the outlook for earnings are expected to  hang over the stock market next week during the depth of the  summer doldrums.", 'Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.')
batch index: 1,
 label: tensor([2, 2, 2, 2]),
 batch: ('Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.', 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past  #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)', 'US trade deficit swells in June The US trade deficit has exploded 19 to a record \\$55.8bn as oil costs drove imports higher, according to a latest figures.', 'Eurozone economy keeps growing Official figures show the 12-nation eurozone economy continues to grow, but there are warnings it may slow down later in the year.')
batch index: 2,
 label: tensor([2, 2, 2, 2]),
 batch: ('Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.', 'Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.', 'Non-OPEC Nations Should Up Output-Purnomo  JAKARTA (Reuters) - Non-OPEC oil exporters should consider  increasing output to cool record crude prices, OPEC President  Purnomo Yusgiantoro said on Sunday.', "No Need for OPEC to Pump More-Iran Gov  TEHRAN (Reuters) - OPEC can do nothing to douse scorching  oil prices when markets are already oversupplied by 2.8 million  barrels per day (bpd) of crude, Iran's OPEC governor said  Saturday, warning that prices could fall sharply.")
batch index: 3,
 label: tensor([2, 2, 2, 2]),
 batch: ("Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums.", 'Expansion slows in Japan Economic growth in Japan slows down as the country experiences a drop in domestic and corporate spending.', "Shell 'could be target for Total' Oil giant Shell could be bracing itself for a takeover attempt, possibly from French rival Total, a  press report claims.", 'Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.')
batch index: 4,
 label: tensor([2, 2, 2, 2]),
 batch: ("Google IPO faces Playboy slip-up The bidding gets underway for Google's public offering, despite last-minute worries over an interview with its bosses in Playboy magazine.", "Dollar Falls Broadly on Record Trade Gap  NEW YORK (Reuters) - The dollar tumbled broadly on Friday  after data showing a record U.S. trade deficit in June cast  fresh doubts on the economy's recovery and its ability to draw  foreign capital to fund the growing gap.", 'Rand falls on shock SA rate cut Interest rates are trimmed to 7.5 by the South African central bank,  but the lack of warning hits the rand and surprises markets.', "Google IPO Auction Off to Rocky Start  WASHINGTON/NEW YORK (Reuters) - The auction for Google  Inc.'s highly anticipated initial public offering got off to a  rocky start on Friday after the Web search company sidestepped  a bullet from U.S. securities regulators.")
batch index: 5,
 label: tensor([2, 2, 2, 2]),
 batch: ("Rescuing an Old Saver If you think you may need to help your elderly relatives with their finances, don't be shy about having the money talk -- soon.", "Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by  #36;1.17 billion in the latest week to  #36;849.98 trillion, the Investment Company Institute said Thursday.", "Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", 'Kids Rule for Back-to-School The purchasing power of kids is a big part of why the back-to-school season has become such a huge marketing phenomenon.')
batch index: 6,
 label: tensor([2]),
 batch: ("Wall St. Bears Claw Back Into the Black  NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling  band of ultra-cynics, are seeing green again.",)

Get the random indices from RandomSampler

Sometimes, we might want to get the random indices from the DataLoader.

One way to do this is to let the __getitem__ method in Dataset return a triple (index, data, label)

Define a new NewsDataset class:

class NewsDataset(Dataset):
    def __init__(self, df):
        self.df = df
        self.texts = df['text']
        self.labels = df['label']

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, idx):
        text = self.texts.iloc[idx]
        label = self.labels.iloc[idx]
        return idx,text,label # <--- make the dataset return the index as well

Let's see if the NewsDataset works as desired.

# Initiate a NewsDataset
news_dataset = NewsDataset(news_df)
# Access the news of index 8
news_dataset[8]
(8,
 'Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.',
 2)

You can see in the output that now the news_dataset also returns the index. We can then create a DataLoader with this news_dataset.

# initiate a news RandomSampler
random_news_sampler = RandomSampler(news_dataset)
# initiate a news DataLoader
random_news_dataloader = DataLoader(news_dataset,batch_size = 4,sampler=random_news_sampler)

When looping through the batches, use the according triple (index, batch, label) to store what we get form the dataloader.

for batch_index, (index,batch,label) in enumerate(random_news_dataloader):
  print(f'batch index: {batch_index},\n index: {index},\n label: {label},\n batch: {batch}')
batch index: 0,
 index: tensor([ 1,  4,  7, 13]),
 label: tensor([2, 2, 2, 2]),
 batch: ('Carlyle Looks Toward Commercial Aerospace (Reuters) Reuters - Private investment firm Carlyle Group,\\which has a reputation for making well-timed and occasionally\\controversial plays in the defense industry, has quietly placed\\its bets on another part of the market.', 'Oil prices soar to all-time record, posing new menace to US economy (AFP) AFP - Tearaway world oil prices, toppling records and straining wallets, present a new economic menace barely three months before the US presidential elections.', 'Fed minutes show dissent over inflation (USATODAY.com) USATODAY.com - Retail sales bounced back a bit in July, and new claims for jobless benefits fell last week, the government said Thursday, indicating the economy is improving from a midsummer slump.', "Google IPO Auction Off to Rocky Start  WASHINGTON/NEW YORK (Reuters) - The auction for Google  Inc.'s highly anticipated initial public offering got off to a  rocky start on Friday after the Web search company sidestepped  a bullet from U.S. securities regulators.")
batch index: 1,
 index: tensor([20, 11, 21, 17]),
 label: tensor([2, 2, 2, 2]),
 batch: ("Google IPO faces Playboy slip-up The bidding gets underway for Google's public offering, despite last-minute worries over an interview with its bosses in Playboy magazine.", "No Need for OPEC to Pump More-Iran Gov  TEHRAN (Reuters) - OPEC can do nothing to douse scorching  oil prices when markets are already oversupplied by 2.8 million  barrels per day (bpd) of crude, Iran's OPEC governor said  Saturday, warning that prices could fall sharply.", 'Eurozone economy keeps growing Official figures show the 12-nation eurozone economy continues to grow, but there are warnings it may slow down later in the year.', "In a Down Market, Head Toward Value Funds There is little cause for celebration in the stock market these days, but investors in value-focused mutual funds have reason to feel a bit smug -- if only because they've lost less than the folks who stuck with growth.")
batch index: 2,
 index: tensor([23, 22, 19, 14]),
 label: tensor([2, 2, 2, 2]),
 batch: ('Rand falls on shock SA rate cut Interest rates are trimmed to 7.5 by the South African central bank,  but the lack of warning hits the rand and surprises markets.', 'Expansion slows in Japan Economic growth in Japan slows down as the country experiences a drop in domestic and corporate spending.', "Shell 'could be target for Total' Oil giant Shell could be bracing itself for a takeover attempt, possibly from French rival Total, a  press report claims.", "Dollar Falls Broadly on Record Trade Gap  NEW YORK (Reuters) - The dollar tumbled broadly on Friday  after data showing a record U.S. trade deficit in June cast  fresh doubts on the economy's recovery and its ability to draw  foreign capital to fund the growing gap.")
batch index: 3,
 index: tensor([ 6, 12,  2,  5]),
 label: tensor([2, 2, 2, 2]),
 batch: ("Money Funds Fell in Latest Week (AP) AP - Assets of the nation's retail money market mutual funds fell by  #36;1.17 billion in the latest week to  #36;849.98 trillion, the Investment Company Institute said Thursday.", 'Non-OPEC Nations Should Up Output-Purnomo  JAKARTA (Reuters) - Non-OPEC oil exporters should consider  increasing output to cool record crude prices, OPEC President  Purnomo Yusgiantoro said on Sunday.', "Oil and Economy Cloud Stocks' Outlook (Reuters) Reuters - Soaring crude prices plus worries\\about the economy and the outlook for earnings are expected to\\hang over the stock market next week during the depth of the\\summer doldrums.", 'Stocks End Up, But Near Year Lows (Reuters) Reuters - Stocks ended slightly higher on Friday\\but stayed near lows for the year as oil prices surged past  #36;46\\a barrel, offsetting a positive outlook from computer maker\\Dell Inc. (DELL.O)')
batch index: 4,
 index: tensor([ 0, 15, 24,  9]),
 label: tensor([2, 2, 2, 2]),
 batch: ("Wall St. Bears Claw Back Into the Black (Reuters) Reuters - Short-sellers, Wall Street's dwindling\\band of ultra-cynics, are seeing green again.", "Rescuing an Old Saver If you think you may need to help your elderly relatives with their finances, don't be shy about having the money talk -- soon.", 'Car prices down across the board The cost of buying both new and second hand cars fell sharply over the past five years, a new survey has found.', "Wall St. Bears Claw Back Into the Black  NEW YORK (Reuters) - Short-sellers, Wall Street's dwindling  band of ultra-cynics, are seeing green again.")
batch index: 5,
 index: tensor([ 8,  3, 18, 16]),
 label: tensor([2, 2, 2, 2]),
 batch: ('Safety Net (Forbes.com) Forbes.com - After earning a PH.D. in Sociology, Danny Bazil Riley started to work as the general manager at a commercial real estate firm at an annual base salary of  #36;70,000. Soon after, a financial planner stopped by his desk to drop off brochures about insurance benefits available through his employer. But, at 32, "buying insurance was the furthest thing from my mind," says Riley.', 'Iraq Halts Oil Exports from Main Southern Pipeline (Reuters) Reuters - Authorities have halted oil export\\flows from the main pipeline in southern Iraq after\\intelligence showed a rebel militia could strike\\infrastructure, an oil official said on Saturday.', 'US trade deficit swells in June The US trade deficit has exploded 19 to a record \\$55.8bn as oil costs drove imports higher, according to a latest figures.', 'Kids Rule for Back-to-School The purchasing power of kids is a big part of why the back-to-school season has become such a huge marketing phenomenon.')
batch index: 6,
 index: tensor([10]),
 label: tensor([2]),
 batch: ("Oil and Economy Cloud Stocks' Outlook  NEW YORK (Reuters) - Soaring crude prices plus worries  about the economy and the outlook for earnings are expected to  hang over the stock market next week during the depth of the  summer doldrums.",)

You can see that now the batches also contain the information of indices.