How to load a custom dataset with tf.data [Tensorflow]

How to load a custom dataset with tf.data [Tensorflow]

Daniel Persson

6 лет назад

50,750 Просмотров

We look into how to create TFRecords and handle images from a custom dataset.
Later we load these records into a model and do some predictions.

Github repository:
https://github.com/kalaspuffar/tensorflow-data

India Exclusive: For a limited time, you can get an annual subscription to Coursera Plus for just INR 7999 (normally INR 33,0001). That’s over 75% in savings for unlimited access.
https://imp.i384100.net/c/3402234/2079384/14726

Expand your career horizons with 40% off Coursera Plus annual subscription
https://imp.i384100.net/c/3402234/2079378/14726



Learn from experts at Google and get in-demand AI skills you can apply to your work right away with Google AI Essentials, with zero experience required.
https://imp.i384100.net/c/3402234/2022070/14726

Join the channel to get access to more perks:
https://www.youtube.com/channel/UCnG-TN23lswO6QbvWhMtxpA/join

Or visit my blog at:
https://danielpersson.dev

Outro music: Danomate
- http://danomate.com
- https://www.youtube.com/user/danomate1

#tensorflow #tfdata #dataset
Ссылки и html тэги не поддерживаются


Комментарии:

Richard Yang
Richard Yang - 04.11.2021 16:02

Hi great video, im trying to set up a dataset for multiclass classification. Would this process work for that?

Ответить
urna kundu
urna kundu - 15.03.2021 16:56

Hi Daniel could you please share a similar excercise on a tabular data with mixed data types where we can have the pre processing steps as well in the tensorflow instead of any other platform. I tried to follow the tensorflow documentation but they did not clarify kuch as they were working one dataset but failing on the other. A guide similar to this will be helpful

Ответить
Alex Waber
Alex Waber - 23.08.2020 05:47

This is fantastic, but I can't seem to figure out how to use my new tfRecords in the mnist DC Gan tutorial. Do you know if this is possible?

Ответить
Aditya Shukla
Aditya Shukla - 25.06.2020 16:49

First of all, thank you Daniel for this excellent resource. Like others I too found the tensorflow documentation very difficult to understand. Secondly, please do a video walk-through of the code for implementing tf.keras.ctc_decode as well.

Ответить
prasanna annadevara
prasanna annadevara - 17.06.2020 13:27

Thanks a lot!
The only update needed in tf2 for createdataset.py(in github) would be "tf.python_io.TFRecordWriter" to "tf.io.TFRecordWriter"

Ответить
sai trinath dubba
sai trinath dubba - 14.06.2020 02:21

Thank you and Subscribed ! :)

Ответить
David Bacelj
David Bacelj - 17.05.2020 16:45

Awesome!

Ответить
Bilal Chandia Baloch
Bilal Chandia Baloch - 17.05.2020 00:50

How to create text based tensorflow dataset?

Ответить
H Dubbs
H Dubbs - 08.05.2020 16:44

Thank you so much for uploading this! I've been stuck for a few days trying to import pictures for my thesis and the documentation on the TF lousy...

Ответить
Rody El Hamod
Rody El Hamod - 03.04.2020 15:12

Thank you it was very helpful but i have a question. In this video you demonstrated how to transform a dataset of images into TFRecords. But i'm having troubles transforming a dataset of videos into TFRecords. Do you have any idea how to define the features for this task?

Ответить
gearstil
gearstil - 26.03.2020 20:56

Thank you for taking the time to make this video. It was exactly what I needed in order to start my development, like a spark to start an engine. I wasn`t able even to start a simple training. Now I discovered, for example, that in order to obtain smaller records files, you can use tf.gfile.GFile to encode the image. Thanks again!

Ответить
Preetham Rakshith
Preetham Rakshith - 24.03.2020 19:26

you are a absolute legend Thank_You

Ответить
Equity, Truth and Justice
Equity, Truth and Justice - 23.03.2020 10:43

This video has won a subscriber. I was literally pulling my hairs out. Thanks a million.

Ответить
Yoav Bachrach
Yoav Bachrach - 31.01.2020 12:15

Hi, how do I create a tensorflow dataset (tfds) with pictures that I alread have? Currently have around 16,000 images on my computer, but I don't know how to use them with the tensorflow code. I'm doing a binary classification testing wether or not there is a human in the picture. Thank you

Ответить
MakerDude
MakerDude - 17.01.2020 19:37

Great video Daniel. Very clear. Would've loved to hear more depth into the TF specifics like the AdamOptimizer, convolution/max pooling, etc. But solid work.

Ответить
Ömer Çiftci
Ömer Çiftci - 28.11.2019 21:11

Hello Daniel, I have dataset which contains very similar features, I am trying to recognize granite tiles, when I use this model my accuracy stuck at %8.33. I couldn't find your mail, I would be very happy if you help

Ответить
SysAdmin
SysAdmin - 21.07.2019 00:56

Does this work for video data sets? Haven’t completed the vid yet

Ответить
Ting Chan
Ting Chan - 26.04.2019 19:14

Hi Daniel,
I'm new in tensorflow.
I can't understand that there is only 2 classes (Cat and Dog), why the num_classes = 3 ?(in the github)
Could you tell me the reason? thanks!

Ответить
Pavan Kumar
Pavan Kumar - 08.02.2019 21:00

Hi Daniel, I find other tensorflow programs really hard to understand as data preparation and modelling was written in a single file. Here it is separated using tfrecords. Thank you so much!!. I am trying to enhance it to include Tensorboard now. Do you have any TensorBoard samples/examples to share?

Ответить
MrBombastic199
MrBombastic199 - 27.01.2019 23:23

I Search this for not Image Data =(

Ответить
tr rt
tr rt - 19.01.2019 12:01

Amazing! Do you have the same example of tf record and training with tensorflow for object detection (detecting more than one label in one image)?

Ответить
A S
A S - 10.01.2019 09:22

Hi Daniel,
Thanks for the videos. Really helpful. I believe you have very good experience so allow me to ask some questions. Let's say I want to make a binary classifier. What's the minimum number of pictures I should get to train a (Convolutional Neural Net) classifier and get above 90% accuracy. Also in case of multi-classification, what's the minimum number of pictures I should have in each class to get also above 90% accuracy. Also another question: Does increasing the number of classes affect negatively the accuracy? Last question: Let's say I don't have enough number of pictures for CNN classifier. Do you suggest other methods?
Many thanks for your support

Ответить
Eila Oriel Research
Eila Oriel Research - 22.12.2018 02:17

Hi Daniel, Thank you so much for your video. Could you please take a look at these tries and let me know what i am doing wrong. As you said, it is not well documented. Please see below the code that works with tf.data.TextLineDataset and fails with tf.contrib.data.make_csv_dataset.

This is the CSV file (3 images that were copied to the local directory, once the code is working, they will be moved back to gs://):

image,label,feat1,feat2
image1.PNG,1,x1,y1
image2.PNG,2,x2,y2
image3.PNG,3,x3,y3

This code that works:

import tensorflow as tf
tf.reset_default_graph()
dataset = tf.data.TextLineDataset(‘file1.txt’).skip(1)
dataset = dataset.map(lambda row: tf.string_split([row],delimiter=”,”).values)
dataset = dataset.map(lambda row: tf.image.decode_png(tf.read_file(row[0]), channels=3))
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
print(sess.run(next_element))

The code that doesnt work:

import tensorflow as tf
tf.reset_default_graph()
dataset = tf.contrib.data.make_csv_dataset(‘file1.txt’,batch_size=1,header=True)
dataset = dataset.map(lambda row: tf.image.decode_png(tf.read_file(row[‘image’]), channels=3))
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()
with tf.Session() as sess:
print(sess.run(next_element))

I get the following error:

ValueError: Shape must be rank 0 but is rank 1 for 'ReadFile' (op: 'ReadFile') with input shapes: [1].
What am I missing (syntax? access to the tensor values?)

Thanks, eilalan

Ответить
Zehra B.
Zehra B. - 04.12.2018 01:29

very useful, thanks a lot

Ответить
Pavan Kumar
Pavan Kumar - 19.11.2018 15:54

How many epochs are used in this algorithm? I dont see the number of epochs defined?

Ответить
insanity
insanity - 17.11.2018 21:46

'image_raw': _bytes_feature(img.tostring())
in this line why you are converting img.tostring() .??

Ответить
alexanderk
alexanderk - 14.11.2018 18:40

Hi Daniel, thank you for this video!

Sorry for this question, but how I can run this model after training with some image?

Ответить
Vinzenz Baur
Vinzenz Baur - 12.11.2018 13:33

Hi Daniel,
thank you very much for making this video and for publishing the code on github! this is exactly what I was looking for

I'm new to python and neural networks and have a problem:
When I'm running create_dataset, I get this Error: [Errno 2] No such file or directory

I saved the folder "PetImages" into my Project folder, which I suppose is my current working directory
I also used the same code you did.
what did I do wrong? do I have to save the folder somewhere else?

Sorry to bother you with something like this, but I would be glad, if you could help me

Ответить
Woj paw
Woj paw - 05.11.2018 14:44

Fantastic thank you!

Ответить
İdris KARAALİ
İdris KARAALİ - 26.10.2018 09:39

It is helpful. Thanks for it! :)

Ответить
Ki Ki
Ki Ki - 26.10.2018 04:19

Hi, Daniel.
This video is great and helpful. Thank you. And I have a problem about the filenames. Should we put the file path before the filenames? When I run my code, I am trying to print the images and labels and got this 'InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got empty file'.
import tensorflow as tf
from PIL import Image
import numpy as np
import os

def train_input_fn():
filenames = ["mytrain.tfrecords"]
dataset = tf.data.TFRecordDataset(filenames)

def parser(record):
keys_to_features = {
"image_data": tf.FixedLenFeature((), tf.string, default_value=""),
"date_time": tf.FixedLenFeature((), tf.int64, default_value=0),
"label": tf.FixedLenFeature((), tf.int64,
default_value=tf.zeros([], dtype=tf.int64)),
}
parsed = tf.parse_single_example(record, keys_to_features)

image = tf.image.decode_jpeg(parsed["image_data"])
image = tf.reshape(image, [128, 128, 3])
label = tf.cast(parsed["label"], tf.int32)

return {"image_data": image, "date_time": parsed["date_time"]}, label

dataset = dataset.map(parser)
dataset = dataset.shuffle(buffer_size=10000)
dataset = dataset.batch(32)
dataset = dataset.repeat(1)
iterator = dataset.make_one_shot_iterator()

features, labels = iterator.get_next()
return features, labels

images, labels = train_input_fn()

init_op = tf.global_variables_initializer()
# I am trying to print the images and labels
with tf.Session() as sess:
sess.run(init_op)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord = coord)
for i in range(230):
image, label = sess.run([images, labels])
img = Image.fromarray(image, 'RGB')
img.save(cwd+str(i) + '_''Label_'+str(l)+'.jpg')
print(image, label)
coord.request_stop()
coord.join(threads)

Ответить
Jack wang
Jack wang - 20.10.2018 18:23

Very helpful video!!
You're certainly true about the Tensorflow doc haha
Instead of struggling through all the pages I'd recommend your video
Nice work :)

Ответить
can uzun
can uzun - 15.10.2018 16:32

Hi Daniel, Thanks for this great video. I am trying to make a GAN network with custom dataset. Can I use the train.tfrecord for input data for GAN network? If yes, how can I load the dataset for GAN

Ответить
Pavan Kumar
Pavan Kumar - 12.10.2018 19:47

Thank you so much. This presentation helped me to get myself going. While trying your example I observed that tf.identify() was used. May I know what is the purpose of tf.identity() in the context of this example? Thank you so much

Ответить
Paul Zimmer
Paul Zimmer - 12.10.2018 12:35

Sorry to bother you again Daniel,
I got everything to work but Im getting an accuracy of around 95% staright off the bat, which to me seems unreasonable. Any chance that this could be?
Also you mentioned in the video that you printed the images, if possible could you please put that on your github.
Thank you :)

Ответить
saurabh dasgupta
saurabh dasgupta - 11.10.2018 18:15

Thank you so much. After more than 20 years of developer experience, TF makes me feel that I know nothing, I feel wasted.

Ответить
Prince Canuma
Prince Canuma - 05.10.2018 00:33

Hi, how can I load png images to a tf dataset pipeline without going through the fuss of making it to string and back again. Also is there a easier way to create tf. Records?

Ответить
Samuel Pearce-Davies
Samuel Pearce-Davies - 30.09.2018 14:33

This is great stuff, thank you! :) I'm a bit of a python newbie (more of a coding newbie in general), and for my Masters I'm trying to get a basic deep learning system set up through Tensorflow to learn from a custom dataset. I'm learning from a pluralsight tutorial on basic Tensorflow usage, half of which I don't actually understand but am hoping I can make some tweaks to get it working without having a 100% understanding of what every step is doing.
I'm preparing image files for the dataset (non-standard size of approx 25x256 pixels, greyscale), and hoping I can find a fairly easy way to use them in the tutorial code instead of the MNIST dataset already being used.
Should I be able to use the code from your video to prepare the files and then load them into the tutorial code I'm working with (making a few changes to account for the different size of the images) and have it work properly? Or is it not as simple as that? Sorry if this is a stupid question.

Ответить
Paul Zimmer
Paul Zimmer - 28.09.2018 22:09

Im sorry if this seems like a rather trivial question, I am very new too both python and tensorflow, but is "addr" and "vaule" supposed to be replaced with actual values and addresses. I have a gut feeling they shouldn't be, in which case I dont really understand when the algrothim loads the images.

Thank you so much for the video, and for taking the time!

Ответить
Reddy Tintaya
Reddy Tintaya - 24.08.2018 02:21

does it work with more than just 2 types of images?
i want to create a dataset with 26 types of images

Ответить
shubham tawade
shubham tawade - 22.08.2018 16:31

For more than two clasess what changes should be done in above program

Ответить
CHETTOUR HAMZA
CHETTOUR HAMZA - 03.08.2018 16:29

Hey Daniel, Awesome vid , thank you !
I have a question : image = tf.decode_raw(parsed["image_raw"], tf.uint8)
Why don't you set tf.float32 as an output format directly ?

Ответить
Vinayak Gosale
Vinayak Gosale - 24.07.2018 09:46

I found Tensorflow documentation quite hard to follow in general

Ответить
Tony Li
Tony Li - 28.06.2018 20:14

You look like an absolute unit

Ответить
Peter Collins
Peter Collins - 16.06.2018 01:34

Very helpful!

Ответить
juan pablo ruiz rodriguez
juan pablo ruiz rodriguez - 16.06.2018 00:49

i was in the edge of having a mental breakdown until i saw your video.

Ответить
Nguyen Mau Dung
Nguyen Mau Dung - 09.06.2018 18:34

Thank you so much. I'm really interested in this video. Hope to see more videos that explain tensorflow from you in the future! ^^

Ответить
Siya Shabir
Siya Shabir - 03.06.2018 21:44

Your video is really nice. I need help in setting labels. I want to set string type labels up to 26. You were just setting 0 and 1. I want to change that. So How can I do that in code?? As I have not worked much in python...

Ответить