ChatGPT解决这个技术问题 Extra ChatGPT

Get class labels from Keras functional model

I have a functional model in Keras (Resnet50 from repo examples). I trained it with ImageDataGenerator and flow_from_directory data and saved model to .h5 file. When I call model.predict I get an array of class probabilities. But I want to associate them with class labels (in my case - folder names). How can I get them? I found that I could use model.predict_classes and model.predict_proba, but I don't have these functions in Functional model, only in Sequential.


E
Emilia Apostolova
y_prob = model.predict(x) 
y_classes = y_prob.argmax(axis=-1)

As suggested here.


This gives me the offset, but I already had a way to figure that out... how do I get the label name?
Keras sorts the labels (names of folders in the train directory) by alphabetical order. If you have a list of labels called labels, the predicted label name will be: predicted_label = sorted(labels)[y_classes]
you can also call model.predict_classes to retrieve the highest probability class in a multi-class output vector
Hey @Guillaume could you please point me to the docs where this alphabetical ordering is mentioned? This is a very crucial info that I just don't seem to find anywhere. thanks
nm, found it in directory_iterator.py in keras_preprocessing with code as , ` classes = [] for subdir in sorted(os.listdir(directory)): if os.path.isdir(os.path.join(directory, subdir)): classes.append(subdir) `
L
Lokesh Kumar

When one uses flow_from_directory the problem is how to interpret the probability outputs. As in, how to map the probability outputs and the class labels as how flow_from_directory creates one-hot vectors is not known in prior.

We can get a dictionary that maps the class labels to the index of the prediction vector that we get as the output when we use

generator= train_datagen.flow_from_directory("train", batch_size=batch_size)
label_map = (generator.class_indices)

The label_map variable is a dictionary like this

{'class_14': 5, 'class_10': 1, 'class_11': 2, 'class_12': 3, 'class_13': 4, 'class_2': 6, 'class_3': 7, 'class_1': 0, 'class_6': 10, 'class_7': 11, 'class_4': 8, 'class_5': 9, 'class_8': 12, 'class_9': 13}

Then from this the relation can be derived between the probability scores and class names.

Basically, you can create this dictionary by this code.

from glob import glob
class_names = glob("*") # Reads all the folders in which images are present
class_names = sorted(class_names) # Sorting them
name_id_map = dict(zip(class_names, range(len(class_names))))

The variable name_id_map in the above code also contains the same dictionary as the one obtained from class_indices function of flow_from_directory.

Hope this helps!


In my interpretation this answers the actual question - getting the class labels
I agree, I think this should be the accepted answer.
Saved my day <3
shouldnt we defined the dictionary the other way around? dict(zip(range(len(class_names),class_names, ))) so that you can directly use the output of argmax as a key?
B
Bohumir Zamecnik

UPDATE: This is no longer valid for newer Keras versions. Please use argmax() as in the answer from Emilia Apostolova.

The functional API models have just the predict() function which for classification would return the class probabilities. You can then select the most probable classes using the probas_to_classes() utility function. Example:

y_proba = model.predict(x)
y_classes = keras.np_utils.probas_to_classes(y_proba)

This is equivalent to model.predict_classes(x) on the Sequential model.

The reason for this is that the functional API support more general class of tasks where predict_classes() would not make sense.

More info: https://github.com/fchollet/keras/issues/2524


Currently, the code for np.utils.py (see github.com/fchollet/keras/blob/master/keras/utils/np_utils.py) doesn't have probas_to_classes method. Did they change this into some other function? Please help me.
I have the same issue as @noobalert mentioned, it doens't have the function.
use y_classes = y_proba.argmax(axis=-1) instead
AttributeError: module 'keras' has no attribute 'np_utils'
@Zach Why axis = -1 and not 1 ?
H
Hemerson Tacon

In addition to @Emilia Apostolova answer to get the ground truth labels, from

generator = train_datagen.flow_from_directory("train", batch_size=batch_size)

just call

y_true_labels = generator.classes

This doesn't seem to give the label names which is what the OP is asking.
As I said, it's a complement to the @Emilia Apostolova 's answer, using this in addition to what she said, you can get the label names just using map. Particularly I used this to make the confusion matrix. Back in time when I posted this, I didn't have reputation to comment in her answer, because that I posted here.
J
Joel Carneiro

You must use the labels index you have, here what I do for text classification:

# data labels = [1, 2, 1...]
labels_index = { "website" : 0, "money" : 1 ....} 
# to feed model
label_categories = to_categorical(np.asarray(labels)) 

Then, for predictions:

texts = ["hello, rejoins moi sur skype", "bonjour comment ça va ?", "tu me donnes de l'argent"]

sequences = tokenizer.texts_to_sequences(texts)

data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)

predictions = model.predict(data)

t = 0

for text in texts:
    i = 0
    print("Prediction for \"%s\": " % (text))
    for label in labels_index:
        print("\t%s ==> %f" % (label, predictions[t][i]))
        i = i + 1
    t = t + 1

This gives:

Prediction for "hello, rejoins moi sur skype": 
    website ==> 0.759483
    money ==> 0.037091
    under ==> 0.010587
    camsite ==> 0.114436
    email ==> 0.075975
    abuse ==> 0.002428
Prediction for "bonjour comment ça va ?": 
    website ==> 0.433079
    money ==> 0.084878
    under ==> 0.048375
    camsite ==> 0.036674
    email ==> 0.369197
    abuse ==> 0.027798
Prediction for "tu me donnes de l'argent": 
    website ==> 0.006223
    money ==> 0.095308
    under ==> 0.003586
    camsite ==> 0.003115
    email ==> 0.884112
    abuse ==> 0.007655

F
Fedor Petrov

It is possible to save a "list" of labels in keras model directly. This way the user who uses the model for predictions and does not have any other sources of information can perform the lookup himself. Here is a dummy example of how one can perform an "injection" of labels

# assume we get labels as list
labels = ["cat","dog","horse","tomato"]
# here we start building our model with input image 299x299 and one output layer
xx = Input(shape=(299,299,3))
flat = Flatten()(xx)
output = Dense(shape=(4))(flat)
# here we perform injection of labels
tf_labels = tf.constant([labels],dtype="string")
tf_labels = tf.tile(labels,[tf.shape(xx)[0],1])
output_labels = Lambda(lambda x: tf_labels,name="label_injection")(xx)
#and finaly creating a model
model=tf.keras.Model(xx,[output,output_labels])

When used for prediction, this model returns tensor of scores and tensot of string labels. Model like this can be saved to h5. In this case the file contains the labels. This model can also be exported to saved_model and used for serving in the cloud.


P
Peter

To map predicted classes and filenames using ImageDataGenerator, I use:

# Data generator and prediction
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
        inputpath,
        target_size=(150, 150),
        batch_size=20,
        class_mode='categorical',
        shuffle=False)
pred = model.predict_generator(test_generator, steps=len(test_generator), verbose=0)
# Get classes by max element in np (as a list)
classes = list(np.argmax(pred, axis=1))
# Get filenames (set shuffle=false in generator is important)
filenames = test_generator.filenames

I can loop over predicted classes and the associated filename using:

for f in zip(classes, filenames):
    ...

Addendum:

The path in which the images are located inputpath needs to have a subdirectory in which the images are actually stored. The reason is that the generator looks for subdirectories. The generator will give a feedback during prediction:

Found 283 images belonging to 1 classes.

The 1 classes part refers to the one subdirectory (this comes from the generator and is unrelated to the actual prediction).

So when your inputpath is (for example) C:/images/, the actual images are located in C:/images/temp/.


Could I ask a question about this? I've asked the full question here: stackoverflow.com/questions/71125533/… but for this, is inputpath one directory with a mix of two-class test images, or is it a directory, with two sub-directories, one per class?
@Slowat_Kela: See my update
Thanks a mil. Thought that would help but when I run your code exactly as above, they're all assigned to one class. if I read in each image with image.load_img and img_to_array and predict, they go to different classes. will make it a new question, thanks.
The "one class" comes from the generator (which thinks "this must be one class since it is one subdir"). However, this part is only for data staging. The actual prediction follows after the generator step and will assign classes according to the model prediction.

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now