ChatGPT解决这个技术问题 Extra ChatGPT

How can I use a pre-trained neural network with grayscale images?

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them. I'd very much like to fine-tune a pre-trained model (like the ones here).

The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.

I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224).

Is there any way that I can use one of those models? I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers). Is this approach correct?

You can try removing the input layer and added your own. Then you can attempt training only that layer. If you do not see the loss decreasing with all other layers locked out, it isn't going to work out for you this way.
Don't ask us whether this approach is correct: ask the computer! Try it! Another approach is to triple the input vectors: feed the gray-scale values to all three color layers.
My personal feeling is that this is not going to work out for you. These classification networks are definitely using interrelationships between colors to classify objects, and this information is deeply ingrained in the weights of intermediate layers
@Prune Training these models can take days, I'd appreciate a bit of insight if anyone has encountered this problem before...
As mentioned by others, it's feasible to stack 3 identical greyscale arrays as input-- but I would explore this as as opportunity to implement more data augmentation-- apply image filters to the original greyscale image and randomly assign them to the 3 channels.

D
Djib2011

The model's architecture cannot be changed because the weights have been trained for a specific input configuration. Replacing the first layer with your own would pretty much render the rest of the weights useless.

-- Edit: elaboration suggested by Prune-- CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input. In your case the second layer has been trained to expect the features of the first layer. By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained. I doubt that they could retain any of the knowledge learned during the initial training. --- end edit ---

There is an easy way, though, which you can make your model work with grayscale images. You just need to make the image to appear to be RGB. The easiest way to do so is to repeat the image array 3 times on a new dimension. Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.

In numpy this can be easily done like this:

print(grayscale_batch.shape)  # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape)  # (64, 224, 224, 3)

The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.

I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.


stacking 1 channel images is easy to do, but the question isn't how to make an image 3 channel, it's whether he can use a pretrained model for classification when his original images are 1 channel, and I think the answer is probably no
This is pretty much the default approach when dealing with grayscale images. I've done it a couple of times and it works fine, its even the default setting in keras' ImageDataGenerator to load the grayscale image repeated 3 times. Think of it as a reverse RGB -> grayscale transform (where gray=(R+B+G)/3).
This shows how to make the second attempt I suggested; it does not answer the original question. Will this result in valid fine-tuning on the gray-scale input?
The first paragraph of your answer is the direct part: can you elaborate on that to convince OP?
"Replacing the first layer with your own would pretty much render the rest of the weights useless." - are you sure about that? An experiment to check this would be to train a neural network e.g. on ImageNet and see how long it "typically" needs to get to a certain accuracy. Then re-initialize the input layer and see how long it takes to get to that accuracy again. I'm convinced that it would take a lot less time with the initialized network.
r
rwightman

Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient. You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal. The modified model will both work out of the box (with reduced accuracy) and be finetunable. Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.

To do this, you'll have to add some code where the pretrained weights are loaded. In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model. The required modification is to sum the weight tensor over the dimension of the input channels. The way the weights tensor is organized varies from framework to framework. The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].

Using PyTorch as an example, in a ResNet50 model from Torchvision (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), the shape of the weights for conv1 is [64, 3, 7, 7]. Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7]. At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.

To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights. There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.

Unmodified ResNet50 w/ RGB Images : Prec @1: 75.6, Prec @5: 92.8 Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec @1: 64.6, Prec @5: 86.4 Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec @1: 63.8, Prec @5: 86.1

def _load_pretrained(model, url, inchans=3):
    state_dict = model_zoo.load_url(url)
    if inchans == 1:
        conv1_weight = state_dict['conv1.weight']
        state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
    elif inchans != 3:
        assert False, "Invalid number of inchans for pretrained weights"
    model.load_state_dict(state_dict)

def resnet50(pretrained=False, inchans=3):
    """Constructs a ResNet-50 model.
    Args:
        pretrained (bool): If True, returns a model pre-trained on ImageNet
    """
    model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
    if pretrained:
        _load_pretrained(model, model_urls['resnet50'], inchans=inchans)
    return model

Sounds super cool! So this works because r*w0+g*w1+b*w2 is equivalent to r*(w0+w1+w2) when r = g = b (grayscale)?
So, this looks like exactly what I am interested in, however, it's not clear whether we will get a 3 x speed up in processing? I have an RGB image where green chan is much better than others, and was hoping that using just green would be much faster. Can you clarify what the times for this were, and whether the aim of this approach is indeed to speed up training and prediction? Thanks so much!
@Cat you may observe a minimap speedup, but nowhere close to 3x because this is just one layer, and the rest of the network remains the same.
This answer contradicts with the accepted answer. If this answer is correct (which I think is) it should be marked as the accepted one. Otherwise, people would be misleaded after reading the accepted answer.
m
mmrbulbul

A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model. Like this:

from keras.models import Model
from keras.layers import Input 

resnet = Resnet50(weights='imagenet',include_top= 'TRUE') 

input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor)    # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x) 

model = Model(inputs=input_tensor,outputs=out)



ValueError: You are trying to load a weight file containing 13 layers into a model with 14 layers. Any idea how to avoid this?
@Madara Can you please share more details? Also please check whether you are including the final layer or not.
# input_shape = (64,64,1) input_tensor = Input(shape=image_shape) model_input = Conv2D(filters = 3, kernel_size=3, padding='same', name="input_conv")(input_tensor) Now I import the VGG16 model vgg16 = VGG16(include_top=False, weights='imagenet',input_tensor=model_input) Then I try to add a Conv layer X = Conv2D(channels, kernel_size=3, padding='same')(vgg16.output) X = Activation('tanh')(X) And the final model: model = Model(inputs = input_tensor, outputs = X)
Just split the line vgg16 = VGG16(include_top=False, weights='imagenet',input_tensor=model_input) into vgg16 = VGG16(include_top=False, weights='imagenet') vgg16 = vgg16(model_input) Should be working.
H
Hu Xixi

why not try to convert a grayscale image to a RGB image?

tf.image.grayscale_to_rgb(
    images,
    name=None
)

N
NielsSchneider

Dropping the input layer will not work out. This will cause that the all following layers will suffer.

What you can do is Concatenate 3 black and white images together to expand your color dimension.

img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])    

model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)

While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. See more details on how to answer at this link: stackoverflow.com/help/how-to-answer
h
hafiz031

I faced the same problem while working with VGG16 along with gray-scale images. I solved this problem like follows:

Let's say our training images are in train_gray_images, each row containing the unrolled gray scale image intensities. So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set. So before passing to fit function do the following:

Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image). The only difference is here we are using the number of the channel is 3.

dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8) 

Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images". (Here the dimensions are [no_of_examples, height, width, channel])

dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]

Finally pass the dummy_RGB_images instead of the gray scale data-set, like:

model.fit(dummy_RGB_images,...)

A
Ahmed Baruwa

numpy's depth-stack function, np.dstack((img, img, img)) is a natural way to go.


They basically yeild the same result, np.dstack just appears to be a little more straightforward
p
pepe

If you're already using scikit-image, you can get the desired result by using gray2RGB.

from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)

J
Julian

I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.

What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):

(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

With the following layer:

(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:

torch.Size([64, 3, 7, 7])

So I did:

resnet18.conv1.weight.data = resnet18.conv1.weight.data.sum(axis=1).reshape(64, 1, 7, 7)

And then check that the output of the new model is the same than the output with the gray scale image:

y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
 tensor(710.8861, grad_fn=<SumBackward0>))

input_image_1: one channel image

input_image_3: 3 channel image (gray scale - all channels equal)

model_resnet_1: modified model

model_resnet_3: Original resnet model


A
Ali Amini Bagh

It's really easy ! example for 'resnet50': before do it you should have :

resnet_50= torchvision.models.resnet50()     
print(resnet_50.conv1)

Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

Just do this !

resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

the final step is to update state_dict.

resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)

so if run as follow :

print(resnet_50.conv1)

results would be :

Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

As you see input channel is for the grayscale images.


S
Sian Cao

what I did is to just simply expand grayscales into RGB images by using the following transform stage:

import torchvision as tv
tv.transforms.Compose([
    tv.transforms.ToTensor(),
    tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
])

T
Tufail Waris

You can use OpenCV to convert GrayScale to RGB.

cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)

COLOR_GRAY2BGR color mode basically replaces all B, G, R channels with the gray value Y, so B=Y, G=Y, R=Y.
j
jae heo

When you add the Resnet to model, you should input the input_shape in Resnet definition like

model = ResNet50(include_top=True,input_shape=(256,256,1))

.


This doesn't run: ValueError: The input must have 3 channels

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now