Image Captioning论文合辑

时间：2018-11-02 18:52:10 阅读：443 评论：0 收藏：0 [点我收藏+]

Image Caption

Automatically describing the content of an image

CV+NLP

数据集：Flickr8k,Flickr30k,MSCOCO,Visual Genome

评测指标:BLEU,METEOR,CIDEr,ROUGE

Learning to Evaluate Image Captioning(CVPR 2018)

技术分享图片

Show and Tell: A Neural Image Caption Generator（CVPR 2015）

directly maximize the probability of the correct description given the image by using the following formulation:

技术分享图片

θ are the parameters of our model, I is an image, and S its correct transcription

技术分享图片

Encoder:Inception-V2

Decoder:LSTM
Inference:BeamSearch

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention （ICML 2015）

Highlight:Attention Mechnism(Soft&Hard)

技术分享图片

"Soft" attention:different parts,different subregions

"Hard" attention:only one subregion.Random choice

Sumary:

1.Attention involves focus of certain parts of input

2.Soft Attention is Deterministic.Hard attention is Stochastic.

3.Attention is used in NMT, AttnGAN, teaching machines to read.

Image Captioning with Semantic Attention(CVPR 2016)

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning(CVPR 2017)

Highlight:Spatial and Channel-Wise Attention

技术分享图片

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning（CVPR 2017）

Hightlight:Adaptive Attention

Semantic Compositional Networks for Visual Captioning(CVPR 2017)

技术分享图片

Deep Reinforcement Learning-based Image Captioning with Embedding Reward (CVPR 2017)

A decision-making framework for image captioning.

A "policy network" and a "value network" to collaboratively generate captions.

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering(CVPR 2018)

In the human visual system, attention can be focused volitionally by top-down signals determined by the current task(e.g.,looking for something), and automatically by bottom-up signals associated with unexpected, novel or salient stimuli.

top-down:attention mechanisms driven by non-visual or task-specific context; feature weights;

bottom-up:purely visual feed-forward attention mechanisms;based on Faster-RCNN proposes image regions (feature vector);

技术分享图片

Image Captioning论文合辑

原文：https://www.cnblogs.com/czhwust/p/imagecaption.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)