首页 > 其他 > 详细

Image Captioning论文合辑

时间:2018-11-02 18:52:10      阅读:443      评论:0      收藏:0      [点我收藏+]

 Image Caption

Automatically describing the content of an image

CV+NLP

数据集:Flickr8k,Flickr30k,MSCOCO,Visual Genome

评测指标:BLEU,METEOR,CIDEr,ROUGE

Learning to Evaluate Image Captioning(CVPR 2018)

技术分享图片

 

技术分享图片

 

Show and Tell: A Neural Image Caption Generator(CVPR 2015)

directly maximize the probability of the correct description given the image by using the following formulation:

技术分享图片

θ are the parameters of our model, I is an image, and S its correct transcription

技术分享图片

Encoder:Inception-V2

Decoder:LSTM
Inference:BeamSearch

 

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML 2015)

Highlight:Attention Mechnism(Soft&Hard)

技术分享图片

 "Soft" attention:different parts,different subregions

"Hard" attention:only one subregion.Random choice

 

Sumary:

1.Attention involves focus of certain parts of input

2.Soft Attention is Deterministic.Hard attention is Stochastic.

3.Attention is used in NMT, AttnGAN, teaching machines to read.

Image Captioning with Semantic Attention(CVPR 2016)

 

SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning(CVPR 2017)

Highlight:Spatial and Channel-Wise Attention

技术分享图片

技术分享图片

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning(CVPR 2017)

Hightlight:Adaptive Attention

Semantic Compositional Networks for Visual Captioning(CVPR 2017)

技术分享图片技术分享图片

Deep Reinforcement Learning-based Image Captioning with Embedding Reward (CVPR 2017)

A decision-making framework for image captioning.

 A "policy network" and a "value network" to collaboratively generate captions.

 

 

 

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering(CVPR 2018)

In the human visual system, attention can be focused volitionally by top-down signals determined by the current task(e.g.,looking for something), and automatically by bottom-up signals associated with unexpected, novel or salient stimuli.

top-down:attention mechanisms driven by non-visual or task-specific context; feature weights;

bottom-up:purely visual feed-forward attention mechanisms;based on Faster-RCNN proposes image regions (feature vector);

技术分享图片

 

技术分享图片

 

技术分享图片

 

Image Captioning论文合辑

原文:https://www.cnblogs.com/czhwust/p/imagecaption.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!