TransVG: End-to-End Visual Grounding with Transformers

时间：2021-04-20 14:14:12 阅读：104 评论：0 收藏：0 [点我收藏+]

TransVG: End-to-End Visual Grounding with Transformers

2021-04-20 10:37:54

Paper: https://arxiv.org/abs/2104.08541

Code: Not available yet

1. Background and Motivation:

本文提出了首个基于 Transformer 模型的 Visual Grounding 算法框架，从下图可以看到，主要包含三个模块：language-Transformer，Image-Transformer，以及Vis-Lang-Transformer。作者的实验表明结构化的融合模块并不是必须的，因为简单地进行 Transformer 编码层的堆叠就可以得到较好的效果。因为，attention layer 已经建模了模态内和模态间的对应关系，尽管不用任何特定的融合模块。此外，作者也发现直接回归矩形框位置，比之前任何一种方法，效果都要好。

技术分享图片

2. Approach：

技术分享图片

2.1.

TransVG: End-to-End Visual Grounding with Transformers

原文：https://www.cnblogs.com/wangxiaocvpr/p/14680131.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)