智能体引导的视频重定位网络

郭阿欣; 周圆; 霍树伟; 李硕士

期刊检索

关键词检索

新闻公告MORE

主管单位 中华人民共和国工业和信息化部 主办单位 哈尔滨工业大学主编李隆球 国际刊号ISSN 0367-6234 国内刊号CN 23-1235/T

期刊网站二维码

微信公众号二维码

引用本文:	郭阿欣,周圆,霍树伟,李硕士.智能体引导的视频重定位网络[J].哈尔滨工业大学学报,2026,58(3):120.DOI:10.11918/202308059
	GUO Axin,ZHOU Yuan,HUO Shuwei,LI Shuoshi.Agent-guided video re-localization network[J].Journal of Harbin Institute of Technology,2026,58(3):120.DOI:10.11918/202308059

【打印本页】【HTML】【下载PDF全文】【查看/发表评论】【下载PDF阅读器】【关闭】

过刊浏览高级检索

本文已被：浏览 1305次下载 18次	码上扫一扫！
分享到：微信更多字体:加大+\|默认\|缩小-
智能体引导的视频重定位网络
郭阿欣,周圆,霍树伟,李硕士
(天津大学电气自动化与信息工程学院,天津 300072)

摘要:

视频重定位的目标是在未经剪辑的参考视频中定位与给定查询视频语义相关的片段。这项任务不仅满足用户的实际浏览需求,而且在多种应用场景中发挥着重要作用。由于视频相较于图像、文本等其他数据类型包含更丰富的信息,因此在长视频中准确识别目标片段并确定其时间边界具有较大挑战。将视频重定位任务视为一个序贯决策过程,应用强化学习实现高效且准确的定位。具体而言,提出智能体引导的定位网络(AGLN),通过训练智能体基于学习到的策略逐步执行动作,细化定位片段的时间边界,从而找到与查询视频最相关的片段。此外,AGLN融合强化学习与监督学习,构建多任务学习框架,助力智能体更有效地探索环境并学习最优策略。在ActivityNet-VRL数据集上的实验结果表明,AGLN在视频重定位任务上的表现优于现有方法,其检索平均准确率达到了25.9%,相较于目前最佳方法提高了0.2个百分点。

关键词: 视频重定位强化学习智能体监督学习多任务学习

DOI：10.11918/202308059

分类号:TP391.4

文献标识码:A

基金项目:国家重点研发计划(2020YFC1523204)；国家自然科学基金(62171320,U2006211)

Agent-guided video re-localization network

GUO Axin,ZHOU Yuan,HUO Shuwei,LI Shuoshi

(School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China)

Abstract:

Video re-localization aims to localize a moment that semantically corresponds to a given query video from an untrimmed reference video. This task not only meets the actual browsing needs of users but also plays an important role in various application scenarios. Since videos contain richer information compared to other data forms like images and text, accurately identifying the target moment in a long video and determining its temporal boundaries are significantly challenging. This paper regarded the video re-localization task as a sequential decision-making process and applied reinforcement learning to achieve efficient and accurate localization. Specifically, this paper proposed an agent-guided localization network (AGLN), which trained an agent to progressively refine temporal boundaries of the localized moment based on the learned policy, thereby finding the most relevant moment to the query video. Additionally, AGLN combined reinforcement learning with supervised learning in a multi-task learning framework, aiding the agent in more effectively exploring the environment and learning the optimal policy. Experimental results on the ActivityNet-VRL dataset demonstrate that AGLN outperforms existing methods in the video re-localization task. The average retrieval accuracy of AGLN is 25.9%, which is 0.2 percentage points higher than the current optimal method.

Key words: video re-localization reinforcement learning agent supervised learning multi-task learning

期刊检索

关键词检索

新闻公告MORE

友情链接LINKS