Google應用多模態大型語言模型解決影片生成任務
· 2023-12-22

有別於當前影片生成模型多為擴散模型,Google的多模態大型語言模型VideoPoet,可完成各種影片生成任務產出高品質影片,單一模型就可生成影片與配樂

VideoPoet能以前一秒的影片預測下1秒的影片,以連續預測的方式達到生成更長影片的目的,而這種方法不只可以有效延長影片,而且經過多次迭代後仍能保持影片主體的外觀不變。VideoPoet生成的影片也能夠以互動的方式編輯,像是改變影片中物體的運動,使其執行不同的動作,且編輯會從影片的第一個影格,或是中段的影格開始,提供了高度可編輯控制性。使用者也可以透過文字提示,添加需要的攝影機運動方式,藉此精確地控制攝影機的移動。

經過評估,VideoPoet能夠良好的執行影片生成任務,在多項基準測試中,VideoPoet較其他模型表現更好。研究人員要求評估者根據偏好選擇,在文字準確度方面,平均24%-35%VideoPoet的範例被認為更符合指令描述,而其他模型的比例則為8%-11%。評估者還更傾向選擇VideoPoet範例,認為其中41%-54%範例呈現出更有趣的運動方式,相較於其他模型比例只有11%-21%(下圖)。

VideoPoet的研究貢獻在於展示大型語言模型的能力,也具有生成高度競爭力影片的能力,特別是在高品質的動作表現方面。研究人員指出,對於未來研究,他們的框架會朝向支援任意形式生成任意形式內容的方向發展。

Popular articles
1spin4win releases unique slot Don Catleone Hold and Win featuring gangster cats
Online Game
Super PAC Raises $48 Million: Sports Betting Forces Ramp Up Political Push
Regulation
B2B Tech Infrastructure Gains Momentum in Philippine Gaming Sector
Southeast Asia
1spin4win grows its Latin American presence by partnering with Fortuna Juegos
Online Game
Vietnam’s Controlled Gaming Shift Gains Ground, But Domestic Demand Still Lags
Southeast Asia
Across 6 Cities: HUIDU Invites You to 8 World Cup Parties Redefining High-Value Social Networking
HUIDU Focus
GAT Expo Puerto Rico Will Pulse with the New Era of Gaming in the Caribbean
Marketing
GGC Awards 2026 Shines in Colombo: Honoring Leaders and Innovators in the iGaming Industry
HUIDU Focus
HUIDU Invites You to Booth T70 at iGB L!VE 2026 — Let’s Ignite London This July!
HUIDU Focus
Brazil Proposes Raising Gambling Tax Rate to 24%, With Revenue Allocated to Social Security and Healthcare
Regulation
SBC Summit Canada to Make Player Safety a Key Pillar of 2026 Agenda
Marketing
Kazakhstan plans to penalise online casino promotions
Regulation
British gambling levy rates confirmed for each vertical
Regulation
Institutional Academy that exceeded expectations marked the opening of GAT CDMX
Online Game
Are you ready to maximize your earnings? Try ProPush.me Constructor!
Marketing
Home
Game
Cooperation
Find
My