Building tools and models at the intersection of computer vision, generative AI, and video understanding.
Thoughts on AI engineering and tooling
从 Mitchell Hashimoto 的定义出发,深入探讨 Agent = Model + Harness 范式。分析 WorkBuddy 如何作为一套完整的 Harness 运行,以及自建 OpenClaw 如何集成 OpenHarness 构建可靠的 AI 编码 Agent。
Read Article动手实现了一个最小可用的 Agent Harness。多 provider 支持、Agent Loop、AGENTS.md 上下文注入、MEMORY.md 跨会话记忆、权限边界、生命周期钩子。附完整源码和三个示例。
Read ArticleOriginal open-source work
Automated arXiv paper tracking hub for Video, World Models, Agents & Tone/Color research. Daily updates with translated abstracts.
ToolMinimal Agent Harness POC — Agent = Model + Harness. ~300 lines of Python with multi-provider support, memory, permissions, and hooks.
Reproductions and experiments — I've tried these
Unified video editing with temporal reasoner. State-of-the-art framework for coherent video manipulation.
ICLR 2026Key photo restoration in Live Photos via reference-guided diffusion. Reselect and restore your best moments.
SIGGRAPH Asia 2025Augmenting real videos with dynamic visual content. Seamlessly add effects to existing footage.
CVPRRegion-constrained in-context generation for instructional video editing with precise spatial control.
CVPR 2023 · 1st PlaceWinning solution for CVPR 2023 1st Foundation Model Challenge, Track 2 — cross-modal track champion.