Video Generation Pipeline
manuscript in prepMulti-model video-generation pipeline combining YOLO11 detection, BLIP2 captioning, and AnimateDiff diffusion. Uses confidence-weighted keyword fusion and a temporal-chaining scoring function for cross-frame visual consistency.