Computer Vision & Graphics

by Synced 2025-01-06 44

Nvidia Intensifies Robot Push with New Humanoid Platform as Industry Giants Eye Lucrative Future

Nvidia will launch Jetson Thor for humanoid robots in H1 2025, entering a growing market where Google is also active. The robotics sector is projected for substantial growth. Nvidia offers integrated hardware and software solutions. Simultaneously, China’s rapidly developing domestic humanoid robot market presents emerging competition.

by Synced 2024-09-24 5

AI Asia China Computer Vision & Graphics Global News Press Release Research

ByteDance Disrupts Video Generation Race with Breakthrough in Multi-Subject Interaction

On September 24, ByteDance’s technology arm, Volcano Engine, introduced two state-of-the-art video generation models, PixelDance and Seaweed, which significantly enhanceContinue Reading

by Synced 2024-07-25 4

AI Computer Vision & Graphics Machine Learning & Data Science Research

Automating Video Highlights: Breakthrough Unsupervised Method Leverages Audio and Visual Cues

A research team from Saskatchewan University and Google introduces an innovative unsupervised method for automatic video highlight detection, eliminating the requirements for manual annotations while achieving superior performance compared to previous methods.

by Synced 2023-12-26 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Reconstructing Videos In Just 14 Seconds: Meta AI’s Fairy Accelerates Video Synthesis by 44×

A Meta GenAI research team introduces Fairy, a versatile and efficient video-to-video synthesis framework. Fairy stands out for its ability to generate high-quality videos at remarkable speed, producing 120-frame 512×384 videos in just 14 seconds, surpassing previous works by a factor of at least 44×.

by Synced 2023-11-13 6

AI Computer Vision & Graphics Machine Learning & Data Science Research

Adobe & ANU’s LRM Reconstructs Models For Single Image to 3D in 5s

In a new paper LRM: Large Reconstruction Model for Single Image to 3D, a research team from Adobe Research and Australian National Univerisity introduces an innovative Large Reconstruction Model (LRM). This groundbreaking model has the remarkable ability to predict a 3D model of an object from a single input image in a mere 5 seconds.

by Synced 2023-08-16 5

AI Computer Vision & Graphics Machine Learning & Data Science Research

MIT & Harvard’s Open-Source FAn System Enables Real-Time Any Objects Detection, Tracking, and Following

In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.

by Synced 2023-07-20 5

AI Computer Vision & Graphics Machine Learning & Data Science Research

Objaverse-XL: Unleashing 10M+ 3D Objects for Advanced 3D Vision

In a new paper Objaverse-XL: A Universe of 10M+ 3D Objects, a research team from Allen Institute for AI, University of Washington, Columbia University, Stability AI, California Institute of Technology and LAION join force to present Objaverse-XL, a large-scale, web-crawled dataset of 3D assets, which provides substantially richer variety and quality data that aims to boost the performance of state-of-the-art 3D models.

by Synced 2023-07-17 10

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind Proposes Novel Vision Transformer for Arbitrary Size & Resolution

In a new paper Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, a Google DeepMind research team further improves ViT with Native Resolution ViT (NaViT), which is able process input sequences of arbitrary resolutions and aspect ratios.

by Synced 2023-07-14 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Shanghai AI Lab, CUHK & Stanford U Extend Personalized Text-to-Image Diffusion Models Into Animation Generators Without Tuning

In a new paper AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning, a research team presents AnimateDiff, a general and practical framework that is able to generate animated images for any personalized text-to-image (T2I) model, without any extra training and model-specified tuning.

by Synced 2023-06-26 8

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind Unlocks Web-Scale Training for Open-World Detection

In a new paper Scaling Open-Vocabulary Object Detection, a DeepMind research team introduces OWLv2 model, an optimized architecture with improved training efficiency and applies and OWL-ST self-training recipe to the proposed OWLv2 to substantially improves detection performance, achieving state-of-the-art result on open-vocabulary detection task.

by Synced 2023-06-19 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind Claims Image Captioner Alone Is Surprisingly Powerful then Previous Believed, Competing with CLIP

In a new paper Image Captioners Are Scalable Vision Learners Too, a DeepMind research team presents CapPa, a image captioning based pretraining strategy that and can compete CLIP and exhibit favorable model and data scaling properties, verifying that a plain image captioning can be a competitive pretraining strategy for vision backbones.

by Synced 2023-05-10 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Georgia Tech’s ZipIt! Effectively Merges Vision Models Trained on Disjoint Tasks Without Additional Training

In the new paper ZipIt! Merging Models from Different Tasks Without Training, a Georgia Tech research team proposes ZipIt!, a general method that exploits redundant features to combine two or more models with the same architecture but trained on different tasks into one multi-task model without additional training.

by Synced 2023-04-24 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

Look Again, YOLO: Baidu’s RT-DETR Detection Transformer Achieves SOTA Results on Real-Time Object Detection

In the new paper DETRs Beat YOLOs on Real-Time Object Detection, a Baidu Inc. research team presents Real-Time Detection Transformer (RT-DETR), a real-time end-to-end object detector that leverages a hybrid encoder and novel IoU-aware query selection to address inference speed delay issues. RT-DETR outperforms YOLO object detectors in both accuracy and speed.

by Synced 2023-04-18 6

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft & Bath U’s SpectFormer Significantly Improves Vision Transformers via Frequency and Attention

In the new paper SpectFormer: Frequency and Attention Is What You Need in a Vision Transformer, a research team from Microsoft and the University of Bath proposes Spectformer, a novel transformer architecture that combines spectral and multi-headed attention layers to better capture appropriate feature representations and improve performance.

by Synced 2023-04-10 12

AI Computer Vision & Graphics Machine Learning & Data Science Research

UC Berkeley’s Instruct-NeRF2NeRF Edits 3D Scenes With Text Instructions

In the new paper Instruct-NeRF2NeRF: Editing 3D Scenes With Instructions, a UC Berkeley research team presents Instruct-NeRF2NeRF, an approach for editing 3D NeRF scenes through natural language text instructions. The proposed method can edit large-scale, real-world 3D scenes with improved ease of use and realism.

by Synced 2023-02-23 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

Oxford U Presents RealFusion: 360° Reconstructions of Any Object from a Single Image

In the new paper RealFusion: 360° Reconstruction of Any Object from a Single Image, an Oxford University research team leverages a diffusion model to generate 360° reconstructions of objects from a single image. Their RealFusion approach achieves state-of-the-art performance on monocular 3D reconstruction benchmarks.

by Synced 2023-02-07 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google & HUJI Present Dreamix: The First Diffusion Model for General Video Editing

In the new paper Dreamix: Video Diffusion Models Are General Video Editors, a team from Google Research and the Hebrew University of Jerusalem presents Dreamix, a novel approach that leverages a video diffusion model (VDM) to enable text-based motion and appearance video editing.

by Synced 2023-01-17 14

AI Computer Vision & Graphics Machine Learning & Data Science Research

CMU’s DensePose From WiFi: An Affordable, Accessible and Secure Approach to Human Sensing

In the new paper DensePose From WiFi, a Carnegie Mellon University research team proposes WiFi-based DensePose, a neural network architecture capable of estimating human dense pose using only WiFi signals in scenarios with occlusion and multiple people.

by Synced 2022-12-27 4

AI Computer Vision & Graphics Machine Learning & Data Science Research

OpenAI’s Point·E: Generating 3D Point Clouds From Complex Prompts in Minutes on a Single GPU

In the new paper Point-E: A System for Generating 3D Point Clouds from Complex Prompts, An OpenAI research team presents Point·E, a system for text-conditional synthesis of 3D point clouds that leverages diffusion models to generate diverse and complex 3D shapes conditioned on complex text prompts in minutes on a single GPU.

by Synced 2022-12-21 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meet Google’s FlexiViT: A Flexible Vision Transformer for All Patch Sizes

In the new paper FlexiViT: One Model for All Patch Sizes, a Google Research team presents FlexiViT, a flexible ViT that performs well across a wide range of patch sizes, matching or outperforming standard fixed-patch ViT performance with no extra costs.

by Synced 2022-12-18 10

AI Computer Vision & Graphics Machine Learning & Data Science Research

Maryland U & NYU’s Visual Exploration Reveals What Vision Transformers Learn

In the new paper What Do Vision Transformers Learn? A Visual Exploration, a research team from the University of Maryland and New York University uses large-scale feature visualizations from a wide range of vision transformers to gain insights into what they learn from images and how they differ from convolutional neural networks.

by Synced 2022-11-22 12

AI Computer Vision & Graphics Machine Learning & Data Science Research

Moody Moving Faces: NVIDIA’s SPACEx Delivers High-Quality Portrait Animation with Controllable Expression

In the new paper SPACEx: Speech-driven Portrait Animation with Controllable Expression, an NVIDIA research team introduces SPACEx — a speech-driven portrait animation framework that generates high-resolution and expressive facial videos with control over subject pose, emotion and expression intensity.

by Synced 2022-10-11 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Maximizing FLOPS Utilization: DeepMind & NYU Propose Efficiency Evaluations for Visual Pretraining Methods

In the new paper Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods, DeepMind and NYU Center for Neural Systems researchers introduce computational efficiency evaluation approaches designed to aid in the selection of optimal methods, datasets and models for pretraining visual tasks on a fixed FLOP budget.

by Synced 2022-08-31 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

Princeton U & Adobe’s 3D-FM GAN Enables Precise 3D-Controllable Face Manipulation

In the new paper 3D-FM GAN: Towards 3D-Controllable Face Manipulation, a team from Princeton University and Adobe Research presents 3D-FM GAN, a novel conditional GAN framework that enables precise 3D-controllable face manipulation with high photorealism and strong identity preservation without requiring any manual tuning or optimizations.

by Synced 2022-08-30 16

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Microsoft’s BEiT-3 Foundation Model: A ‘Big Convergence of Language, Vision, and Multimodal Pretraining’ That Achieves SOTA Results on Popular Benchmarks

In the new paper Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks, a Microsoft research team presents BEiT-3, a general-purpose state-of-the-art multimodal foundation model for both vision and vision-language tasks that advances the big convergence of backbone architectures, pretraining tasks, and model scaling.

by Synced 2022-08-24 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

Adobe and ANU’s Paint2Pix: Intent-Accurate Image Synthesis from Simple Brushstroke Inputs

In the new paper Paint2Pix: Interactive Painting based Progressive Image Synthesis and Editing, a research team from Adobe Research and Australian National University presents paint2pix, a novel model that learns to predict users’ intentions and produce photorealistic images from primitive and coarse human brushstroke inputs.

by Synced 2022-08-09 35

AI Computer Vision & Graphics Machine Learning & Data Science Research

NVIDIA’s Minimal Video Instance Segmentation Framework Achieves SOTA Performance Without Video-Based Training

In the new paper MinVIS: A Minimal Video Instance Segmentation Framework Without Video-based Training, an NVIDIA research team presents MinVIS, a minimal video instance segmentation framework that outperforms state-of-the-art VIS approaches without requiring video-based training.

by Synced 2022-08-03 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

IITM & UT Austin’s Generalizable NeRF Transformer Demonstrates Transformers’ Capabilities for Graphical Rendering

In the new paper Is Attention All NeRF Needs?, a research team from the Indian Institute of Technology Madras and the University of Texas at Austin proposes Generalizable NeRF Transformer (GNT), a pure and universal transformer-based architecture for efficient on-the-fly reconstruction of NeRFs. The work demonstrates that a pure attention mechanism suffices for learning a physically-grounded rendering process.

by Synced 2022-08-02 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Introduces the First Effective Face-Motion Deblurring System for Mobile Phones

In the new paper Face Deblurring Using Dual Camera Fusion on Mobile Phones, a Google team proposes a novel dual camera fusion technique that achieves robust face deblurring in diverse motion and lighting conditions.

by Synced 2022-07-12 3

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Academia Sinica’s YOLOv7 Outperforms All Object Detectors, Reduces Costs by 50%

In the new paper YOLOv7: Trainable Bag-Of-Freebies Sets New State-Of-The-Art for Real-Time Object Detectors, an Academia Sinica research team releases YOLOv7. This latest YOLO version introduces novel “extend” and “compound scaling” methods that effectively utilize parameters and computation; and surpasses all known real-time object detectors in speed and accuracy.

by Synced 2022-06-29 195

AI Computer Vision & Graphics Machine Learning & Data Science Research

NVIDIA’s Global Context ViT Achieves SOTA Performance on CV Tasks Without Expensive Computation

In the new paper Global Context Vision Transformers, an NVIDIA research team proposes the Global Context Vision Transformer, a novel yet simple hierarchical ViT architecture comprising global self-attention and token generation modules that enables the efficient modelling of both short- and long-range dependencies without costly compute operations while achieving SOTA results across various computer vision tasks.

by Synced 2022-06-15 11

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

Apple’s MobileOne Backbone Reduces Inference Time to Under One Millisecond on an iPhone12 and Reaches 75.9% Top-1 Accuracy on ImageNet

In the new paper An Improved One millisecond Mobile Backbone, an Apple research team presents MobileOne, a novel mobile backbone that cuts inference time to under one millisecond on an iPhone12 and reaches 75.9 percent top-1 accuracy on ImageNet.

by Synced 2022-06-06 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Snap & NEU’s EfficientFormer Models Push ViTs to MobileNet Speeds While Maintaining High Performance

In the new paper EfficientFormer: Vision Transformers at MobileNet, a research team from Snap Inc. and Northeastern University proposes EfficientFormer, a vision transformer that runs as fast as MobileNet while maintaining high performance.

by Synced 2022-06-02 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Brain’s UViM: A Unified Approach for Modelling Diverse Vision Tasks Without Modifications

In the new paper UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes, a Google Brain research team proposes UViM, a unified approach that leverages language modelling and discrete representation learning to enable the modelling of a wide range of computer vision tasks without task-specific modifications.

by Synced 2022-05-11 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft Azure Introduces i-Code: A General Framework That Enables Flexible Multimodal Representation Learning

In the new paper i-Code: An Integrative and Composable Multimodal Learning Framework, a Microsoft Azure Cognitive Services Research team presents i-Code, a self-supervised pretraining framework that enables the flexible integration of vision, speech, and language modalities and learns their vector representations in a unified manner.

by Synced 2022-05-10 4

AI Computer Vision & Graphics Machine Learning & Data Science Research

LSTM Is Back! A Deep Implementation of the Decades-old Architecture Challenges ViTs on Long Sequence Modelling

A research team from Rikkyo University and AnyTech Co., Ltd. examines the suitability of different inductive biases for computer vision and proposes Sequencer, an architectural alternative to ViTs that leverages long short-term memory (LSTM) rather than self-attention layers to achieve ViT-competitive performance on long sequence modelling.

by Synced 2022-04-25 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Baidu’s PP-Matting: Trimap-Free High-Accuracy Natural Image Matting

In the new paper PP-Matting: High-Accuracy Natural Image Matting, a Baidu research team proposes PP-Matting, a trimap-free architecture that combines a high-resolution detail branch and a semantic context branch to achieve state-of-the-art performance on natural image matting tasks.

by Synced 2022-04-20 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

UC Berkeley & Intel’s Photorealistic Denoising Method Boosts Video Quality on Moonless Nights

In the new paper Dancing Under the Stars: Video Denoising in Starlight, a research team from UC Berkeley and Intel Labs leverages a GAN-tuned, physics-based noise model to represent camera noise under low light conditions and trains a novel denoiser that, for the first time, achieves photorealistic video denoising in starlight.

by Synced 2022-02-24 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

DeepMind’s Upgraded Hierarchical Perceiver Is Faster, Scales to Larger Data Without Preprocessing, and Delivers Higher Resolution and Accuracy

DeepMind researchers propose Hierarchical Perceiver (HiP), a model that retains the original Perceiver’s ability to process arbitrary modalities but is faster, can scale up to even more inputs/outputs, reduces the need for input engineering, and improves both efficiency and accuracy on classical computer vision benchmarks.

by Synced 2022-02-23 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Tsinghua & NKU’s Visual Attention Network Combines the Advantages of Convolution and Self-Attention, Achieves SOTA Performance on CV Tasks

In the new paper Visual Attention Network, a research team from Tsinghua University and Nankai University introduces a novel large kernel attention (LKA) mechanism for an extremely simple and efficient Visual Attention Network (VAN) that significantly outperforms state-of-the-art vision transformers and convolutional neural networks on various computer vision tasks.