computer vision

by Synced 2025-01-06 48

Nvidia Intensifies Robot Push with New Humanoid Platform as Industry Giants Eye Lucrative Future

Nvidia will launch Jetson Thor for humanoid robots in H1 2025, entering a growing market where Google is also active. The robotics sector is projected for substantial growth. Nvidia offers integrated hardware and software solutions. Simultaneously, China’s rapidly developing domestic humanoid robot market presents emerging competition.

by Synced 2024-08-27 4

AI Machine Learning & Data Science Research

Meta’s Sapiens: Revolutionizing Human Pose, Segmentation, and Depth Estimation with Vision Transformers

In a new paper Sapiens: Foundation for Human Vision Models, a Meta research team introduces Sapiens, a suite of models designed to address four core human-centric vision tasks: 2D pose estimation, body-part segmentation, depth estimation, and surface normal prediction.

by Synced 2024-04-18 3

AI Machine Learning & Data Science Research

87% ImageNet Accuracy, 3.8ms Latency: Google’s MobileNetV4 Redefines On-Device Mobile Vision

A Google research team unveils the latest iteration of MobileNets: MobileNetV4 (MNv4). This cutting-edge model boasts an impressive 87% ImageNet-1K accuracy, coupled with an astonishingly low Pixel 8 EdgeTPU runtime of merely 3.8ms.

by Synced 2023-11-30 4

AI Machine Learning & Data Science Research

Spatial-Temporal Innovation: STLVQE Redefines Real-Time Video Enhancement for an Unmatched Viewing Experience

A paper titled “Online Video Quality Enhancement with Spatial-Temporal Look-up Tables” introduces a novel method, STLVQE. This research, conducted by a team from Tongji University and Microsoft Research Asia, pioneers the exploration of the online video quality enhancement problem and presents the first method achieving real-time processing speed.

by Synced 2023-08-16 5

AI Computer Vision & Graphics Machine Learning & Data Science Research

MIT & Harvard’s Open-Source FAn System Enables Real-Time Any Objects Detection, Tracking, and Following

In a new paper Follow Anything: Open-set detection, tracking, and following in real-time, a research team from MIT and Harvard University presents the follow anything system (FAn), an open-set real-time any object following framework that can detect, segment, track, and follow any object, and is able to adapt to new objects using text, images, or click queries.

by Synced 2023-07-20 5

AI Computer Vision & Graphics Machine Learning & Data Science Research

Objaverse-XL: Unleashing 10M+ 3D Objects for Advanced 3D Vision

In a new paper Objaverse-XL: A Universe of 10M+ 3D Objects, a research team from Allen Institute for AI, University of Washington, Columbia University, Stability AI, California Institute of Technology and LAION join force to present Objaverse-XL, a large-scale, web-crawled dataset of 3D assets, which provides substantially richer variety and quality data that aims to boost the performance of state-of-the-art 3D models.

by Synced 2023-01-17 15

AI Computer Vision & Graphics Machine Learning & Data Science Research

CMU’s DensePose From WiFi: An Affordable, Accessible and Secure Approach to Human Sensing

In the new paper DensePose From WiFi, a Carnegie Mellon University research team proposes WiFi-based DensePose, a neural network architecture capable of estimating human dense pose using only WiFi signals in scenarios with occlusion and multiple people.

by Synced 2022-12-21 0

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meet Google’s FlexiViT: A Flexible Vision Transformer for All Patch Sizes

In the new paper FlexiViT: One Model for All Patch Sizes, a Google Research team presents FlexiViT, a flexible ViT that performs well across a wide range of patch sizes, matching or outperforming standard fixed-patch ViT performance with no extra costs.

by Synced 2022-06-02 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Brain’s UViM: A Unified Approach for Modelling Diverse Vision Tasks Without Modifications

In the new paper UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes, a Google Brain research team proposes UViM, a unified approach that leverages language modelling and discrete representation learning to enable the modelling of a wide range of computer vision tasks without task-specific modifications.

by Synced 2022-01-24 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meta AI’s OMNIVORE: A Modality-Agnostic Single Vision Model With Cross-Modal Generalization

A Meta AI research team presents OMNIVORE, a single vision model for various visual modalities that can perform cross-modal generalization and achieves performance at par or better than traditional modality-specific models of the same size.

by Synced 2021-11-29 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Microsoft’s ‘Florence’ General-Purpose Foundation Model Achieves SOTA Results on Dozens of CV Benchmarks

In the paper A New Foundation Model for Computer Vision, a Microsoft research team proposes Florence, a novel foundation model for computer vision that significantly outperforms previous large-scale pretraining approaches and achieves new SOTA results across a wide range of visual and visual-linguistic benchmarks.

by Synced 2021-11-15 2

AI Company Computer Vision & Graphics Global News Research US & Canada

Google’s Pet Portraits Will Find Art Doubles for Your Pets

Google recently has launched an adorable new feature for its Arts and Culture app named Pet Portraits that can compare your pet photo to artworks from museums worldwide and find their art doubles.

by Synced 2021-11-15 2

AI Computer Vision & Graphics Machine Learning & Data Science Popular Research

A Leap Forward in Computer Vision: Facebook AI Says Masked Autoencoders Are Scalable Vision Learners

In a new paper, a Facebook AI team advances autoencoding methods to the computer vision field and shows that masked autoencoders (MAE) are scalable self-supervised learners.

by Synced 2021-10-28 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Softmax-free Vision Transformer With Linear Complexity: Achieving a Superior Accuracy/Complexity Trade-off

Researchers from Fudan University, University of Surrey and Huawei Noah’s Ark Lab identify the limitations of quadratic complexity for vision transformers (ViTs) as rooted in keeping the softmax self-attention during approximations. The team proposes the first softmax-free transformer (SOFT), which reduces the self-attention computation to linear complexity, achieving a superior trade-off between accuracy and complexity.

by Synced 2021-10-27 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Open-Sources SCENIC: A JAX Library for Rapid Computer Vision Model Prototyping and Cutting-Edge Research

A research team from Google Brain and Google Research introduces SCENIC, an open-source JAX library for fast and extensible computer vision research and beyond. JAX currently supports implementations of state-of-the-art vision models such as ViT, DETR and MLP Mixer, and more open-sourced cutting-edge projects will be added in the near future.

by Synced 2021-10-13 0

AI Community Computer Vision & Graphics Global Global News Research

ICCV 2021 Best Papers Announced

On October 13, ICCV 2021 announced its Best Paper Awards, honourable mentions, and Best Student Paper.

by Synced 2021-10-06 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google Significantly Improves Visual Representations by Adding Explicit Information Compression

A Google Research team presents compressive variants of SimCLR and BYOL that yield better and more robust visual representations.

by Synced 2021-10-04 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

Debiasing Image Datasets: Oxford University Presents PASS, an ImageNet Replacement for Self-Supervised Pretraining

An Oxford University research team presents PASS, a large (1.28M) image collection excluding humans, created as an ImageNet replacement for self-supervised pretraining without technical, ethical or legal issues.

by Synced 2021-08-23 3

AI Computer Vision & Graphics Machine Learning & Data Science Research

UCSD & Microsoft Improve Image Recognition With Extremely Low FLOPs

A research team from University of California San Diego and Microsoft proposes Micro-Factorized Convolution (MF-Conv), a novel approach that can deal with extremely low computational costs (4M–21M FLOPs) and achieves significant performance gains over state of the art models in the low FLOP regime.

by Synced 2021-07-08 1

AI Computer Vision & Graphics Machine Learning & Data Science Research

Meet CLIPDraw: Text-to-Drawing Synthesis via Language-Image Encoders Without Model Training

A research team from Cross Compass Ltd, Massachusetts Institute of Technology, Tokyo Institute of Technology and University of Tokyo presents CLIPDraw, an algorithm that synthesizes drawings based on natural language input without the need for any training.

by Synced 2021-07-01 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Video Swin Transformer Improves Speed-Accuracy Trade-offs, Achieves SOTA Results on Video Recognition Benchmarks

A research team from Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, and Tsinghua University takes advantage of the inherent spatiotemporal locality of videos to present a pure-transformer backbone architecture for video recognition that leads to a better speed-accuracy trade-off.

by Synced 2021-06-02 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Google & Rutgers’ Aggregating Nested Transformers Yield Better Accuracy, Data Efficiency and Convergence

A research team from Google Cloud AI, Google Research and Rutgers University simplifies vision transformers’ complex design, proposing nested transformers (NesT) that simply stack basic transformer layers to process non-overlapping image blocks individually. The approach achieves superior ImageNet classification accuracy and improves model training efficiency.

by Synced 2021-04-30 2

AI Computer Vision & Graphics Machine Learning & Data Science Research

Yann LeCun Team’s Novel End-to-End Modulated Detector Captures Visual Concepts in Free-Form Text

A research team from NYU and Facebook proposes MDETR, an end-to-end modulated detector that identifies objects in images conditioned on a raw text query and is able to capture a long tail of visual concepts expressed in free-form text.

by Synced 2021-04-01 2

AI Computer Vision & Graphics Research

Google Research’s SOTA GNN ‘Reasons’ Interactions over Time to Boost Video Understanding

A research team from Google Research propose a message-passing graph neural network that can explicitly model spatio-temporal relations, use either implicitly or explicitly representations of objects, and generalize previous structured models for video understanding.

by Synced 2021-03-31 2

AI Computer Vision & Graphics Research

Google Research’s Novel High Efficient Neural Volumetric Representation Enables Real-Time View Synthesis

A Google Research team accelerates Neural Radiance Fields’ rendering procedure for view-synthesis tasks, enabling it to work in real-time while retaining its ability to represent fine geometric details and convincing view-dependent effects.

by Synced 2021-03-15 3

AI Computer Vision & Graphics Research

Yann LeCun Team’s Barlow Twins Method Boosts SSL in Image Representation via Redundancy Reduction

Yann LeCun and a team of researchers propose Barlow Twins, a method that learns self-supervised representations through a joint embedding of distorted images, with an objective function that can make the embedding vectors almost identical while reducing redundancy between their components.

by Synced 2021-03-08 1

AI Computer Vision & Graphics Research

Meet Transformer in Transformer: A Visual Transformer That Captures Structural Information From Images

A team from Huawei, ISCAS and UCAS propose the novel Transformer-iN-Transformer (TNT) for modelling both patch-level and pixel-level representations.

by Synced 2021-03-02 30

AI Computer Vision & Graphics Research Share My Research

Fast Video Object Segmentation using the Global Context Module

A novel module that effectively and efficiently propagates information through an arbitrarily long video, with constant complexity w.r.t. number of frames and linear complexity w.r.t. resolution.

by Synced 2021-02-17 3

AI Computer Vision & Graphics Research

UC Berkeley & Google’s BoTNet Applies Self-Attention to CV Bottlenecks

Researchers from UC Berkeley and Google Research have introduced BoTNet, a “conceptually simple yet powerful” backbone architecture that boosts performance on computer vision (CV) tasks such as image classification, object detection and instance segmentation.

by Synced 2021-01-12 2

Computer Vision & Graphics Machine Learning & Data Science Research

VisualVoice Uses Facial Appearance to Boost SOTA in Speech Separation

Recent AI research on speech separation has explored ways to associate lip motions in videos with audio, but this approach suffers when speakers’ lips are occluded, which they often are in busy multi-speaker environments.

by Synced 2021-01-11 3

Computer Vision & Graphics Machine Learning & Data Science Research

StyleGAN-Based VOGUE Is a SOTA AI-Powered Fitting Room

VOGUE, an AI-powered optimization method that deforms garments according to a given body shape while preserving pattern and material details to deliver state-of-the-art photorealistic, high-resolution try-on images.

by Synced 2021-01-07 3

Computer Vision & Graphics Machine Learning & Data Science Research

Columbia University Model Learns Predictability From Unlabelled Video

Researchers propose a novel framework and hierarchical predictive model that learns to identify what is predictable from unlabelled video.

by Synced 2021-01-05 6

Computer Vision & Graphics Machine Learning & Data Science Research

‘Neural Body’ Reconstructs Dynamic Human Bodies From Sparse Camera Views

The novel approach tackles dynamic 3D human-body synthesis from a sparse set of camera views, bettering existing methods on key metrics by significant margins.

by Synced 2021-01-04 4

Computer Vision & Graphics Machine Learning & Data Science Research

PGDrive Simulator Generates Unlimited Diverse Driving Environments

Researchers proposed PGDrive, a driving simulator designed to evaluate and improve end-to-end driving agents’ generalization abilities.

by Synced 2020-12-23 9

Computer Vision & Graphics Machine Learning & Data Science Popular Research

2020 in Review: 10 AI-Powered Art Projects

As part of our year-end series, Synced highlights 10 AI-powered art projects that inspired and entertained us in 2020.

by Synced 2020-12-21 2

Computer Vision & Graphics Machine Learning & Data Science Popular Research

Heidelberg University Researchers Combine CNNs and Transformers to Synthesize High-Resolution Images

Researchers combine the effectiveness of the inductive bias in CNNs with the expressivity of transformers to model and synthesize high resolution images.

by Synced 2020-12-18 15

Computer Vision & Graphics Machine Learning & Data Science Popular Research

‘We Can Do It’ – Geoffrey Hinton and UBC, UT, Google & UVic Team Propose Unsupervised Capsule Architecture for 3D Point Clouds

In the new paper Canonical Capsules: Unsupervised Capsules in Canonical Pose, Turing Award Honoree Dr. Geoffrey Hinton and a team of researchers propose an architecture for unsupervised learning with 3D point clouds based on capsules.

by Synced 2020-12-16 1

Computer Vision & Graphics Machine Learning & Data Science Research

Facebook AI & University of Notre Dame Propose Multi-Face Pose Estimation Without Face Detection

University of Notre Dame and Facebook AI research propose Img2pose, real-time 6DoF 3D face pose estimation without face detection or landmark localization.

by Synced 2020-12-15 5

Machine Learning & Data Science Popular Research

NeurIPS 2020 | Teaching Transformers New Tricks

This year, 22 Transformer-related research papers were accepted by NeurIPS, the world’s most prestigious machine learning conference. Synced has selected ten of these works to showcase the latest Transformer trends.

by Synced 2020-12-09 5

Computer Vision & Graphics Machine Learning & Data Science Popular Research

This Pizza Does Not Exist: StyleGAN2-Based Model Generates Photo-Realistic Pizza Images

The new AI-powered Multi-Ingredient Pizza Generator (MPG) can deliver all these mouth-watering pies and many more.