Introducing Qian Yan, Baidu’s New Plan to Build 100 Chinese NLP Datasets in Three Years
Qian Yan, a plan to jointly develop the world’s largest Chinese natural language processing database.
AI Technology & Industry Review
Qian Yan, a plan to jointly develop the world’s largest Chinese natural language processing database.
Researchers from the University of Bristol, the University of Toronto and the University of Catania explain how they created Epic-Kitchens and introduce new baselines that emphasize the multimodal nature of the largest such egocentric video benchmark.
The study introduces an Event Recognition in Aerial video (ERA) dataset comprising 2,866 aerial videos collected from YouTube and annotated with labels from 25 different classes corresponding to an event that can be seen unfolding over a period of five seconds.
AI systems are already helping farmers with soil analysis, planting, animal husbandry, water conservation and more.
In the conclusion to our year-end series, Synced spotlights ten datasets that were open-sourced in 2019.
An R&D team from Japanese multinational conglomerate Hitachi has proposed a similar, sound-based technique for identifying malfunctioning industrial robots on a factory floor.
Synced Global AI Weekly September 29th
In an attempt to unlock the science behind hugs, a team of researchers from Arizona State University slapped wearable sensors on 33 humans and collected data on their more than 350 huggings with the humanoid remote-controlled robot “Baxter.”
At last week’s Re•Work AI in Finance Conference in New York, researchers and engineers from banks and academia alike shared their thoughts on current AI research and applications in the finance world.
A team of researchers from the Norwegian University of Science and Technology recently proposed a new architecture that can anonymize faces in images automatically while the original data distribution remains uninterrupted.
In collaboration with Partnership on AI, Microsoft, and academics from top universities, Facebook today announced the Deepfake Detection Challenge (DFDC) with the aim of finding innovative deepfake detection solutions to help the media industry spot videos that have been morphed by AI models.
Synced Global AI Weekly August 25th
Researchers from Two Six Labs and Stanford Schnitzer Lab have developed a deep learning system designed to explore the workings of the mouse mind and predict behavior by processing brain-based electrical activity with a neural network.
In a bid to combine anime and cartoon culture with machine learning, a research team from China’s leading video streaming service iQIYI has introduced a novel large unconstrained cartoon dataset they call “iCartoonFace.”
This paper presents a new large-scale multilingual video description dataset, covering over 41,250 videos and 825,000 captions in both Chinese and English.
Do you dream of Asuna Yuuki? Do you long to escape to a fantasy world with a beautiful anime partner? If so there’s a new artificial intelligence system just for you — the “Waifu Vending Machine” can create a highly customized anime companion in minutes.
Facebook AI research team show how they trained a large convolutional network to predict hashtags on some 3.5 billion social media images. The research returned a state-of-the-art top-1 accuracy result of 85.4 percent on ImageNet.
Researchers from New York University and Facebook AI Research recently added 50,000 test samples to the dataset. Facebook Chief AI Scientist Yann LeCun, who co-developed the MNIST, tweeted his approval: “MNIST reborn, restored and expanded.”
Alphabet’s autonomous driving unit Waymo surprised many by releasing a new high-quality multimodal sensor dataset for autonomous driving. The *Waymo Open Dataset *was introduced at top AI conference Computer Vision and Pattern Recognition (CVPR) 2019 in Long Beach, California.
Building large datasets is a time-consuming and labor-intensive task which challenges entities with limited budgets. There are hundreds of open visual datasets out there, but searching across them and their millions of entries is not a simple task.
Researchers from the Sri Lanka’s University of Moratuwa and the University of Sydney in Australia have proposed a technique for generating new handwritten character training samples from existing samples.
Google today announced the release of a new and improved landmark recognition dataset. Google-Landmarks-v2 includes over 5 million images, doubling the number in the landmark recognition dataset the tech giant released last year. The dataset now covers more than 200 thousand different landmarks, a seven times increase over the first version.
If we ask one of today’s AI-powered voice assistants like Alexa and Siri to tell a joke, it might very well come up with something that puts a smile on our face. If however we then asked “Why do you think that joke is funny?” the bot would be stuck for a response. AI researchers want to change that.
The Stanford ML Group led by Andrew Ng has released its MRNet Dataset, which contains more than 1000 annotated knee MRI Scans; and announced an associated public model competition.
Synced Global AI Weekly April 14th
Chinese technology giant Tencent has open-sourced its face detection algorithm DSFD (Dual Shot Face Detector). The related paper DSFD: Dual Shot Face Detector achieves state-of-the-art performance on WIDER FACE and FDDB dataset benchmarks, and has been accepted by top computer vision conference CVPR 2019.
A collaboration between researchers from China’s Beihang University and Microsoft Research Asia has produced TableBank, a new image-based dataset for table detection and recognition built with novel weak supervision from Word and Latex documents on the Internet.
Chinese AI unicorn Megvii Technology has proposed a new single-path, one-shot NAS design approach which makes various applications more convenient and achieves start-of-the-art performance on the large dataset ImageNet.
Synced Global AI Weekly April 7th
“CityFlow,” a city-scale traffic camera dataset paper from NVIDIA researchers, has been accepted by CVPR 2019 as an Oral Session, earning two “Strong Accepts” and one “Accept” from reviewers.
Chinese AI company iFLYTEK has bested the SQuAD2.0 challenge once again. The model “BERT + DAE + AoA” submitted by the joint iFLYTEK Research and HIT (Harbin Institute of Technology) laboratory HFL outperformed humans on both EM (exact match) and F1-score (fuzzy match) indexes to top the SQuAD2.0 leaderboard.
There are some things that some people just don’t want showing up on their websites, and this has spawned a wide range of activities and technologies that fall under “content review.”
The San Francisco-based AI non-profit however has raised eyebrows in the research community with its unusual decision to not release the language model’s code and training dataset. In a statement sent to Synced, OpenAI explained the choice was made to prevent malicious use: “it’s clear that the ability to generate synthetic text that is conditioned on specific subjects has the potential for significant abuse.”
Uber has unveiled Ludwig, a new TensorFlow-based toolkit that enables users to train and test deep learning models without writing any code. The toolkit will help non-experts understand models and accelerate their iterative development by simplifying the prototyping process and data processing.
CUHK researchers recently teamed up with Chinese AI giant SenseTime to develop a greatly improved iteration in DeepFashion2, a large-scale benchmark with comprehensive tasks and annotations of fashion image understanding.
In December Synced reported on a hyperrealistic face generator developed by US chip giant NVIDIA. The GAN-based model performs so well that most people can’t distinguish the faces it generates from real photos. This week NVIDIA announced that it is open-sourcing the nifty tool, which it has dubbed “StyleGAN”.
To help keep our readers abreast of the trend, Synced has identified five high-quality open-source datasets that were released this month (January 2019) and that AI researchers and engineers might find useful in their work.
The proliferation of social media in our daily lives has profoundly changed the way we work and play with others. It has also created an entirely new job: thousands of people worldwide now work for Google, Facebook and Twitter “Community Operations Teams.” Whenever a user flags content as offensive, it’s sent to these guys for review.
Facebook AI Research (FAIR) and the New York University (NYU) School of Medicine’s Center for Advanced Imaging Innovation and Research (CAI2R) announced today they are sharing a standardized set of AI tools and baselines and MRI data as part of their joint research project fastMRI.
If you’ve ever wondered whether Dota 2 or League of Legends is the most popular multiplayer online battle arena game, or how long you’d need to spend on a treadmill to burn off that party size bag of chips you just ate, you know that you can probably find the answer by accessing a couple of relevant information sources and then applying what seems like a natural and straightforward reasoning process.