IEEE Region 8 Today - Alphabet's DeepMind Makes a Key Advance in Computer Vision

Researchers at Alphabet’s DeepMind today described a method that they say can construct a three-dimensional layout from just a handful of two-dimensional snapshots.

So far the method, based on deep neural networks, has been confined to virtual environments, they write in Science magazine. Natural environments are still too hard for current algorithms and hardware to handle.

The article doesn’t speculate on commercial applications, and the authors weren’t available for interview. That gives me license to speculate: The new method might be useful for any surveillance system that has to reconstruct a crime from a few snapshots. Self-driving cars and household robots would also seem likely beneficiaries of the technique.

What’s key is that the system learns a lot from very little—in these experiments it never got more than five snapshots to work with. And, the researchers write, it does the job by observation alone, without anyone having to first label the objects and “without any prior specification of the laws of perspective, occlusion, or lighting.”

The researchers use two neural networks, a representation network and a generation network. This would seem to correspond to the “generator” and the “discriminator” networks described in DeepMind’s 2016 paper on AlphaGo, the Go-playing machine.

The representation network reduces perceived objects to a very simplified abstraction, leaving it to the generation network to fill in the details. The researchers give, as an example, a robot arm that can be abstracted as a simple articulation, with several joints, which is then constructed using data on form, color and so forth.

By manipulating the abstraction first and filling in details later, the system can work much faster than rendering systems that attempt to manipulate huge sets of three-dimensionally related points. The researchers add that the division of labor also makes the method much better at representing soft objects, like animals and vegetables.

Alphabet’s DeepMind, based in London, is still best known for its out-of-the-blue advance in cracking the game of Go. That problem, long seen as a Holy Grail of computing, yielded in 2016 when the company’s AlphaGo program beat one of the world’s best players. Last year DeepMind built a second machine that utterly crushed the first one after mere weeks of self-teaching.

Alphabet, the umbrella corporation that owns DeepMind, gets almost all its revenue from Google, and it has been pushing to generate new streams of income from its other units. DeepMind did discover a way to save substantial energy in Google’s server farms, and earlier this year a method for improving a text-to-speech product went to market.

But if the company can extend its achievement in machine vision to real-life optical feeds, it just might bring the wave of applications Alphabet is hoping for. A good deal of what’s called thinking can be put down to sheer perception, and perception itself is harder for computers than is generally understood.

There is a standard psychological test of the human ability to mentally rotate objects that uses geometric figures known as Shepard-Metzler Objects. In a video supplied by DeepMind, the neural nets classify these objects as one of two kinds: Either they are versions of a template that’s been rotated in one or more planes or they are mirror images of that template. The DeepMind networks do the job well. Not all people can say the same thing.

image: deepmind GQN agent performing the Shepard Metzler object rotation task.

It’s the human ability to do this sort of thing, as well as to figure out what must lie behind a barrier to vision—like a lock of hair or a branch of a tree—that explains why we can navigate complex environments so well. A human being knows, from simple experience of the world, that a person who is in the sitting position is almost always to be found on a chair (and only very rarely on thin air, as circus mimes might do).

Such common-sense knowledge has been unheard-of in robots. So far.

Source: IEEE Spectrum

Recent Posts

How do I play Free Slot Games

Unveiling Patent Piracy: The Case of Samsung’s Powerbot Vacuum Cleaner

Understanding the IEEE National Society Agreements Program

IEEE Egypt Entrepreneurship Week 2023

IEEE Oman SYP Congress 2024

Call for Application: IEEE STEM Champion Program

Call for nominations

Call for nominations – IEEE Region 8 Awards

IEEE RTSI 2024 Conference at Politecnico di Milano – Site of Lecco!

IEEE Fellow program in Region 8

IEEE Rising Star and United Kingdom Member Shines Bright at 2023 Volunteer Leadership Training

Championing Green Innovation: IEEE’s Webinar Series Sparks the Climate Tech Revolution

Celebration of IEEE MPEG Milestone

Don’t Forget to Vote in the IEEE Annual Election

Celebrate IEEE Day 2023: Leveraging Technology for a Better Tomorrow

Article

Latest Announcements

Keeping IEEE Student Members Connected Locally: University Branches Now Accessible on IEEE Collabratec

Call for IEEE Collabratec Ambassadors – Europe, Middle East, and Africa

Social Media

About Us

Disclaimer

Recent Posts

How do I play Free Slot Games

Unveiling Patent Piracy: The Case of Samsung’s Powerbot Vacuum Cleaner

Understanding the IEEE National Society Agreements Program

IEEE Egypt Entrepreneurship Week 2023

IEEE Oman SYP Congress 2024

Recent Posts

How do I play Free Slot Games

Unveiling Patent Piracy: The Case of Samsung’s Powerbot Vacuum Cleaner

Understanding the IEEE National Society Agreements Program

IEEE Egypt Entrepreneurship Week 2023

IEEE Oman SYP Congress 2024

Call for Application: IEEE STEM Champion Program

Call for nominations

Call for nominations – IEEE Region 8 Awards

IEEE RTSI 2024 Conference at Politecnico di Milano – Site of Lecco!

IEEE Fellow program in Region 8

IEEE Rising Star and United Kingdom Member Shines Bright at 2023 Volunteer Leadership Training

Championing Green Innovation: IEEE’s Webinar Series Sparks the Climate Tech Revolution

Celebration of IEEE MPEG Milestone

Don’t Forget to Vote in the IEEE Annual Election

Celebrate IEEE Day 2023: Leveraging Technology for a Better Tomorrow

Article

Alphabet’s DeepMind Makes a Key Advance in Computer Vision

Related posts

Status Update for the IEEE Portugal Section Blockchain Working Group

Tech Startup Thorn Takes Aim at Child Sexual Exploitation and Trafficking

Stay Updated on IEEE’s Public Policy Efforts in Europe

IEEE launches task forces for high-speed electronic interfaces, 400G over multimode fiber standards development

Tomorrow’s Doctors Must Be Engineers Too

Registration Opens for the 7th IEEE PES/IAS PowerAfrica Conference Virtual Edition

Latest Announcements

Keeping IEEE Student Members Connected Locally: University Branches Now Accessible on IEEE Collabratec

Call for IEEE Collabratec Ambassadors – Europe, Middle East, and Africa

Tag Cloud

Social Media

How do I play Free Slot Games

Unveiling Patent Piracy: The Case of Samsung’s Powerbot Vacuum Cleaner

Understanding the IEEE National Society Agreements Program

IEEE Egypt Entrepreneurship Week 2023

IEEE Oman SYP Congress 2024