{"id":4650,"date":"2024-09-26T13:12:58","date_gmt":"2024-09-26T17:12:58","guid":{"rendered":"https:\/\/blog.daed.com\/?p=4650"},"modified":"2024-10-01T15:08:59","modified_gmt":"2024-10-01T19:08:59","slug":"how-machine-vision-learns-depth-perception-from-humans","status":"publish","type":"post","link":"https:\/\/blog.daed.com\/?p=4650","title":{"rendered":"How Machine Vision Learns Depth Perception From Humans"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>At Daedalus, machine vision is more than just a fascinating technological advancement\u2014it plays a crucial role in the solutions we research, design, and engineer. By integrating machine vision systems into our projects, we&#8217;re able to enhance precision, automation, and data analysis for projects across various industries. This allows us to create smarter, more efficient products that solve real-world problems. Our commitment to staying at the forefront of such technologies ensures that we deliver innovative and cutting-edge solutions to our clients.<\/p>\n<p>Machine vision is a subfield of computer vision that enables computers to interpret and understand visual information, much like human eyes and brains do. It involves using cameras, sensors, and software to detect, analyze, and process visual data, and it plays a vital role in AI, robotics, and automation.<\/p>\n<p>One of the core challenges in machine vision is enabling computers to perceive depth\u2014the distance between objects in their view. Understanding how humans perceive depth helps us build similar systems for computers, which leads us to explore stereo vision technology, a method inspired by our own biological depth perception.<\/p>\n<p>With that said, let\u2019s first go back to where machine vision got its inspiration: humans, and our eyes!<\/p>\n<p>&nbsp;<\/p>\n<h2>How do humans perceive distance?<\/h2>\n<p>For people, estimating the distance of objects is so automatic that we almost never think of it in our daily lives. So what goes on in our eyes and in our brains when we do this essential task?<\/p>\n<figure id=\"attachment_4656\" aria-describedby=\"caption-attachment-4656\" style=\"width: 306px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-4656\" src=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/Descartes_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392.jpg\" alt=\"Descartes' hypothesis for how humans perceived depth\" width=\"306\" height=\"425\" srcset=\"https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/Descartes_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392.jpg 861w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/Descartes_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392-216x300.jpg 216w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/Descartes_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392-737x1024.jpg 737w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/Descartes_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392-768x1068.jpg 768w\" sizes=\"auto, (max-width: 306px) 100vw, 306px\" \/><figcaption id=\"caption-attachment-4656\" class=\"wp-caption-text\">Image from Wikimedia Commons: https:\/\/commons.wikimedia.org\/wiki\/File:Descartes;_Coordination_of_muscle_and_visual_mechanisms._Wellcome_L0002392.jpg<\/figcaption><\/figure>\n<p>This has been a topic of interest for scientists and philosophers going back millennia. Descartes, a philosopher and mathematician from the 1600s known for inventing the Cartesian x-y plane, believed that when people look at an object in their field of view, they can figure out the distance of the object \u201cby the relation which the two eyes have to each other.\u201d That is, by using the angle that the eyes form as they move inward or outward to see nearer or farther objects, we unconsciously, \u201cas though by a natural geometry,\u201d estimate the distance of objects from us (Dioptrique 140-41).<\/p>\n<p>This idea from hundreds of years ago that it\u2019s the combination of information from both eyes is intuitive today. If you cover one of your eyes, you\u2019ll generally find that it\u2019s harder to perceive depth.\u00a0But where does the actual calculation take place?<\/p>\n<p>Descartes definitely didn\u2019t have the scientific resources we have today to give a hard answer, but he thought that there must be some sort of sensory organ that receives images from both eyes, and that this organ unconsciously does this distance estimation. He believed this to be a part of the brain called the pineal gland, which, since he thought it was so important for processing sensory information, must even be the \u201cprincipal seat of the soul, and the place in which all our thoughts are formed.\u201d<\/p>\n<p>Modern neuroscience has debunked this particular model of perception\u2014now we know that visual information is processed at a part of the brain called the occipital lobe, and is actually divided into left and right halves.<\/p>\n<p>Despite his theory being off, you\u2019ll see next how Descartes\u2019 model is closely related to computational depth perception.<\/p>\n<p>&nbsp;<\/p>\n<h2>How do computers perceive distance?<\/h2>\n<p>Computer scientists and engineers have taken inspiration from centuries of research into the human vision system to design computer vision systems.<\/p>\n<p>One of the main strategies for computational depth perception is remarkably similar to Descartes\u2019 model. Instead of two eyes connected to a gland of the brain that combines the information, processes, and forms a guess about what\u2019s happening, scientists and engineers use two cameras connected to a computer. The cameras serve as the eyes, and the computer as the brain. These are called stereo vision systems.<\/p>\n<p>Stereo vision-based depth algorithms rely on the left and right images from the stereo cameras as well as information about the cameras themselves and their positions relative to each other to determine depth. Algorithms of this type have been researched for decades and are called stereo depth algorithms.<\/p>\n<p>&nbsp;<\/p>\n<h2>How Stereo Depth Algorithms create \u201cDepth Maps\u201d<\/h2>\n<p>Stereo depth algorithms take in two images from horizontally aligned cameras (stereo cameras) and output what is called a \u201cdepth map,\u201d which is a black-and-white image where each pixel value represents how far the object at that pixel is from the cameras.<\/p>\n<table class=\" aligncenter\" style=\"border-collapse: collapse; width: 77.382%; height: 537px;\">\n<tbody>\n<tr style=\"height: 18px;\">\n<td style=\"width: 37.9423%; text-align: center; height: 18px;\"><em>Left Camera<\/em><\/td>\n<td style=\"width: 37.5412%; text-align: center; height: 18px;\"><em>Right Camera<\/em><\/td>\n<\/tr>\n<tr style=\"height: 258px;\">\n<td style=\"width: 37.9423%; height: 258px;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4652\" src=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp1.png\" alt=\"\" width=\"324\" height=\"258\" srcset=\"https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp1.png 1390w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp1-300x240.png 300w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp1-1024x818.png 1024w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp1-768x613.png 768w\" sizes=\"auto, (max-width: 324px) 100vw, 324px\" \/><\/td>\n<td style=\"width: 37.5412%; height: 258px;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4653\" src=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp5.png\" alt=\"\" width=\"324\" height=\"259\" srcset=\"https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp5.png 1390w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp5-300x240.png 300w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp5-1024x818.png 1024w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/disp5-768x613.png 768w\" sizes=\"auto, (max-width: 324px) 100vw, 324px\" \/><\/td>\n<\/tr>\n<tr style=\"height: 261px;\">\n<td style=\"width: 37.9423%; height: 261px;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4654\" src=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view1.png\" alt=\"\" width=\"346\" height=\"277\" srcset=\"https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view1.png 1390w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view1-300x240.png 300w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view1-1024x818.png 1024w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view1-768x613.png 768w\" sizes=\"auto, (max-width: 346px) 100vw, 346px\" \/><\/td>\n<td style=\"width: 37.5412%; height: 261px;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-4655\" src=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view5.png\" alt=\"\" width=\"322\" height=\"257\" srcset=\"https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view5.png 1390w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view5-300x240.png 300w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view5-1024x818.png 1024w, https:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/view5-768x613.png 768w\" sizes=\"auto, (max-width: 322px) 100vw, 322px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p style=\"padding-left: 40px;\"><span style=\"font-size: 8pt;\"><em>Images taken from\u00a0 H. Hirschm\u00fcller and D. Scharstein. <a href=\"http:\/\/www.cs.middlebury.edu\/~schar\/papers\/evalCosts_cvpr07.pdf\">Evaluation of cost functions for stereo matching<\/a>.<\/em><\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"font-size: 8pt;\"><em>In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), Minneapolis, MN, June 2007. <a href=\"https:\/\/vision.middlebury.edu\/stereo\/data\/\">https:\/\/vision.middlebury.edu\/stereo\/data\/<\/a>\u00a0<\/em><\/span><\/p>\n<p>&nbsp;<\/p>\n<p><strong>The steps to creating a &#8220;depth map&#8221; are as follows:<\/strong><\/p>\n<ol>\n<li aria-level=\"1\"><strong>Rectification of images:<\/strong> Rectification is the process of undistorting images and aligning them so that objects seen by both cameras are on the same horizontal line.<\/li>\n<li aria-level=\"1\"><strong>Computation of matching pixels:<\/strong> Once the images are aligned, the next step is to find out matching pixels between the left and right images. That is, we want to find which pixel in the right image corresponds to a pixel in the left image, for every pixel in the left image. For example, if I have left and right images of a dog, and I have a pixel in the left image that corresponds to the end of the dog\u2019s tail, then I\u2019ll have to find the pixel in the right image that corresponds to the edge of the dog\u2019s tail. Stereo vision algorithms use a variety of techniques to compute matching pixels.<\/li>\n<li aria-level=\"1\"><strong>Calculation of disparity:<\/strong> Once we know the matching pixels, we obtain the disparity by subtracting the x-coordinate of the position of the pixel in the right image from the x-coordinate of the position of the pixel in the left image. The result will be an array (which can be visualized as a black-and-white image) the size of our image(s) that contains the pixel-wise disparity between the left and right images. This array is called a disparity map.<\/li>\n<li aria-level=\"1\"><strong>Filtering or fusing with other inputs (optional):<\/strong> Post-processing filters can be applied, and\/or the disparity map can be fused with other inputs such as IR projectors to enhance the depth map. Possible enhancements include reducing noise, filling in missing values, and clarifying edges of objects.<\/li>\n<li aria-level=\"1\"><strong>Translation to real-world coordinates:<\/strong> Using the baseline, which is the distance between the two cameras, and the camera\u2019s calibration data (which is also used in Step 1., and isn\u2019t discussed here since the calculation of such data happens prior to execution of the algorithm), a mathematical operation is applied to each pixel in the disparity map, transforming from pixel coordinates to real-world coordinates, e.g., meters. We end up with a depth map of real-world distances of pixels (or the objects they represent) from the stereo cameras.<\/li>\n<\/ol>\n<p>&nbsp;<\/p>\n<h2>Improving computer vision\u2019s accuracy over time<\/h2>\n<p>While Descartes thought people used a type of \u201cnatural geometry\u201d to estimate depth, computers instead use actual geometry (and trigonometry, optimization, and other mathematical techniques). And just like how biological scientists have continued to study human vision and develop more sophisticated models, computer scientists have continued to study computational stereo vision.<\/p>\n<p>Us humans aren\u2019t estimating depth freshly every time we blink\u2014we\u2019ve learned over our lifetimes how to improve our depth perception. But for the past decade or so, computer scientists have been researching ways to use machine learning and deep learning to improve the accuracy and reliability of stereo depth algorithms.<\/p>\n<p><strong>Two main approaches computer scientists attempt to improve stereo depth algorithms\u2019 accuracy are:<\/strong><\/p>\n<ol>\n<li aria-level=\"1\">Using machine learning to determine parameters that go into an existing stereo depth algorithm based on scene characteristics or other variables<\/li>\n<li aria-level=\"1\">Using machine learning to come up with a better model to compute matching pixels directly<\/li>\n<\/ol>\n<p>Traditional stereo vision algorithms usually determine the best (right) match for a (left) pixel based on a combination of pixel brightnesses from a window of nearby (left) pixels, but the machine learning algorithms in the second approach use a learned model to determine these best matches.\u00a0 Scientists are investigating ways to improve the speed and generalizability of these models, which are often trained using standardized, publicly available datasets that only have stereo images with a particular baseline.<\/p>\n<p>&nbsp;<\/p>\n<p>All of this begs another question, though.<\/p>\n<p><em><span style=\"font-weight: 400;\">Is it possible to judge depth with just one camera?\u00a0<\/span><\/em><\/p>\n<p><span style=\"font-weight: 400;\">To answer this question, let\u2019s turn again to human vision.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h2>Monocular depth<\/h2>\n<p><span style=\"font-weight: 400;\">Seeing with one eye is called <\/span><i><span style=\"font-weight: 400;\">monocular vision.<\/span><\/i><span style=\"font-weight: 400;\"> Biologists know that it is possible to judge <\/span><i><span style=\"font-weight: 400;\">relative<\/span><\/i><span style=\"font-weight: 400;\"> depth of objects just using monocular vision. That is, given several objects in our field of view, we can use context to determine which of them are closer or farther away.\u00a0 We use cues like whether one object overlaps another (judging the overlapped object to be farther away) and linear perspective (we know that parallel lines converge at a far distance). <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/books\/NBK11512\/\">Monocular depth cues<\/a> are heavily rooted in our past experience and familiarity with similar objects and situations. (It still takes two eyes on the same plane to make precise depth judgments without relying on the position of other objects in our fields of view.)<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Likewise, as computer scientists begin to investigate monocular depth estimation, they look to machine learning to help computers learn from past experience how to judge depth from a <em>single<\/em> image instead of two. These techniques are much newer in development compared to stereo depth techniques, which have been researched for decades and don\u2019t have to rely on machine learning. <\/span><\/p>\n<p>&nbsp;<\/p>\n<h2>Machine Vision at Daedalus<\/h2>\n<p>At Daedalus, we\u2019re not only fascinated by the improvements in technology like this over time \u2013 but we also use machine vision in several projects we work on day-to-day. From data collection, to training an AI model, to analyzing results of projects over time, tools like stereo vision and stereo depth algorithms play a large part in research, design, and engineering efforts.<\/p>\n<p>Are you looking for the right design firm to help with your next software development project, or need a vendor familiar with using software for tracking project success? <a href=\"https:\/\/daed.com\/contact\">Get in touch<\/a> to see how we can help.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; At Daedalus, machine vision is more than just a fascinating technological advancement\u2014it plays a crucial role in the solutions we research, design, and engineer. By integrating machine vision systems &#8230;<\/p>\n","protected":false},"author":1,"featured_media":4663,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[222],"tags":[],"class_list":["post-4650","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software-engineering"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.10 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How Machine Vision Learns Depth Perception From Humans - daed.com<\/title>\n<meta name=\"description\" content=\"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a &quot;depth map&quot; by calculating the difference in position between corresponding pixels in two images.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.daed.com\/?p=4650\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How Machine Vision Learns Depth Perception From Humans - daed.com\" \/>\n<meta property=\"og:description\" content=\"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a &quot;depth map&quot; by calculating the difference in position between corresponding pixels in two images.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.daed.com\/?p=4650\" \/>\n<meta property=\"og:site_name\" content=\"daed.com\" \/>\n<meta property=\"article:published_time\" content=\"2024-09-26T17:12:58+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-10-01T19:08:59+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/How-Computers-Detect-Depth-Perception.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1000\" \/>\n\t<meta property=\"og:image:height\" content=\"750\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Daedalus\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Daedalus\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/blog.daed.com\/?p=4650#article\",\"isPartOf\":{\"@id\":\"https:\/\/blog.daed.com\/?p=4650\"},\"author\":{\"name\":\"Daedalus\",\"@id\":\"https:\/\/blog.daed.com\/#\/schema\/person\/ffe3d55f759956aa85792c64b0d0f984\"},\"headline\":\"How Machine Vision Learns Depth Perception From Humans\",\"datePublished\":\"2024-09-26T17:12:58+00:00\",\"dateModified\":\"2024-10-01T19:08:59+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/blog.daed.com\/?p=4650\"},\"wordCount\":1726,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/blog.daed.com\/#organization\"},\"articleSection\":[\"Software Engineering\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/blog.daed.com\/?p=4650#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.daed.com\/?p=4650\",\"url\":\"https:\/\/blog.daed.com\/?p=4650\",\"name\":\"How Machine Vision Learns Depth Perception From Humans - daed.com\",\"isPartOf\":{\"@id\":\"https:\/\/blog.daed.com\/#website\"},\"datePublished\":\"2024-09-26T17:12:58+00:00\",\"dateModified\":\"2024-10-01T19:08:59+00:00\",\"description\":\"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a \\\"depth map\\\" by calculating the difference in position between corresponding pixels in two images.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.daed.com\/?p=4650#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.daed.com\/?p=4650\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.daed.com\/?p=4650#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.daed.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How Machine Vision Learns Depth Perception From Humans\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.daed.com\/#website\",\"url\":\"https:\/\/blog.daed.com\/\",\"name\":\"daed.com\",\"description\":\"research, design, and engineering thinking\",\"publisher\":{\"@id\":\"https:\/\/blog.daed.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.daed.com\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/blog.daed.com\/#organization\",\"name\":\"daed.com\",\"url\":\"https:\/\/blog.daed.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.daed.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/blog.daed.com\/wp-content\/uploads\/2019\/10\/White_Daedalus.png\",\"contentUrl\":\"https:\/\/blog.daed.com\/wp-content\/uploads\/2019\/10\/White_Daedalus.png\",\"width\":5249,\"height\":745,\"caption\":\"daed.com\"},\"image\":{\"@id\":\"https:\/\/blog.daed.com\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.daed.com\/#\/schema\/person\/ffe3d55f759956aa85792c64b0d0f984\",\"name\":\"Daedalus\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.daed.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/8bf29d93ebf6e34ef10f067eac16bcb56fdf8a13ae323fdda0b07b25d15d6c61?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/8bf29d93ebf6e34ef10f067eac16bcb56fdf8a13ae323fdda0b07b25d15d6c61?s=96&d=mm&r=g\",\"caption\":\"Daedalus\"},\"url\":\"https:\/\/blog.daed.com\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How Machine Vision Learns Depth Perception From Humans - daed.com","description":"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a \"depth map\" by calculating the difference in position between corresponding pixels in two images.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.daed.com\/?p=4650","og_locale":"en_US","og_type":"article","og_title":"How Machine Vision Learns Depth Perception From Humans - daed.com","og_description":"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a \"depth map\" by calculating the difference in position between corresponding pixels in two images.","og_url":"https:\/\/blog.daed.com\/?p=4650","og_site_name":"daed.com","article_published_time":"2024-09-26T17:12:58+00:00","article_modified_time":"2024-10-01T19:08:59+00:00","og_image":[{"width":1000,"height":750,"url":"http:\/\/blog.daed.com\/wp-content\/uploads\/2024\/09\/How-Computers-Detect-Depth-Perception.jpg","type":"image\/jpeg"}],"author":"Daedalus","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Daedalus","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.daed.com\/?p=4650#article","isPartOf":{"@id":"https:\/\/blog.daed.com\/?p=4650"},"author":{"name":"Daedalus","@id":"https:\/\/blog.daed.com\/#\/schema\/person\/ffe3d55f759956aa85792c64b0d0f984"},"headline":"How Machine Vision Learns Depth Perception From Humans","datePublished":"2024-09-26T17:12:58+00:00","dateModified":"2024-10-01T19:08:59+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.daed.com\/?p=4650"},"wordCount":1726,"commentCount":0,"publisher":{"@id":"https:\/\/blog.daed.com\/#organization"},"articleSection":["Software Engineering"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.daed.com\/?p=4650#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.daed.com\/?p=4650","url":"https:\/\/blog.daed.com\/?p=4650","name":"How Machine Vision Learns Depth Perception From Humans - daed.com","isPartOf":{"@id":"https:\/\/blog.daed.com\/#website"},"datePublished":"2024-09-26T17:12:58+00:00","dateModified":"2024-10-01T19:08:59+00:00","description":"Machine vision perceives depth much like how humans do: through the combination of information from both eyes. In modern computer vision systems, two cameras are used like human eyes to create stereo vision, generating a \"depth map\" by calculating the difference in position between corresponding pixels in two images.","breadcrumb":{"@id":"https:\/\/blog.daed.com\/?p=4650#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.daed.com\/?p=4650"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/blog.daed.com\/?p=4650#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.daed.com\/"},{"@type":"ListItem","position":2,"name":"How Machine Vision Learns Depth Perception From Humans"}]},{"@type":"WebSite","@id":"https:\/\/blog.daed.com\/#website","url":"https:\/\/blog.daed.com\/","name":"daed.com","description":"research, design, and engineering thinking","publisher":{"@id":"https:\/\/blog.daed.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.daed.com\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/blog.daed.com\/#organization","name":"daed.com","url":"https:\/\/blog.daed.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.daed.com\/#\/schema\/logo\/image\/","url":"https:\/\/blog.daed.com\/wp-content\/uploads\/2019\/10\/White_Daedalus.png","contentUrl":"https:\/\/blog.daed.com\/wp-content\/uploads\/2019\/10\/White_Daedalus.png","width":5249,"height":745,"caption":"daed.com"},"image":{"@id":"https:\/\/blog.daed.com\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/blog.daed.com\/#\/schema\/person\/ffe3d55f759956aa85792c64b0d0f984","name":"Daedalus","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.daed.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/8bf29d93ebf6e34ef10f067eac16bcb56fdf8a13ae323fdda0b07b25d15d6c61?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/8bf29d93ebf6e34ef10f067eac16bcb56fdf8a13ae323fdda0b07b25d15d6c61?s=96&d=mm&r=g","caption":"Daedalus"},"url":"https:\/\/blog.daed.com\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/posts\/4650","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.daed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4650"}],"version-history":[{"count":4,"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/posts\/4650\/revisions"}],"predecessor-version":[{"id":4660,"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/posts\/4650\/revisions\/4660"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.daed.com\/index.php?rest_route=\/wp\/v2\/media\/4663"}],"wp:attachment":[{"href":"https:\/\/blog.daed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4650"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.daed.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4650"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.daed.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4650"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}