Painter’s greatest achievement is to make the flat surface show convex bodies protruding from this plane (Leonardo da Vinci, 1792/2006)

We live in a space defined by three dimensions: horizontal (right or left), vertical (up or down) and deep (front or back). In this space, we see objects and we can relatively accurately determine their location on each of these dimensions. Of course, this is true provided that, firstly, we have a visual system well-functioning along the entire length from the retinas of both eyes to the dorsal pathway, secondly, we occupy a central (egocentric) position in this space, and thirdly, we are sufficiently experienced in moving in this space (Milner and Goodale, 2008).

The horizontal and vertical dimensions define a plane perpendicular to the visual axis and the depth inside sets a line parallel to it. Seeing the scene in three dimensions therefore means seeing a set of planes perpendicular to the visual axis, which are at different distances from us, on the axes along the third dimension. The minimum condition for seeing an object in space is therefore to see any of its surfaces in such a way that its isometric projection on a plane perpendicular to the visual axis can be described by means of non-zero values ​​on the horizontal and vertical dimensions. We do not see those object whose isometric projections on a plane perpendicular to the visual axis assume zero values ​​on any of these two dimensions.

The situation with deep vision is a bit more complicated. Like everything that evolution has equipped us with, the mechanisms of seeing in depth are primarily subordinated to ensure safety, i.e. survival. In natural conditions, the visual system pays special attention to the area up to approx. 6 m around us. Intrusion of someone or something into this space requires precise reactions, e.g. defence, and there can be no room for any mistakes in assessing the distance of that person or object from us.

In an even smaller area around us, that is, in the space delimited by the reach of our hands, high precision is necessary in assessing the distance of objects or their fragments from us. Probably thanks to the development of the ability to precisely manipulate objects within a short distance from the eyes, it became possible for us to manufacture precise tools and sculptures. Perhaps the carving of the first Venus in the Hohle Fels cave, 35-40 thousand years ago, was just a manifestation of the warlike nature of our ancestors aimed at development of modern killing technologies, but how beneficial it was for the development of human spiritual culture. Either way, in relation to the nearest space around us, there are several neurocognitive mechanisms controlling the presence of objects deep in the watched scenes. The most important of them are related to binocular vision.

Awareness of order of objects along the third dimension in a shorter distance, i.e. roughly to the limit of 6 m, is primarily the result of binocular vision, and in particular the so-called binocular disparity and vergence. The experience of seeing the spatiality of the visual scene arises as a result of the action of neurophysiological mechanisms of data processing regarding the reaction of photoreceptors excited by two-dimensional images projected on the retina of both eyes of the observer. If there are differences in photoreceptor response between projections of these images on the spatially corresponding receptive fields of both retinas, then at the cortical sections of the visual pathway, this data is integrated into one image containing information about the order of the planes seen in depth.

It is not difficult to guess that the first signals regarding binocular divergence are the subject of analyses in the V1 cortex, in which data from the right and left eye is ordered within one anatomical structure. Research on this subject was conducted several years after Hubel and Wiesel’s discoveries regarding the construction of the V1 cortex, among others by Horace B. Barlow, Colin Blakemore and Jack Pettigrew (1967). Further analyses in the field of binocular divergence are conducted in the structures of the parietal lobes on the dorsal path, especially in the V5 area.

If the binocular divergence is too large, i.e. if the images projected on the retina of both eyes of the observer are completely different, then the so-called  binocular rivalry occurs. In this situation, differences between the images are not interpreted as a depth indicator; only one of these images, most often the one projected onto the retina of the dominant eye, is treated as the correct one, and the other is ignored.

The second group of mechanisms associated with stereoscopic vision concerns vergence reflexes and divergence of eyeballs. This issue has already been discussed in the chapter on framing the visual scene. Convergent eye movement is a manifestation of the greater interest in objects closer to the observer, while divergent movement indicates interest in objects slightly away from the observer, but only up to approx. 6 m, because after that limit the lines of field of vision are almost parallel to each other.

Because in this book I focus on viewing flat images, binocular cues of depth perception, i.e. binocular divergence and vergence reflexes, will not be discussed further. From the point of view of the issues discussed here, those mechanisms of depth vision that do not require binocularity are much more interesting. They are much more universal, because they not only allow us to assess the order of objects in depth in a three-dimensional scene, regardless of whether they are close or far from us, but also form the basis of the third dimension illusion in the paintings. They are the so-called non-stereoscopic or monocular depth cues.

Space seen through Cyclope’s eye

At the same time, monocular depth cues are the basis for understanding spatial relationships inwards between objects depicted in a flat image or in three-dimensional space, practically regardless of the distance from the observer. The accuracy of the interpretation of these indicators is the result of learning how specific patterns of planes, their contours and colors, and their variability in motion, represent real relationships in three-dimensional space. This hypothesis is supported by the results of intercultural research, in which the stimulus was illustrations (paintings, drawings, photos) depicting scenes containing depth cues unknown in other cultures (e.g. Duncan, Gourlay and Hudson, 1973; Gibson, 1988; Hudson, 1960; Jahoda, Deregowski, Ampene and Williams, 1977; Jahoda and McGurk, 1974; Liddell, 1996; 1997; Serpell and Deregowski, 1980).

Robert Serpell and Jan B. Deregowski (1980) prove that introduction to the scene of hunting of depth cues obvious to people from the circle of Western European culture, such as the horizon, converging road edges and the right proportions of sizes of the presented objects in the first and second plane (Fig. 134) by no means is a valuable tip for the indigenous people of Ghana, Zambia or Uganda in terms of space.  Africans have consistently interpreted this scene as an elephant hunt.

Figure 134. Illustration from W. Hudson’s depth perception test (1960). Graphic design: P.F. Bsed on Hudson (1960)

Depth perception depicted on a flat painting based on its monocular indicators is the result of the activity of neurophysiological mechanisms developing on similar principles as, e.g., mechanism that enables us to separate an object from its shadow. With regard to new objects, we can easily be mistaken in taking a shadow for actual part of an object. In numerous experiments we found out that both the contour and the color of the visual object and its shadow have slightly different properties. Their perception, classification and remembering are the basis for accurate recognition of object in the future, both in the three-dimensional world and in images.

It is similar with depth perception. We know from experience that, for example, the size of the same objects in a visual scene most often depends on their distance from the observer. And although this observation has certainly been given to humanity almost since the dawn of time, it took over 30,000 years (counting from the creation of rock paintings in Lascaux, Altamira and Chauvet) for the principles of linear perspective to be consciously used by artists in Western culture to depict three-dimensional visual scenes in flat paintings (Alberti, 1963; Janowski, 1997). It was only at the beginning of the 15th century that Filippo Brunelleschi codified the principles of a linear perspective and through the centuries-old cultural message they were included in the canon of basic monocular depth cues. Acquiring the ability to use this cue required time and training, i.e. simply learning how it reveals the spatiality of the presented visual scene in the third dimension.

Although the Renaissance solution of Brunelleschi seems to be the most obvious form of presenting three-dimensional space in the picture, it is surely not the only perspective accepted by the human mind. This is proved by the very interesting graphic and painting propositions of Dick Termes, fascinated by not a one-point or two-point perspective of perceiving things, but a six-point spherical perspective (Fig. 135). Both his experiments and studies of ways of space imaging in the past prove that we do not have a good answer to the question of how perspective lines run when viewing three-dimensional scenes. We cannot even agree whether these lines are straight or crooked. Drawing on knowledge from optics, it is easier for us to think about straight-line visual axes and parallel lines of perspective convergence than the lines running along the edges of the sphere.

Figure 135. From one-point to six-point (spherical) perspective of Dick Termes

Currently, we know many monocular depth cues that form the basis for seeing the third dimension in flat paintings. The functionality of all these cues is verified by countless daily visual and movement experiences. Looking at the flat image of three-dimensional space, we use it automatically. They create the unquestioned illusion of three-dimensionality of the image and it is only when the artist breaks the principle underlying one of these cues, it forces us to revise the belief in the spatiality of what we are currently watching. The illusory spaces of Maurits Cornelis Escher provide excellent examples confirming this supposition (Fig. 136).

Figure 136. Maurits Cornelis Escher, Day and Night (1938). National Gallery of Art, Washington, DC, USA [39.2 x 67.6 cm] 

Three groups of monocular depth cues

The first group of monocular depth cues include those that define depth on the basis of the relationship between elements in the visual scene depicted in the image or in relation to its frameworks. These are: interposition or occlusion, i.e. covering one object by another, transposition (elevation), which determines the relationship between the position of objects in the image relative to the horizon and their illusory distance from the observer, and linear perspective and curve linear perspective), zero-, one- or multi-point.

The second group of cues is based on blurring of the outlines of objects seen as a result of thickening of the texture gradient or sharpness of details. The basic interpretation rule of these cues is: the less different the surface of the objects presented in the image, the greater their distance from the observer inwards.

Finally, the third group of depth cues is connected with the luminance and color of the presented objects. The most important cues in this group include: chiaroscuro, aerial perspective and color separation. The distribution of light and the intensity of the shading of the surfaces of objects and the space around them is basically interpreted in accordance with the following principle: the brighter the closer, and the darker the more inwards. Not without significance for the interpretation of chiaroscuro as a cues of depth is also the assumption regarding the position of light relative to the object, without which it does not cast a shadow.

Watching distant objects, e.g. mountains on the horizon, we generally see them much less clearly than nearby objects, but above all, we see them as if behind a blue-grey haze. It is the density of this mist that defines the air perspective. The last cue, i.e. color separation, is the most controversial one. According to Pablo Picasso, for example, colors do not take part in coding of space at all. They are only symbolic. On the other hand, according to Paul Cézanne or 17th-century academics, it is completely different – warm colors are closer to the observer, and cold ones – further.

Of the depth cues listed above, inclusion is the most obvious and also the least controversial. We may have doubts whether the artist correctly reflected the perspective of the depicted space, or whether – in accordance with the convention – he or she used one or another color to emphasise the depth, but we have no doubt about one thing: the obstructed object cannot be closer to us than the one obscuring it. The results of intercultural studies indicate that African children in the 1960s had the least problems with correct depth interpretation based on the inclusions presented in the pictures of the Hudson test (1960). On the one hand, we almost exclusively look at objects that are either obscured or obscure others, and we always interpret this observation as a reliable indicator of their distance from us. On the other hand, a closer look at the mechanisms responsible for understanding the relationship of obscuring, reveals their complexity. For these reasons, in the next chapter I will discuss this depth viewing cue in more detail.

The identity of objects as the foundation of seeing along the third dimension

Regardless of the depth cues listed, which can be used more or less correctly on the image to give the scene a three-dimensional plasticity, they all have one thing in common: they are image properties. In other words, by drawing or painting a picture we can use each of them to emphasise the spatiality of the scene inwards. However, how the given depth cue is interpreted is no longer a feature of the image but of the observer, or more precisely his or her ability to perceive objects as identical regardless of perceptual conditions (Lawson, 1999).

None of the three dimensions of space work as destructively for the constancy of seeing of objects as the dimension inwards. Spatiality not only affects the size of objects seen, but above all causes that we are still dealing with their different appearance. This variability is the result of both the observer taking different points of view and inclusion, which mercilessly provides us with only the fractions of complete images. A set of those mental features that counteract the destructive effects of inwards vision is referred to as visual object constancy, two of which are particularly relevant to images. These are: shape and size stability. I will start discussing the issue of depth in flat images using those two.


Constancy of shape

The perception of complex shapes of objects as the same, despite the fact that they are seen from a new or unusual point, in unfavourable lighting conditions, or that they are partially obscured, is called shape constancy (Palmer, 1999; Pizlo, 2008; Vogels and Orban, 1996). Shape constancy is one of the basic factors allowing us to recognise an object, a person or relationship between them, regardless of whether their image projected on the retina of the eye at the given moment is the same or different as the one projected earlier (Pizlo, Sawada, Li, Kropatsch et al., 2010).

The sense of shape constancy is one of the pillars of vision. In particular it refers to the perception of depth because the three-dimensionality of space is one of those factors that significantly affect the changeability of shapes of objects, understood as their isometric projections on the surfaces of the retinas. After all, we must remember that although we live in a three-dimensional world in which the dimension “inwards” is most often interpreted as “in front” or “in front of us”, in the act of seeing data about it is reduced to two-dimensional retinal images. The experience of the third dimension is not given to us directly in such a way as e.g. the experience of seeing light, but it is a kind of conclusion resulting from many premises contained in flat retinal images.

An example illustrating the constancy of shape is the experience of viewing the same object from different points of view (Fig. 137 A).

Figure 137 A. Illustration of the experience of shape stability despite the different appearance of the object viewed from different perspectives. Graphic design: P.F.

If we look at the contours that will most likely be captured on the basis of their images projected on the retina of the eyes, we will see that they are fundamentally different (Fig. 137 B). It does not bother us, however, to think of this object as on and the same. And this is the wonderful experience of shape constancy. It is really hard to imagine life without this ability to grasp the essence of objects despite the variability of their appearance.

Figure 137 B. The contour version of Fig. 137 A. Graphic design: P.F. Procedure: (1) Image/Mode/Lab Color [Luminance]; (2) Image/Image Size [Resolution: 150]; (3) Filter/Stylize/Find Edges ; (4) Image/Image Size [Resolution: 300]; (5) Image/Adjustments/Brightness/Contrast [Brightness: 50; Contrast 100]

Shape constancy generally does not depend on how large are the images projected by the same object onto the retina of the eye, e.g. due to viewing it from different distances. Of course, growing distance between the object and the observer is one of the factors that causes a decrease in visual acuity and after exceeding a certain threshold, the size of the image projected on the retina stars having a negative effect on shape constancy. In the 1970s, Hershel W. Leibowitz, Stephen B. Wilcox and Robert B. Post (1978) studied the effect of image blurring on shape and size constancy. They found that an increase in image blurring reduces shape constancy, but does not affect size constancy (which will be discussed later in the next chapter).

Moses W. Chan, Adam K. Stevenson, Yunfeng Li and Zygmunt Pizlo (2006) stated that the shape constancy of three-dimensional objects is closely related to such features as symmetry, visibility of the planar contours defining the planes and volume. Symmetry plays a particularly important role in maintaining shape constancy. Asymmetry makes recognition of the object viewed from different points of view difficult. Similarly, smaller shape constancy is characteristic of those objects in which it is difficult to identify planes and that are devoid of volume.

Three-dimensional object shown in Fig. 138 differ in terms of shape constancy. Although both are asymmetrical, object A has clearly visible planes that, when connected together, suggest it is a solid figure, and therefore it has volume, while object B does not resemble a solid figure. It would be much easier for us to recognise object A from a different point of view than in case of object B.

Figure 138. Three-dimensional objects presented on a plane, characterised by high shape constancy (A) and low shape constancy (B)

Zygmunt Pizlo, Yunfeng Li and Robert M. Steinman (2008) claim that the shape constancy of three-dimensional objects viewed from a short distance is by no means necessarily, nor sufficiently related to the brain’s recorded divergence of images on the retinas of both eyes (binocular disparity). The basis for shape constancy are primarily the listed features of these objects seen in a two-dimensional plane.

It is worth recalling one more concept at this point, which addresses the issue of shape constancy as the principle of perceptual categorization. It is the so-called canonical perspective, from which the object is seen. The creators of this concept are Stephen E. Palmer, Eleanor H. Rosch and Paul Chase (1981). They presented to the subjects photographs of natural objects captured from different points of view and asked them to recognise those objects. It turned out that for each subject there is the most preferred point of view from which it is most accurately recognised. From this point of view, the appearance (shape) of the object reveals the most important details necessary to recognise it and is most often seen from this point in everyday life (Blanz, Tarr and Bulthoff, 1999). In other words, the shape of objects presented from a canonical perspective is a prototype shape, with which the mind compares the currently seen shapes of objects when determining their identity.

Two brain structures lying on the abdominal pathway, i.e. inferior temporal gyrus; IT and fusiform gyrus lying just above it from the inside of the cortex, to which visual data flows mainly from the V4 field (Tanaka, 1993; 1996) are responsible for the perception of complex shapes of things as identical, or in other words, for their constancy. It is now also known that the constancy of vision of shapes of letters and word is also associated with the activity of neurons in the fusiform gyrus, but only on the left side of the brain, in the visual word form area (VWFA) (Dehaene and Cohen, 2011). It is also known that the constancy of vision of shapes of face that underlies their recognition is related to the activity of neurons located in the fusiform gyrus on the right side of the brain or on both sides of it (Farah, 1996; Feinberg, Schindler, Ochoa, Kwan et al., 1994; Kanwisher, McDermott and Chun, 1997; McCarthy, Puce, Gore and Allison, 1997; Fig. 139).

Figure 139. Brain structures responsible for the constancy of shape of the objects seen, including letters and words, and face. Graphic design: P.A. based on Tanaka (1996), Dehaene and Cohen (2011) and McCarthy, Puce, Gore and Allison (1997)

Neurons in the area of IT and fusiform gyrus react similarly to the same shapes of seen objects regardless of whether their contours are encoded by the contrast of brightness, texture diversity, or movement (Sáry, Vogels, Kovács, and Orban, 1995), regardless of the size and position of these objects in the field of vision (Ito, Tamura, Fujita, and Tanaka, 1995; Logothetis, Pauls and Poggio 1995), and irrespective of whether they are fully visible or partially obscured (Kovács, Vogels, and Orban, 1995; Missal, Vogels, and Orban, 1997).

Alan Slater and Victoria Morison (1985) and Alan Slater, Scott P. Johnson, Elizabeth Brown, and Marion Badenoch (1996) studied the interest of young children in the presented shapes of various figures. They determined that the mechanism responsible for coding complex shapes and, consequently, for the constancy of seeing them, is innate. Already in several days old infants, they found habituation, i.e., loss of interest in known shapes and increased interest in figures that have shapes unknown to them.

At the other extreme, due to the degenerative processes of the aging brain, this mechanism may become gradually dysfunctional, which is manifested by the deepening symptoms of shape agnosia, i.e. the inability to correctly recognise and reproduce the things being seen (Farah, 1990; Tippett, Blackwood and Farah, 2003). The phenomenon of the breakdown of shapes of things being seen against the background of neurodegenerative disorders perfectly reflects the self-portraits of William Utermohlen (Fig. 140 A-D).

Figure 140 A. William Utermohlen, Self Portrait with Saw (1997). Galerie Beckel Odille Boïcos, Paris, France [35.5 x 35.5 cm]
Figure 140 B. William Utermohlen, Self Portrait with Easel (1998). Galerie Beckel Odille Boïcos, Paris, France [35.5 x 25 cm]
Figure 140 C. William Utermohlen, Erased Self Portrait (1999). Galerie Beckel Odille Boïcos, Paris, France [45.5 x 35.5 cm]
Figure 140 D. William Utermohlen, Self Portrait Drawing (2000). Galerie Beckel Odille Boïcos, Paris, France [40.5 x 33 cm]

Pictures on Fig. 140 are presented chronologically. They were painted by Utermohlen between sixty-three and sixty-six years of age, while Alzheimer’s disease gradually led to the disappearance of his brain structures in both hemispheres (Crutch, Isaacs and Rossor, 2001).

On the one hand, neuroscience provides many examples of painters who suffered from migraine, epilepsy, stroke or other brain damage, as well as neurodegenerative diseases. In 2006, the entire International Review of Neurobiology (volume 74) was devoted to this issue. On the other hand, contemporary art is full of examples of deliberate and even programmatic violation of the principle of shape constancy. Classics in this field should certainly include painters associated with surrealism (Fig. 141), expressionism (Fig. 142) and cubism (Fig. 143). Despite their many differences, they have one in common – a radical departure from the typical shape of things as a means of artistic expression.

Figure 141. Salvador Dali, The Temptation of St. Anthony (1946). Musée Royaux des Beaux-Arts, Brussels, Belgium [89.7 x 119.5 cm]
Figure 142. Francis Bacon, Three Studies for Figures at the Base of a Crucifixion (1944). Tate Modern, London, United Kingdom [94 cm x 74 cm each]
Figure 143. Pablo Picasso, Les Demoiselles d’Avignon (1907). Museum of Modern Art, New York, USA [243.9 × 233.7 cm] 

Size constancy

Size constancy is the ability of the mind to accurately assess the size of perceived objects regardless of their distance from the observer, in other words no matter how big the image they project on the retina (Gibson, 1979; Kaufman and Kaufman, 2000; Konkle and Oliva 2011; Palmer, 1999). Size constancy is based on the knowledge and experience of the observer, which tells him that the perceived size of known objects is a derivative of the distance at which they are located from him. It is so obvious that we do not even realize how huge the impact of distance in depth is on the size of the images projected on the retina of the eye by objects located closer and further away from the observer.

A pair of young people in the upper right corner in Fig. 144 (arrow A) is located no more than 15 metres behind the person from the foreground, but only bringing these plans together gives an idea of ​​the scale of the difference. The height of the people from the background does not exceed 2/3 of the height of the head of the person in the foreground (cf. Fig. 144, arrow B). Despite such a large disproportion between the sizes of people photographed, we do not have impression that the people from the background are particularly short.

Figure 144. Illustration of the stability of the size of people at different distances from the observer into the visual scene. Graphic design: P.F.

The sense of size constancy depending on the distance inwards is, however, even more extraordinary experience. Here is a painting by Paolo Uccello, Scenes from the Life of the Holy Hermits, on which the distances between different plans are at least several dozen metres (Fig. 145). In each of these plans we see characters, and we accept their sizes as easily as in Fig. 144. The difference between the two paintings is, however, fundamental. The figure of a kneeling hermit visible from a distance of 60-80 metres in the farthest plan is about half the height of the largest figure in the foreground – the visionary hermit sitting in the bench on the left. If you photographed this scene in nature, then a kneeling hermit would become a small point on the horizon. And then, we would certainly also accept this scene as correctly reflecting the size of people and objects in it. So, what about size constancy in paintings?

Figure 145.  Paolo Uccello, Scenes from the Life of the Holy Hermits (1460). Galleria dell’Accademia, Florence, Italy [81 cm x 110 cm]

Talia Konkle and Aude Oliva (2011), in addition to the idea of ​​size constancy, introduce the concept of canonical visual size, analogous to the concept of canonical shape of objects perceived from a particular point of view. While the size constancy of a known object allows the observer to assess the distance between him and the said object in a real situation, the canonical visual size of this object is the most preferred size in the image. These two concepts are not contradictory, although they do not have to be compatible with each other. When viewing the image, knowledge of the actual size of the objects can be used by the observer to assess the spatial relationships inwards. At the same time, it is subject to a specific adjustment, precisely because it is a representation in an image and not a real situation in three-dimensional world.

Two effects found by Konkle and Oliva (2011) reveal the specifics of this adjustment. The first effect relates to the relativization of the size of the drawn object to the space of the image, determined by its frames. It turns out that although the fish, chair and truck, drawn on three separate sheets of paper of the same size, occupy a larger and larger area, respectively, the increase in size is not rectilinear (as it would be in reality), but logarithmic (Fig. 146 A).

Fig. 146 A. The effect of logarithmic increase in size of the drawn object in relation to the fixed paper size. Graphic design: P.F. based on Konkle and Oliva (2011)
Fig. 146 B. The effect of proportional increase in size of the drawn object in relation to the changing paper size. Graphic design: P.F. based on Konkle and Oliva (2011)

The second effect stated by Konkle and Oliva (2011) relates to the relationship between the size of the surface of image determined by its frame and the size of the drawing of the same object. The proportion of the size of the car in Fig. 146 B to the surface of the sheet of paper on which it was drawn is nearly the same, although the increase in the size of the image surface compared to the increase in the size of the drawn object is slightly larger with each subsequent paper size. It can therefore be expected that for each object depicted in the drawing with a certain surface there is a certain most preferred (accepted) size.

In order to understand the relationship between the size of the object depicted in the image and the size of its surface, it is still necessary to realize that the real picture frames, usually enclosed in a rectangular form, are not the only reference frames for the objects depicted in it. The size of people, both in the photo in Fig. 144, as well as on the painting by Uccello in Fig. 145 can be observed not so much through the prism of seeing space inwards, but through the prism of many plans, in some sense independent of each, or in other words – images within the image. Each of these sub-images has its own space of a specific size in relation to which the size of the objects contained in it is relative.

Let’s use the example of different scenes from the Scenes from the Life of the Holy Hermits, which I took from three different plans of the whole painting (Fig. 147). It turns out that the proportions of the size of the character to the size of the space of individual scenes are more or less the same, regardless of the plan in which they are painted. What is more, the principle of size constancy is respected in each of these scenes. Further objects are relatively smaller than the closer ones.

Figure 147. Three fragments taken from three plans of the painting by Paolo Uccello, Scenes from the Life of the Holy Hermits. Graphic design: P.F. based on Fig. 145

The possibility to manipulate the relationship between the size constancy and canonical visual size are of particular interest to some outstanding contemporary painters. Chuck Close and Andy Warhol experimented with the canonical visual size of the faces portrayed in the paintings, rescaling them to dimensions strongly different from typical photo formats or photocopies. Not only can the image surface exceed 10 m2, but also the face painted on it fills it almost entirely. The full artistic effect is also achieved because these paintings are viewed from a short distance in the museum space.

Figure 148 A. Andy Warhol, Frank (1969). The Minneapolis Institute of Arts, Minneapolis, Minnesota, USA [274 x 213 cm]
Figure 148 B. Andy Warhol, Mao (1973). Hamburger Bahnhof Museum für Gegenwart, Berlin, Germany [448 x 346 cm]

An example of a radical breach of the principle of size constancy in a painting can be, in turn, the paintings of René Magritte, from the series Surreal interior, in which the proportion of easily recognisable objects of relatively small size was seriously disturbed both in relation to the space in which they are presented, and to the size of the space determined by the picture frame (Fig. 149). Removal of the normal relationship between objects and spaces in which they are found, combined with the unusual titles of these works are a source of completely new meanings.

Figure 149 A. René Magritte, The Listening Room (1952). Menil Collection, Houston, Texas, USA [55 x 45 cm]
Figure 149 B. René Magritte, The Tomb of the Wrestlers (1960). Private collection [35 x 45 cm]

Similarly to the shape, also the mechanism responsible for the perception of the size constancy is most likely innate. This hypothesis is supported by the results of research on the habituation of newborn babies (Granrud, 1987; Slater, Mattock and Brown, 1990) and 4-6-month-old infants (Day and McKenzie, 1981; Granrud, 2006; McKenzie, Tootell and Day, 1980) to the size of the objects shown to them. Developmental studies in older children (5-11 years) reveal the relative stability of size constancy, where the accuracy of assessment of size of objects seen depending on the distance increases with age (Granrud, 2009).

These studies on the effects of image blurring on shape and size constancy led Hershel W. Leibowitz and Robert B. Post (1982) to the conclusion that while the collection and maintenance of shape information is handled by the ventral path responsible for object recognition (which has already been confirmed many times in other studies), data about the size of the object is processed in the dorsal pathway, which deals with object’s location, which is the basis for performing motor actions involving the said object. Similar conclusions were reached a few years later by Hide-aki Saito et al. (1986), who found that about 15% of neurons in the medial superior temporal area (MST) located on the dorsal path are sensitive to changes in stimulus size. Still, the problem of location of the function of size constancy is much less certain than the neuroanatomical location of the function of shape constancy.

As a result of the research conducted on the patient D.F., which led Melvin A. Goodale and A. David Milner to specify the function of two visual pathways: ventral and dorsal, it was also found that despite the lack of damage to the dorsal path, the patient has shown problems related to size constancy, while performing tasks using only one eye (Marotta, Behrmann and Goodale, 1997). It turns out that damage to the centres responsible for recognizing the complex shapes of objects on the ventral path in conjunction with monoscopic (monocular) vision also causes difficulties in correctly assessing their size.

Also Allan C. Dobbis, Richard M. Jeo, József Fiser and John M. Allman (1998) claim that both the dorsal (MT) and ventral path structures are responsible for the effect of size constancy, especially in the area of ​​V4. To sum up, Helen Ross and Cornelis Plug (2002) believe that there is still too little data to indicate with high probability those cortical structures that are responsible for the size constancy, and surely we should not expect them to be located exclusively in one of the mentioned visual pathways.

Notwithstanding previous findings, the results of the latest fMRI research conducted by Talia Konkle and Aude Oliva (2012) reveal that small objects, such as a coin, pipe, leaf or mug activate neurons mainly in the inferior temporal cortex and lateral occipital cortex, while large objects, such as an armchair, chest of drawers or lawn mower – in the parahippocampal cortex (Fig. 150). This means that other brain structures respond to the data on opposition in terms of the size of objects seen, just as different brain structures store data and are responsible for their differentiation with respect to the opposition of animate objects (faces and body parts) and inanimate ones (Kriegeskorte, Mur, Ruff, Kiani et al., 2008), regarding the face and other body parts (Peelen and Downing, 2005) and relating to visual scenes and objects isolated from them (Epstein and Kanwisher, 1998).

Figure 150. Active areas of the temporal cortex in response to large (blue) and small (yellow-orange) stimuli in two projections of the left and right hemispheres of the brain: A – lateral, B – lower. Graphic design: P.F. based on Konkle and Oliva (2012)

An interesting result of the study of Talia Konkle and Aude Oliva (2012) is also that the activity of these brain structures is independent of how large the retinal image of these objects is. In other words, regardless of whether the same object projected a 4o or 11o angle of view on the retina, it activated the same parts of the temporal cortex. The results suggest a significant relationship between knowledge about the size of objects seen and the activation of specific structures in the temporal and occipital cortex.

Breaking the principle of size constancy is one of the means used in visual arts to emphasise the importance of particular objects or people. In ancient Egypt, the size of the depicted characters was interpreted symbolically as a sign of importance in the social hierarchy. Here, for example, the pharaoh Akhenaten, his wife, queen Nefertiti and daughter, make an offering to the god Aton (solar disk) (Fig. 151 A). The size of the characters depicted on the papyrus is not determined by their distance from the observer, but by their rank in the state and family.

Figure 151. A. Pharaoh Amenhotep IV (Akhenaten), his wife Queen Nefertiti and daughter, making sacrifice to god Aten (the disc of the sun), papyrus

Almost exactly the same pattern of relation of importance reflected by the different sizes of the presented characters can be found in Western European Christian art (Fig. 151 B). Were it not for the giant Gabriel the Archangel in the middle of the scene, one would think that the differences in the size of the figures of Christ and the resurrected people are caused by the viewpoint of the observer (from heaven towards earth). However, looking at the size of people around Christ also points to an unjustified physical disproportion in their size.

Figure 151 B. Hans Memling, The Last Judgment (1467-1470). National Museum in Gdańsk [221 x 160 cm]

By the way, much less manifestations of breaking of the principle of size constancy is found in the visual art of Asia, South America and Australia, and even in East European icon art.

Among modern paintings, we can also find examples of the symbolic use of size as a guide to the importance of the objects presented, which break the rules of size constancy. One of the masters of such anecdotes is undoubtedly René Magritte (Fig. 152).

Figure 152. René Magritte, The Giant (1929). Museum Ludwig, Cologne, Germany [54 x 73 cm] and text of the poem entitled La Géante (The Giantess) (1857) by Charles Baudelaire, translated by William Aggeler (The Flowers of Evil. Fresno, CA: Academy Library Guild, 1954)

Painting The Giant by Magritte is a surreal vision inspired by the poem of Charles Baudelaire (1857, translated by M. Jastrun) with the same title.


Main depth indicator

Interposition or occlusion is the most obvious and also the most important indicator of depth, both in relation to the three-dimensional scene and the flat image that represents it. It means mutual obscuring of non-transparent objects located deep into the visual scene being watched. The obstructed object is perceived as being further away from the observer than the object that obscures it (Fig. 153 A). Despite the obviousness of this experience, a closer look at the perceptual mechanism that underlies it reveals its complexity. First of all, before the observer’s cognitive system determines which object is closer and which one is further away from them, they must first answer the more basic question: whether what they are seeing is one object in the same plane perpendicular to the visual axis, or many objects positioned inwards along the dimension.

According to Gaetano Kanizsa (1979), interposition is characterised by the intersections of the “L” type and the “T” type contours (intra- and inter-object). From the point of view of separating objects from each other, the key element is the perception of inter-object intersections of “T” type. There is no interposition phenomenon in relation to transparent objects (Fig. 153 B). Typical cross-contour intersections between transparent objects are ‘X’ intersections.

Figure 153 A. Interposition of two non-transparent objects; blue circles indicate intersections of “L” type contours, red – “T” type (intra-object) and yellow – “T” type (inter-object); B. Lack of interposition of two transparent shapes; Green circles indicate the intersection of “X” type contours. Graphic design: P.F.

The issue of interposition is most often considered in two contexts:

  • Gestalt principles of perceiving the figure and background, which define the relationship between the obscuring and the obscured, and
  • perceptual completeness of what is obscured and the cognitive ability to reconstruct the unseen part.

To understand what essentially the phenomenon of interposition is, you must first remember how the visual system recognises the shapes of individual things in the visual scene. The basis for seeing each object is to isolate it from the background and separate it from other objects in the scene based on the planes and contours that surround it.

However, while the neural mechanisms that allow for the identification of contours are well known, we know much less about the neural mechanisms that are responsible for determining on which side of the contour the object is and on which side is the background.

Back in 1988, David Hubel wrote: “Many people, including myself, still have trouble accepting the idea that the surface [lying between the contours of seen objects – P.F] is not able to stimulate neurons in our brain – that our awareness of the interior as white, black or colored, depends only on the cells sensitive to the edges” (Hubel, 1988, p. 87). Gestalt theorists also noticed this difficulty and proposed the concept of object’s border ownership, which indicates the plane belonging to the figure, as opposed to the background plane or other objects (Koffka, 1935).

Neural basis of interposition

To this day, the neuronal mechanism responsible for identifying the plane as the object’s own border ownership lying on a specific side is not known. The results of research conducted by Hong Zhou, Howard S. Friedman and Rüdiger von der Heydt (2000) on monkeys and by Rüdiger von der Heydt, Tod Macuda and Fangtu Qiu (2005) on humans revealed, however, that at the early stages of cortical processing of visual signals, and exactly in the V2 area, more than 50% of the cells in the receptive field that are involved in contour coding react more intensively on the inside of the presented figure than on the outside.

Philip O’Herron and Rüdiger von der Heydt (2009; 2011) presented to the subjects two types of stimuli, shown in Fig. 154. They differ in the degree of defining of the figure inside the circle. On board A there is a white square with contours clearly separated from the grey background of the circle. On board B, however, we cannot define which part of the circle belongs to the figure and which part is the background. The oval shape lying at the junction of the planes means the receptive field of the ganglion cell encoding the contour. Red means high activity of neurons in the receptive field in the V2 area, and blue means low activity.

Fig. 154. Incentives used in research on the identification of the order ownership of objects. A – easily recognizable figure (square) against the background of a dark grey circle and B – the circle divided into two equal parts, without any indication which part is the figure and which part is the background. Graphic design: P.F. based on O’Herron and von der Heydt (2009)

O’Herron and von der Heydt confirmed the results of earlier studies according to which cells lying in the receptive field on the figure side were more active (red) than on the background (blue) (Fig. 154 A). They also determined that the intense cell activity on both sides of the contour in the receptive field in the V2 cortex persists much longer, even more than one second, when the figure cannot be clearly distinguished within the stimulus, as in Fig. 154 B. The results of the cited studies show that there must be specialised cells in the V2 cortex that identify the outline on the side of the figure by identifying the contour, but so far we have no idea what constitutes the basis of their “knowledge”.

What connects Strzemiński, the artists from Chauvet and Leonardo da Vinci?

An interesting example of artistic variation on the space signalled by the contours of recognizable things and colorful spots are the temperas of Władysław Strzemiński (Fig. 155 A and B).

Figure 155 A. Władysław Strzemiński, The Unemployed (1934). Museum of Art in Łódź, Poland [19.5 x 25.5 cm]
Figure 155 B. Władysław Strzemiński, Lodz landscape (1932). Museum of Art in Łódź, Poland [20 x 24.4 cm]

The space of a city, in which some objects are obscuring other objects is defined in Fig. 155 B using two types of contours: transparent, full “X” intersections and surrounding colored spots, which in a complicated way connect with each other and with a free contour line, drawn as if regardless of them.

The spatiality of the scene depicted in the painting can be read in several ways. On the one hand, some houses obscure each other (interposition), while others are transparent (visible, but without planes). Some of them are indicated only by a contour line, others by colored spots. On the other hand, in the painting we can separate from each other a layer of a free contour line that is in front of a layer of colored spots. However, this line does not belong to only one plane. On the contrary, the contours of the windows are much closer than the chimneys on the horizon.

The situation in the painting entitled The Unemployed is even more complicated (Fig. 155 A). Contour lines and colored spots intertwine here completely free. However, we have no doubt that the scene represents a group of people. On the other hand, the measures used by Strzemiński emphasise its shapelessness, irrelevance of the explicit order of obscuring and mobility, features so characteristic when expressing the idea of a crowd.

From the point of view of the task that the visual system performs by examining space inwards the visual scene, one can find some analogy between Strzemiński’s paintings and the oldest surviving evidence of human painting activity. Photo in Fig. 156 shows a fragment of cave drawings from the Chauvet cave. Most likely, they come from the Palaeolithic era, about 30 thousand years ago.

Figure 156. Cave drawings from the Chauvet cave, France, approx. 30,000 b.c.

Looking at these, as well as many other cave drawings, it cannot be clearly stated to what extent the artists who created them consciously applied the principle of interposition. On the one hand, the group of animals in the upper part of the wall is presented in such a way that we have no doubt that the artist used the interposition to present the order of the animals inwards. On the other hand, the contours of two groups of animals at the bottom of the wall intertwine and we are not so sure whether they present a three-dimensional scene. Don’t the cave drawings of animals resemble the effects Strzemiński achieved in his visual experiments? Perhaps this is a poor analogy, but it is possible that the cave drawings, like Strzemiński’s unism variations, are a manifestation of momentous discoveries in the field of presenting what is seen on the plane of the picture.

Also another interpretation of these prehistoric works of art is possible. These, as well as many other cave drawings, resemble sketches, study of the animal’s head or body, analogous to those filled by sketchbooks of almost all painters. Although the location of the heads of animals drawn by Leonardo da Vinci and an unknown artist from thousands of years ago may suggest interposition, in fact each of them is autonomous, and their accumulation next to each other is motivated by saving space rather than trying to recreate the dimension in depth (Fig. 157 A and B). The thing is that we are not sure about the motives for creating cave drawings.

Figure 157 A. Leonardo da Vinci, Study of Horses for The Battle of Anghiari (1503). Royal Collection, Windsor Castle, London, United Kingdom [19.6 x 30.8 cm]
Figure 157 B. Sketches of horses’ and other animals’ heads from Chauvet Cave, France, approx. 30,000 b.c.

Jean Clottes and David Lewis-Williams (2009), David Lewis-Williams (2002) and Michael Winkelman (2002) claim that these are imaginary performances (hallucinations) of objects of special worship and respect to which the artist – most likely, the shaman – admired during the trance. Certainly, they included large animals, such as horses, bison, rhinos or tigers. The motives to immortalize them on the walls of the cave could therefore have little to do with the intention to depict reality, in the sense in which, for example, Bernardo Belotto (Canaletto) painted Warsaw from different points of view. In any case, this puzzle has not been resolved to this day.

Reconstructing the unseen according to gestaltists

Another important issue related to interposition is the answer to the question: how does the visual system interpret the relationship between two objects whose retinal image may suggest that one of them obscures the other. Kanizsa calls reconstructing the unseen (1979) amodal, emphasising that knowledge of what cannot be seen in an obscured object cannot be verified by any of the senses. Seeing incomplete objects, partly obscured by others, is one of the most common perceptive experiences, and it does not seem difficult for people to recognize which object is obscuring and which is obscured, or what the obscured object looks like. Accurate reproduction of invisible parts of the obstructed object is the basis for seeing their order inwards.

Considering this issue, Rob van Lier, Peter van der Helm and Emanuel Leeuwenberg (1994) conducted an interesting analysis verifying the accuracy of the basic principles of perception formulated by Gestalt psychologists. One of them is the principle of good continuation, according to which the shape of the obscured part of the figure is defined by extending the visible contour lines of this figure in their direction.

A condition for good continuation is therefore a local analysis of the most likely contact points of two figures identified, for instance, on the basis of “T-intersections.”

Seeing the two figures on the left in Fig. 158 A, people generally have no doubt that the rectangle obscures the square (solution: b), and it does not contact with an irregular figure through two edges (solution: a). Considering this scene in three dimensions, we are more likely to think that the rectangle is closer to us than the square, and not that the rectangle and the irregular figure lie in the same plane, touching each other with the edges.

Figure 158. Preferred interpretation of the relationship between the two figures, based on the principle of good continuation – solution (b), B – preferred interpretation of the relationship between the two figures based on the principle of similarity and regularity – solution (a) rather than good continuation – solution (b) and C – the preferred interpretation of the relationship between the two figures based on the principle of continuation – solution (b) rather than similarity and regularity – solution (a). Graphic design: P.F. based on van Lier, van der Helm and Leeuwenberg (1994)

In contrast to the concepts emphasising the local analysis of the intersection of the intersecting surfaces of two figures, Gestalt theorists also stress the importance of global principles of figure perception, such as similarity and regularity, e.g. symmetry. As an example of the implementation of these perceptual principles in discovering the obscured part of one of the figures, Rob van Lier et al. (1994) present the illustration in Fig. 158 B. It turns out that people generally think that they see a cross and a square which, lying in the same plane, are in contact with each other with two edges (solution: a) rather than that the square obscures the irregular figure (solution: b). It is worth noting that the solution (b) is based on the principle of good continuation, but the interpretation this time is determined by the similarity and symmetry of the figure, which is potentially obscured.

It turns out, however, that the principles of regularity and similarity can also not be accepted as a satisfactory explanation of the adopted interpretation regarding the likely shape of the covered part of the figure. The third example perfectly illustrates this difficulty, which is cited by Rob van Lier et al. (1994). Looking at the figures from the left in Fig. 158 C, people generally prefer solution (b), which is based on the principle of good continuity, than solution (a), referring to the principles of similarity and regularity. It turns out that in this example it is easier to see two less regular and more complex figures that lie one on top of the other than two regular and much simpler figures lying next to each other. To sum up, neither the local nor global principles of figure perception formulated by Gestalt psychologists can be considered sufficient to explain the decisions regarding the preferred shape of the obscured figure.

Perceptual complexity vs interpretation memory complexity

Rob van Lier et al. (1994) suggested that perhaps a solution lies in the distinction between perceptual complexity and memory complexity of interpretation. The perceptual complexity of interpretation refers to the process of reaching the preferred solution to the problem of relations between figures, while the memory complexity of interpretation refers only to the final state, which is precisely the solution that is preferred. The process of determining whether two figures overlap and testing the rules that justify this solution can be more or less complicated. Similarly, the solution itself may be more or less complex, but there are indications that simpler (more economical) solutions are preferred above all, regardless of whether the path to them was based on simple or complex principles (Hatfield and Epstein, 1985).

Van Lier et al. (1994) focused mainly on the analysis of perceptual complexity. However, from the point of view of the problem of interposition, understood as an indicator of depth, the issue of the memory complexity of interpretation of the relationship between the two figures seems much more interesting.

First, however, van Lier et al. (1994) introduced this concept, distinguishing it from the concept of perceptual complexity, but they stopped at its general definition. This concept refers to the concepts of Gary Hatfield and William Epstein (1985) and the research of Fred Attneave (1954) on the relationship between complexity/simplicity and probability and redundancy, i.e. the repetition of certain perceptual patterns.

The complexity/simplicity of the visual interpretation of the visual scene reflects the likelihood or repetition of a particular solution. This means that the preferred solutions for the relationship between the figures shown in Fig. 158 are a manifestation of a simpler, i.e. more likely memory interpretation, referring to previous experiences with similar figures. In this context, it is worth commenting on the preference of solution (b) in Fig. 158 C. Instead of referring to the results of complex analyses of the perceptual complexity of both figures and their potential relationships, it is enough to pay attention to the fact that the probability of the visual scene, in which the shapes of the two figures perfectly match each other, as in solution (a), is much lower than the probability of the scene, in which two more complicated shapes just cover each other.

Secondly, it is also worth paying attention to the fact that most of the experiments whose subject is the study of predicting the shape of obscured parts of figures, is based on the analysis of two-dimensional figures, sometimes known, like a square, triangle or circle, and sometimes less known, invented solely for the study. In natural life situations, and especially when viewing images, we usually deal with known objects (perhaps except for abstract paintings). This means that the problem of restoring invisible parts of objects should be considered in the context of the concept of shape constancy. According to this principle, a partially obscured object is still the same object and it can be expected that those parts that are visually accessible are sufficient to induce their complete representation in memory. This hypothesis is supported by the results of studies in which it is stated that when viewing partially obscured objects, exactly the same brain structures on the ventral path are activated, especially in the area of ​​the inferior temporal gyrus, as when viewing the complete version (see e.g., Kovács, Sáry, Köteles, Chadaide et al., 2003; Kovács, Vogels and Orban, 1995; Missal, Vogels and Orban, 1997).

Fidias certainly knew about it, but did René Magritte?

Watching the reliefs on the frieze of the Parthenon, we have no doubt that, when carving them, Fidias knew exactly what interposition was and how to use it to depict the inwards illusion in a bas-relief metope (Fig. 159). Although, with the exception of the foreground characters, all the others are only signalled by larger or smaller fragments, the image is fully understandable and logical. Most likely, this is because the visible fragments are sufficient to elicit their complete representations in the memory of observers. Interposition, therefore, appears as the result of a discrepancy between the complete representation of the object induced in memory based on its part, and its incomplete image projected on the retina. This system is interpreted by the perceptual system as an indicator of depth.

Depicting the depth on a flat image using interposition is one of the most obvious and most commonly used techniques in visual arts.

Figure 159. Fidias, Riders (approx. 440 B.C.) Fragment of the northern frieze of the Parthenon, Athens, Greece. The British Museum, London, United Kingdom.

For modern painters, however, interposition is also an excuse to experiment. René Magritte painted the Amazon, breaking the rule of interposition and making it an unreal phenomenon of the world of dreams (Fig. 160). The Amazon penetrates the third dimension in the same way that the needle of a careless seamstress pierces the canvas, regardless of the order of the vertically arranged threads. As a result, some of them undergo unnatural bends. And although this is not a major obstacle to correctly read the image, nevertheless such a representation of the visual scene forces the viewer to reflect on the nature of space, and especially its dimension inwards.

Figure 160. René Magritte, La Carte Blanche (1965). National Gallery of Art, Washingon, DC, USA [81 x 65 cm]

The eminent Italian psychologist and artist, Gaetano Kanizsa (1985) also drew attention to the painting by Magritte and in an article devoted to the relationship between vision and thinking, he published his, to put it mildly, somewhat coarse version of La Carte Blanche (Fig. 161), noting in it an analogy to the curvatures we know – as he said – from the views of woven baskets.

Figure 161. Photo montage of Gaetano Kanizsa, inspired by Magritte’s painting. Graphic design: P.F. based on Kanizsa (1985)