Other work has demonstrated that processing costs interface with congruity (Manfredi et al., Reference Manfredi, Cohn, De Arajo Andreoli and Boggio2018), so as long as the multimodal combinations are congruous, they are processed easier. The data and analyses have been made available in an online data repository ( This trend in general may suggest that a more overt cue is not beneficial when combined with other techniques; possibly, such integration is more costly. Finally, by directly comparing inferential techniques, we sought to examine whether their proposed underlying features indeed function as psychological constructs. However, a trend arose that viewing times seemed to slightly increase for original panels and action stars when combined with sound effects and decreased for echoic onlookers and metaphors with sound effects. Picture perfect peaks: comprehension of inferential Department of Communication and Cognition, Tilburg School of Humanities and Digital Sciences, Tilburg University, Tilburg, The Netherlands, Reference Magliano, Larson, Higgs and Loschky, Reference Lerdahl, Jackendoff and Slawson, Reference Kuperberg, Paczynski and Ditman, Reference Zacks, Speer, Swallow, Braver and Reynolds, Reference Loschky, Larson, Smith and Magliano, Reference Coderre, ODonnell, ORourke and Cohn, Reference Huff, Rosenfelder, Oberbeck, Merkt, Papenmeier and Meitz, Reference Ojha, Ervas, Gola and Indurkhya, Reference Ortiz, Grima Murcia and Fernandez, Reference de Vries, Reijnierse and Willems, Reference Henninger, Shevchenko, Mertens, Kieslich and Hilbig, Reference Manfredi, Cohn, De Arajo Andreoli and Boggio, Reference Myers, Cook, Kambe, Mason and OBrien, Processing unfamiliar metaphors in a self-paced reading task, The time course of predictive inferences depends on contextual constraints, Inferences about predictable events: Eye movements during reading, Predictability modulates neurocognitive semantic processing of non-verbal narratives, The visual language of comics: Introduction to the structure and cognition of sequential images, The architecture of visual narrative comprehension: The interaction of narrative structure and page layout in understanding comics, Youre a good structure, Charlie Brown: The distribution of narrative categories in comic strips, A multimodal parallel architecture: A cognitive framework for multimodal interactions, Being explicit about the implicit: Inference generating techniques in visual narrative, Who understands comics? 1b, which shows a common inferential technique to substitute for the main action. There was no main effect for Modality, nor an interaction (all p>0.576). Such semantic cues may potentially mediate comprehension through the explicitness of inferential techniques. Even though blank panels contained no visual information at all, action star panels were viewed faster than blank panels. 2 shows the viewing times for the critical panel and critical panel +1 for all six sequence types at both panel positions. Specifically, action stars were rated more comprehensible than echoic onlookers and metaphors, even though action stars remain the least explicit, giving more of an opportunity for readers to fill in the meaning. The text required longer viewing times than the visuals, suggesting that switching modalities may require more effort than unimodal sequences. At the Peak panel, the differences in viewing times between inferential techniques did not necessarily indicate variance in inference generation. For comprehensibility ratings, there were no significant correlations. Only when we begin to grasp this deep otherness can we be sure we are no longer projecting ourselves onto plants. Therefore, this study examines to what extent processing differs across conventionalized inferential techniques. Experiment 2 combined inferential techniques to investigate the effect on inferential processing and comprehension, and the influence of the (combined) features. Thus, while panels with onomatopoeia are multimodal, sound effects may not incur as much a cost of switching modalities as other text. This experiment compared the self-paced viewing times of inferential techniques in visual narratives. The sample consisted of 70 participants with a mean age of 29.73years (SD=10.98, range: 1761, 32 male, 35 female, 3 other). As in Experiment 1, all panels following an inferential technique required more time than panels following the original Peaks. The panels following metaphors also correlated with comic reading expertise, such that more fluent comic readers spent more time on them. Viewing times were then analyzed at the third panel position, the critical Peak panel. Last, in these results, [blend] also affected the subsequent panel and ratings, but consistently negative rather than a reverse effect across panels. hasContentIssue true, Experiment 1: comparing inferential techniques, Experiment 2: comparing combinations of techniques, This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (, The Author(s), 2022. Such fixation zoom panels were easier to understand if they included informative cues of a scene rather than uninformative cues. Therefore, the final analysis included only the features [blend], [framing], and [explicit]. Overview of viewing times at the critical Peak panel and subsequent panel for all six sequence types; the error bars represent standard errors. Overview of t-values and p-values of each feature per dependent variable. While various studies have investigated the processing of visual narratives when events are fully missing (Hutson et al., Reference Hutson, Magliano and Loschky2018; Magliano et al., Reference Magliano, Larson, Higgs and Loschky2016, Reference Magliano, Kopp, Higgs and Rapp2017), comparison of inferential techniques remains limited. Because this is such a quintessential example of growth, its no wonder it can be used as a growth metaphor. Beta-weights from a regression examining the influence of different features on the viewing times and self-rated comprehension of the sequence. There, an omitted event motivated a more intense search for cues at the final panel to facilitate generating the inference. One of the most prominent features affecting inferences and comprehensibility was explicitness, and this feature also underlies the onomatopoeia, an inferential technique readily combinable with other panels. For all 30 strips, a cloze probability score (see Coderre et al., Reference Coderre, ODonnell, ORourke and Cohn2020) and inference assessment score (see Cohn, Reference Cohn2021) were measured in rating studies. In Experiment 2, the [explicit] effect was even more overt, stretching the pattern to the Peak as well. At the subsequent panel, the explicitness of onomatopoeias should lead to shorter viewing times, further facilitating the bridging inferences, and thus leading to higher comprehensibility ratings. While the processing of some inferential techniques has been explored, little research has compared their comprehension. 1f illustrates a metaphor, which reflects the event with an abstract depiction of a comparable event. Likewise, with six conditions, we required F-values of above 2.23. Thus, one could interpret action stars as merely signals to draw an inference, and comprehensibility ratings could inform how easy readers inferred those events. Experiment 2 used the same 30 strips from Experiment 1 along with two additional strips to sufficiently counterbalance the number of strips per condition. [Text] indicates the presence of text. In these contrasts, the features now predicted an even more recognizable trade-off, where more explicit cues at the Peak facilitate viewing times at the subsequent panel and ultimately higher comprehensibility ratings, as in previous studies (Cohn & Kutas, Reference Cohn and Kutas2015; Cohn & Paczynski, Reference Cohn and Paczynski2013). Thus, comprehensibility may not always align with the incremental panel-to-panel processing. These features were then also correlated against one another to test their relationship, and appeared as valid predictors with a shared variance of .25 at most. While Einstein might disagree on a metaphysical level, as far as we humans are concerned, time is a constant. VLFI scores were correlated with the difference between the viewing times of inferential sequence types and those of the original sequence, to see an influence of comic expertise. [Framing] and [blend] both predict low scores, concurring with low ratings for onlookers and metaphors. There was no main effect, F(3, 280)=0.48, p=0.698, suggesting no differences between sequence types. Experiment 1 showed differences in processing and comprehensibility between inferential techniques. Moreover, all multimodal panels were processed longer than the unimodal versions, but this did not create differences at the subsequent panel. Table 2. Rather, this non-sequitur information needs to be resolved with the sequences events in order to remain congruent. Despite these similarities, metonymies appear less complex than metaphors, and are comprehended easier (Rundblad & Annaz, Reference Rundblad and Annaz2010). At the critical Peak panel, VLFI scores correlated with the differences between original events with multimodal action stars and with unimodal metaphors (respectively, r(68)=0.24, p=0.040, and r(68)=0.24, p=0.041). Rather, the most salient difference between techniques was the faster viewing times for action star and onomatopoeia panels. At the subsequent panel, onlooker panels evoked brain responses different from explicit depictions in a way that suggested the possibility of working memory processes involved in inference. [Explicit] techniques depict or describe aspects of the actual event. Still, metaphoric images from advertisements require more processing costs than literal advertisement images (Ortiz et al., Reference Ortiz, Grima Murcia and Fernandez2017). However, no relations emerged between these ratings and the inference assessment scores. As in Experiment 1 and previous work (Cohn & Kutas, Reference Cohn and Kutas2015), the original event panels were rated as the most comprehensible. Analysis of inferential features suggested that the explicitness of the inferential technique led to greater demand in processing, which later facilitated inference generation and comprehensibility. Bridging inferences reflect such an update to the situation model when meaning is missing and needs to be filled in. $ {R}_{\mathrm{Adjusted}}^2 $ The data and analyses are accessible in an online data repository ( The more incongruent the incoming stimuli is, the greater the update that is required (Huff et al., Reference Huff, Meitz and Papenmeier2014; Magliano & Zacks, Reference Magliano and Zacks2011). View all Google Scholar citations First, we analyzed viewing times in a 2 (Position: critical panel and critical panel +1)6 (Sequence Type: original event panel, action star, onomatopoeia, echoic onlooker, metonymic selective framing, and metaphor) factorial analysis of variance (ANOVA), to explore whether the position and the type of Peak panel affect viewing times. It is snow that can awaken memories of things more wonderful than anything you ever knew or dreamed. The idea that youre living on borrowed time means that you only have a finite amount of days in your life, so you better make the most of it. Both SPECT and PINS establish that each consecutive panel is integrated in the existing situation model via updating processes. During the following experimental part, the stimuli were presented in a self-paced viewing set-up via Qualtrics, using the lab.js JavaScript plugin (Henninger et al., Reference Henninger, Shevchenko, Mertens, Kieslich and Hilbig2022). Visual metaphors broadly have only recently begun receiving empirical attention, and many studies focus on a comparison to verbal metaphors (Ojha et al., Reference Ojha, Ervas, Gola and Indurkhya2019), rather than to other visual techniques. At the critical panel +1, VLFI scores correlated with the difference between original event panels and action stars in both versions (multimodal: r(68)=0.30, p=0.012; unimodal: r(68)=0.31, p=0.008), and metaphors in both versions (multimodal: r(68)=0.25, p=0.035; unimodal: r(68)=0.28, p=0.020). Action stars are rated relatively high, most likely due to being a familiar part of the visual lexicon of comics (Cohn, Reference Cohn2021; Cohn & Wittenberg, Reference Cohn and Wittenberg2015). 2=0.08 Sequences with original event panels were rated most comprehensible (all p<0.001), then sequences with onomatopoeias, which were rated higher than those with other inferential techniques (all p<0.021). The definition of a metaphor is (loosely) a figure of speech that suggests an analogy between objects or ideas. All participants gave their informed written consent according to protocols approved by the Tilburg University School of Humanities and Digital Sciences Research Ethics and Data Management Committee. There was also a main effect of sequence type, F(3, 1,104)=11.83, p<0.001, partial2=0.03. As reinforced by the ratings, the onomatopoeia could be easily deleted with no consequences for the comprehensibility of the sequence. One possibility is that, while differences may persist between techniques, they may be motivated by the features (as in Table 1) used to describe their abstract similarities and differences (Cohn, Reference Cohn2019). For the inference assessment score, 49 participants (39 female; mean age: 21.3, range: 1835; mean VLFI: 11.3, range: 1.538.5) viewed the same 30 strips, with the Peak omitted (always the third panel). The faster viewing times made to action stars and onomatopoeias specifically seemed related to visual complexity of the panels themselves (Cohn, Reference Cohn2021; Cohn & Wittenberg, Reference Cohn and Wittenberg2015).