The modality effect in multimedia learning showed that learning outcomes were enhanced if words were presented in an audial format with pictures, rather than a visual format (see Mayer, 2009). However, research on the reverse modality effect indicated a totally opposite result (e.g. Crooks, Cheon, Inan, Ari, & Flores, 2012; Tabbers, Martens, & van Merriënboer, 2004). A meta-analysis based on 91 empirical studies was conducted to investigate the effect of text modality on both retention and transfer tests. After pooling data preliminarily, ninety-four independent effect sizes (8088 participants) were finally included in retention-related meta-analysis while 83 independent effect sizes (6664 participants) in transfer-related meta-analysis. The results suggested that participants who learned from narration outperformed those who learned from visual text both on retention test (dretention = 0.24) and transfer test (dtransfer = 0.25) with different effect sizes from Ginns’s (2005). Further moderator analyses indicated that modality effect on learning outcomes was significantly moderated by the pace of presentation, dynamism of pictures and duration of learning materials. Specifically, the modality effect occurred mainly in conditions of system-paced presentation (dretention = 0.43, dtransfer = 0.44), dynamic pictures (dretention = 0.50, dtransfer = 0.59) and short learning materials (dretention = 0.38, dtransfer = 0.33). All of the results didn’t reveal a reverse modality effect. The replicated strong modality effect suggested that performance on recall and comprehension tasks was better when words and pictures were presented in a dual modality, rather than a single one, which supported Mayer’s Cognitive Theory of Multimedia Learning (CTML). Moreover, the pace of presentation, dynamism of pictures and duration of learning materials should be considered as vital boundary conditions of modality effect.