The study investigates the consistency and divergence between language and gesture in the expression of spatial orientations in the metaphorical conceptualization of sequence time, and the influence of the diversity in the reading and writing practices used in Taiwan on the spatialization of earlier and later events across modalities. The study was based on Chinese conversational data in face-to-face communication. The spontaneous gestures occurring along with speech reveal real-time metaphorical conceptualization in the context of use. It was found that the spatial orientations that are consistent between the two modalities bear out the online activation of the universal front-back and the culture-specific up-down concepts in the source domains. When speech and gesture are not redundant, the divergence reflects the more complex temporal spatialization involving two timelines or different orientations on the same timeline. The most preferred cross-modal combination of two timelines is the co-occurrence of lateral gestures and the front-back spatial words. Finally, the two different directions in which Chinese characters can be read and written were found to affect people’s conceptualization of the earlier or later event as being rightward or leftward.