The current study investigates a complete course of action for the joint construction of meaning and the way mimicked gestures are used along with speech to accomplish the joint action in Mandarin Chinese conversation. The domain of analysis is a stretch of talk that encompasses the beginning till the end of the joint action during which similar gestures are produced by different speakers across turns. Within the stretch of talk, the beginning of the joint action is the ‘presentation phase’ during which a speaker presents meaning. A variety of situations were found to prompt another participant to jointly create meaning. The end of the joint action is the ‘completion phase’, during which the new meaning is recognized and the collaboration ends. In between is the ‘collaboration phase’ during which the joint action starts and develops with the use of cross-modal resources. In conversation, one way to accomplish the joint action is by the use of gestural repetition with slight modification as in a discussion about size. For other types of semantic information, the involvement of speech and gesture is more frequent, in that the second speaker mimics the gesture of the previous speaker to form a semantic foundation shared by the participants; and the second speaker conveys new meaning with a new lexical expression on the basis of the semantic common ground. The use of cross-modal resources thus facilitates the simultaneous realization of shared knowledge in gesture and new meaning in speech within a clausal unit.