The perception of speech sounds is affected by neighboring speech or nonspeech context. Several theories have been put forward to explain the context effects. The gesture theory suggests that context effects reflect perceptual compensation of coarticulation, and speech perception implicitly takes into account articulatory dynamics of speech production. The auditory theory suggests that context effects reflect a kind of spectral contrast effects, and are just the result of general auditory processes. With Chinese stop-vowel-stop sequences as materials, the aim of the present study was to test the explanatory power of several theories for context effects, and explored the internal mechanisms of context effects in Chinese stop-vowel-stop sequences. In experiment 1, synthetic /pa/, /pi/, and /pu/ served as context sounds, and /ta/-/ka/ contrast continuum served as target sounds. Thirty Chinese undergraduate listeners were asked to identify target sounds. According to acoustic cues and places of articulation of three syllables, auditory theory and gesture theory made different predictions about context effects. The results showed that there were significantly more /ta/ responses in /pi/ context, and more /ka/ responses in /pa/ and /pu/ context. The results were partly consistent with the prediction of spectral contrast effects, but the gesture theory failed to predict the effect pattern. Experiment 2 adopted tone analogues of F2 trajectories (the crucial acoustic-cue variations among three context syllables) of /pa/, /pi/, and /pu/ as context sounds. Twenty-six Chinese undergraduate listeners participated in the experiment. The results found that F2 analogs of /pi/ led to more /ta/ responses, and F2 analogs of /pa/ and /pu/ produced more /ka/ responses. The overall result similarity between experiment 1 and 2 showed that the differences of context effects among three syllables mainly resulted from the critical acoustic cue differences between three syllables. The results also found that acoustic cues (F2 analogs of /pa/ and /pu/) far from the critical frequency areas of the /ta/-/ka/ series could influence /ta/-/ka/ continuum identification. Although overall similarity, there were still some differences between experiment 1 and 2. Experiment 3 used sine-wave speech (SWS) of /pa/, /pi/, and /pu/ as context sounds. Sine-wave speech modeled all formant trajectories of speech sounds. The aim of experiment 3 was to test whether the differences between experiment 1 and 2 were due to the effects of other acoustic cues (F1, F3 and F4). Twenty-one Chinese listeners served as participants. The results showed that /pa/-SWS context produced the most “ga” responses, the second was /pu/-SWS context, /pi/-SWS context produced the least “ga” responses. The results were similar to those of experiment 2. So the result differences between experiment 1 and 2 were not due to the effects of other acoustic cues, and more likely due to the phonetic category perception of three syllables. These results indicated that context effect differences between stop-vowel syllables mainly originated from differences of critical acoustic cues, and provided support for the auditory-based explanation of context effects. Gesture theory failed to anticipate the effect pattern of three syllables. The present results also expanded the range of auditory interaction types. In addition to spectral contrast effects, acoustic cues in context sounds far from critical frequency channels can also promote specific phonetic category identification.