In our daily lives, people integrate various kinds of cues from different modalities to perceive events or objects. These modalities represent the properties and positions of objects by adopting different reference frameworks. Moreover, the reliability of each cue varies as environment changes. However, human brains could integrate these cues effectively to perceive objects exactly. We highlight some of the models that underlie the integration of the modalities in human brain and mainly summarize the statistic optimization model based on the theory of Bayesian. The related research in the future could be studied by combining the virtual reality and neuroimaging techniques to investigate the mechanisms of multisensory cues integration.