Although visual virtual reality performance was widely investigated by researchers, it remains unclear that whether semantic reliability of sound can also facilitate the visual virtual reality performance. We investigated the behavioral category performance of living and non-living outlined drawings accompany with a semantic reliable or unreliable sound. We evaluate the living and non-living picture category performance under three multisensory semantic conditions: semantic reliable, semantic unreliable and semantic control (mix with semantic reliable and semantic unreliable stimuli). The reaction time results showed that faster category speed for semantic reliable multisensory presentation condition compared with semantic unreliable multisensory presentation condition whatever the outlined drawing was living or non-living. Such result indicated the multisensory integration facilitate the visual category performance. Additionally, non-living category has a significant advantage than living picture category under semantic unreliable multisensory condition, indicated nonliving objects had robust multisensory representation compared with living objects. This study provided potential theory support for developing new audiovisual virtual reality device or application.