While previous research has shown that during mental imagery participants look back to areas visited during encoding it is unclear what happens when information presented during encoding is incongruent. To investigate this research question, we presented 30 participants with incongruent audio-visual associations (e.g. the image of a car paired with the sound of a cat) and later asked them to create a congruent mental representation based on the auditory cue (e.g. to create a mental representation of a cat while hearing the sound of a cat). The results revealed that participants spent more time in the areas where they previously saw the object and that incongruent audio-visual information during encoding did not appear to interfere with the generation and maintenance of mental images. This finding suggests that eye movements can be flexibly employed during mental imagery depending on the demands of the task.