Understanding Multimodal Composition