1. Overview of multi-model learning
Modalities in multi-modal learning




2. Multi-modal tasks
2-1. Text embedding
- characters are hard to use in machine learning
- Map to dense vectors
- Surprisingly, generalization power is obtained by learning dense representation

Word2vec - skip-gram model
- Trained to learn $W$ and $W^{\prime}$
- Rows in $W$represent word embedding vectors
- Learning to predict neighboring N words for understanding relationships between words

2-2. Joint Embedding

Application - Image tagging
- Tags → image, image → tags
- Combining pre-trained unimodal models