MMArt-ACM '21: Proceedings of the 2021 International Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia 2021
MMArt-ACM '21: Proceedings of the 2021 International Joint Workshop on Multimedia Artworks Analysis
and Attractiveness Computing in Multimedia 2021
SESSION: Invited Talk
Session details: Invited Talk
- Min-Chun Hu
Automatic Music Composition with Transformers
- Yi-Hsuan Yang
In this talk, I will first give a brief overview of recent deep learning-based approaches
for automatic music generation in the symbolic domain. I will then talk about our
own research that employs self-attention based architectures, a.k.a. Transformers,
for symbolic music generation. A naive approach with Transformers would treat music
as a sequence of text-like tokens. But, our research demonstrates that Transformers
can generate higher-quality music when music is not treated simply as text. In particular,
our Pop Music Transformer model, published at ACM Multimedia 2020, employs a novel
beat-based representation of music that informs self-attention models with the bar-beat
metrical structure present in music. This approach greatly improves the rhythmic structure
of the generated music. A more recent model we published at AAAI 2021, named the Compound
Word Transformer, exploits the fact that a musical note is associated with multiple
attributes such as pitch, duration and velocity. Instead of predicting tokens corresponding
to these different attributes one-by-one at inference time, the Compound Word Transformer
predicts them altogether jointly, greatly reducing the sequence length needed to model
a full-length song and also making it easier to model the dependency among these attributes.
SESSION: Main Session
Session details: Main Session
- Kensuke Tobitani
Color-Grayscale-Pair Image Sentiment Dataset and Its Application to Sentiment-Driven
Image Color Conversion
- Atsushi Takada
- Xueting Wang
- Toshihiko Yamasaki
In this study, we focused on the fact that the color information of an image has a
significant effect on the emotion recalled, and we created a dataset with discrete
and continuous emotion labels for color and grayscale image pairs, which are not present
in existing emotion image datasets. As a result of the analysis of the continuous-valued
emotion labels, we found that the color images evoked a wider range of positive and
negative emotions. We also conducted experiments to transform the color of the image
to the target emotion using the emotion label and the image as input and succeeded
in changing the color of the entire image according to the input emotion value. Another
experiment is conducted to convert the emotion category and we confirmed that the
image was generated in such a way that the evoked emotion was changed compared to
the original image.
Ketchup GAN: A New Dataset for Realistic Synthesis of Letters on Food
- Gibran Benitez-Garcia
- Keiji Yanai
This paper introduces a new dataset for the realistic synthesis of letters on food.
Specifically, the "Ketchup GAN" dataset consists of real-world images of egg omelettes
decorated with ketchup letters. Our dataset contains sufficient size and variety to
train and evaluate deep learning-based generative models. In addition, we generate
a synthetic ketchup-free set, which enables us to train paired-based generative adversarial
networks (GAN). The ketchup GAN dataset comprises more than two thousand images of
omelette dishes collected from Twitter. Automatically generated segmentation masks
of egg and ketchup are also provided as part of the dataset. Thus, we can evaluate
generative models based on segmentation inputs as well. With our dataset, two state-of-the-art
GAN models (Pix2Pix and SPADE) are reviewed on photorealistic ketchup letter synthesis.
We finally present an automatic application of omelette decoration with ketchup text
input from users. The dataset and more details are publicly available at https://mm.cs.uec.ac.jp/omrice/.
Estimating Groups of Featured Characters in Comics with Sequence of Characters' Appearance
- Kodai Imaizumi
- Ryosuke Yamanishi
- Yoko Nishihara
- Takahiro Ozawa
This paper proposes a method to estimate a group of featured characters during an
arbitrary period in comics. Comics have so many aspects of attractiveness such as
illustrations and quotes. The storyline, which is one of the attractive of comics,
enables us to enjoy the dramas in comics. In comics, the story is driven by characters'
activities represented in multimedia forms: emotion, speech, and some other actions.
This paper, as a first step to recognize the storyline, tackles the estimation of
a group of featured characters in a given period. To compute the storyline of comics
with the facility, the proposed method uses a sequence of characters' appearances
for each page. The experiment showed that the proposed method outperformed the comparative
methods in estimating groups of featured characters with a 0.82 F-value on average
while comparative methods showed 0.67 and 0.48. The results showed that sequences
of characters' appearance, which are relatively easy to obtain, were sufficient to
catch the brief storyline i.e. featured characters in a story.