Ryoto Ishizuka

This is a portfolio written in English.


Hi, I'm zuka.

I'm interested in Automatic Music Transcription. Also, trying to let musical patterns "humanization" through machine learning. I also like skiing, playing the piano, taking pictures and drinking strong coffee.

Anyway, thank you for seeing my portfolio. In the following, I will introduce my background, research interests, worked projects, extracurricular activities, and other activities. In the end, I will show you some tips through beautiful pictures of Kyoto. If anything happens, please contact me through an e-mail or SNS.


Academic Background

  • B.E. in Cognitive and Information Sciences at Kyoto University
    [2015.4 – 2019.3]
  • VNUK in Danang University International Business Management
    [2018.9 – 2018.12]
  • M.E. in Intelligence Science and Technology at Kyoto University
    [2019.4 –2021.3]

Work Experience (including Internships)

  • Nomura research institute, IT solution course
    Multi cloud integration division
  • Sony Corporation, R&D center
    Speech information processing technology department
    [2019.8 – 2019.9]
  • Recruit Holdings,
    Media & solutions project
  • OngaACCEL office assistant
    [2019.10 - 2021.3]


  • IT
    • Fundamental information technology engineer [2019.5]
    • Applied information technology engineer [2020.10]
  • Ski
    • Crown prize [2017.3]
    • Associate Ski Instructor [2020.2]



  • Python:★★★★☆
  • Java:★★★☆☆
    • Apache / Tomcat
    • Seasar2 / SAStruts



  • JavaScript:★★★☆☆
    • node.js
  • C++:★★☆☆☆


  • Academic
    • The 82th of National Convention of Information Processing Society of Japan (IPSJ) [2020.3]
      • Student award
    • The 129th of IPSJ Special Interest Group on Music and Computer [2020.11]
      • Student award
    • The 82th of National Convention of Information Processing Society of Japan (IPSJ) [2020.3]
      • Student award (co-author)
  • Ski
    • The 7the of summer ski student convention [2017.8]
      • 1st place (individual PISLAB)
      • 6th place (team)
    • The 40th of ski student convention in Kansai region [2018.3]
      • 1st place (men individual)
      • 1st place (men team)
    • The 26th of international ski convention in Kyoto [2019.1]
      • 5th place (men individual)
    • The 46th of international Iwatake ski student convention [2019.2]
      • 5th place (men team)
    • The 40th of Hakuba Goryu ski student convention [2020.2]
      • 3rd place (men individual)


International Journals

  1. Ryoto Ishizuka, Ryo Nishikimi, and Kazuyoshi Yoshii:
    Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms.
    Signals (ISSN 2624-6120), August 2021.
    [DOI] [arxiv]

International Conferences

  1. Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, and Kazuyoshi Yoshii:
    Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training.
    Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), December 2020.
    [pdf] [arxiv] [slide] [video]
  2. Takehisa Oyama, Ryoto Ishizuka, and Kazuyoshi Yoshii:
    Phase-Aware Joint Beat and Downbeat Estimation Based on Periodicity of Metrical Structure.
    International Society for Music Information Retrieval Conference (ISMIR), November 2021.
  3. Moyu Terao, Yuki Hiramatsu, Ryoto Ishizuka, Yiming Wu, and Kazuyoshi Yoshii:
    Difficulty-Aware Neural Band-to-Piano Score Arrangement Based on Note- and Statistic-Level Criteria.
    IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), January 2022.


Automatic Music Transcription

It has been a long time since artificial intelligence can "recognize" and "make" arts comparable to humans. We call this kind of process including these two approaches as Music Information Processing. In terms of music recognition, transcription is the main task while it is the most difficult problem. If we can achieve enough performance to transcribe scores from music, the system would be utilized for helping to musical practice or composition. On the other hand, music generation is also primarily researched in the context of making musics. However, it is hard to evaluate generated music due to the fact that there is no clear criterion for defining what is "good" music. Even now, and maybe in the future also, it depends on subjective evaluation experiments to judge generated music. In my work, music recognition and generation are collaborated and interact with each other within the framework as an acoustic model and language model. The former represents how good output is in the input feature while the latter show how good the output is in musical structure. These two models have not been investigated together for a long time although each part is rapidly developed. Therefore, it is really meaningful to integrate the acoustic model and language model, and it will bring on other perspectives in Music Information Processing.

Sound Source Separation

We talk to each other with various kinds of background noise. It is famous for people to perceive our name even when it is a noisy environment. This is called the "Cocktail-Party Effect". Similarly, music usually consists of many instruments like the piano, drum, and bass, etc. In those situations, sound source separation is used for recognizing desired objectives. Non-negative Matrix Factorization (NMF) is commonly used for sound source separation. This is due to the mathematical features of NMF: sparse and low rank. The separated component should be sparse. Also, especially in music, objectives are low rank because of the repeated structure. However, NMF has some problems like initial value dependence and performance. In order to tackle these matters, lots of derivative models are suggested. Recently, more and more researchers are using deep learning for sound source separation. Therefore, we need to prepare enough dataset for those tasks.

Data Hiding

Data hiding is a process to hide data behind cover data. It is mainly divided into 2 types: Digital Watermarking and Steganography. We usually use watermarking to protect copyright or something like metadata including author, publisher and Release date, etc. It is crucial for modeling the watermarking system not to be perceived the watermarking by users. That is, the media embedded metadata is conceived to be important in Digital Watermarking. This is because we assume that watermarking is mainly used in the music market. On the other hand, steganography is utilized for hiding secret information into cover data. Therefore, it is required not to read the secret information. In that situation, we can read the secret information when we use a secret key that is used for embeddings. In order to protect the secret information from aggressors, we should construct a system where the aggressors cannot understand the information, even if they can perceive the information.


Demo Ski

Both men and women won the championship in the Kansai region. This is the first-ever achievement in our history. I have been belonging to the demo ski team for 5 years and played the role of captain.

We have over 50 members and our dream was to become the strongest team in 48 years history. In the end, we achieved the goal: we won the championship in the Kansai region and reached a podium in the national convention.


I have managed some web medium for my academic output or individual branding. "Beginaid" is the main media and over 20000 people visit this web media every month. This title is made of two words: "Beginners" and "Band-aid". At first, I wanted to help lonely beginners on the Internet. That is, there is much information on the Internet while the information lacks a source of a quote or "reshape" the original information for their sake.

On the other hand, in some cases, we cannot find the data we want to get. In other words, beginners are poor at solving their problems on the Internet because of the lack of plenty or information on the Web site. In order to help those beginners: to let them the strongest band-aid, I write accurate and unique articles. This media has over 650 articles which are all made by my work.

Cram School

I have worked at a cram school for 4 years. This is because my dream was to be a teacher. Through studying, I wanted to let students know how fun to know gain knowledge and how important to be independent in all areas of life. Along with this policy, we managed the cram school to be a "3rd place" for students.

The 1st place is certainly family. Then the 2nd place in school. Generally speaking, we have these two places in my childhood. However, in some cases, it is not enough to gratify their curiosity or let children get independency. Therefore, our cram school seeks to become a 3rd place which is neither family nor school. We should play a role in what the two places cannot do.

Exchange program in Vietnam

I went to DaNang (Vietnam) for 4 months as an exchange student. I belonged to the International Business Management course and learned how developing countries like Vietnam should go along with developed countries. There are various kinds of classes such as "Leadership", "Cross-cultural management" and "Contemporary Issues of International Business". In those classes, I had learned about what is leadership, what brings us a "good" leader, how should we construct relationships among leaders and members, what is culture, what is the difference between capitalism and socialism, how can we use design thinking in the context of business, and how is the difference between emerging countries and developed countries.

Also, I had a really important experience there. Actually, I went to DaNang for the first time from Kyoto University. Therefore, I had a hard time living there. Then, many friends helped me to spend time in DaNang and even took me to enjoy sightseeing. At first, I couldn’t understand why they welcomed me so much. I was just one exchange student from Japan. I wondered they didn’t have to devote so much time to hang out with me. However, the more I knew about Vietnam, the more I got it. That is, all Vietnamese think plenty of families. In the same way, they welcomed me so much. I feel as if I were a member of your families. Really honored about it.


Digital contents

Alexa Skill Award 2019

PRML exercises

Pseudo EternityII