
For comparison, GPT-2 had 1,000 timesteps and OpenAI Five took tens of thousands of timesteps per game. A typical 4-minute song at CD quality (44 kHz, 16-bit) has over 10 million timesteps. Generating music at the audio level is challenging since the sequences are very long.

A different approach is to model music directly as raw audio. This has led to impressive results like producing Bach chorals, polyphonic music with multiple instruments, as well as minute long musical pieces.īut symbolic generators have limitations-they cannot capture human voices or many of the more subtle timbres, dynamics, and expressivity that are essential to music. A prominent approach is to generate music symbolically in the form of a piano roll, which specifies the timing, pitch, velocity, and instrument of each note to be played. Automatic music generation dates back to more than half a century.
