AI + Music | Dialogue with Suno CEO: How the Breakout Music Generation Product Was Born?
First music foundation model based on transformer.
Recently, the music generation product Suno has broken through with the release of its V3. After experiencing the product at the time of its initial launch on December 20th last year, we observed that music creation and consumption could become a consumer-level content form with the significant reduction of creative barriers. Thus, we published our first research article on Suno.
We have recently organized further insights from Suno CEO Mikey Shulman's latest interview and the Rolling Stone report, concerning the background of Suno's birth, the methods of music creation, and the possibilities of consumption.
Key Takeaway
We attempt to extrapolate and deeply consider based on the content of the article, and welcome discussion.
Suno AI, by integrating artificial intelligence technology with music creation, provides users with a brand-new music creation platform. The application of this technology not only lowers the threshold for music production but also offers new creative tools for music enthusiasts and professionals, thereby promoting the democratization of music creation.
Suno AI could serve as an educational tool, helping learners understand music structure and the creative process. Through practical operation, learners can master the skills of music creation more quickly, which may change traditional music education models.
The emergence of tools like Suno AI may alter the way the music industry operates. From music production to distribution, the application of AI technology could lead to new business models and services, thereby affecting the interactions between music creators, publishers, and consumers.
Suno
Product: Suno
Founding Year: 2022
Founding Team
Mikey Shulman: Serving as CEO, he was formerly the Head of Machine Learning at Kensho, a lecturer at MIT Sloan School of Management, and holds a Bachelor's degree in Applied Physics from Columbia University and a Ph.D. in Physics from Harvard University;
Camachoeorg Kucsko: Chief Architect at Kensho Technologies, joined the Suno team after graduating from Harvard.
Product Introduction:
Suno AI is a powerful AI music generator developed by an MIT team. Users can create high-quality music in various styles and voices through simple text prompts.
Financing Situation:
2023: Suno.ai conducted a seed round of financing, raising $1.4 million;
1. Founder's Background and Path to Music
Suno, a pioneering AI music creation tool, stands out in the rapidly growing field of AI music generators. Unlike traditional tools that focus mainly on instrumental works, Suno's unique feature is its ability to create complete songs that include lyrics and vocals. This distinct skill sets Suno apart from competitors like Google's MusicFX and Meta's AudioGen.
Founded by a team of experts specializing in AI and machine learning, who previously worked at Kensho Technologies, Suno aims to democratize music creation. The founders, including Mikey Shulman with a Ph.D. in Physics from Harvard, see Suno as a tool to address the imbalance between music listeners and creators. Their vision is to enable a billion people worldwide to unleash their potential musical talents by providing a simple platform for song creation accessible to all.
Mikey Schulman, one of the co-founders and CEO of Suno, reflected on his musical journey in an interview.
He learned multiple instruments from a young age and played bass in small clubs in New York during high school and college. Despite not always being successful in performances, the process of making music with people was full of joy.
Mikey: "Music has always been a significant part of my life. I learned violin and guitar as a child and formed a band in college."
Later, Mikey studied physics at Harvard but ultimately shifted his career towards artificial intelligence and machine learning.
Mikey: "Although I studied physics, music has always been my passion. I've recorded an EP. As a hobbyist musician, studio work is much more tedious than live performances. I remember once in a recording session, I accidentally slipped off my chair, causing a great take to be scrapped and re-recorded. Such incidents would never happen during live shows."
After completing his graduate studies, Mikey started working at Kensho Technologies, which was later acquired by S&P Global.
Mikey: "In my last year of grad school, I met some people from Keno, one of whom, Martin, is now my co-founder. During lunch, they asked when I could come for an interview. I said, as a student, I'm available anytime. They suggested doing it right then, so I followed them upstairs for the interview. Although I didn't do well, they decided to give me a chance."
At Kensho, Mikey and his team began exploring the possibilities of audio AI. A speech-to-text project they worked on at Kensho sparked their keen interest in audio AI. Although the project focused on the financial domain, they recognized the potential of audio AI in the broader field of music creation.
Mikey: "Keno mainly dealt with NLP and machine learning to process a vast amount of financial documents. It was acquired by S&P Global in 2018. After the acquisition, we got access to a wealth of financial document data, it was a dream come true. We also did an audio project, which was to transcribe earnings calls from public companies in real-time into text. This was Keno's first foray into the field of audio AI. We found that with decades of high-quality transcription data and machine learning algorithms, we could significantly improve accuracy, far exceeding the existing speech-to-text services on the market."
Mikey and his partners realized that pursuing audio AI within a financial services company might not be appropriate. They believed there were greater opportunities to utilize AI technology in music creation. Mikey specifically pointed out that although they were not sure about the specific form of the product at first, they knew this direction was full of opportunities and challenges.
Mikey: "Keno mainly focused on text-related projects. This audio project started a year after the acquisition. Although there were indeed many areas worth exploring in finance with audio, I believe audio AI has broader applications beyond finance. Moreover, the financial industry tends to be more conservative in innovation due to risk considerations. There are too many interesting projects in the text field to invest in, making it difficult to divert efforts to audio."
An early milestone for the Suno team was the release of an open-source text-to-speech project called "Bark." This project received widespread attention from the community, and they found that people were most interested in music generation, not just text-to-speech. They discovered that although people tried to use Bark to generate music, it was not the original purpose of training the model. But it was this realization that further motivated them to develop Suno, focusing on music generation.
Mikey: "We carefully evaluated various opportunities in the audio field and found that most people do not like dealing with audio data, which might be our unique advantage (laughs). We decided from the start to follow the foundation model approach because, in the long run, it is the most promising direction. Although there were hardly any precedents for using transformers in audio at the time, to be honest, when we first left Keno, we were not entirely sure whether to focus on speech or music. After all, we had more experience in the speech field, and many people advised us that the speech market is larger and not to venture into music.
But two things changed our minds:
As music enthusiasts, we couldn't help but try our hand at music;
After we open-sourced a speech model called bark, it gained a lot of attention on GitHub. Through a survey form, we found that users were actually most interested in music, not speech."
Mikey hopes to redefine the experience of music creation and consumption with Suno, making it a new way for both music professionals and ordinary users who have never tried creation to express their emotions and tell their life stories.
Mikey: "Our goal is to enable everyone to transform the musical inspiration in their minds into actual music works through simple and easy-to-use tools. Whether you are a music professional or an ordinary user who has never tried creation, Suno aims to be a new way for you to express your emotions and tell your life stories. In the future, we also plan to explore more interesting human-computer interaction methods. For example, you might hum a melody, and the model could generate a complete song based on that inspiration; or you could upload some images or videos, tell your life片段, and the model could create background music to match. We hope to fully leverage the capabilities of AI to stimulate people's creativity and make music creation a new lifestyle. Of course, we are also well aware of the music industry's emphasis on intellectual property rights. Suno is committed to developing music AI in a legal and compliant manner. We only used royalty-free music data for training the model. The generated songs have also been carefully filtered to ensure they do not infringe on any artist's rights. The copyright of the songs created by users will be entirely owned by the users themselves. Our goal is to become a bridge connecting musicians and AI, allowing both parties to benefit from this technology rather than replacing each other."
2. Suno's Creative Method
Suno's uniqueness lies in its ability to generate not only the melody and accompaniment of music but also lyrics and vocals. This means it can produce complete songs, including all necessary musical elements. Its musical elements even encompass different languages and dialects; music in Cantonese and Sichuan dialect can be generated.
When using Suno, users can choose between two modes.
Basic Mode: Users provide an AI-style text prompt and can opt for the song to remain an instrumental piece without lyrics.
Custom Mode: Users can use their own lyrics, set various genre styles for the music, and even name the song.
The creative process for users with Suno is straightforward. When using Suno, users first select the generation mode (basic or custom), then input relevant prompt information, such as the song's theme, style, or specific lyrics. Suno uses these inputs to generate a complete song, providing an audio track for users to preview, along with lyrics and an image representing the song's theme.
Suno generates songs in an end-to-end manner, meaning that the vocals, instruments, and all other parts of the song are generated all at once. This decision to enable Suno to create lyrics and vocals significantly increases the complexity of model training. It requires the founding team of Suno to invest considerable effort to ensure the model can understand and generate human singing and lyrics.
In Suno's latest version, v3, more natural-sounding and less auto-tune style vocals have been introduced, along with new features that provide users with more detailed control over the creative process. For example, users can lock parts of the song they like and regenerate parts that did not meet expectations.
3. The Significance of Suno's Creation
Suno offers a path for people to explore music through text, allowing anyone to enjoy the fun of making music, regardless of whether the creator has a musical background. Suno aims to empower more people to transform the melodies in their hearts into reality, making music creation not limited to professional musicians.
Suno is not only focused on creating music in new ways but also committed to exploring how to experience and share music in new ways. The emergence of Suno may herald a transformation in the way music is consumed and shared, where music is no longer just a passive object of consumption but an experience that can be dynamically generated based on the listener's specific emotions and needs.
By making music creation more accessible and personalized, Suno has the potential to change people's relationship with music and promote the development of music as a means of expression and communication. This method of sharing music created by natural language will give rise to a new social model - music socializing.
Music socializing is not just about sharing music itself; it is a new way of interaction that allows people to connect and understand each other through music. On this platform, users can not only publish their own musical works but also discover creators from around the world and their creations, thereby inspiring new inspiration and creativity. This cross-cultural exchange and collaboration will drive innovation in music, making it a truly global language.
As the Suno community continues to grow, we will also see more music-based social activities and events. From online concerts to creative workshops, Suno provides a platform for users to participate in and enjoy the joy of music together. This not only deepens the connections between community members but also brings new experiences and opportunities for music enthusiasts.
Reference materials:
https://www.tomsguide.com/ai/i-tried-the-radio-quality-suno-ai-music-generator-heres-how-it-sounds
https://www.rollingstone.com/music/music-features/suno-ai-chatgpt-for-music-1234982307/
https://gosummarize.com/youtube/@lightspeedvp/mikey-shulman-suno-and-the-sound-of-ai-music