Meta is making it easier for artists and sound designers to produce audio using only AI. The Facebook owner has released an open source AudioCraft kit that bundles three existing generative AI models for creating sounds from text descriptions. AudioGen and MusicGen respectively produce sound effects and music, while EnCodec compresses sounds to produce higher-quality results. A musician or sound designer might have everything they need to compose pieces.
The release includes pre-trained AudioGen models for those who want to start quickly, and tinkerers will have access to the entire AudioCraft code and model weighting. The open source debut gives pros and researchers a chance to train the models using their own data, Meta says. All the pre-trained models use either public or Meta-owned material, so there’s no chance of copyright disputes.
The tech firm characterizes AudioCraft as a way to make generative AI audio simpler and more accessible. Where AI-produced images and text have been popular, Meta believes sound has lagged “a bit behind.” Existing projects tend to be complicated and frequently closed off. In theory, the new kit gives creators the opportunity to shape their own models and otherwise stretch what’s possible.
This isn’t the only open text-to-audio AI on the market. Google opened up its MusicLM model in May. Meta’s system also isn’t designed for everyday users — you’ll still need to be technically inclined to use AudioCraft properly. This is more for research, the company says. The developers are also trying to improve the performance and control methods for these models, expanding their potential.
Even in its current state, though, AudioCraft may hint at the future of AI’s role in music. While you won’t necessarily see artists using AI to completely replace their own creativity (even experimenters like Holly Herndon are still highly involved), they’re getting more tools that let them create backing tracks, samples and other elements with relatively little effort.