Voice muzzle for humans

7/31/2023

I have my fair share of complaints about AI coding tools, but that isn't one of them. Now you're one person with the resources and earning potential of an entire development shop. Rather than writing every line of code by hand, you would transition to becoming an architect, project manager, code reviewer, and QA tester. If you had a solo contracting business, and the technology existed to fully outsource a development project to AI based on carefully documented requirements, using it would be a cheaper alternative to subcontracting. Sure, why not? If you could earn more money and produce more value to society with the same amount of labor, and the legal/regulatory environment supported it, I wouldn't see a reason not to. We are thrilled to be sharing our new model, and look forward to feedback! We have a free plan and transparent pricing available for anyone to upgrade. The API supports streaming and polling and we are working on reducing the latency to make it real time. We offer two ways to use these models on the platform: (1) our text to voice editor, that allows users to create and manage their audio files in projects, etc. We also offer a diverse library of over a hundred voices for various use cases. Zero-shot clones the voice with only a few seconds of audio and captures most of the accent and tone, but isn’t as nuanced because it has less data to work with. High-fidelity voice cloning requires around 20 minutes of audio data and creates an expressive voice that is more robust and captures the accent of the target voice with all its nuances. On our platform, we offer two types of voice cloning: high-fidelity and zero-shot. So we doubled down in training a new model based on the new emerging architectures using transformers and self supervised learning. We initially used existing TTS models and APIs but when we started talking to our customers in gaming, media production, and others, people didn't like the monotone robotic TTS style. There are many robotic TTS services out there, but ours allows people to generate truly human-level expressive speech and allows anyone to clone voices instantly with strong resemblance. We initially built this product for ourselves to listen to books and articles online and then found the quality of TTS is very low, so we started working on this product until, eventually we trained our own models and built a business around it.

Our users range from individual creators looking to voice their videos, podcasts, etc to teams at various companies creating dynamic audio content. We solve that and make it as simple as writing and editing text. Both Parrot and Peregrine only speak English at the moment but we are working on other languages and are seeing impressive early results that we plan to share soon.Ĭontent creators of all kinds (gaming, media production, elearning) spend a lot of time and effort recording and editing high-quality audio. With Parrot, we've taken a slightly different approach and trained it on a much larger data set. Our previous speech model, Peregrine, which we released last September, is able to laugh, scream and express other emotions. Since the voices are built on LLMs they are able to express emotions based on the context of the text.

Our goal is to solve these across all languages. For example, making a voice speak in a specific way, or emphasizing on a certain word or parts of the speech. Just upload a non-English speaker clip and try it yourself.Įxisting text to speech models either lack expressiveness, control or directability of the voice. Even more interesting, it can make non-English speakers speak English while preserving their original accent. The model also captures accents well and is able to speak in all English accents. Today, we are excited to share beta access to our latest model, Parrot, that is capable of cloning any voice with a few seconds of audio and generating expressive speech from text. We're building Large Language Speech Models across all languages with a focus on voice expressiveness and control. Hey HN, we are Mahmoud and Hammad, co-founders of Play.ht, a text-to-speech synthesis platform.

0 Comments

Voice muzzle for humans

Leave a Reply.

Author

Archives

Categories