Joining this community, first post for me, happy to be here.
As a personal exercise, I'd like to develop an app similar to the Biden vs Trump endless debate https://www.twitch.tv/trumporbiden2024 just with different avatars. Lets assume I have video material, couple of hours, of the avatar I'd like to generate. What kind of model is capable of doing this? My first goal would just be the avatar speak some text given to it, doesn't need to be super fancy. I think it needs voice cloning for the audio part, text-to-speech, but I'm unsure about the video synthesis (only head movements). Is there a hugging face pipeline for this? Sorry for open question, any reading suggestions are welcome as well.