SignLLM – AI sign language video generation

62 0 0

A multilingual sign language model that generates sign language videos through text descriptions. The model can convert input text or prompts into corresponding sign language gesture videos.

It can generate eight different sign languages including American Sign Language (ASL) and German Sign Language (GSL).

Link SignLLM: Sign Languages Production Large Language Models

SignLLM is a large language model for multilingual sign language generation. It is based on a new large-scale multilingual sign language dataset called Prompt2Sign and can generate sign language action videos in multiple languages from input text or prompts.

SignLLM, a groundbreaking multilingual sign language generation model, relies on a new large-scale multilingual sign language dataset Prompt2Sign. The dataset integrates public sign language data, including American Sign Language (ASL) and seven other languages, and converts it into a format suitable for training sequence-to-sequence and text-to-text conversion models. The SignLLM model proposes two new multilingual sign language generation modes, which can use new loss functions and modules based on reinforcement learning to accelerate training and improve the model’s ability to autonomously sample high-quality data. The webpage also shows the benchmark results of SignLLM, demonstrating the model’s leading performance on multiple sign language generation tasks. In addition, the webpage provides an overview of the dataset and the main methods, including the dataset structure, the interaction principles of Text to Language Gloss (Text2LangGloss) and Multi-Language Switching Framework (MLSF), and how the output of SignLLM is converted into multiple pose representation formats and rendered into realistic human images through style transfer models or specific fine-tuned generative models.

SignLLM improves the Text2Gloss framework by introducing a Gloss tag that can produce necessary language properties and represents profound features through variables Vt and Xu in the neural network. At the same time, five key elements are introduced – users, agents, environments, iterative update processes, and PLCs – which together outline a reinforcement learning process for sequence prediction.

Models for American Sign Language Generation (ASLP) and German Sign Language Generation (GSLP) are compared in empirical studies, and ablation studies and training efficiency studies are conducted to demonstrate the effectiveness of SignLLM.

# AI Encyclopedia