AudioBookSlides – Audiobook generated video slideshows

An open source project based on AI that generates video slideshows and subtitles from audiobooks.
Allows users to extract audio from audiobooks, automatically generate relevant slideshows and subtitles, and then create videos. It supports multiple models, can operate natively, and integrates image generation and text-to-speech technology.

AudioBookSlides: Create an AI generated slide show and subtitles from an audio book.

 

AudioBookSlides - Audiobook generated video slideshows

This sample was created without GPT API. You can find more free audiobooks here.

 01VenomLethal_G_vcs_01.mp4 

This is a demo contact sheet showing the images generated for “Venom Lethal Protector”. (Made with VideoCS not this program.)

 v17_out_01.mp4 

Sample from “Venom Lethal Protector”. Click the muted speaker icon to enable audio when playing.

2024-2-21 Version 1.1.0 Update

 

  • Removed GPT requirement. Characters and scenes can now be generated programatically.
  • Randomized default actor assignments
  • Changed default Stable Diffusion model to photon_v1.safetensors. It generates 10 images per minute on my NVIDIA 12GB 3060 GPU. Released for $329, February 25, 2021. Amazon: $289 (Other sellers list for under $100. Maybe refurbished?) The 1,221 Images for the 5 hour Venom book took 2 hours to generate.
  • SD LCM support requires existing installations update ComfyUI from the Manager menu
  • default_config.yaml setting “keep_actors: 1” allows you to set how many characters to allow in a scene
  • default_config.yaml setting “actor_priority: “creature, actress, female” to ensure creatures and women get priority over men
  • default_config.yaml setting “LUFS_target: -17” automatically increase volume of input files under -17 LUFS (Loudness Units Full Scale)

Installation of AudioBookSlides

 

  • You must change the paths to your Stable Diffusion (ComfyUI or A1111) output folder path in default_config.yaml
  • Consider creating a conda (or mamba) environment for the installation.
  • The requirements are minimal with one specific version requirement: openai==0.28 (compatible with LM-Studio).
  • Python versions 3.9 and 3.10 have been used successfully.

Windows:

Windows Installation Steps (click to expand)

Linux:

WSL Installation Steps (click to expand) [whisperX update]

Optional GPT API Setup

 

  • 2024-2-21 Version 1.1.0 Update. GPT is no longer required. Scripts are now included to generate character and scene data locally. (GPT does a better job though.)
  • To use the GPT API, you need to sign up for an API Key. Register and get your key here.
  • Save your API key in a file named ABS_API_KEY.txt in the application folder.
  • The cost is approximately $1 for a 12-hour audiobook. New sign-ups might receive $20 free credit.
  • Alternatively, use the free LM-Studio Local GPT server. It’s about 3 times slower (1 hour vs 20 minutes) and less accurate. The recommended model is here.
  • Note: Not all requests have been optimized for LM-Studio. Some results may be suboptimal. This feature was utilized during development but has not been fully verified with the current installation.

Overview of Processing

 

  • The application keeps track of its workflow and can be stopped or restarted at any time.
  • If it stops or you interrupt it, you can relaunch it and it will resume from where it left off.
  • The app will connect to the ChatGPT API to identify characters if you have configured an API key.
  • It may connect to GPT again to extract the scene/setting information for the image prompts.
  • The process will pause to allow you to modify, or keep the default, file used to replace characters with actors.
  • Default lists of actors are provided. By default, the app picks the replacement actor randomly.
  • You can create custom actor lists in the folder by changing the books\bookname\bookname.yaml file.
  • You can add guidance to the actor description, such as “long blond hair”, “20yo”, NSFW, etc.
  • You must save the books/bookname/bookname_ts_p_actors_EDIT.txt as books/bookname/bookname_ts_p_actors.txt, then relaunch the app and it will continue.
  • The process will pause once the image generation requests have been submitted. Wait for all images to be created, then rename the ComfyUI “output” folder to “bookname” before continuing.

The finished files will be in a folder under the installation directory books\bookname:

$abs/
├── books
│   ├── BookName1
│   ├──── Bookname1.avi 
│   ├──── Bookname1.srt
├───├── BookName2
├── ...

Tips on Managing Actors

 

  • Adding actor entries only once, and allowing replacements to be consolidated into a single select name, reduces name collision issues. See edited example below.
  • Replacing characters with actors is conducted to create consistent character appearances. This approach is simpler than trying to describe a particular character in detail.
  • Character names will be replaced with actor names from .csv files configured in default_config.yaml. The file will be sorted with actors on top and actresses below.
  • Due to the audiobook being transcribed with speech-to-text, actor names may often be misheard or misspelled. They might also be spoken in various forms, such as “John Smith”, “John”, “Smith”, or “Mr. Smith.”
  • You are responsible for identifying these cases and assigning a single actor to all variations.
  • Well-known characters like “Vampire”, “Santa Claus”, “Peter Rabbit”, or any other character the AI already knows how to render, can be omitted.
  • You are not required to enter actual actors. You can enter any name from popular culture the AI may recognize, as I did below with ‘Daenerys Targaryen’
  • The “depth” key in default_config.yaml controls the minimum number of times a name must appear to be included in the initial replacement list. The default value is 4.
  • While you might not need to provide an actor name for a character mentioned only once, that single instance may actually be a misspelling of the main character. Therefore, you might prefer to set the depth to 1 in the config file and manually remove any unrecognized names.
  • It is important to order the various actor names from the longest to the shortest character length. For example, if the names “John” and “John Smith” both occur, place “John Smith” before “John” to ensure “John Smith” is correctly replaced.
  • Exercise caution when using actor names that are also names of characters in your book. If overlooked, you might end up replacing a part of an actor’s name with a different actor.
  • Actor replacements are case-sensitive because the names you see are written in the case they appear.
  • Be careful when replacing very short names that may be part of other words. For instance, entering “Pat ” (with a space after) will ensure that the letters in “Patterns danced…” are not replaced.
  • The format of this file is: count<tab>character name, _solo [age] <gender> [actor | actress] actor name @. Do not remove or alter the “count<tab>character,” portion of the text.
  • The delimiters _…@ are included in case you want to make targeted replacements or corrections to the final prompt file, specifically for actors and not other spoken text.
  • For non-GPT processing, if you suspect a character’s name is missing from your actor file, please follow these steps for possible correction:
    1. Open the file tokenizer_vocab_2.txt. This file acts as an English dictionary from which numerous names have been removed, including those scraped from U.S. baby names, census data, and extracted from 200,000 lines of eBooks. For instance, I recently had to remove the name “Holmes” from this file.
    2. If you believe there are still names missing after the initial edit, you can set ‘use_dictionary: 0’ in the config file. Proceed by deleting all files except for bookname.mp3 and bookname.srt, and re-run the generation script (for example, abs 01SherlockHolmes). This process ensures that no names from the dictionary are filtered out during name generation.
    3. If you’re not satisfied with the results, you can set ‘use_speech_verbs: 0’ in the config file. This adjustment bypasses the validation check that requires proper names to be immediately followed by one of 500 different verbs, which indicate actions performed by a character. After this change, the sole criterion for a word to be considered a potential character name is capitalization, though this may result in some place names being included in your actor list. Ideally, you should review the list to identify and remove any such instances. (This match only needs to occur once, so for a prominent character, it is very likely to happen at least once.)
    4. I do not recommend setting both of these values to 0. If you do, you will probably get every word from the beginning of every sentence. Execution time will also increase.
    5. Or you can just type the missing character into the bookname_ts_p_actors_EDIT.txt file by copying and pasteing an existing line, then change the name in order to keep proper delimiters.

This is an example of a default vs. manually edited actor list.

 

[Default] books/01ThisHour/01ThisHour_ts_p_actors_EDIT.txt [Edited] books/01ThisHour/01ThisHour_ts_p_actors.txt
100 Theodora, _solo female actress Anna Paquin @ 10 Lemony Snicket, Snicket
62 Moxie, _solo 25yo female actress Naomi Watts long blond hair @ 4 Mr. Snicket, Snicket
50 Ellington, _solo 25yo female actress Florence Pugh @ 40 Snicket, _solo 30yo male actor Patrick Warburton@
40 Snicket, _solo 30yo male actor Al Pacino @ 12 S. Theodora Markson, Theodora
18 Ellington Faint, _solo 30yo female actress Meg Donnelly @ 100 Theodora, _solo 25yo green eyed female Daenerys Targaryen@
17 Pip, _solo 30yo male actor Matthew Lewis @ 62 Moxie, _solo 25yo female actress Naomi Watts curly hair@
16 Hangfire, _solo male actor George Clooney @ 18 Ellington Faint, Ellington
13 Qwerty, _solo 25yo male actor Keanu Reeves @ 50 Ellington, _solo 25yo sexy female actress Margot Robbie@
12 S. Theodora Markson, _solo 40yo female actress Summer H. Howell @ 7 Stu Mitchum, _solo 12yo male actor Brad Pitt@
12 Hector, _solo 12yo male actor Colin Farrell @ 7 Stu , _solo 12yo male actor Brad Pitt@
11 Prosper Lost, _solo 12yo male actor Christopher Walken @ Harvey Mitchum, Harvey
10 Lemony Snicket, _solo 45yo male actor John Barrowman @ Harvey, _solo 35yo male actor Gene Hackman@
7 Harvey Mitchum, _solo 40yo male actor Anthony Heald @ 6 Mimi Mitchum, _solo 30yo female actress Angelina Jolie@
7 Stu, _solo male actor Dwayne Johnson @ Quirty, Qwerty
7 Mrs. Sallis, _solo 50yo female actress Markella Kavenagh @ 13 Qwerty, _solo 25yo male actor Keanu Reeves@
7 Squeak, _solo 30yo male actor Michael Caine @ 4 Murphy Sallis, Murphy
6 Harvey, _solo 35yo male actor Cary Grant @ 7 Sally Murphy, Murphy
6 Mimi Mitchum, _solo 30yo female actress Madison Lintz @ 7 Mrs. Sallis, Murphy
6 Mitchum, _solo 40yo male actor Gene Hackman @ 7 Mrs. Salas, Murphy
5 Malahan, _solo 45yo male actor David Harbour @ 4 Murphy, _solo 55yo female actress Sharon Stone@
5 Father, _solo 50yo male actor Henry Cavill @ 17 Pip , Peuchet
4 Mother, _solo 30yo female actress Maia Mitchell @ 17 Pecuchet, _solo 30yo male actor Jet Li@
4 Mrs. Salas, _solo 60yo female actress Amelia Clarkson @ 7 Bouvard, Squeak
4 Quirty, _solo female actress Camila Morrone @ 7 Squeak, _solo 30yo male actor Ken Watanabe@
4 Mr. Snicket, _solo 40yo male actor Javier Bardem @ 11 Prosper Lost, _solo 55yo male actor Christopher Walken@
16 Hangfire, _solo male actor Rich Litle@
12 Hector, _solo 12yo male actor Colin Farrell@
5 Malahan, _solo 45yo male actor David Harbour@
5 Father, _solo 50yo male actor George Clooney@
4 Mother, _solo 40yo female actress Michael Pfeiffer@

TODO List for AudioBookSlides

 

  •  1) Test wildcard input folder with multiple MP3 files.
    • Verified 3 files (chapters) concatenated:
      python abs.py 01ThisHour "E:\Media\AudioBooks\Lemony Snicket\All the Wrong Questions 1 - Who Could That Be at This Hour\*.mp3"
      
  •  2) Test spaces in BookName and MP3 path.
  •  3) Test on Windows Subsystem for Linux (WSL).
  •  4) Test on system A1111 (note: some manual steps required).
  •  5) Test input with different audio formats (.WAV, .AAC). (ffmpeg does not support .m4b containing images so rename those to .aac and they will work)
  •  6) Finish Win/Whisper upgrade
  •  7) Enable Tortoise-TTS text-to-speech to convert text eBooks to .mp3 with AI narrator. Sample: Cave Johnson from “Portal” video game reads “Oil Slick” by Warren Murphy.
 016OilSlick_cave_G_01.mp4 

© Copyright notes

Related posts

No comments

No comments...