use edge ======== scroll page automatically, every 500ms setInterval(function() { window.scrollBy(0, 100); }, 500); save page as complete with images use tessarc to convert images to text ======================================= mkdir -p output && for file in *.jpg *.jpeg *.png; do tesseract "$file" "output/${file%.*}.txt";done #cat *.txt concatenates all .txt files in the current directory. #sed 's/flat/pancake/g' replaces all occurrences of “flat” with “pancake”. #> merged.txt redirects the output to a file named “merged.txt”. #&& is a logical AND operator that only runs the next command if the previous command was successful. #espeak-ng -f merged.txt -w output.wav uses espeak-ng to convert the text in “merged.txt” into a .wav file named “output.wav”. cat *.txt | sed 's/flat/pancake/g' > merged.txt && espeak-ng -f merged.txt -w output.wav can also do cat *.txt > merged.txt convert wav to mp3 ==================== lame output.wav output.mp3 using mozilla tts and https://docs.coqui.ai/en/latest/inference.html ==================================================================== pip install TTS import torch from TTS.api import TTS # Get device device = "cpu" # List available print(TTS().list_models()) # Init TTS tts = TTS(model_name="tts_models/en/ljspeech/tacotron2-DDC", progress_bar=False).to(device) # Read the text file with open('input.txt', 'r') as file: text = file.read() # Run TTS tts.tts_to_file(text=text, file_path="output_moz.wav") # Run TTS # ❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language # Text to speech list of amplitude values as output #wav = tts.tts(text="Hello world!", speaker_wav="~/cloning/audio.wav", language="en") # Text to speech to a file #tts.tts_to_file(text="Hello world!", speaker_wav="~/cloning/audio.wav", language="en", file_path="output.wav") re-number files 1 to total ========================= count=1; for file in *.txt; do mv -- "$file" "$count.txt"; ((count++)); done