@Nbasder9 there is, it might not be as reliable as you'd expect, and has a few negative points.
First, it requires a Nvidia GPU with at least 3GB VRAM.
Second, it does not support language detection, you must manually set the source language for each video.
Third, sometimes it will fail to identify the audio segment directly from the video file, requiring you to first converting your video to an audio file (mp3 or wav).
This is for Windows users, if you're on another platform you'll need a different program. If you want to give it a go, download "WhisperDesktop.zip" here: https://github.com/Const-me/Whisper/releases
Then you need to download a language model here: https://huggingface.co/ggerganov/whisper.cpp/tree/main
I recommend " ggml-medium.bin" as it offers decent transcription and works well with low VRAM cards. If you have a high-end card with 8GB or more, you can try the model " ggml-large-v3.bin". The Larger model offers more accurate results, but takes slightly more time to transcribe.