OpenAI Whisper - MultiLingual AI Speech Recognition Live App Tutorial

OpenAI Whisper - MultiLingual AI Speech Recognition Live App Tutorial

1littlecoder

1 год назад

40,891 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

layla Bitar
layla Bitar - 31.05.2023 10:15

how can you have it process a multilingual audio?

Ответить
George Mathew
George Mathew - 25.05.2023 02:20

Very cool. Can we use openai whisper to IVR telephony. Like it needs to address clients from multiple languages like Hindi, telugu, Malayalam, tamil, English and respond accordingly

Ответить
Bassam
Bassam - 17.05.2023 17:56

it is so confusing

Ответить
Anukiran Ghosh
Anukiran Ghosh - 12.05.2023 15:28

Do you have a tutorial for translation? can you please help me out? just like the def transcribe, I want a function for translate and I want to integrate with gradio. I can't make the translate work? can you please give me the code for translate?

Ответить
Rayden X
Rayden X - 19.04.2023 02:35

I would recommend Streamlit to build front-end interface.

Ответить
Tapan Ray
Tapan Ray - 12.03.2023 23:34

Hello dear, video is really very helpful for me. I am trying to build asr for Sanskrit language. It is not working for that. Could you help me how to train sanskrit data? Or any videos that will help me for building sanskrit asr. I have a parallel sanskrit data.

Ответить
George Patronus
George Patronus - 11.02.2023 12:51

To run OpenAI Whisper LARGE model, how does the RTX 4090 compare to this setup on AWS - NVIDIA A10G Tensor Core GPU, g5.xlarge with 16GB RAM. Can I expect faster or slower transcription with the 4090?

Ответить
George Patronus
George Patronus - 11.02.2023 12:10

Can the RTX 4090 run Openai Whisper LARGE model well, on an i9 1TB Nvme SSD 12th Gen gig that has 64GB DDR5 RAM?

Ответить
Dos Hanif
Dos Hanif - 07.02.2023 05:13

Hi. Tq for your tutorial. I ' be tried your Web UI and hope that it can help me transcribe some of the discussions related to my job. Unfortunately every time I tried to use your web ui - failed. Is it because. Of the size of the recorded oudio - more than 1 hour? Please help. Tq

Ответить
David Thompson
David Thompson - 17.01.2023 17:37

Great Video, thanks for sharing.
Non coder here, but I see great application for this in terms of improving productivity. I was wondering:
1. How straight forward would it be for a non-coder to install it on windows?
2. I can it cannot currently differentiate between 2 different speakers, is that something in the pipeline?

Use-Case: I have been looking for a tool, that will take my recorded meeting conversations, and transcribe them, with proper formatting and differentiate between the different participants, wonder if its possible to achieve this with Whisper or another tool?

Thanks

Ответить
Anna Acedo Ortega
Anna Acedo Ortega - 12.01.2023 14:24

Hello, thank you so much for your tutorial. I am trying to use Whisper for my master's thesis in translation technologies. The only issue I had was that after importing gradio and recording live a short audio so Whisper can transcribe, it doesn't work, it just keeps loading and loading forever even if it's just a 6 second audio. What do you suggest I can do? Thank you again from Spain!

Ответить
Tejas Narola
Tejas Narola - 06.01.2023 14:51

Best content ! Thanks
Can we calculate confidence interval of each word transcribed?

Ответить
Avijit barua
Avijit barua - 25.12.2022 07:25

Hello sir i saw your all video.
i am your big follower!
please tell me how I can convert long Bangla language mp3 to Bengali text?
please sir you make a video about this topic.

Ответить
REAL VIBES TV
REAL VIBES TV - 22.12.2022 11:31

Can you use this in Unreal Engine

Ответить
Engg M. Ali Mirza Short Clips, Whatsapp Status
Engg M. Ali Mirza Short Clips, Whatsapp Status - 12.12.2022 12:34

love from Pakistan :)

Ответить
App Stuff
App Stuff - 10.12.2022 14:55

May I ask, once the web demo is done with basic UI web using Gradio, how can we migrate this to a proper web app, like standalone webapp, can you please guide a little ?

Ответить
App Stuff
App Stuff - 10.12.2022 12:32

Thank you for this. Subbed!

Ответить
Gowtham Dora
Gowtham Dora - 18.11.2022 20:47

Bro really amazing content hatsoff to you

Ответить
IdeaAi
IdeaAi - 18.11.2022 20:37

Hi ! do you know if it's posible to do it in nodejs? how can you use whisper in a web app ?

Ответить
ABHIGNA CONSCIENCE
ABHIGNA CONSCIENCE - 04.11.2022 01:20

can it do realtime transcription instead of processing audio file ?

Ответить
Danish a
Danish a - 26.10.2022 22:19

Hey @1littkecoder can we train this model on our own dataset

Ответить
Dimoris Chinyui
Dimoris Chinyui - 30.09.2022 00:12

Hey guys please can anyone help me with this issue. I am trying to run whisper on my machine and I am getting this error in cmd. UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead").
I use a windows 10 with gpu RTX2060. Also it seems it runs on my cpu instead of NVIDIA GPU. I created a python virtual environment and pip installed whisper in that virtual environment just for more details.

Ответить
Arun Kumar
Arun Kumar - 28.09.2022 12:41

indian acccent vs british accent does it shows any different or only shows english language

Ответить
concretec0w
concretec0w - 27.09.2022 15:43

Love the channel, you should have many more subs! ❤

Ответить
Shyam Siddarth
Shyam Siddarth - 27.09.2022 12:57

Thanks to openAI and thanks to you தல. ஒரு வெள அந்த 5 sec limitation இல்லேன்னா, இத நம்ம podcast transcriptionக்கு use pannalamன்னு நினச்சேன். Thanks for making this video.

Ответить
Abhilekh Kalita
Abhilekh Kalita - 26.09.2022 09:11

Thanks for sharing.

Ответить
Ashutosh Kumar
Ashutosh Kumar - 26.09.2022 08:00

Great work !

Ответить
Homeless in America
Homeless in America - 26.09.2022 01:19

Thank you!

I have two questions:

1) When I try to run the notebook it says "no such file" after I uploaded my audio file. How should I make sure it can access the audio file?

2) when I do model.device it returns cpu instead of cuda. How do I change this?

Ответить
byGDur
byGDur - 25.09.2022 00:28

Kudos to you if you prepared the Colab files!

Ответить
Flawed Thoughts
Flawed Thoughts - 23.09.2022 04:27

This is a great demo, thank you!

I am new to programming. Can our local machines handle this or should we do it in google collab?

Ответить
Alastair van Heerden
Alastair van Heerden - 22.09.2022 23:52

Thank you! Are you able to explain how to do simple voice activity detection with this model?

Ответить
Ayush Singhal
Ayush Singhal - 22.09.2022 15:44

Can I integrate whisper in my android application? Is there any API keys for this?

Ответить
CosmicVibing
CosmicVibing - 22.09.2022 13:58

Hello, how do you import audios ? I'm stuck on the 4th step

Ответить
Chris Lloyd
Chris Lloyd - 22.09.2022 13:07

How do I download the models and weights without using Colab? Eg a local conda env. There is no way I can see how to do this on the GIT page.

Ответить
Sathish Kumar
Sathish Kumar - 22.09.2022 04:39

Superb

Ответить
fedahumada
fedahumada - 22.09.2022 03:13

Hi and thank you! I find your content so inspiring! Definetly trying this app.

Ответить
Azmo
Azmo - 22.09.2022 01:01

Comparison to speech recognition with Google pixel 6 Pro would be interesting

Ответить
Chrontexto
Chrontexto - 22.09.2022 00:21

Thank you for the tutorial.

When I tried to step through your Gradio app, I got errors when trying to import your audio clips.
When I disconnected and copied your code to my own Google Drive, I was able to at least record audio with my own microphone and see Whisper transcribe up to 30 seconds.

Ответить
chaithanya vamshi
chaithanya vamshi - 22.09.2022 00:20

Golden Content! Just started working on a project and this is a very helpful resource to implement. Thank you!

Ответить
musicspinner
musicspinner - 22.09.2022 00:00

If it could distinguish and tag/timestamp multiple speakers in a recording (e.g. of a meeting) then that would be awesome.

Ответить