▲ ▼ On premise video indexing with visual data

posted by datascience enthusiast , 2028 days ago permalink , show insights

Current video indexing are mostly done by converting 'speech-to-text' and then indexing it. This is mostly done to save on compute resources and that algorithms to index video with visual data are not as stable as that of text.

e.g. Searching for a video for an old man with red shoes wouldn't yield results unless there was an audio in the video describing it when using speech-to-text indexing.

It is said that even Youtube hasn't indexed all its video by visual data due to sheer number of videos. There APIs available like that from Microsoft which claim to enable video indexing with visual meta data, but they are not a on-premise solution and so using those might rake up bills if the number of videos are large.

There is a need gap for on-premise video indexing technology.

WHO NEEDS THIS

▲ ▼ TheKidd , 2010 days ago permalink

I was just noodling over something similar the other day with regard to the "speech-to-text" bit. I know that when you speed videos up on Youtube, the audio speeds up as well. I always wondered if the speech-to-text algorithms were able to transcribe that. My first thought was "the NSA can probably do this".

reply BUILD THIS USING AI I WILL PAY FOR THAT
HOW MANY WILL PAY FOR THIS
▲ ▼ datascience enthusiast , 2003 days ago permalink

Youtube does use speech-to-text for indexing its videos, the transcriptions are associated with the timestamp and so if the time is sped up the text still syncs fine with the audio.

reply BUILD THIS USING AI I WILL PAY FOR THAT
HOW MANY WILL PAY FOR THIS