yeah i don't know for story yet, but it will come. again, we need small models that are locally installed, they will be a far cry from chatgpt, but they don't need to be anything like it either, they just need to be better than what we have currently in open world games. Would be nice to be able to ask your companion to do anything you can do in the game. e.g "go fight X" or "go sell Y for me" etc.
That talking isn't used as a control scheme is a bit of a shame. It's quite a nice (and natural) way to control stuff.. we've had great speech recognition since 20+ years back.
we've had great speech recognition since 20+ years back.
That’s nonsense. Maybe if you know English, and have no accent, and live in the quietest area. But otherwise, speech to text has been pretty terrible until recent years. They still fail, for example, if wind blows in the direction of my headphones. Imagine playing a game, trying to bark commands, but those fail because it’s hot in your country and the fan is blowing in your face.
Even if STT is perfect, having to speak to play is not always feasible. What if you live in a small apartment, where your partner or kids are trying to sleep? Or you are trying to play on the train on your Deck? What if you are a console moron without a single original thought in your single brain cell, not wanting to think about sentences (“if I wanted to speak, I’d read a book”) and just want to sit on the couch?
i used it around 20 years ago (if not more!) a ton, i controlled games like SWAT 3 for example and had 0 issues with it.. the program i used was called Game Commander. You trained it with your own voice, so it was very much dependent on how good you were training it, you could say the same word several times so that even if you kind of mumbled it would still get it, if you had trained it saying the word perhaps 4 times a bit differently.
Yes it has to be at least somewhat quiet, yes you need a proper mic that won't pick up too much background noise (like game noise, if you're not using headphones). But again, it was 20 years ago, so i'm sure it must have evolved somewhat.
edit: it was 22 years ago, because SWAT 3 came out in 2001.
of course it's not good for all games, or for all commands, i shouldn't need to tell you that
I guess that was not real speech recognition, they probably had precalculated DFT signatures of possible words the game recognizes, and then just compared it with DFTs of your words.
it's speech recognition of course. but yes that's pretty much how it works here.
The game itself does not recognize words at all because there was no such thing. You needed Game Commander (or similar) for it to work, it binds speech recognition to keys.
it's all in the implementation, it can be done poorly and it can be done good. i have it for my TV and it's terrible, my father has it in his car for the GPS, its terrible too. if its gonna be good it probably needs to be trained on your specific voice for a bit + you need a good mic, the later is extremely important of course. I've always had good mics around since i've been doing music for 35 years, so yes it can be a bit of an investment. but then again, any controller is..
State of the art personal assistants in 1500+$ phones can’t really perform speech to text properly, but surely your two and a half commands in some silly app worked flawlessly. Sure, why not you fail to differentiate the orders of magnitude difference in complexity between these shitty late 90s/early 2000s “command recognizers” from general purpose speech to text required for something like what we are discussing.
So what, those phones still don't have professional microphones and the speech recognition is likely the usual garbage where it doesn't train on your voice first. Thus it's a pretty terrible implementation of it, it's just a quick hack job that might work under perfect conditions.
We've come a really long way in the recent years when it comes to extracting / examining frequencies. SpectralLayers can extract e.g vocals from a track rather easily... a total fantasy 22 years ago.
and no, i understand the difference and i understand it's been 22 years of advancement, we can absolutely do perfect speech recognition, the problem lies in cost / implementation, not that it would be impossible to do. A garbage mic isn't gonna cut it because its going to pic up noise and it won't be great for interpreting frequencies in great detail either.
I don't think it's the mic or frequencies. If two humans are able to speak on the phone, in the same windy condition, and understand each other, then the algorithms are still not good enough. These days, even earbuds and headphones have OK mics. Nobody but tardtubers are going to buy a small diaphragm shotgun mic, installed on a vibration reducing shock mount, with a large deadcat and a pop filter, just so they can yell at their video game.
My guess why in 2023, speech to text is not yet properly solved is that not enough attention has been paid to it. A lot more attention has been paid in the last ten years to computer vision algorithms, for example. Who knows, maybe video game advancements will be the catalizator to trigger advancements in this field.
There was this RTS EndWar ~15 years ago where you could give commands to your units by voice. Tried that, but it was still way quicker an more convenient to do it "the old way"
Who actually wants to talk to their computer though?
I certainly would never use this feature
feels weird at first but its super convenient. especially so in games that are complex and has a lot of commands, e.g you have companions you can ask to do things, but you'll have to remember perhaps 15 keyboard commands. you never need to look at cheat-sheet.
you also free up your hands so if its an especially intense game you can execute complex commands and still be on the move, shoot etc.
i think one of the reason games are often limited like that is because the keyboard itself is a limited controller and you would scare away players if you tell them "yes, you have to remember these 20 keyboard shortcuts for your companion!" it wouldn't work well unless its a slow game or a game with a big on-screen interface where you can just click the commands.
@LeoNatan i think for current text to speech with "unlimited" amount of words, it needs to be really accurate to not mix things up, might even need training on your voice to be really accurate. But for something similar to the program i explained you just need a good mic that doesn't pic up noise, my success was likely due to having a really good mic, while most people use shit mics. or i was good at training it, or it was a mix of both these things. i do remember going in with 0 expectations of it, and was really surprised when it worked the way it did.
Rockstar uploaded a video and set it to private immediately a couple of days ago. Everyone thought it was the first trailer for Grand Theft Auto VI but I guess it was this
"There is much more still to come, including ongoing weekly special events and bonuses, festive celebrations, gifts, surprises, as well as plans to bring the much-requested PlayStation 5 and Xbox Series X|S features of GTA Online to the PC platform in the new year. Please stay tuned to the Rockstar Games Newswire for details."
Signature/Avatar nuking: none (can be changed in your profile)
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum