Based on a demo Thijs built two years ago, I built some examples: https://mydemoversion8-sandbox.mxapps.io/p/speech. That is the part of recording and playback and turning text-to-speech. Speech-to-text is not yet in the marketplace. The module that Thijs refers to is yet to be published. Meanwhile you can try to implement something yourself by implementing other speech-to-text providers like https://get.otter.ai/ or the webspeech API of Google.
There is a module that converts speech to text. When you have the text you could apply string operations in order to recognize the commands and the attacht an certain action to these commands.
Hope you doing well and already you got best answer for this request. If you looking for the best voice api, here i have shared top 11 voice api provider list,