Create an AI Voice Assistant using Python

You have watched Iron Man, right? Have you ever wondered how it would be to have your own JARVIS, your own AI voice assistant? Just imagine how easy it would be to search on Wikipedia, or Google or play videos on YouTube, or even send E-Mails, just with a single voice command. In this article, I will show you how you can make your AI personal assistant using Python.

What can the Assistant do?

It can play music for you.
It can do Wikipedia, and Google searches for you.
It is capable of opening websites like Google, YouTube, etc., in a web browser.
It is capable of opening your Applications with a single voice command.
And More,

Without wasting much of your time, Let’s start making your A.I.!!!

Open your IDE

Open your preferred IDE, I am going to use VS Code, but you can use any. Start a new project and create a file named assistant.py

Speak Function

To make our AI assistant be able to talk, it should be able to speak first. For that, we will define a speak() function. It will take audio as an argument and then pronounce it.

def speak(audio):
       pass      #we will give conditions later.

Next up, we need audio for our assistant, for it to pronounce it. For that, we will use the python module called pyttsx3

What is PYTTSX3?

A Python library that will help us to convert text to speech. In short, it is a text-to-speech library. It works offline.

To Install it, open cmd or terminal and type the following command.

pip install pyttsx3

In case of any errors, like

No module named win32com.client
No module named win32
No module named win32api

Then Install pypiwin32

pip install pypiwin32

After successfully installing pyttsx3, import this module into your program.

import pyttsx3

engine = pyttsx3.init('sapi5')
voices= engine.getProperty('voices') #getting details of current voice
engine.setProperty('voice', voice[0].id)

Now, What is sapi5?
- SAPI is an API developed by Microsoft to allow the use of speech recognition and speech synthesis within Windows applications.
What is VoiceId?
- VoiceId helps us to select different voices.
- voice[0].id = Male voice
- voice[1].id = Female voice

Writing our speak() function

Let’s program our speak() function that we created earlier, so that it will convert our text to speech.

def speak(audio):
engine.say(audio) 
engine.runAndWait() #Without this command, speech will not be audible to us.

Now that’s done, let’s create our main() function

main() Function

Now, We will create the main function of our assistant and then we will call our speak() function in it.

if __name__=="__main__" :
speak("Hello, Geek!")

Now, whatever you will write inside this speak() function will be converted into speech.

Whoohooo 🥳🥳 !!!. Now our assistant has its own voice and it can speak.

Creating different functions that our AI can perform

wishme() Function

After starting the AI we want it to first greet us, right ?. For that, we will create a wishme() function, so that it can greet and wish us according to the time.

To provide the current time and date to our AI we will import a module called datetime

import datetime

Now, define wishme() function

def wishme():
time = int(datetime.datetime.now().hour)

We have stored the value of current time into a variable called “time”. We will use this value inside an if-else loop.

Creating a Function that takes Command Input

As a voice assistant, it needs to take command, with the help of the microphone of the system. For that we will create a takecommand() function, using which our AI will be able to return a string output by taking microphone input from the user.

First, we need to install a module named speechRecognition into our program.

pip install speechRecognition

After installation, import this module in the program

import speechRecognition

Let’s define our takecommand() function

def takecommand():
    #It takes microphone input from the user and returns string output

    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        r.pause_threshold = 1
        audio = r.listen(source)

The takecommand() function is created. Now, we will add try and except block to manage errors efficiently.

  try:
        print("Recognizing...")    
        query = r.recognize_google(audio, language='en-in') #Using google for voice recognition.
        print(f"User said: {query}\n")  #User query will be printed.

    except Exception as e:
        # print(e)    
        print("Say that again please...")   #Say that again will be printed in case of improper voice 
        return "None" #None string will be returned
    return query

Defining Tasks for our AI.

Now, that our AI is ready to take commands, let us create some tasks which it can perform. For e.g. Wikipedia searches, Google searches, opening applications, etc

Task 1 – Wikipedia Search

For our AI to perform wikipedia search, we have to install and import a module called wikipedia into our program.

pip install wikipedia

import wikipedia

After that, write the logic for the task

if __name__ == "__main__":
    wishMe()
    while True:
    # if 1:
        query = takeCommand().lower() #Converting user query into lower case

        # Logic for executing tasks based on query
        if 'wikipedia' in query:  #if wikipedia found in the query then this block will be executed
            speak('Searching Wikipedia...')
            query = query.replace("wikipedia", "")
            results = wikipedia.summary(query, sentences=4) 
            speak("According to Wikipedia")
            print(results)
            speak(results)

In the above code, we used an if loop to check if “wikipedia” is in the search query of the user or not. If Wikipedia is found in the user’s search query, then a few sentences from the summary of the Wikipedia page will be converted to speech with the help of the speak function.

Task 2 – Opening YouTube in Web-Browser

To open youtube or any other website using AI, we need to import a module called webbrowser

import webbrowser

It is an in-built module so no need for installation.

elif 'open youtube' in query:
            webbrowser.open("youtube.com")

Here, we used an elif loop to check whether “youtube” is in the query or not. If it is present then the AI will use the webbrowser module and then open it in the default web-browser of the system. You can use the same logic from above code for any other website

Task 3 – Play Music

For our AI to be able to play music we have to import another module called os.

import os

elif 'play music' in query:
            music_dir = 'PATH TO YOUR MUSIC DIRECTORY' #Enter the path of your music directory
            songs = os.listdir(music_dir)
            print(songs)    
            os.startfile(os.path.join(music_dir, songs[0]))

In the above code, first we opened the directory where the songs are and then listed all the songs present in the directory with the help of the os module. Then, with the help of os.startfile, you can play any song of your choice. The above code will play the first song in the list. However, you can also play a random song with the help of a random module. Every time you command to play music, AI will play any random song from the song directory.

Task 5 – Know Time

elif 'the time' in query:
            strTime = datetime.datetime.now().strftime("%H:%M:%S")    
            speak(f"Sir, the time is {strTime}")

Above code uses the datetime() function and stores the current or live time of the system into a variable called strTime. After storing the time in strTime, we pass this variable as an argument in speak function. And then, the time string will be converted into the speech.

Task 6 – To Open an Application

 elif 'open notepadplusplus' or 'start notepadplusplus' in query:
            app = "C:\Tools\Notepad++\notepad++.exe" #Add the path of your app
            os.startfile(app)

Here, we are again using the os module to open the app. First we are storing the target file in the string called ‘app’. Then using os.startfile we are opening the file.

You can use the same logic for any other app you want to open.

Replace ‘notepadplusplus’ with the name of the app you want to open.

How to get the path of the app

Right-click on the app and select “Open file location”
After opening, right-click on the application
Copy the content of “Location” under the General tab.

Let’s see what we have done so far.

First, we created a wishme() function that gives the functionality of greeting according to the system time to our A.I.
After wishme() function, we created a takeCommand() function, to help our A.I. take commands from the user. This function is also responsible for returning the user’s query in a string format.
We developed code to open different websites like youtube or others.
We developed code to open any application.

Complete code:

import pyttsx3 #pip install pyttsx3
import speech_recognition as sr #pip install speechRecognition
import datetime
import wikipedia #pip install wikipedia
import webbrowser
import os

engine = pyttsx3.init('sapi5')
voices = engine.getProperty('voices')
# print(voices[1].id)
engine.setProperty('voice', voices[0].id)


def speak(audio):
    engine.say(audio)
    engine.runAndWait()

def wishMe():
    hour = int(datetime.datetime.now().hour)
    if hour>=0 and hour<12:
        speak("Good Morning!")
    elif hour>=12 and hour<18:
        speak("Good Afternoon!")   
    else:
        speak("Good Evening!")  
    speak("I am your assistant. Please tell me how may I help you")

def takeCommand():
    #It takes microphone input from the user and returns string output

    r = sr.Recognizer()
    with sr.Microphone() as source:
        print("Listening...")
        r.pause_threshold = 1
        audio = r.listen(source)

    try:
        print("Recognizing...")    
        query = r.recognize_google(audio, language='en-in')
        print(f"User said: {query}\n")

    except Exception as e:
        # print(e)    
        print("Say that again please...")  
        return "None"
    return query


if __name__ == "__main__":
    wishMe()
    while True:
    # if 1:
        query = takeCommand().lower()

        # Logic for executing tasks based on query
        if 'wikipedia' in query:
            speak('Searching Wikipedia...')
            query = query.replace("wikipedia", "")
            results = wikipedia.summary(query, sentences=2)
            speak("According to Wikipedia")
            print(results)
            speak(results)

        elif 'open youtube' in query:
            webbrowser.open("youtube.com")

        elif 'open google' in query:
            webbrowser.open("google.com")
        
        elif 'play music' in query:
            music_dir = 'PATH TO YOUR MUSIC DIRECTORY' #Enter the path of your music directory
            songs = os.listdir(music_dir)
            print(songs)    
            os.startfile(os.path.join(music_dir, songs[0]))

        elif 'the time' in query:
            strTime = datetime.datetime.now().strftime("%H:%M:%S")    
            speak(f"Sir, the time is {strTime}")

        elif 'open notepadplusplus' in query:
            app = "C:\Tools\Notepad++\notepad++.exe"
            os.startfile(app)

Is it really like Tony Stark’s JARVIS ?

Many people will argue that the virtual assistant that we have created is not an A.I., but just an output of the bunch of the statements we wrote. But, what is an A.I. basically, the sole purpose of A.I. is to develop machines that can perform human tasks with the same effectiveness or even more effectively than humans. And our “AI” is effecient to do that.

Congratulatins!!, you have successfully made your very first virtual assistant. Explore and try to add other functionalities to A.I. I hope you all have liked this tutorial.