Installing NLTK in Windows
Installing Python in Windows
Installing NLTK in Mac/Linux
Installing NLTK through Anaconda
NLTK Dataset
How to Download all packages of NLTK
Running the NLP Script
How to Run NLTK Script
Installing NLTK in Windows
In this part, we will learn that how to make setup NLTK via terminal (Command prompt in windows). The instruction given below are based on the assumption that you don’t have python installed. So, first step is to install python.
Installing Python in Windows:
Step 1) Go to link https://www.python.org/downloads/, and select the latest version for windows.
Note: If you don’t want to download the latest version, you can visit the download tab and see all releases.
Step 2) Click on the Downloaded File
Step 3)Select Customize Installation
Step 4) Click NEXT
Step 5) In next screen
Select the advanced options Give a Custom install location. In my case, a folder on C drive is chosen for ease in operation Click Install
Step 6) Click Close button once install is done.
Step 7) Copy the path of your Scripts folder.
Step 8) In windows command prompt
Navigate to the location of the pip folder Enter command to install NLTK pip3 install nltk
Installation should be done successfully
NOTE: For Python2 use the commandpip2 install nltk Step 9) In Windows Start Menu, search and open PythonShell
Step 10) You can verify whether the installation is accurate supplying the below command
import nltk
If you see no error, Installation is complete.
Installing NLTK in Mac/Linux
Installing NLTK in Mac/Unix requires python package manager pip to install nltk. If pip is not installed, please follow the below instructions to complete the process Step1) Update the package index by typing the below command
sudo apt update
Step2) Installing pip for Python 3:
sudo apt install python3-pip
You can also install pip using easy_install.
sudo apt-get install python-setuptools python-dev build-essential
Now easy_install is installed. Run the below command to install pip
sudo easy_install pip
Step3)Use following command to install NLTK
sudo pip install -U nltk sudo pip3 install -U nltk
Installing NLTK through Anaconda
Step1) Please install anaconda (which can also be used to install different packages) by visiting https://www.anaconda.com/products/individual and select which version of python you need to install for anaconda.
Note: Refer to this tutorial for detailed steps to install anaconda Step 2)In the Anaconda prompt,
Enter command conda install -c anaconda nltk
Review the package upgrade, downgrade, install information and enter yes NLTK is downloaded and installed
NLTK Dataset
NLTK module has many datasets available that you need to download to use. More technically it is called corpus. Some of the examples are stopwords, gutenberg, framenet_v15, large_grammarsand so on.
How to Download all packages of NLTK
Step 1)Run the Python interpreter in Windows or Linux Step 2)
Enter the commands
import nltk nltk.download ()
NLTK Downloaded Window Opens. Click the Download Button to download the dataset. This process will take time, based on your internet connection
NOTE: You can change the download location by Clicking File> Change Download Directory
Step 3) To test the installed data use the following code
from nltk.corpus import brown brown.words()
[‘The’, ‘Fulton’, ‘County’, ‘Grand’, ‘Jury’, ‘said’, …]
Running the NLP Script
We are going to discuss how NLP script will be executed on our local PC. There are many libraries for Natural Language Processing present in the market. So choosing a library depends on fitting your requirements. Here is the list of NLP libraries.
How to Run NLTK Script
Step1) In your favorite code editor, copy the code and save the file as “NLTKsample.py “
from nltk.tokenize import RegexpTokenizer tokenizer = RegexpTokenizer(r’\w+’) filterdText=tokenizer.tokenize(‘Hello Guru99, You have build a very good site and I love visiting your site.’) print(filterdText)
Code Explanation:
In this program, the objective was to remove all type of punctuations from given text. We imported “RegexpTokenizer” which is a module of NLTK. It removes all the expression, symbol, character, numeric or any things whatever you want. You just have passed the regular Expression to the “RegexpTokenizer” module. Further, we tokenized the word using “tokenize” module. The output is stored in the “filterdText” variable. And printed them using “print().”
Step2)In the command prompt
Navigate to the location where you have saved the file Run the command Python NLTKsample.py
This will show output as : [‘Hello’, ‘Guru99’, ‘You’, ‘have’, ‘build’, ‘a’, ‘very’, ‘good’, ‘site’, ‘and’, ‘I’, ‘love’, ‘visiting’, ‘your’, ‘site’]