Stanford CoreNLP is a great Natural Language Processing (NLP) tool for analysing text. Given a paragraph, CoreNLP splits it into sentences then analyses it to return the base forms of words in the sentences, their dependencies, parts of speech, named entities and many more. Stanford CoreNLP not only supports English but also other 5 languages: Arabic, Chinese, French, German and Spanish. To try out Stanford CoreNLP, click here.
Stanford CoreNLP is implemented in Java. In some cases (e.g. your main code-base is written in different language or you simply do not feel like coding in Java), you can setup a Stanford CoreNLP Server and, then, access it through an API. In this post, I will show how to setup a Stanford CoreNLP Server locally and access it using python.
1) Download Stanford CoreNLP
To download Stanford CoreNLP, go to https://stanfordnlp.github.io/CoreNLP/index.html#download and click on “Download CoreNLP”. The latest version of Stanford CoreNLP at the time of writing is v3.8.0 (2017-06-09).
Once the download has completed, unzip the file using the following command:
2) Install Java 8 (if not installed)
Stanford CoreNLP is implemented in Java 8. Hence, you need at least that version to be able to use it. You can know the version of your Java by executing java -version in terminal. If the version was 1.8+, then you are good to go. Otherwise, you need to install Java 8.
I will cover installing Java 8 on Mac and on Linux (locally, i.e. without installing it on the system; therefore, without sudo rights).
Installing Java 8 on Mac
Installing Java 8 on Mac is dead easy using brew. In the terminal, run the below commands and you will have Java 8 installed in no time.
brew update brew install jenv brew cask install java
Installing Java 8 on Linux (locally)
Before installing, you need to download Java from Oracle’s website. You have to accept and agree to their license agreement before you proceeding with the download. Once you have downloaded Java 8 for your platform, extract it using tar -xzvf jdk-8u144-linux-x64.tar.gz . Whenever you want to use Java 8, you must add the “bin” folder inside of the extracted Java 8 folder to your PATH environment. This can be achieved by running export PATH=~/java8/jdk1.8.0_144/bin:$PATH in the terminal.
3) Running Stanford CoreNLP Server
Now, we have our environment ready to fire up Stanford CoreNLP Server. To do so, go to the path of the unzipped Stanford CoreNLP and execute the below command:
java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -annotators "tokenize,ssplit,pos,lemma,parse,sentiment" -port 9000 -timeout 30000
Voilà! You now have Stanford CoreNLP server running on your machine.
4) Accessing Stanford CoreNLP Server using Python
You can access a Stanford CoreNLP Server using many other programming languages than Java as there are third-party wrappers implemented for almost all commonly used programming languages.
For simplicity, I will demonstrate how to access Stanford CoreNLP with Python. In this code, I am using the python package “stanfordcorenlp”. Below is a sample code for accessing the server and analysing some text.
In conclusion, Stanford CoreNLP is a very useful toolkit for analysing and annotating texts, it is widely used by researchers and enterprises. This post is provided as a basic tutorial for setting up and using Stanford CoreNLP to analyse some text. I hope this post facilitated the setting up process on you. Finally, have fun processing and analysing texts 🙂