Setup and Run Local LLM using Ollama
In this blog, we'll guide you through the installation of Ollama and running a local LLM using Ollama on a Linux-based operating system. Our setup involves a single node.
Let's start with a step-by-step process, starting from creating virtual machines on the Alces Cloud platform, leading up to the installation of Ollama and then running a LLM model.
Launch the Instance
All the steps to launch and connection to instance is provided in link.
Note
Just Make sure port 22, 11434
is opened as ingress rule in security group that is attached to the instance. If firewalld
is enabled on the server, in order to allow external connection on your ollama api endpoint runs on port 11434
by default we need to update the setting by running the below command
$ sudo firewall-cmd --zone=public --add-port=11434/tcp --permanent
To apply the changes and reload the firewalld run below command
$ sudo firewall-cmd --reload
Setup Ollama
Ollama can be quickly installed using a one-liner command, use below command.
$ curl -fsSL https://ollama.com/install.sh | sh
Note
In order to install on other Operating System, use the link
Execute Model
In our example we will pulling and running a the gemma:2b
model using ollama, Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind. These models run on diverse devices and are adaptable through fine-tuning, making AI accessible to more users.
Models Supported by Ollama
Ollama provides a selection of models accessible via ollama.com/library.
Below are a few sample models that users can use:
Model | Parameters | Size | Download |
---|---|---|---|
Llama 2 | 7B | 3.8GB | ollama run llama2 |
Mistral | 7B | 4.1GB | ollama run mistral |
Dolphin Phi | 2.7B | 1.6GB | ollama run dolphin-phi |
Phi-2 | 2.7B | 1.7GB | ollama run phi |
Neural Chat | 7B | 4.1GB | ollama run neural-chat |
Starling | 7B | 4.1GB | ollama run starling-lm |
Code Llama | 7B | 3.8GB | ollama run codellama |
Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
Llama 2 13B | 13B | 7.3GB | ollama run llama2:13b |
Llama 2 70B | 70B | 39GB | ollama run llama2:70b |
Orca Mini | 3B | 1.9GB | ollama run orca-mini |
Vicuna | 7B | 3.8GB | ollama run vicuna |
LLaVA | 7B | 4.5GB | ollama run llava |
Gemma | 2B | 1.4GB | ollama run gemma:2b |
Gemma | 7B | 4.8GB | ollama run gemma:7b |
Note
To effectively operate the 7B models, a minimum of 8 GB of RAM is recommended, while 16 GB is advisable for the 13B models, and 32 GB is necessary for the 33B models.
Pull Model
-
To begin, let's ensure that the model is available on the local machine. Run the following command:
$ ollama list NAME ID SIZE MODIFIED
-
If the model is not present, we can fetch it locally using the command:
$ ollama pull gemma:2b
-
Once the model has been pulled, verify its presence on the machine by running:
$ ollama list NAME ID SIZE MODIFIED gemma:2b b50d6c999e59 1.7 GB About a minute ago
Run Model
-
To execute the model, simply use the following command. If the model is not already available locally, it will be automatically pulled before initiating the execution:
$ ollama run gemma:2b >>> What is linux? Linux is an open-source operating system that is used on a wide range of computers, including desktops, laptops, and servers. It is known for its flexibility, reliability, and security. Linux is also used in a variety of software applications, such as web browsers, operating systems, and productivity tools.
-
Above command starts the model and provides a prompt in the terminal, allowing users to input queries that can be answered by the Language Model (LLM).
Note
For more details on Ollama, please refer to link