Skip to content

Setup and Run Local LLM using Ollama

In this blog, we'll guide you through the installation of Ollama and running a local LLM using Ollama on a Linux-based operating system. Our setup involves a single node.

Let's start with a step-by-step process, starting from creating virtual machines on the Alces Cloud platform, leading up to the installation of Ollama and then running a LLM model.

Launch the Instance

All the steps to launch and connection to instance is provided in link.

Note

Just Make sure port 22, 11434 is opened as ingress rule in security group that is attached to the instance. If firewalld is enabled on the server, in order to allow external connection on your ollama api endpoint runs on port 11434 by default we need to update the setting by running the below command

$ sudo firewall-cmd --zone=public --add-port=11434/tcp --permanent

To apply the changes and reload the firewalld run below command

$ sudo firewall-cmd --reload

Setup Ollama

Ollama can be quickly installed using a one-liner command, use below command.

$ curl -fsSL https://ollama.com/install.sh | sh

Note

In order to install on other Operating System, use the link

Execute Model

In our example we will pulling and running a the gemma:2b model using ollama, Gemma is a family of lightweight, state-of-the-art open models built by Google DeepMind. These models run on diverse devices and are adaptable through fine-tuning, making AI accessible to more users.

Models Supported by Ollama

Ollama provides a selection of models accessible via ollama.com/library.

Below are a few sample models that users can use:

Model Parameters Size Download
Llama 2 7B 3.8GB ollama run llama2
Mistral 7B 4.1GB ollama run mistral
Dolphin Phi 2.7B 1.6GB ollama run dolphin-phi
Phi-2 2.7B 1.7GB ollama run phi
Neural Chat 7B 4.1GB ollama run neural-chat
Starling 7B 4.1GB ollama run starling-lm
Code Llama 7B 3.8GB ollama run codellama
Llama 2 Uncensored 7B 3.8GB ollama run llama2-uncensored
Llama 2 13B 13B 7.3GB ollama run llama2:13b
Llama 2 70B 70B 39GB ollama run llama2:70b
Orca Mini 3B 1.9GB ollama run orca-mini
Vicuna 7B 3.8GB ollama run vicuna
LLaVA 7B 4.5GB ollama run llava
Gemma 2B 1.4GB ollama run gemma:2b
Gemma 7B 4.8GB ollama run gemma:7b

Note

To effectively operate the 7B models, a minimum of 8 GB of RAM is recommended, while 16 GB is advisable for the 13B models, and 32 GB is necessary for the 33B models.

Pull Model

  • To begin, let's ensure that the model is available on the local machine. Run the following command:

    $ ollama list
    NAME        ID              SIZE    MODIFIED
    

  • If the model is not present, we can fetch it locally using the command:

    $ ollama pull gemma:2b
    

  • Once the model has been pulled, verify its presence on the machine by running:

    $ ollama list
    NAME        ID              SIZE    MODIFIED
    gemma:2b    b50d6c999e59    1.7 GB  About a minute ago
    

Run Model

  • To execute the model, simply use the following command. If the model is not already available locally, it will be automatically pulled before initiating the execution:

    $ ollama run gemma:2b
    >>> What is linux?
    Linux is an open-source operating system that is used on a wide range of computers, including desktops, laptops, and servers. It is known for its flexibility, reliability, and security. Linux is also used in
    a variety of software applications, such as web browsers, operating systems, and productivity tools.
    

  • Above command starts the model and provides a prompt in the terminal, allowing users to input queries that can be answered by the Language Model (LLM).

Note

For more details on Ollama, please refer to link