# What is RAG — and why your company bot answers like a stranger.


*AI · Infrastructure · April 2026 · 7 min read*


A practical guide to Retrieval-Augmented Generation: how your bot reads documents before answering, and why it costs 10× less than fine-tuning.

Khalid, IT director at an Omani logistics firm, asked the internal bot: What was the cost of shipping container 4429 to Salalah port last quarter? The bot replied: I am a large language model; I do not have access to your company data. Khalid was angry. Not because the bot was ignorant, but because the company had paid eighty thousand riyals for a smart bot that could not even read the internal pricing sheet.

This story repeats every month. And the reason is not the model. The reason is that we train the bot on the internet, then are surprised when it does not understand our internal context.


## What RAG is — retrieval before generation.
RAG stands for Retrieval-Augmented Generation [1]. The practical meaning: the bot reads your documents before it answers. It does not memorize them. It retrieves them.

The large language model (LLM) knows how to speak. But it knows nothing about your contracts, your prices, or your internal staff policies. RAG gives it external memory: a document database searched before generation. The result: an answer grounded in a real document, not a general guess [2].


## RAG vs Fine-tuning — a decision framework, not a technical comparison.
Fine-tuning means retraining the model on your data. It costs tens of thousands of dollars. It takes weeks. And it produces a static model: if you change your contract next month, the model does not know. You must retrain [3].

RAG means leaving the model as-is and connecting it to a document repository. The cost is ten times lower. Updates are instant: change the contract? Just upload the new file. The bot reads it in real time [4].

> Fine-tuning teaches the model how to speak in your dialect. RAG gives it what to say from your documents. Most Arab companies need RAG, not fine-tuning.


## The RAG pipeline in Oman — how we actually build it.
At Nuqta, we build RAG for clients in Muscat in four stages: extract text from PDF, Word and Excel; split documents into small chunks; convert chunks into vectors (embeddings) using an Arabic model; then semantic search + answer generation [5].

The third stage is the backbone. The Arabic model reads each document chunk and converts it into a mathematical vector. When an employee asks a question, the question is converted into a vector, then the database searches for the nearest vectors. The result: precisely relevant documents, even if the question does not use the same words.

*[Figure: FIG. 1 — RAG PIPELINE FOR OMANI ENTERPRISE]*


## Real costs — not cheap, but ten times cheaper than fine-tuning.
On a private stack in Muscat, the cost of running RAG for a medium-sized company: a CPU server for extraction and splitting already exists. A small GPU for embeddings: NVIDIA T4 or A10, rental cost ~$400–800 per month [6]. Vector database: Qdrant or Chroma, free or nominal price. Language model: runs on the same GPU or called via internal API.

Total: less than one thousand dollars per month. Compared to fine-tuning, which costs ten thousand dollars for a single run and needs repeating every month [3].


## Honesty: RAG is not magic.
Do not believe anyone who says the bot now knows everything. RAG fails in three clear cases: unstructured documents, questions outside the document scope, and dialectal Arabic [7].

Most embedding models understand formal Arabic, but struggle with Omani colloquial dialect. At Nuqta we are testing local Arabic models to solve this.


## The invitation.
If you are building an internal bot for your company in Muscat, Riyadh or Dubai, do not start with the model. Start with the documents. Sit with your team and ask: what are the ten documents our employees answer thousands of times per year? That is where RAG begins. The rest is technology we build for you.

If you want a real assessment of an RAG architecture that fits your company — no quote, no contract — write to us at hello@nuqtai.com. We sit for an hour, review your documents, and give you a roadmap.


## Frequently asked questions.
What is the difference between RAG and a normal database? A normal database searches by keywords. RAG searches by meaning. If you ask: what is the cost of the heavy shipment? a database may fail if the word heavy is not in the record. RAG understands that heavy is close to 40-foot container and retrieves the right record [4].

Does RAG need the internet? No. It can run entirely on internal servers in Oman. This is what we build at Nuqta under the name Private AI. Your data never leaves your control [7].

How many documents are needed for RAG? There is no minimum. We started with 50 pages for one client. But quality matters more than quantity. One clear document is better than a thousand messy pages.

Can RAG be connected to WhatsApp Business API? Yes. This is exactly what we do in our Al-Dhaki product. The customer sends a question via WhatsApp, and the bot searches company documents and replies in Gulf dialect [8].

How long does it take to build RAG? Four to eight weeks for a working prototype. The version launched to employees — after internal testing — may need two more weeks.