Who owns your embeddings? Fine-tuning and PDPL reality.
Faisal Al-Anqoodi · Founder & CEO
Embeddings and fine-tuned weights are not ordinary files. They are processing outputs that can redefine what your data means — and contracts often discuss the base model while ignoring what was generated for you.
A telecom signed to fine-tune on support tickets. Six months later, they asked to move weights in-house. The vendor said base model ownership is ours; you only own outputs. A conversation missing from the appendix began.
At Nuqta, we split three artifacts: stored embeddings, fine-tuned weights or adapters, and training logs. Each has a compliance path under Oman PDPL and an IP clause that must be explicit — not implied [1][2].
Embeddings vs fine-tuned weights.
Embeddings are vector representations of text or documents; they can reflect sensitive content if sources are sensitive [3]. Fine-tuned weights adjust model parameters for behavior; they can carry training-data influence indirectly [4].
Mixing the two in contracts breeds dispute: who may reuse, who may delete on exit?
Owning the base model does not mean owning what your data produced. Write one contract sentence per artifact — or accept the unknown.
Qualitative risk matrix.
Negotiation playbook.
- Ask for an artifact list: files, sizes, formats, storage location.
- Define export rights at contract end with a numeric deadline, not mutual agreement later.
- Tie deletion to verification: wipe certificate or key destruction.
- Separate sandbox from production; never fine-tune on production data without explicit consent [1].
Closing.
Private AI starts with owning the data path — then owning processing outputs. If that stays fuzzy, you pay twice: once to the vendor, once to counsel later.
Before any embedding or fine-tuning project this month, send the vendor a two-row table: who owns the file, who owns deletion rights. If they cannot answer on one page, the answer is rarely in your favor.
Frequently asked questions.
- Are embeddings personal data? They can be if re-identification is easy; treat cautiously and involve counsel [1].
- Can I port weights to another model? Depends on base license and contract; do not assume [4].
- What about cloud? Demand region clauses for embeddings as you do for raw data [2].
- How does this tie to RAG? Embeddings are part of the index; read the RAG guide.
- What is the first vendor question? Show me the embedding path from document to vector — on one page.
Sources.
[1] Sultanate of Oman — Personal Data Protection Law (Royal Decree 6/2022).
[2] Sultanate of Oman — Executive Regulation to the Personal Data Protection Law (Ministerial Decision 34/2024).
[4] Hugging Face — Model cards and licensing guidance.
[5] Nuqta — internal IP boundary notes for training outputs, April 2026.
Related posts
- Oman's Personal Data Protection Law (2022) and its impact on AI.
AI does not run in a legal vacuum. Oman's PDPL (Royal Decree 6/2022) changed how teams collect data, train models, and move personal data across borders. The key question is no longer only "is the model accurate?" but also "is its data lifecycle lawful?"
- Digital sovereignty: why your data should stay in Oman.
When you send your customers' data to a server in Frankfurt or Virginia, you are not hosting it. You are handing it over. The difference is not technical.
- What is fine-tuning — and how it differs from prompting.
Half the meetings say "we will tune the model" while they mean "we will rewrite the prompt." The two complement each other — but one changes the text going in, and the other can change the model's weights. That distinction clarifies the decision and saves you from training costs you did not need.
- Shadow AI — governing unsanctioned use in GCC enterprises.
This is not a lecture aimed at employees. It is what happens when the consumer assistant becomes the default way to work — with no processing record, no approved alternative, and no checkpoint linking IT to compliance.
- Prompt injection and corpus poisoning — the RAG gap vendors smooth over.
A normal-looking document hides instructions that derail policy or leak index content. This is not sci-fi — it is a realistic attack pattern that needs operational defense, not a marketing disclaimer.
Share this article