SLIIT · Department of Information Technology · April 2026

Small Language Models
for Sri Lankan Legal Applications

AI-Powered Legal Intelligence — Accessible, Accurate & Affordable

This research presents an AI-driven framework leveraging Small Language Models (SLMs) integrated with Retrieval-Augmented Generation (RAG) and agentic architectures to democratize legal knowledge in Sri Lanka. The system spans four specialized domains: Labour & Employment law guidance, Property & Family law advisory, Criminal case outcome prediction, and intelligent Deed document verification.

4
Research Components
4
Team Members
93%
System Accuracy
SLM
Technology Core

Research Domain

A comprehensive exploration of legal AI systems tailored for the Sri Lankan legal context, combining SLMs, RAG frameworks, and agentic workflows.

Background & Literature Survey

The legal system plays a critical role in maintaining justice, fairness, and social order. In Sri Lanka, legal knowledge is confined to professionals or documented in complex texts, creating significant barriers for ordinary citizens. Labour disputes, property transfers, family disputes, and criminal litigation all require specialized understanding that most citizens lack.

Recent advancements in AI, particularly Natural Language Processing (NLP), offer promising solutions. The digitization of court judgments and legal documents has created opportunities for computational analysis. Research in Legal NLP has evolved from keyword-based systems to sophisticated transformer-based models like BERT, LEGAL-BERT, and domain-specific LLMs.

However, most existing systems — LawLLM (US), LawGPT (China), Swiss-BERT variants — are jurisdiction-specific and computationally expensive, limiting their applicability to Sri Lanka's unique legal ecosystem, which blends Roman-Dutch law, English common law, and customary traditions.

Key Findings from Literature

LLM-based approaches demonstrate strong reasoning but suffer from jurisdictional overfitting, high computational cost, and hallucination risks. They are not directly transferable to Sri Lanka.

RAG-based systems improve factual grounding but often lack structured output generation, validation mechanisms, and user-friendly interfaces essential for non-expert users.

Small Language Models offer a compelling balance — lower computational overhead, efficient fine-tuning via LoRA/QLoRA, and strong domain adaptation capabilities when trained on curated legal datasets.

FeatureQuick CheckLawRecLegal Query RAGOur System
Transformer ModelsNoYes (BERT)YesYes
RAG IntegrationNoNoYesYes
Sri Lankan FocusNoNoNoYes
Natural Language QueriesPartialPartialYesYes
Structured Legal OutputNoNoPartialYes
ScalabilityMediumMediumMediumHigh

Identified Research Gaps

Despite significant advances in legal AI globally, critical gaps remain for the Sri Lankan context:

  • No fine-tuned transformer models specifically for Sri Lankan Labour, Property, Family, or Criminal Law
  • Absence of RAG frameworks adapted to Sri Lankan legal datasets
  • No structured criminal outcome classification studies for Sri Lanka
  • Inability of existing systems to process natural language queries with high precision in local context
  • Lack of structured legal output (Act name, Section, Year, Case references)
  • No scenario-based or step-by-step legal explanation for non-experts
  • Absence of deed verification systems tailored to Sri Lankan property documents
  • Limited accessibility for non-expert users due to complex interfaces

Jurisdictional Concentration Problem

Most Legal NLP research concentrates on the United States Supreme Court, European Court of Human Rights, Chinese criminal courts, and Swiss Federal Supreme Court. These systems benefit from well-digitized databases and large labeled datasets. Sri Lanka, with its hybrid Roman-Dutch and English common law system, represents a significantly underexplored jurisdiction with unique challenges: limited digitized data, multilingual content (Sinhala, Tamil, English), inconsistent document formats, and no standard benchmark datasets.

Research Problem Statement

Despite the increasing need for efficient and accessible legal information systems, Sri Lanka currently lacks AI-driven legal frameworks that integrate modern NLP, transformer-based models, and Retrieval-Augmented Generation — specifically tailored for its unique legal domains.

This absence creates significant barriers to legal accessibility, reduces efficiency in legal research and decision-making, and contributes to inequality in access to legal knowledge among citizens and professionals. A survey of 40 participants (lawyers, law students, general public) revealed:

  • 90%+ identified the need for an intelligent legal decision-support system
  • Significant delays in accessing relevant legal documents
  • Difficulty interpreting fragmented and technical legal language
  • Lack of user-friendly systems supporting natural language queries

Main Objective

To develop specialized AI-based legal assistance systems for Sri Lankan law domains that provide reliable, context-aware, and structured legal guidance — combining fine-tuned Small Language Models with retrieval-augmented mechanisms grounded in authoritative legal sources.

Specific Objectives

  • Transform unstructured Sri Lankan legal resources into structured, machine-readable training datasets
  • Fine-tune transformer-based SLMs (Qwen3-8B, LEGAL-BERT-SMALL) on domain-specific datasets
  • Design consistent structured output formats ensuring legal elements (Act, Section, Year) are always present
  • Integrate RAG frameworks grounding outputs in verified legal documents via FAISS vector search
  • Enable practical recommendations for real-world workplace, property, family, and criminal disputes
  • Develop scalable deployment pipelines ensuring low-latency, production-level accessibility
  • Build user-friendly web interfaces accessible to non-expert users

System Methodology Overview

All four research components follow a unified, multi-layered methodology that integrates legal data engineering, model adaptation, retrieval design, system integration, and rigorous evaluation. The Agile development framework enables iterative improvement with measurable artifacts at each stage.

Phase 1 — Data Collection & OCR Processing

Weeks 1–4

Collection of legal materials from digital repositories, law books, and physical archives. OCR-based digitization of scanned documents with quality scoring. Multilingual handling (Sinhala, Tamil, English).

Phase 2 — Dataset Construction & Governance

Weeks 5–8

Cleaning, normalization, and JSONL formatting. Schema validation ensuring consistent instruction-context-output structure. Train/validation/test splitting with leakage prevention.

Phase 3 — Model Fine-Tuning

Weeks 9–14

LoRA/QLoRA-based fine-tuning using Unsloth. Domain adaptation for Qwen3-8B (legal recommendation), LEGAL-BERT-SMALL (criminal prediction). Structured output alignment training.

Phase 4 — RAG & Vector Indexing

Weeks 13–18

FAISS index construction from legal document embeddings. Document-diverse reranking. Agentic RAG with LangGraph orchestration: classify → retrieve → grade → generate → validate.

Phase 5 — System Integration & Evaluation

Weeks 19–24

FastAPI backend with modular microservices. React frontend. Multi-layer evaluation (model-level, retrieval-level, system-level). End-to-end testing and iterative refinement.

Core Technologies

The research employs a carefully selected technology stack balancing capability, efficiency, and deployability.

Qwen3-8BLEGAL-BERT-SMALL LoRA / QLoRAUnsloth FAISS Vector DBRAG Framework LangGraphFastAPI PostgreSQLReact 18 Sentence TransformersPyTorch HuggingFaceModal (GPU) OllamaOCR Pipeline Gemini EmbeddingsAdamW Optimizer TanStack QueryTailwind CSS PydanticLangChain
01

Labour & Employment Law Recommendation

SLM + RAG system accepting natural language queries, outputting structured legal recommendations with applicable Act, Section, Year, and analogous case scenarios.

IT22322326 — E. Niruththika
02

Criminal Case Outcome Prediction

LEGAL-BERT-SMALL fine-tuned on 890 Sri Lankan criminal judgments (2021–2025) for multi-class outcome classification — convicted, acquitted, sentence reduced, etc.

IT22049322 — Abiramy.T
03

Property & Family Law Guidance

Agentic RAG system providing step-by-step legal guidance for Property Law and Family Law — fine-tuned Qwen3-1.7B with 4,700+ structured JSONL entries.

IT22177032 — E.S. Mathusigan
04

Deed Document Verification Agent

Multi-agent template matching for 5 deed types (Sale, Gift, Mortgage, Power of Attorney, Testamentary). 99.13% classification accuracy with rule-based legal validation.

IT22030412 — A. Thuvaraga

Project Milestones

Track the progression of our research through key assessment milestones and deliverables.

Select Assessment

Project Proposal

Initial research proposal outlining problem statement, objectives, and planned approach

August 2025
Completed / Submitted

The project proposal established the foundational research framework for all four components. It defined the research problem — the lack of AI-driven legal systems tailored for Sri Lanka — and proposed an integrated approach combining Small Language Models with RAG architectures.

  • Defined research objectives and scope for all four sub-projects
  • Conducted preliminary literature review across Legal NLP, SLMs, and RAG systems
  • Proposed system architectures for each domain component
  • Identified data sources: Sri Lankan court databases, law books, regulatory documents
  • Received supervisor approval from Dr. Prasanna Sumathipala and Ms. Karthiga Rajendran

Progress Presentation I

First progress evaluation demonstrating initial implementation and data preparation

November 2025
Completed / Evaluated

The first progress presentation demonstrated the data pipeline, initial model experiments, and early system prototypes for all four research components.

  • Labour Law: FAISS vector database with 6,313 embeddings (100 documents indexed)
  • Criminal Law: 890-case dataset collected and structured into JSON format
  • Property/Family Law: 4,700 JSONL dataset entries prepared from legal materials
  • Deed Verification: 1,500+ deed samples labeled across 5 deed types
  • Initial fine-tuning experiments completed with baseline evaluations

Progress Presentation II

Second evaluation showing system integration, testing results, and refined models

January 2026
Completed / Evaluated

Demonstrated functional prototypes with integrated RAG pipelines, agent-based workflows, and initial evaluation metrics across all components.

  • Labour Law system: 93.5/100 end-to-end score, 100% schema compliance, 90% retrieval accuracy
  • Criminal system: LEGAL-BERT-SMALL achieving 67% accuracy, 0.61 Macro F1 across 11 classes
  • Property/Family Law: RAG system achieving best balance of accuracy and usability
  • Deed Agent: Fine-tuned classifier reaching 99.13% accuracy across 5 deed types
  • FastAPI backends operational with full LangGraph orchestration

Final Assessment

Complete system evaluation, final report submission, and comprehensive demonstration

April 2026
Submitted / April 2026

Final submission of all four research components with complete documentation, evaluation reports, and fully deployed web applications.

  • All four final reports submitted: Labour Law, Criminal Prediction, Property/Family Law, Deed Verification
  • Complete system testing: 103 tests executed (93% pass rate — 420/450 score, Excellent grade)
  • Deployed web applications for all components with React frontends and FastAPI backends
  • Comprehensive evaluation reports with multi-layer validation methodology
  • Research paper manuscripts prepared for academic submission

Research Viva

Oral defense and examination of the research work by panel

TBD — 2026
Upcoming / Scheduled

The research viva will involve a comprehensive oral examination by an academic panel evaluating the depth, validity, and significance of all four research components.

  • Presentation of full system capabilities and research contributions
  • Technical defense of methodology, model choices, and evaluation metrics
  • Discussion of limitations, ethical considerations, and future scope
  • Demonstration of live system across all four legal domains

Project Documents

All research documents produced throughout the project lifecycle. Click download to access each document.

📋

Project Charter

Formal project initiation document outlining scope, stakeholders, objectives, and governance structure for all four research components.

PDF · Charter Available
⬇ Download
📄

Project Proposal Document

Comprehensive research proposal covering literature review, problem statement, research objectives, methodology, and feasibility analysis.

PDF · Proposal Available
⬇ Download
🧾

Proposal Report — Labour Law (IT22322326)

E. Niruththika's individual project proposal report for the Labour & Employment Law recommendation system.

PDF · Proposal Available
⬇ Download
🧾

Proposal Report — Criminal Prediction (IT22049322)

Abiramy.T's individual project proposal report for the criminal case outcome prediction system.

PDF · Proposal Available
⬇ Download
🧾

Proposal Report — Property & Family Law (IT22177032)

E.S. Mathusigan's individual project proposal report for the property and family law guidance system.

PDF · Proposal Available
⬇ Download
🧾

Proposal Report — Deed Verification (IT22030412)

A. Thuvaraga's individual project proposal report for the deed document verification agent.

PDF · Proposal Available
⬇ Download
📊

Final Report — Labour Law (IT22322326)

E. Niruththika's final research report on the Labour and Employment Law Recommendation System using Qwen3-8B + RAG.

PDF · Final Report Available
⬇ Download
⚖️

Final Report — Criminal Prediction (IT22049322)

Abiramy.T's final research report on criminal judicial outcome prediction using LEGAL-BERT-SMALL on Sri Lankan High Court judgments.

PDF · Final Report Available
⬇ Download
🏠

Final Report — Property & Family Law (IT22177032)

E.S. Mathusigan's report on step-by-step legal guidance for Property and Family Law using Agentic RAG with Qwen3-1.7B.

PDF · Final Report Available
⬇ Download
📜

Final Report — Deed Verification (IT22030412)

A. Thuvaraga's report on the multi-agent deed template matching system achieving 99.13% classification accuracy.

PDF · Final Report Available
⬇ Download

Check List Documents

Assessment check lists and progress tracking documents for all project milestones and deliverables.

PDF · Checklist Available
⬇ Download
📈

Status Document — Progress Report

Consolidated progress status document covering all four sub-projects with current development milestones and results summary.

PDF · Status In Progress
⬇ Download

Presentation Slides

Slide decks from all research presentations across the project lifecycle.

Proposal Presentation

Initial research proposal presentation — problem, objectives, methodology overview

Progress Presentation I

First milestone presentation — dataset preparation, initial models, early results

Progress Presentation II

Second milestone — integrated systems, evaluation metrics, refined architectures

Final Presentation

Complete research findings, system demonstrations, conclusions, and future scope

Our Team

A dedicated research team from the Department of Information Technology, Sri Lanka Institute of Information Technology (SLIIT), working to make legal knowledge accessible to all Sri Lankans.

PS

Supervisor

Dr. Prasanna Sumathipala

Department of Information Technology

Sri Lanka Institute of Information Technology

KR

Co-Supervisor

Ms. Karthiga Rajendran

Department of Information Technology

Sri Lanka Institute of Information Technology

E. Niruththika
IT22322326

E. Niruththika

B.Sc. (Hons) Information Technology

Research Focus: Labour & Employment Law Recommendation System — Fine-tuned Qwen3-8B with FAISS-based RAG for structured legal recommendations including Act, Section, and Year identification.

✉ it22322326@my.sliit.lk
Abiramy T
IT22049322

Abiramy.T

B.Sc. (Hons) Information Technology

Research Focus: Criminal Case Outcome Prediction — LEGAL-BERT-SMALL fine-tuned on 890 Sri Lankan criminal judgments for 11-class judicial outcome classification (67% accuracy, 0.61 Macro F1).

✉ it22049322@my.sliit.lk
E.S. Mathusigan
IT22177032

E.S. Mathusigan

B.Sc. (Hons) Information Technology

Research Focus: Property & Family Law Step-by-Step Guidance — Qwen3-1.7B with Agentic RAG (LangGraph), 4,700+ JSONL training samples, three-backend comparative evaluation (SLM / RAG / Agentic RAG).

✉ it22177032@my.sliit.lk
A. Thuvaraga
IT22030412

A. Thuvaraga

B.Sc. (Hons) Information Technology

Research Focus: Deed Document Template Matching Agent — Multi-agent SLM system for 5 deed types (Sale, Gift, Mortgage, Power of Attorney, Testamentary). 99.13% classification accuracy with rule-based legal validation.

✉ it22030412@my.sliit.lk

Get In Touch

We'd love to hear from you

For research enquiries, collaboration opportunities, or questions about our legal AI systems, please reach out through any of the following channels.

🏛
Institution

Sri Lanka Institute of Information Technology (SLIIT)

📚
Department

Department of Information Technology

🌐
Project Website

cdap.sliit.lk

📧
Research Supervisor

Dr. Prasanna Sumathipala — SLIIT

📅
Academic Year

2025 / 2026 — Final Year Research Project