Edge-Ready Semantic Enrichment via Quantization, Pruning, and Distillation

Javad Rahebi

Authors

Javad Rahebi Department of Computer Engineering, Isfahan University, Isfahan, Iran Author

Keywords:

On-device NLP, edge AI, quantization, pruning, knowledge distillation, entity linking, approximate nearest neighbors, energy-efficiency

Abstract

Semantic enrichment pipelines increasingly run on constrained devices (edge gateways, embedded SoCs) where data-residency, latency, and privacy preclude roundtrips to the cloud. Building on the bibliometric baseline of [12], we investigate edge-ready entity linking with three model compression levers: post-training quantization, magnitude pruning, and knowledge distillation. We design a two-stage linker—quantized bi-encoder retrieval followed by a micro cross-encoder reranker—equipped with calibration and cache-based reuse. Across three edge-like corpora (technical manuals, incident tickets, IoT logs), we retain 93–96% of macro-F1 while reducing energy by 55–66% and raising throughput 3–5×. We open-source figure scripts and tables that compile with this template.

Edge-Ready Semantic Enrichment via Quantization, Pruning, and Distillation

Authors

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Similar Articles

Latest publications

Browse

Information

Make a Submission

Language