Skip to content
Linpeng Tang

Linpeng Tang




Hi, I am an AI researcher and engineer currently working at Shanghai Institute of Advanced Algorithms Research, focusing on the intersection of LLMs and data systems (Data-Centric AI).

Previously, I co-founded an AI startup Moqi Technology and served as its CTO, where I led the development of MyScaleDB, a highly-efficient SQL-vector AI databases and large-scale label-free fingerprint systems. Prior to that, I consulted for the systems team at Meta (Facebook), developing high-performance systems that supported massive global multimedia distribution.

I have a long-standing commitment to the deep integration of AI and systems. I hold a PhD in Computer Science from Princeton University advised by Prof. Kai Li, and my work has been recognized with honors including the WAIC SAIL Award and 1st place in the KDDCup.

Latest Posts

Agentic RL Cover
04/26/26Agentic RL: A New Paradigm for Self-Evolving Large Models (Part I)

A deep dive into Agentic Models, exploring the necessity of Reinforcement Learning, the evolution of reward engineering, and the progression of RL algorithms.

→ Read More Posts

Experience

Institute of Advanced Algorithms Research
Data&AI Center | 2024 - Present

Moqi Technology
Co-founder & CTO | 2016 – 2024

Meta
Research Consultant | 2013 – 2016

HP Labs Beijing
Research Intern | 2011 - 2012

Education

Princeton University
Ph.D. in Computer Science | 2012 – 2018
Advisor: Prof. Kai Li (Member of National Academy of Engineering, Foreign Member of Chinese Academy of Engineering)

Shanghai Jiao Tong University
B.S. in Computer Science, ACM Class | 2008 – 2012

Products & Projects

Data-Centric AI Platform

2024 – Present

  • Led overall product architecture design and key project delivery, guiding the team to build a new generation of Agentic LLM & Data infrastructure.
  • Pioneered the development and implementation of a multimodal data intelligence pipeline system based on agents and the DataFlow data preparation framework. Built-in with 150+ intelligent operators, it supports natural language conversational automated pipeline orchestration, enabling highly efficient and flexible processing of massive heterogeneous data.
  • Addressing the high-risk hallucination challenge of LLMs in scientific and industrial scenarios, constructed a high-fidelity data synthesis and feedback system based on multi-level environments (including rule filtering, knowledge graphs, simulations, and external system verification).
  • Disrupted the traditional data engineering paradigm that consumes 90% human effort, significantly lowering the barrier to producing AI-ready datasets. Successfully deployed in multiple benchmark scenarios such as industrial manufacturing, multimodal corpus management, and scientific corpora, drastically reducing the costs for enterprises to build specialized agents and LLMs.

MyScaleDB AI Database

2020 – Present

  • Responsible for defining product technical architecture, leading core vector search algorithm design and core engine R&D, creating a world-leading open-source AI database system.
  • Pioneered the concept of an AI database in the industry, innovatively achieving integrated management and joint retrieval of PB-level structured and unstructured data (vectors, graphs, text, spatio-temporal, etc.) within a single SQL kernel based on a columnar data engine.
  • Self-developed the MSTG vector engine and deeply combined it with a high-performance NVMe SSD memory caching mechanism for software-hardware co-optimization. While ensuring millisecond-level complex joint queries, achieved a 10x increase in vector data storage density.
  • Successfully implemented in large-scale knowledge base constructions for industrial manufacturing, AI for Science, and financial auxiliary decision-making, providing exceptional cost-effectiveness for massive corpora and widely used in a global SaaS.

Contactless Fingerprint & Palmprint Capture Device

2018 – 2022

  • Led the product definition of the world's first large-area, high-quality contactless fingerprint and palmprint capture terminal. Guided the team to overcome core technical challenges such as 3D reconstruction and complex optical image enhancement.
  • Combining binocular vision with a self-developed structured light system, achieved sub-millimeter high-precision 3D reconstruction of fingers. Introduced multi-source, multi-band optical designs and deep learning image enhancement algorithms, substantially breaking through ambient light interference.
  • Successfully disrupted industry pain points and technical bottlenecks of traditional contact-based capture, launching revolutionary contactless capture terminal devices, and driving inter-generational technological upgrades in security biometric capture hardware.

Massive Fingerprint Identification System

2015 – 2022

  • Responsible for core system architecture design and deep learning model R&D for massive fingerprint and palmprint matching.
  • Pioneered a multi-scale vector representation scheme, innovatively introducing an Active Deep Learning mechanism to drive model self-optimization and iteration. With joint CPU and GPU acceleration, broke through the technical bottleneck of 100-billion scale multi-scale feature indexing.
  • Improved the speed, accuracy, and automation of massive complex biometric feature retrieval by over 100 times. Successfully deployed at the National Fingerprint Center, generating significant social impact.

Video Popularity Prediction System

2015 – 2017

  • Responsible for high-performance algorithm design and implementation for Facebook's massive-scale video traffic trend prediction.
  • Self-developed a high-performance time-series probabilistic prediction model, deeply coupling and optimizing it with the underlying video compression strategy flow and real-time cache scheduling pipeline.
  • Achieved real-time accurate prediction of large-scale video popularity, improving prediction accuracy by over 10%. Supported Facebook in adopting smarter video compression conversion schemes and efficient cache scheduling, reducing system consumption while enhancing user viewing experience on the platform.

RIPQ Caching System

2013 – 2015

  • Responsible for core algorithm design and system implementation of a large-scale cache scheduling system based on SSD storage.
  • Pioneered the Restricted Insertion Priority Queue (RIPQ) caching algorithm, cleverly resolving the inherent non-sequential write amplification and sharp performance drop issues of traditional cache eviction mechanisms on Solid State Drives (SSDs) from the bottom layer.
  • Built a next-generation intelligent caching system with extremely low write amplification and high throughput features. Successfully deployed in Facebook's global CDN edge nodes and core caching systems, increasing cache hit rates by over 20% in large-scale concurrent environments, optimizing network request latency, and saving massive bandwidth costs.

Awards & Honors

  • First Prize, WAIC World Artificial Intelligence Conference (2024)
  • First Prize, HICOOL Global Entrepreneur Summit (2022)
  • Best Student Paper Award, CIKM (2012)
  • 1st Place, KDDCup Data Mining Competition (2012)
  • Fu Di Scholarship (2011)
  • National Scholarship (2010)
  • Schneider Scholarship (2009)