Updating(更新中) Limin Fu (傅利民)
Home Biography Projects Research Misc. 中文版
主页 简历 项目 研究 其它 English Version

Welcome to my home page. My name is Limin Fu. This page is setup to introduce myself and my work. I have a multidisciplinary background, spanning from mathematics, through physics, computer science and bioinformatics. I am interested in many things, and enjoy programming to do cool things.

Some Work

Dao DaoStudio DaoGraphics CD-HIT FLAME
Programming Language and Virtual Machine Integrated Development Environment Graphics Engine High Performance Sequence Clustering Data Clustering Algorithm

欢迎访问我的个人主页。我是傅利民。这个主页主要是用来介绍我自己和我的工作。 我的专业背景有点独特,学过数学,物理,计算机和生物信息。 我兴趣爱好比较广泛,热爱编程,喜欢钻研和开发有意思有挑战性的东西。


Dao DaoStudio DaoGraphics CD-HIT FLAME
程序语言和虚拟机 集成开发环境 图形引擎 高性能序列聚类 数据聚类算法

I am from south central China (Huarong in northen Hunan, just about 20km from Yangtze River).

I studied Mathematics at Fudan University (Shanghai) in 1996. Then I studied Physics in graduate school for about one year before going to ICTP (Trieste, Italy) to study in a multidisciplinary master program in 2001. Two years after, I went to the University of Turin (Italy) for my PhD study, and worked in a lab located in a cancer research center.

After PhD study, I worked for about one year and half in a group lead by Prof. Riccardo Zecchina in the ISI Foundation (Turin, Italy), then I moved to UC San Diego in 2009 and this group moved to Polytechnic of Turin. Since 2015, I joined Research Center for High performance Computing at Shenzhen Institutes of Advanced Technology (SIAT).


  • 2003-2007, University of Turin (Italy), Bioinformatics (PhD)
  • 2001-2003, ICTP/SISSA (Trieste, Italy), Modeling and Simulation (MS)
  • 2000-2001, Fudan University (Shanghai), Physics
  • 1996-2000, Fudan University (Shanghai), Mathematics (BS)


  • University of California, San Diego, USA
    Postdoctoral Associate, 2009-Now

    Developed a novel parallelization technique for CD-HIT that can achieve quasi-linear speedup on multi-core machines;

    Developed tools for sequence clustering, duplicate detection etc.;

    Worked on RNA-SEQ assembling and quantification etc.;

  • Institute for Science Interchange (ISI), Turin, Italy
    Junior Researcher, 2008

    Worked on nonparametric Belief Propagation (NBP) Algorithm based on a Gaussian Mixture Reduction (GMR) algorithm.

    Application and test of this NBP algorithm on computer vision stereo matching problem and 2D electrophoresis gel image alignment problem.

  • Institute for Cancer Research and Treatment (IRCC, abbrev. from Italian), University of Turin, Italy
    PhD Student, 2003-2007

    Worked on clustering algorithms and microarray gene expression data analysis;

    Developed a novel fuzzy clustering algorithm named FLAME that preserves the local structures of datasets when mapping from feature spaces to the fuzzy membership spaces in a way similar to LLE (Locally Linear Embedding).

    Developed a software with graphical interface which implemented FLAME and several other clustering algorithms, as well as several gene expression data analysis and visualization techniques.

  • The Abdus Salam International Center for Theoretical Physics(ICTP), Trieste, Italy
    MS Student, 2001-2003

    Worked on the identification of "transcription modules" in large gene expression datasets;

    Worked on decaffeination process modeling and optimization (summer project).


我本科在上海复旦大学学数学专业(96级)。 研究生期间在复旦读了一年物理硕士,然后去了在意大利Trieste的下属联合国教科文组织的 ICTP(国际理论物理研究中心)参加一个跨学科的硕士课程(2001)。 后来又去了意大利都灵大学读博士(2003), 并在下属都灵大学的肿瘤研究治疗中心做生物信息方面的研究。

2007年底博士毕业后我在ISI Foundation 的Riccardo Zecchina教授(现在都灵理工) 的组里做一年半研究。之后于2009年去了加州大学圣地牙哥分校做博士后研究。 从2015开始, 我加入到了中国科学院深圳先进技术研究院(SIAT)的高性能计算技术研究中心


  • 2003-2007, 都灵大学 (意大利都灵), 生物信息 (博士)
  • 2001-2003, ICTP/SISSA (意大利Trieste), 建模与模拟 (硕士)
  • 2000-2001, 复旦大学 (上海), 物理
  • 1996-2000, 复旦大学 (上海), 数学 (学士)


  • 加州大学圣地牙哥分校
    Postdoctoral Associate, 2009-2013

    • 研究开发了一个新的并行技术,可在多核机器上对一个高度顺序性的 CD-HIT序列聚类算法实现准线性的加速;这个并行技术具有通用性,可用来 加速其他有类似顺序特性的算法;
    • 研发了一个新的基于概率模型的基因组序列组装算法;开发一系列 高效的序列聚类和重复片度检测的工具; 也研究了新的从头(de-novo) RNA-SEQ组装和量化算法;
    • 开发了一个基于LLVM的即时编译器和一个基于Clang的C/C++库自动封装工具; 另外还开发了支持任意精度的二维矢量图形和基本三维图像的图形引擎, 以及多个Dao语言的模块和工具。
  • Institute for Science Interchange (ISI), 意大利都灵
    Junior Researcher, 2008

    • 开发过Belief Propagation (NBP)算法的非参数版本,使用了高斯混合模型来表示 连续的概率分布,并开发了一个高效的高斯混合模型简化算法(Gaussian Mixture Reduction, GMR);
    • 成功地将这个非参数Belief Propagation算法应用到了计算机视觉立体匹配(stereo matching) 问题上和2D电泳图的匹配问题上;
    • 继续设计开发Dao编程语言,加了不少新特性和改进,并开发了一个基于Qt4 的集成开发环境的初步版本;
  • Institute for Cancer Research and Treatment (IRCC), 意大利都灵大学
    PhD Student, 2003-2007

    • 研究开发了一个新的模糊聚类算法,名为FLAME (Fuzzy clustering by Local Approximation of MEmberships); 这个聚类算法的创新点在于通过构建数据点的邻近图,并在图上作类成员向量的 局部近似优化而算出数据点的最优模糊成员向量;通过这种方式,最终的聚类结果 能保留原始数据的局部拓扑结构;
    • 开发了一个带图形界面的基因表达数据的分析和可视化软件; 该软件支持多个标准的聚类算法和一些基于基因语义学(Gene Ontology)的功能分析;
    • 开始了开发名为Dao的新编程语言的开源项目,设计并实现了一个基于寄存器 的虚拟机;在此基础上实现了支持类型推断的可选类型标注,类BNF语法宏 和同步垃圾回收等。

Open Source Projects:

Programming has been an integrated part of my work (and my hobby:)) since graduate school, so over the years, I have created a number of open source projects. Most of them are related to Dao, but in attempt to create various modules and bindings for Dao, I managed to expand the projects to touch various types of libraries and applications.

Dao - Programming Language and Virtual Machine

Dao is a lightweight and optionally typed programming language with many interesting features. It includes features that can make concurrent programming much simpler. It has well designed programming interfaces for easy embedding and extending.

  • Optional typing with type inference and static type checking;
  • Object-Oriented Programming (OOP) with classes and interfaces;
  • Code section methods as a better alternative to functional methods;
  • Native support for concurrent programming;
  • Concurrent garbage collection;
  • Support closures and anonymous functions;
  • Designed and implemented as a register-based virtual machine;
  • Portable implementation using standard C;
  • Simple C programming interfaces for easy embedding and extending;

Size: about 60K lines of C codes.
Links: daoscript.org, github.com/daokoder/dao.

DaoJIT - Just-In-Time (JIT) Compiler Using LLVM

ClangDao - Automatic Binding Tool Using Clang Frontend

ClangDao is a tool that can be used to automate the generation of Dao language bindings from the header files of C/C++ libraries. It uses the Clang (clang.org) frontend to parse header files. It has been used to generate bindings for over a dozen of libraries.

Size: about 8K lines of C++ codes.
Links: daoscript.org, github.com/daokoder/dao-tools.

DaoStudio - Integrated Development Environment for Dao

DaoStudio is an integrated development environment for Dao. It uses the Qt4 framework.

Size: about 11K lines of C++ codes.
Links: daoscript.org, github.com/daokoder/daostudio.

DaoGraphics - Dao Graphics Engine

DaoGraphics is a lightweight graphics engine written in C with interfaces to Dao.

  • Support for both 2D and 3D graphics;
  • Resolution independent 2D vector graphics;
  • Support animation, particle system and terrain generation etc.;
  • Support OpenGL 3.1+ and OpenGL ES3;
  • Minimum dependency (Dao and GLFW3);

Size: about 22K lines of C codes.
Links: daoscript.org, github.com/daokoder/DaoGraphics.

DaoSQL - Module for Accessing SQL Databases

CD-HIT - DNA/Protein Sequence Clustering Program

CD-HIT is a popular program for clustering DNA/protein sequences originally developed by Dr. Weizhong Li. It implements a greedy algorithm to cluster sequences incrementally and efficiently. I toke over the development in 2009 and rewrote most parts, in addition, I developed a novel parallelization technique that can speedup the clustering algorithm almost linearly on multi-core machines.

Links: cd-hit.org, cdhit.googlecode.com.

FLAME - Fuzzy clustering by Local Approximation of MEmberships

This FLAME is NOT the malware that surfaced last year:), it is a data clustering algorithm I developed a few years ago. FLAME defines clusters in the dense parts of a dataset and perform cluster assignment solely based on the neighborhood relationships among objects.

Links: flame-clustering.googlecode.com.

GEDAS - Gene Expression Data Anlysis Studio


从研究生阶段开始,我的学习和工作(以及主要爱好)就和编程密不可分。 因此,这些年来,我创建了不少开源项目。 其中最大的是Dao程序语言开发项目,其它项目也大都和Dao程序语言相关, 目标是建立一套Dao语言能使用的模块和工具。 这些项目的开发过程,使得我有机会涉猎不少不同的领域及相关的库和应用程序。

Dao - 程序语言和虚拟机

道(Dao)语言是一个轻量级、支持可选类型标注的程序语言。它支持很多高级特性, 对基于多核的并行编程有很好的支持。它的C编程接口简单易用,方便嵌入或扩展。


  • 支持可选类型标注,类型推导和静态检查;
  • 支持基于类和接口的面向对象编程;
  • 支持代码块方法(类似函数式方法);
  • 支持闭包和匿名函数;
  • 对并行编程有内置的原生支持;
  • 有基于同步垃圾回收的内存管理;
  • 设计和实现为基于寄存器的虚拟机;
  • 使用跨平台的标准C实现;
  • 有简单易用的C编程接口,方便嵌入或扩展;
  • 有基于LLVM的及时编译器;
  • 有基于Clang的自动封装工具;

代码规模: 约 6.2 万行C代码.
项目链接: daoscript.org, github.com/daokoder/dao.

DaoJIT - 基于LLVM的即时编译器

ClangDao - 基于Clang前端的自动封装工具

ClangDao是一个可以从C/C++库的头文件自动(或半自动)地生成Dao语言的捆绑封装的工具。 它使用Clang(clang.org)前端的来解析C/C++头文件。 它已被用来生成十多个库的封装模块。

代码规模: 约 8 千行 C++ 代码.
项目链接: daoscript.org, github.com/daokoder/dao-tools.

DaoStudio - Dao语言的集成开发环境

DaoStudio是Dao语言的一个集成开发环境。 它基于服务器-客户端模式;带调试器; 编辑器和控制台内语法高亮;支持类VIM编辑模式。 使用Qt4框架开发。

代码规模: 约 1.1 万行 C++ 代码.
项目链接: daoscript.org, github.com/daokoder/daostudio.

DaoGraphics - 图形引擎

DaoGraphics是一个轻量级的图形引擎, 使用C语言开发,带Dao语言接口。

  • 支持二维和三维图形;
  • 支持任意分辨率二维矢量图形;
  • 支持动画,粒子系统和地形生成等;
  • 支持 OpenGL 3.1+ 和 OpenGL ES3;
  • 最少的库依赖(Dao 和 GLFW3);

代码规模: 约 2.2 万行 C 代码.
项目链接: daoscript.org, github.com/daokoder/DaoGraphics.

DaoMake - 基于Dao的编译工具

DaoMake 是一个类似于CMake,但基于Dao的编译工具。 它的主要优点是:简单方便的语法;设计比较干净;可方便地根据编译平台和环境作定制; 能自动生成安装和卸载目标。

项目链接同Dao,以Dao的标准工具形式存在(dao/tools/daomake)。 代码规模约 4 千行行 C 代码。

DaoSQL - SQL数据库访问模块

CD-HIT - DNA和蛋白质序列聚类

CD-HIT 是一个比较流行的用于DNA和蛋白质序列聚类的程序。 此程序最初由李维忠博士开发,使用了贪婪优化算法实现了高效的增量聚类。 我从2009年开始对CD-HIT里的算法做改进,并重新设计实现了该程序的大部分。 这项工作中我开发了一个新的并行技巧,使得CD-HIT能在多核计算机上实现准线性的加速。

项目链接: cd-hit.org, cdhit.googlecode.com.

FLAME - Fuzzy clustering by Local Approximation of MEmberships

FLAME 是我读博时开发的一个新的数据聚类算法。 这个聚类算法的创新点在于通过构建数据点的邻近图,并在图上作类成员向量的 局部近似优化而算出数据点的最优模糊成员向量;通过这种方式,最终的聚类结果 能保留原始数据的局部拓扑结构。

项目链接: flame-clustering.googlecode.com.

GEDAS - Gene Expression Data Anlysis Studio

Fields of Interest:

Machine learning, pattern recognition, combinatorial optimization, computer graphics, computer vision and their applications in bioinformatics (and any other exiciting fields).

Mostly as hobby, I am also interested in programming language design and implementation. Recently I also find robotics very interesting, maybe this will become another hobby of mine:).



模式识别,机器学习,组合优化,计算机图形图像, 计算机视觉和相关技术在生物信息(或其它一些有意思的领域)里的应用。

作为爱好,我也对计算机程序语言的设计和实现很感兴趣。 现在我也对机器人学很感兴趣,可能会成为我的另一个主要爱好:)。


请参看 我的Google学者页面, 或其该页面的国内镜像.

Preferred Tools

  • Computer Desktop: Mac OS X;
  • Programming Platform: Linux (Unix-like);
  • Programming Languages: C, C++ and Dao;
  • GUI/Application Framework: Qt4;
  • Version Control System: Fossil;
  • Text/Code Editor: VIM;
  • Image Editor: GIMP;
  • Document Preparation System: LaTex;

Preferred TV Series

I have a habit of playing TV series while programming at home, I found it's good for my eyes:). The followings are my preferred TV series:

  • The Big Bang Theory;
  • Friends;
  • Star Trek (OS, and Next Generation);
  • Stargate (SG1);

Things I like

  • Paintings: I like paintings since childhood, so much that I saw paintings in my dreams! My favourite paintings are from Van Gogh;
  • SF Novels: My favourite is I, Robot from Isaac Asimov. It's one of the few books I managed to read in Italian:) ;
  • Chinese Go: sometimes I feel this is the kind of board game that should be introduced to geek communities.


  • 桌面系统: Mac OS X;
  • 编程环境: Linux (Unix-like);
  • 编程语言: C, C++ and Dao;
  • 图形界面库: Qt4;
  • 版本控制: Fossil;
  • 文本和代码编辑器: VIM;
  • 图像编辑器: GIMP;
  • 文档编写工具: LaTex;


我有喜欢边看电视边写程序的习惯。 我发觉这样对眼睛很有好处:)。 下面是我喜欢的电视剧:

  • The Big Bang Theory;
  • Friends;
  • Star Trek (OS, and Next Generation);
  • Stargate (SG1);


  • 绘画: 从小就喜欢画,有时还会梦见画! 我最喜欢梵高的画;
  • 科幻小说: 最喜欢Isaac Asimov的I, Robot; 这本小说我最早看的是意大利语版的:)。
  • 围棋: 觉得这是最好的棋类游戏。
Copyright (C) 2013-2016, Limin Fu