Welcome to my home page. My name is Limin Fu.
This page is setup to introduce myself and my work.
I have a multidisciplinary background, spanning from mathematics, through physics,
computer science and bioinformatics.
I am interested in many things, and enjoy programming to do cool things.
|Programming Language and Virtual Machine||Integrated Development Environment||Graphics Engine||High Performance Sequence Clustering||Data Clustering Algorithm|
用简单的心 做简单的事 处简单的世
I am from south central China (Huarong in northen Hunan, just about 20km from Yangtze River).
I studied Mathematics at Fudan University (Shanghai) in 1996. Then I studied Physics in graduate school for about one year before going to ICTP (Trieste, Italy) to study in a multidisciplinary master program in 2001. Two years after, I went to the University of Turin (Italy) for my PhD study, and worked in a lab located in a cancer research center.
After PhD study, I worked for about one year and half in a group lead by Prof. Riccardo Zecchina in the ISI Foundation (Turin, Italy), then I moved to UC San Diego in 2009 and this group moved to Polytechnic of Turin. Since 2015, I joined Research Center for High performance Computing at Shenzhen Institutes of Advanced Technology (SIAT).
University of California, San Diego, USA
Postdoctoral Associate, 2009-Now
Developed a novel parallelization technique for CD-HIT that can achieve quasi-linear speedup on multi-core machines;
Developed tools for sequence clustering, duplicate detection etc.;
Worked on RNA-SEQ assembling and quantification etc.;
Institute for Science Interchange (ISI), Turin, Italy
Junior Researcher, 2008
Worked on nonparametric Belief Propagation (NBP) Algorithm based on a Gaussian Mixture Reduction (GMR) algorithm.
Application and test of this NBP algorithm on computer vision stereo matching problem and 2D electrophoresis gel image alignment problem.
Institute for Cancer Research and Treatment (IRCC, abbrev. from Italian), University of Turin, Italy
PhD Student, 2003-2007
Worked on clustering algorithms and microarray gene expression data analysis;
Developed a novel fuzzy clustering algorithm named FLAME that preserves the local structures of datasets when mapping from feature spaces to the fuzzy membership spaces in a way similar to LLE (Locally Linear Embedding).
Developed a software with graphical interface which implemented FLAME and several other clustering algorithms, as well as several gene expression data analysis and visualization techniques.
The Abdus Salam International Center for Theoretical Physics(ICTP), Trieste, Italy
MS Student, 2001-2003
Worked on the identification of "transcription modules" in large gene expression datasets;
Worked on decaffeination process modeling and optimization (summer project).
Postdoctoral Associate, 2009-2013
Institute for Science Interchange (ISI), 意大利都灵
Junior Researcher, 2008
Institute for Cancer Research and Treatment (IRCC), 意大利都灵大学
PhD Student, 2003-2007
Programming has been an integrated part of my work (and my hobby:)) since graduate school,
so over the years, I have created a number of open source projects.
Most of them are related to Dao, but in attempt to create various
modules and bindings for Dao, I managed to expand the projects to
touch various types of libraries and applications.
Dao is a lightweight and optionally typed programming language with many interesting features. It includes features that can make concurrent programming much simpler. It has well designed programming interfaces for easy embedding and extending.
DaoJIT is a standard module for Dao to provide Just-In-Time (JIT) compiling. It is based on LLVM (llvm.org)
ClangDao is a tool that can be used to automate the generation of Dao language bindings from the header files of C/C++ libraries. It uses the Clang (clang.org) frontend to parse header files. It has been used to generate bindings for over a dozen of libraries.
DaoStudio is an integrated development environment for Dao. It uses the Qt4 framework.
DaoGraphics is a lightweight graphics engine written in C with interfaces to Dao.
DaoSQL allows mapping Dao classes to SQL database tables, and provides a simpler way to access those tables through Dao class instances. Currently it supports PostgreSQL, MySQL (MariaDB) and SQLite3 backends.
CD-HIT is a popular program for clustering DNA/protein sequences originally developed by Dr. Weizhong Li. It implements a greedy algorithm to cluster sequences incrementally and efficiently. I toke over the development in 2009 and rewrote most parts, in addition, I developed a novel parallelization technique that can speedup the clustering algorithm almost linearly on multi-core machines.
This FLAME is NOT the malware that surfaced last year:), it is a data clustering algorithm I developed a few years ago. FLAME defines clusters in the dense parts of a dataset and perform cluster assignment solely based on the neighborhood relationships among objects.
This is a software I developed along with the development of the FLAME algorithm. It provides user friendly interfaces for analyzing and visualizing gene expression data. Note: GEDAS is no longer maintained.
Size: about 15K lines of C++ codes.
DaoJIT是一个给Dao提供及时编译的标准模块。 它基于LLVM (llvm.org)。
ClangDao是一个可以从C/C++库的头文件自动（或半自动）地生成Dao语言的捆绑封装的工具。 它使用Clang(clang.org)前端的来解析C/C++头文件。 它已被用来生成十多个库的封装模块。
DaoStudio是Dao语言的一个集成开发环境。 它基于服务器－客户端模式；带调试器； 编辑器和控制台内语法高亮；支持类VIM编辑模式。 使用Qt4框架开发。
DaoMake 是一个类似于CMake，但基于Dao的编译工具。 它的主要优点是：简单方便的语法；设计比较干净；可方便地根据编译平台和环境作定制； 能自动生成安装和卸载目标。
项目链接同Dao，以Dao的标准工具形式存在(dao/tools/daomake)。 代码规模约 4 千行行 C 代码。
DaoSQL是Dao的一个模块。它允许定义与SQL数据库表相对应的类，实现表纪录的数据域 与类成员域对应关系，并通过类实例实现对SQL表的方便访问。 Currently it supports 目前它支持 PostgreSQL, MySQL (MariaDB) 和 SQLite3 数据库.
CD-HIT 是一个比较流行的用于DNA和蛋白质序列聚类的程序。 此程序最初由李维忠博士开发，使用了贪婪优化算法实现了高效的增量聚类。 我从2009年开始对CD-HIT里的算法做改进，并重新设计实现了该程序的大部分。 这项工作中我开发了一个新的并行技巧，使得CD-HIT能在多核计算机上实现准线性的加速。
FLAME 是我读博时开发的一个新的数据聚类算法。 这个聚类算法的创新点在于通过构建数据点的邻近图，并在图上作类成员向量的 局部近似优化而算出数据点的最优模糊成员向量；通过这种方式，最终的聚类结果 能保留原始数据的局部拓扑结构。
GEDAS 是我在开发FLAME算法的同时开发的一个用于基因表达数据的分析和可视化的软件。 该软件支持多个标准的聚类算法和一些基于基因语义学(Gene Ontology)的功能分析。 GEDAS已不再维护。
代码规模: 约 1.5 万行 C++ 代码.
Machine learning, pattern recognition, combinatorial optimization, computer graphics, computer vision and their applications in bioinformatics (and any other exiciting fields).
Mostly as hobby, I am also interested in programming language design and implementation. Recently I also find robotics very interesting, maybe this will become another hobby of mine:).
I have a habit of playing TV series while programming at home, I found it's good for my eyes:). The followings are my preferred TV series:
我有喜欢边看电视边写程序的习惯。 我发觉这样对眼睛很有好处:)。 下面是我喜欢的电视剧：