Research of I-Chen Wu

(updated on January 1, 1997)

  • Research interests
  • Developed software packages
  • Current research plans
  • Research experiences by 1996
  • Past research experiences at CMU (for Ph.D)
  • Past research experiences at NTU (for MS)

  • Researh Interests

    Internet/Network Programming and Network Centric Computing
    Multimedia Computing
    Parallel/distributed programming and computing
    Parallel complexity and algorithms


    Developed Systems and Software Packages

    High performance computing:
    A dynamic load balancing package for branch-and-bound and divide-and-conquer problems, 1993-1994.
    Distributed Information computing:
    A WWW authoring package, 1994-6.
    Multimedia computing:
    A high performance software MPEG player, 1995-6.
    Generic game servers over Internet:
    Game providers can easily provide games on top of this system, 1996-7
    Multimedia computing:
    An audio broadcast system over Internet, 1996-7.

    Research Experiences by 1996

    My research experiments before 1996.

    Distributed Information Systems

    With the emergence of world wide computer networks, the demand of accessing information all over the world has increased tremendously. Under such a demand, many global information systems have been developed, such as FTP, Gopher, Wais, Archie, Webster, and WWW. Among them, WWW (World Wide Web) is the most attractive because it allows different protocols to be used and provides hypermedia capacity. Besides, WWW can integrate a general protocol for many users and offer multimedia functionality. Interested readers can also read my article which introduces WWW. In this area, I have worked on a WWW authoring system.

    Multimedia Computing

    In this part, I am especially interested in MPEG decoding/encoding techniques. Our Lab has successfully designed a very fast MPEG-1 decoder, which can compete with Xing's product. We also find the work full of interesting research topics. Our future work is to research on the techniques of encoding and to extend the current work to MPEG-2.

    Parallel Programming System

    In my Ph.D. work, I have successfully proposed a new approach, called multilist scheduling, which can greatly reduce the effort of developing dynamic load balancing routines for parallel/distributed programs. For this approach, I did not only examine it from the aspect of theory, but I also pratically implemented it. The success of this work will have a significant impact on parallel programming and will change the way people write parallel/distributed programs.

    The first version of this system (with 20,000 lines of C++ code) is operational at CMU. At this point, although the system has not been formally publicized, this work has attracted attention of many reseachers, which are requesting/using the code of this system and committing the collaborative work. For example, the Nectar group and the EDRC center at CMU are using this software package for solid modeling problems. In the Northeast parallel architecture center at Syracuse, one group (led by Dr. Nikos Chrisochoides) will use this package to develop many important simulation problems and plan to have collaborative work with me, in CSIE, NCTU. Researchers at Harvard Univ., as well as some other institutes, are planning to port my system on their high-speed network systems and parallel systems. With the very encouraging response, I plan to continue to work on the topic and any related topics with any interested colleagues and students. I can also expect fruitful research results and many good Ph.D. thesis topics in this research direction.

    I will extend the work in the following directions.

    Apply this new system to more applications. For example,
  • A large number of the operations research (OR) applications in industry, such as traveling salesperson problem.
  • Difficult dyanmic alpha-beta search problems, such as computer Go (³ò´Ñ) and Computer Chess (or Chinese Chess, ¶H´Ñ).
  • A large number of simulation applications, e.g., weather prediction and wind tunnel simulation (in the defense industry).
  • Those computations in a distributed computing environment or database environment.
  • A large number of real-time applications, such as sonar-detection (in the defense industry).
  • Others.
  • Investigate more language/interface issues.
    For example,
  • The technique to pass parameters (such as a tree or a list) to a remote procedure.
  • The technique to handle global variables (consider some techniques used in the ISIS system).
  • Add fault tolerance to this system.
    It is especially important for a network to tolerate processor failure. For example, a remote workstation may be turned off unexpectedly.
    Support some tools on top of this package.
    For example,
  • Debugging tool.
  • Graphical assistance tool (help design parallel programs).
  • Performance monitoring (help us monitor the performance).
  • Port our system to other parallel systems.
    For example,
  • Ethernet (avaiable everywhere).
  • FDDI (available in the High-Speed Computing Center (HSCC) in Taiwan).
  • Nectar (CMU).
  • CM5 (available in NPAC, Syracuse).
  • Cray (available in HSCC(?) or at NTU)
  • nCube (available at NCTU(?) and Acadamia)
  • Port the programming system to other network interfaces.
    For example,
  • PVM. (My current work.)
  • Sockets of TCP/IP.
  • Active Message (proposed at Berkeley)
  • Integrate the scheduling part of our system with operating systems (OSs).
    For example, can OS directly support load balancing on my system?

    Other Interesting Topics

    Theory:
  • Study the theory for task scheduling.
  • Processor-load Tradeoff for parallel divide-and-conquer.
  • Parallel Selection Problem.
  • Many other problems.
  • Computer Games (e.g., ³ò´Ñ).
  • Parallelize a two-player program based on my system.
  • Design a strong Go (³ò´Ñ) program. (Note the second item is my side interest. Any ³ò´Ñ-holic is welcome to work in this area.)

  • Past Research Experiences at CMU

    The following sections will summarize the research that I have done at CMU over the past 7 years. Most of my research was involved in the Warp and Nectar projects at CMU. Warp is a 100-FLOPS systolic array machine. Nectar is a high-bandwidth and low-latency computer network for connecting various hosts, e.g., Cray, iWarp, and workstations. The Nectar network is made of fiber-optic links, large crossbar switches, and dedicated network coprocessors, called CABs. The current Nectar prototype uses 100 Mbits/s links and 16x16 crossbar switches.

    Multilist Scheduling: A New Parallel Programming Model (Thesis Work)

    Parallel programming requires task scheduling to optimize performance; this primarily involves balancing the load over the processors. In many cases, it is critical to perform task scheduling at runtime. For example, (1) in many parallel applications the task load cannot be accurately predicted a priori; (2) in a network-based multicomputer the computational power of each processor may not remain constant. In order to support dynamic task scheduling, the programmer usually needs to design and implement a complex set of scheduling routines, e.g., routines for maintaining task lists and handling interprocessor communication for load balancing. Unfortunately, it is very difficult and time-consuming to write and debug all of these scheduling routines.

    This thesis proposes a new approach which can greatly reduce the effort of developing efficient dynamic task scheduling routines. In our new approach, we decompose task scheduling into two parts --- the specification of scheduling policies and the implementation of supportive scheduling operations --- and then hide the latter from the programmer. We call this approach multilist scheduling, because it is based on a uniform scheduling model involving the use of multiple scheduling lists.

    This thesis analyzes three main features of the new multilist scheduling model: ease of use, generality, and efficiency.

    Ease of use:
    Programmers only need to specify scheduling policies, not the details of supportive scheduling routines. Typically, this involves writing only tens of lines of C code, as opposed to tens of thousands of lines of code for the supportive scheduling routines.
    Generality:
    We show that this model results in no loss of generality. We also illustrate the generality of the model by designing several scheduling algorithms, including those for parallel divide-and-conquer (D&C) and best-first search (BFS).
    Efficiency:
    We propose some efficient techniques for implementing scheduling lists, and also show that our general approach incurs no significiant performance overhead, at least for the parallel D&C and BFS scheduling algorithms. In addition, we also demonstrate good performance results for some applications that are based on parallel D&C and BFS, such as the set covering problem.

    Traditionally, it has been difficult to efficiently support both parallel D&C and BFS in a uniform framework. We believe that our system is the first system that do so.

    Multilist scheduling is the first model which can hide the details of dynamic task scheduling routines while simultaneously supporting general task scheduling. I expect that this model will have a significant impact on parallel programming, especially in the domain of multicomputers. In fact, researchers in the Engineering Design Research Center (EDRC) at CMU have, already, started using my system to develop parallel programs for their solid modeling systems. I expect my system will be widely used in the near future.

    Parallel D&C (1991)

    I proposed a new scheduling algorithm for parallel D&C. This scheduling algorithm minimizes the communication cost while balancing the load well. I also proved that, among all the scheduling algorithms that can split the load nearly evenly, my scheduling algorithm is optimal with respect to the communication cost. This algorithm is a very good example for showcasing the system proposed in my thesis.

    Thread and Register Windows: a Case Study (Fall 1990)

    Several computer architectures provide register windows to support fast procedure calls. Large numbers of registers, however, increase the size of the on-chip process state, and thus the process switch time. This overhead is negligible for Unix-style processes, but it can be significant for light-weight processes or threads. In this work, I quantified the effect of register windows on the performance of threads and interrupt handlers. The context is the nectar software on CAB, which is built around the SPARC CPU. I also looked at a number of possible strategies in which register windows are used to support fast thread switching. For some alternative strategies, I had to use modified Gnu C-Compilers to produce the code and had to trace the code of C-threads in Mach OS. All the experiments were performed based on the CAB functional simulator.

    Parallelizing Noodles (Fall 1990)

    Noodles, a solid modeling program developed in EDRC, uses a sophisticated data structure with complicated links (pointers) to represent a solid model. In order to parallelize this complex system, I implemented remote procedure calls in a way similar to the Sun RPC package and the Matchmaker package in Mach OS. I also implemented routines to balance the load at runtime. Basically, the load balancing routine can be viewed as a client/server application: the load balancer server and each processor client. Finally, I successfully demonstrated a speedup of 3.8 on 4 processors in a Nectar Demo meeting. Note that there were only 4 processors available on Nectar at that time. Since I spent about two months on implementing load balancing routines, this gave me a strong motivation to provide a parallel programming language/system to hide the details of load balancing routines in my thesis.

    Interprocessor Communication on Nectar (Spring 1989)

    In this project, I studied how the interprocessor communication systems of the 4.3 BSD UNIX OS, the Amoeba distributed OS and the V kernel could be implemented on Nectar. UNIX is a standard and widely used OS, while Amoeba and the V kernel are operating systems with fast communication protocols. Note that the interprocessor communication in Amoeba is based on the client/server model. In this study, I learned how to ``off-load'' protocols such as TCP/IP to the network coprocessor CAB and the implications of doing so.

    CAB Hardware (summer 1988)

    I was a member of the team building the CAB hardware. First, I implemented the functional simulator of CAB on Nectar. Since a CAB includes a Sparc CPU, I incorporated the Sparc functional simulator (obtained from Sun Microsystems) into my functional simulator. In addition, I wrote about 70 diagnostic programs to detect the correctness of the CAB hardware by checking the consistency between the CAB functional simulator and the CAB logical simulator. This experience greatly enhanced my knowledge of Nectar hardware (CAB) and hardware/software interface, thus enabling me to study threads and interprocessor communication on Nectar (see preceding paragraphs).

    Apply Compiler on Warp (1987-1988)

    Apply is a specialized programming language for low-level vision operations. I implemented the compiler to generate code on Warp by using the lex/yacc packages and the Lisp language. I also successfully developed an efficient algorithm, called cyclic scroll buffering, for Apply. The algorithm is very efficient, but hard to be written without the help of compiler. Interestingly, researchers in the Robotics Institute, at CMU, found that the performance of the compiled code are about 2-3 times faster than those of their hand-written code. As far as I know, this technique has been widely used in low-level vision programmers. I was also motivated to work on the parallel/distributed computing paradigm, from this point on.

    Graph Matching Problem on Warp (fall 1986)

    Warp was a participant in the DARPA Architecture Workshop Benchmark Study, which compared performance of various architectures for image processing. I implemented the graph matching problem, one of the benchmarks. Since the problem was too large and too complicated for Warp, it was almost impossible to implement it on Warp, especially without debugging tools. However, I noticed that my sequential program on a Sun3 workstation already outperformed other participants because I devised a new pruning technique for the graph matching problem. This work provided us with one good reason to build a network-based multicomputer, i.e., the current Nectar system (which I helped to build later).


    Past Research Experiences at NTU

    Area-Time Tradeoffs in the VLSI complexity

    I proposed a generic method to obtain the area-time tradeoffs for a class of problems, e.g., matrix multiplication, sorting, and convolution.

    Systolic Arrays

    I proposed a new transformation technique which the designers can use to simplify their design. I also designed some efficient systolic arrays for integer multiplication, and matrix multiplication.