Second Forum on Parallel Computing Curricula
Sunday, June 22, 1997
Newport, RI
We argue that an undergraduate course in parallel algorithms is an
excellent opportunity for the application of
an
Active Learning is a name for a collection of approaches to educating students that give the students a more active role in the learning process. The common element in these approaches is that the instructor is removed from his or her role of standing at the front of a classroom and presenting the material; rather, the students are placed into the position of teaching themselves, and the instructor is converted into a coach and a helper in this process.
The most fundamental justification for taking an active-learning approach to delivery of course material is the widely agreed-upon assertion that the degree to which students understand a concept is in direct proportion to the amount of personal energy they have expended in trying to master it. (We of course do not claim that this is the only element in a successful learning process.}. The importance of an active student role has long been accepted in mathematics and science education, as evidenced by the prominent role of the problem set in courses in these disciplines. Although almost all successful educational endeavors require an active role on the part of the students, in this paper we use the phrase active learning to refer to a more radical approach, in which a significant part of the entire educational process is placed, with sufficient guidance, in the hands of the student. Proponents of this approach argue that it leads to a better and more meaningful learning experience for the students [Baldwin96a,Davidson90,Joyce92].
A second justification for students learning in this fashion is that it more closely models what students will need to do when they leave the university and enter the workplace, whether it be in a development or research setting. Much of one's professional technical life involves identifying problems, learning somehow about appropriate relevant solution techniques, and applying and/or extending them. Usually a dedicated instructor is unavailable to aid in this process; we therefore greatly benefit our students by giving them significant experience with the learning technique that they must utilize for the rest of their lives.
An additional attractive element of this approach is that it is easily combined with group or collaborative learning , in which students are organized into groups to solve problems or develop projects. In our courses we did combine these two approaches, but we do not focus on the group element in this paper.
Baldwin notes that active learning, which he calls Discovery Learning , has been applied in other disciplines, but that, although ``opportunities for discovery are implicit in many of the activities computer scientists use to teach our discipline, those opportunities are seldom exploited,'' and therefore ``specific documented uses ... in college-level computer science are rare''[Baldwin96a]. He reports on his experiences in employing discovery learning in two separate courses in the CS curriculum; we discuss elements of his work throughout this paper.
This short paper has three goals, The first goal is to argue briefly that active learning is an important educational paradigm, and that the teaching of parallel and distributed algorithms is an excellent area in which to experiment with an active learning approach. The second goal is to describe our experience in the Fall of 1996 and Spring of 1997 in teaching courses based on this philosophy at Polytechnic University. The final goal is to seek feedback, criticism, and pointers to other such experiments.
We also note what this paper is not . It is not an argument that this is the best manner in which to teach parallel algorithms, nor is it an argument that parallel algorithms is the only topic in the undergraduate computer science curriculum that can be taught in such a fashion. It is not a scientific evaluation of the success or failure of the courses, and finally, it is certainly not complete in its references to relevant literature on both teaching parallel computing and on active learning.
In Section 2 we discuss our reasons as to why we believe that the design of parallel algorithms is an excellent topic to which to take an active learning approach. In Sections 3 and 4 we make this more concrete by discussing our course strategy and our experience with it.
As necessary background, we distinguish between two approaches to active learning, that we call the course-by-problem-set approach and the course-by-project approach. In both approaches the goal is for the students to learn a specified body of material, although the specification of the body is likely to be looser than in a traditional course.
In the course-by-problem-set approach, the instructor limits lectures to a minimum of necessary introductory and background material, and then prepares a well-designed set of exercises and problem sets through the successful solution of which the student learns the techniques and results of the field. Obviously we can not expect students to recreate on their own the cumulative efforts of years of scientific research in the course of a semester; the problem sets must break down complex results into appropriately-sized pieces, and the instructor must be vigilant to provide hints at appropriate times, lectures when necessary, and in general to interact with the students to keep them pointed in the correct direction. In this approach the majority of class time is used for the students to collaborate (in groups) on problem solving with the instructor present. In an ideal setting, the students in such a course would have little reading material available, beyond background information and clear summaries of what has been ``discovered'' as the semester progresses.
In the course-by-project approach, one or several significant projects are designed, which, in order to complete, the student must understand a significant subset of the course material. In this approach, as illustrated in the work of Baldwin [Baldwin96a], the instructor provides the students with a textbook, a list of secondary sources of information, and sets the students loose. Class meeting times are used for discussion initiated by the students.
It is our feeling that either active learning approach is most likely to succeed in delivering a course that meets the following criteria.
In the Fall of 1996 we taught a course based largely on these principles to 14 undergraduates at Polytechnic University; the students were juniors and seniors, some of whom had taken an introductory course on parallel and distributed systems, and some of whom had not. Programming assignments were carried out in MPI in a laboratory of 15 Sparc5 SUN workstations connected by ethernet. The goal of the course was to introduce the students to the basics of parallel algorithm design in the PRAM model, the logP model and the fixed-interconnection network model, and to complement this with several MPI programming assignments on our network of workstations. In addition, in the Spring of 1997 we taught a survey course on parallel and distributed computing to 30+ undergraduates (sophomores, juniors and seniors).
Due to a concern that the approach might ``bomb,'' we designed the first course to deliver some of the material in a traditional fashion and some of it in an active fashion; we omit discussion of the material that was presented in a traditional manner. Approximately 60% of class time was spent on ``active exercises'' during which the students worked in groups of 3-4 students; they were of course expected to meet in groups outside of class time as well. During these active exercises the instructor jumped from group to group, discussing ideas and providing pointers. Of the five active exercises, I would categorize four as from the ``course by problem set'' approach, and the last being a longer ``course by project'' exercise.
In the second course we only spent aproximately three-four weeks on algorithms, and used the active approach for much of this material, integrating elements of Exercises 1,2 and 4 into the course.We now briefly discuss the specific active exercises; this will hopefully make concrete our general statements to this point. In the Appendix we give some other examples of possible active exercises for the material. Copies of the actual handouts for these exercises are available at URL http://ebbets.poly.edu/PDC-lab/active.html
1. Sorting on a Linear Array: The first exercise covered the material contained in pp. 4-29 of the textbook by Leighton [Leighton91], although we attempted to keep the students uninformed about the source of the material. The students, after an introduction to the fixed-interconnection model, were asked to design and evaluate sorting algorithms on the linear array, and then to evaluate whether their algorithms were optimal. This latter part of the exercise led to a discussion of speedup and efficiency, and fundamental lower bounds in this model (I/O bandwidth, network diameter, bisection width) and then to a formalization of these notions.
This exercise was a huge success in both classes. All of the groups were successful in developing appropriate algorithms and in inferring, with a few pointers, what quantities to look at as the fundamental limitations of the computational ability of networks, and how to understand speedup and efficiency.
2. Sorting on a Mesh: We then moved on to a more complex problem: sorting efficiently on a mesh. Our hope was that the students would use the linear-array algorithms as a subroutine, and arrive at the idea of alternating phases of sorting the rows and sorting the columns (which is the basis of the shearsort algorithm [Leighton91], with some concrete ideas on how to analyze this. This exercise was somewhat of a dissapointment; some of this was my fault as I did not allocate enough class time to the students to really develop their ideas. The students did generate a number of interesting ideas, but noone really came up with an algorithm that they had any idea how to analyze, even with some helpful prompting from the instructor. Their discussions on how they might analyze their proposed algorithms, however, and their struggles with this issue, set the stage perfectly for the introduction of the 0-1 Sorting Lemma, and as a result the lecture on this topic was listened to intently.
Exercises 1 and 2 were chosen because they could be constructed as a series of sub-questions that build upon each other, and because they deal with a problem that the students knew well: sorting. We note however that, while many of the students attempted to apply sorting algorithms with which they were familiar from the sequential setting, for the array and mesh-based algorithms this, to some extent, leads them in the wrong direction. Although they recover from this rather quickly and generate other ideas, and the realization that one can't just apply sequential techniques and be done is a valuable one, in a future course I might start with an exercise that had a more direct application of sequential ideas, in order to start more gradually.
3. Summation and Broadcast in the logP Model: This exercise was a combination of theory and practice. In the context of their understanding of broadcast and summation in the PRAM model the students were asked to design optimal algorithms for these problems in the logP model and then to implement these algorithms in MPI and evaluate their performance against the MPI primitives for these problems.The major purpose of this exercise was to get the students to understand the logP model and its differences with the PRAM model, and to understand some of the issues in moving a theoretical result in a theoretical model to an implementation. (We note that any course that covers more than one model has many opportunities for ``active learning'' in having the students consider re-solving as problem in one model given the solution in a different model. We give another specific example in the Appendix.)
This exercise was a success; each group was able to come up with the correct algorithms, and some groups were able to come up with the correct recurrence equations that describe the running times of these algorithms.
4. Load Balancing in Parallel Loop Scheduling: The fourth active exercise involved the parallelization of nested loops such as the following:
FOR i = 1 TO n DO FOR j = 1 to m DO FOR k = 1 to p DO Execute (Independent) Iteration(i,j,k) // Different iterations may run for very different times.This problem amounts to a collection of independent tasks that must be assigned in a load-balanced fashion to the p processors. The goal of the exercise was for the students to develop and experiment with various algorithms for load balancing, and to implement them in MPI. Pizza and free books, the latter supplied by our friendly Addison-Wesley representative, were supplied to the 2 groups with the fastest implementations.
This exercise was ideally suited for active learning, for two reasons, First, there are a large variety of good intuitive methods for doing load balancing, so there is much lying around to be discovered.
Second, the traditional approaches to load balancing lend themselves to the sort of ``step-by-step'' exposition we discussed earlier. Consider, for example, centralized approaches with a global task queue. If there are N total iterations, assigning N/p to each processor is a low-overhead algorithm with potentially horrific load imbalance. In contrast, each processor taking one iteration at a time from the global task queue is a high-overhead algorithm with good load balance.
A good compromise is fixed-size chunking, in which each processor takes jobs in chunks of fixed size from the global queue [KruskalW85]. A further improvement to this approach is to have the processors take work in chunks whose size decreases throughout the computation, and there are a variety of approaches as to the best way to determine the chunk size [FlynnHS92,PolychronopoulosK87}]. Thus, this class of approaches provides a nice path of increasingly sophisticated results that the students may follow. A similar phenomenon can be described in distributed, local-control approaches to the problem.
This exercise was a great success; the students invested exceptional energy and came up with an assortment of approaches, some of which were quite clever. In addition, the exploration went in unexpected (but useful) directions. As one example, an artifact of the assignment was that, although the iterations were of varying size, the students could determine their relative running times, and they noted that in centralized approaches it was helpful to sort the iterations by running time first. This (unexpected) observation led me to present Graham's classical result that list scheduling is a 2-approximation algorithm for multiprocessor scheduling, whereas prior sorting of the list improves the performance guarantee to 4/3 [Graham66,Graham69] giving some more general analytical support to their observations.
5: Database Operations:In contrast to the first four exercises, which I see as of the ``course-by-problem set'' school, this last exercise was a longer one, and was an experiment with the ``course-by-project'' approach. I prepared a database-based project, which in order to complete the students would need to develop and implement a fast parallel sorting algorithm, a fast parallel join algorithm, and some distributed methods of dealing with mutual exclusion. It quickly became clear that this last part was too ambitious for the time remaining, and we cut back to the first two parts.
In addition, we handed out a copy of Blelloch et. al.'s 1991 paper about designing sorting algorithms for the CM-2 [BlellochLMPSZ91] and expected the students to try and absorb it on their own and use it as background in their implementations. We also handed out some information from [CullerKPSSSSvE93] about mapping butterfly-network based algorithms into the logP model to aid their thinking about implementing Odd-Even Merge Sort in our computing environment.
This exercise was a qualified success; the sorting part went well and the students gained a good understanding of various possible approaches to parallel sorting in a distributed memory setting. I did have the impression, however, that the students enjoyed the first four exercises more, especially coming off of the load-balancing exercise, in which they had the opportunity to design their algorithms essentially from scratch, and not apply other's ideas.
The join part of the project was carried out in the last week of the semester, and little that was interesting resulted from it, although they did produce decent implementations: the students simply were too busy preparing for finals to give this last part the energy and time that they gave the other active exercises.
In general I viewed the active-learning element of this course as a significant success. During certain of the active exercises the intellectual energy and excitement generated by the various groups working was superior to anything that had happened in any other course I have taught; it was an exhilarating experience. Most of the students enjoyed the process, and most seemed to leave the course with a solid understanding of the basic concepts in parallel algorithms. The first class was am above-average group of students, but nonetheless it was encouraging that all (based on my impressions and the final examination) the students left the course with a good understanding of the material. I was also very pleased with the understanding displayed by the students from the second course.
That being said, there were a few problems in the teaching of the course. The first was simply that our student body is largely made up of commuters, and it was difficult for the students to find time to meet together in groups outside of class. I believe that this slowed down the pace of the course.
Secondly, for this approach to truly work, the students must be energetic and prepared to work and think when they come to class. There are a variety of reasons why an undergraduate would come to class unprepared to think and work hard, and if the majority of a group arrived in this state, their class time was rather wasted, although no more wasted than in a lecture. Fortunately, this seemed to happen rather infrequently in our course.
Finally, for the first four exercises, and to some extent the fifth, the exercise would ideally be carried out in the absence of outside sources, maximizing the students' opportunity to struggle with the material on their own. This created an odd tension: in most courses I'm happy to suggest related readings. In this course I rejected textbooks that covered the material I wished the students to learn, and rather chose one that covered the basic ideas and related topics.
We thank David Chang, for introducing us to the notion of active learning, and for helpful discussions. We also thank Willard Korfhage and Susan Flynn Hummel for many helpful discussions and for collaboration in setting up the laboratory in which this course was taught. We thank Doug Baldwin for a very helpful discussion about his experiences in employing this approach in a different topic area. We also thank Lisa Hellerstein, David Shmoys and Eric Torng for helpful comments. Finally, we thank the students of CS341 and CS342, whose energy and enthusiasm made this a very enjoyable experience. Research partially supported by NSF Research Initiation Award CCR-9211494, NSF grant CCR-9626831, NSF grant DUE-9551416, SUN Microsystems, and a grant from the New York State Science and Technology Foundation, through its Center for Advanced Technology in Telecommunications.
We see the opportunity to develop a variety of active exercises on the design of parallel algorithms, and provide here a few more ideas along these lines.
One sort of exercise arises from the mapping of algorithms for one model/architecture into another. An excellent example is the mapping of the FFT (a butterfly-based algorithm) into the logP model [CullerKPSSSSvE93]. This is a lovely step-by-step argument in which two possible layouts are considered, and a compromise position in which one switches between the two and then refines the method of switching.
A second sort of exercise arises when a well-known sequential algorithm lends itself to parallelization. One example is the minimum-spanning tree problem. A second example is sorting: one can parallelize quicksort, and from this one can motivate the sample sort approach. There are other examples of this sort as well (such as the calculation of the convex hull in computational geometry) where a divide and conquer approach can be simply parallelized, but in fact an initial splitting into a number of subproblems rather than 2 yields a more efficient solution.
Finally, many of the more sophisticated and efficient PRAM algorithms use as subroutines simpler PRAM algorithms; once the students have mastered the building-block algorithms they can be led to combine them.
In conclusion, we note that many courses have employed the device of giving more advanced results through step-by-step problem set exercises; it is not a new idea. It is our feeling, however, that the study of parallel algorithms, grounded on a firm understanding of sequential algorithms, seems full of opportunities for teaching in such a fashion, and that it is possible to successfully deliver an entire course in this fashion.