Event image

Abstract:

This talk introduces search and optimization techniques for auto-tuning nearest-neighbor computations on GPUs. We auto-generate multi-kernel codes for heterogeneous accelerator environments that deliver performance close to handcrafted codes.  Next, a hierarchical data parallel language is introduced that improves coding productivity for hierarchical data parallelism while maintaining performance within a source-to-source translation scheme.  Finally, we show that the trade-offs between scratch-pad memories and caches on GPUs are application dependent and require a close understanding of the underlying architecture, thread parallelism and memory hierarchy.

Bio:

Frank Mueller (mueller@cs.ncsu.edu) is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994.  He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers.  He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist.  He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and two Fellowships from the Humboldt Foundation.

A paper on this topic is here:

http://moss.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/cgo12.pdf