About this Course
This five-day instructor-led course provides students with the knowledge and skills to develop high-performance computing (HPC) applications for Microsoft Windows HPC Server 2008 R2. Students learn how to design, debug, tune and run high-performance computing applications under Windows HPC Server 2008 R2. Students also learn the most compelling technologies for building HPC applications, including parametric sweep, multi-threading, MPI, SOA, and Excel. Students program in Visual C++ as well as C#, and work with both managed and unmanaged code.
Audience Profile
This course is intended for software developers who need to develop long-running, compute-intensive, or data-intensive apps targeting multi-core and cluster-based hardware. No experience in the field of high-performance computing is required.
At Course Completion
After completing this course, students will be able to:
•Understand the goals of the high-performance computing (HPC) field.
•Measure and evaluate the performance of HPC apps.
•Design HPC apps using the a variety of technologies: parametric sweep, tasks, MPI, and SOA.
•Design HPC apps targeting a variety of hardware: from single-core to multi-core to cluster-based.
•Implement HPC apps using C++ or C#.
•Integrate HPC apps with Windows HPC Server 2008 R2, including a client-friendly front-end.
•Performance tune HPC applications under Windows HPC Server 2008.
•Design HPC apps that take advantage of Azure compute resources.
•Design HPC apps that take advantage of HPC Services for Excel 2010.
Course Outline
Module 1: Introduction to High-Performance Computing and HPC Server 2008 R2
This module introduces the field of high-performance computing, the product Microsoft Windows HPC Server 2008 R2, and developing software for HPCS-based clusters.
Lessons
•Motivation for HPC
•Brief product history of CCS, HPCS, and HPCS 2008 R2
•Brief overview of HPC Server 2008 R2 — components, job submission, scheduler
•Product differentiators
•Measuring performance — linear speedup
•Predicting performance — Amdahl’s law
Lab: Introduction to HPC And Windows HPC Server 2008
•Submitting and monitoring jobs
•Running an HPC app
•Measuring performance
•Measuring the importance of data locality
Module 2: Developing Software for HPC Server 2008 R2
This module presents designs, technologies, and challenges when developing HPC software. It also presents the use of Parametric Sweep to solve a real-world HPC problem.
Lessons
•Design challenges
•Design patterns
•Common problem decompositions
•Common communication patterns
•Computation vs. communication
•Available HPC technologies: multi-threading, GPUs, MPI, SOA, etc.
•Data-mining and Parametric Sweep
Module 3: The HPCS Job Scheduler
This module introduces the heart of HPCS-based clusters — the Job Scheduler.
Lessons
•Throughput vs. performance
•Nodes vs. sockets vs. cores
•Jobs vs. Tasks
•Job and task states
•Default scheduling policies
•The impact of job priorities and job preemption
•Job resources and dynamic growing / shrinking
•Submission and activation filters
Lab: Working with the Job Scheduler
•Environment variables in HPC Server 2008 R2
•Exit codes and denoting success / failure
•Checkpointing in case of failure
•Multi-task jobs and task dependences
Module 4: Multicore for Performance
This module provides an overview of Microsoft’s multicore libraries available for C# and C++.
Lessons
•Parallel, multicore programming for responsiveness and performance
•Structured, fork-join parallelism
•Multicore in C# using the Task Parallel Library in .NET 4
•Multicore in VC++ using OpenMP and the Parallel Patterns Library
•Scheduling parallel, multicore apps on Windows HPC Server
Lab: OPTIONAL: Multicore Programming in C# using Task Parallel Library
•Creating a parallel, multicore app in C# and .NET 4
•Running and measuring performance locally
•Running and measuring performance on the cluster
Module 5: Interfacing with HPCS-based Clusters
This module demonstrates the various ways you can interface with Windows HPC Server 2008 R2, in particular using the HPC Server 2008 API.
Lessons
•Cluster Manager
•Job Manager
•Job Description Files
•clusrun
•Console window
•PowerShell
•Scripts
•Programmatic access via HPCS API v2.0
•Showing job progress
•Implementing job fault tolerance
Lab: Interfacing with Windows HPC Server 2008
•Clusrun is your friend
•Scripting
•Using the HPCS API to submit and monitor a job
Module 6: Introduction to MPI
This module introduces *the* most common approach to developing cluster-wide, high-performance applications: the Message-Passing Interface.
Lessons
•Shared-memory vs. distributed-memory
•The essence of MPI programming — message-passing SPMD
•Microsoft MPI
•Using MSMPI in Visual Studio with VC++
•Execution model
•MPI Send and Receive
•mpiexec
•Scheduling MPI apps on Windows HPC Server
Lab: Introduction to MPI
•Creating a simple MPI app using Send and Receive
•Running and measuring performance locally
•Running and measuring performance on the cluster
Module 7: MPI on the Microsoft Platform
This module discusses MSMPI and data parallelism, in particular how best to build data parallel MPI apps using its collective operations.
Lessons
•MSMPI: Microsoft MPI
•Data parallelism in MPI
•A real world example
•Broadcast
•Scatter
•Gather
•Barriers
•Reductions
•Defining your own reduction operator
•Common pitfalls
Lab: Data Parallelism with MPI’s Collective Operations
•Parallelizing an existing MPI application
•Mapping Sends and Receives to Broadcast, Scanner, Gather, and All_reduce
•Running and measuring performance locally
•Running and measuring performance on the cluster
Module 8: MPI Debugging, Tracing, and Performance Tuning
This module dives into the practical realities of using MPI — debugging, tracing, and performance tuning.
Lessons
•Local MPI debugging with Visual Studio 2010
•Remote MPI debugging with Visual Studio 2010
•General MPI tracing
•Tracing with ETW (Event Tracing for Windows)
•Trace visualization
•Other tools for MPI developers: perfmon, resmon, and xperf
•Common performance problems in MPI
Module 9: MPI Application Design
This module presents the most common design issues facing MPI developers.
Lessons
•Hiding latency by overlapping computation and communication
•Non-blocking communication
•Safety: detecting and avoiding deadlock
•MPI.NET
•Hybrid designs involving both MPI and OpenMP
•Buffering, error handling, I/O, and large datasets
•Remote memory access
Module 10: Intro to SOA with HPC Server 2008 R2
This module presents one of the most interesting and unique features of Windows HPC Server 2008 R2 — service-oriented HPC.
Lessons
•Service-oriented architectures
•SOA and WCF
•Mapping SOA onto Jobs and the Job Scheduler
•Private vs. shared sessions
•Secure vs. insecure sessions
•Volatile vs. durable sessions
Lab: Consuming a HPC-based SOA Service
•Deploying a SOA service
•Building a desktop client to communicate with a SOA service
•Working with volatile SOA sessions
•Working with durable SOA sessions
Module 11: Create SOA-based Apps with HPC Server 2008 R2
This module presents the details of building a SOA-based HPC app, from start to finish.
Lessons
•Service-side programming
•Service configuration
•Client-side programming options
•Proxies: Async WCF vs. HPC BrokerClient, Enumeration vs. Callback
Lab: SOA-based HPC with HPCS and WCF
•Creating a SOA service from scratch
•Deploying a SOA service
•Develop a client-side app to communicate using WCF proxy
•Develop a client-side app to communicate using HPC BrokerClient proxy
•Call a service with multiple entry points, concurrently
Module 12: SOA Debugging, Tracing, and Performance Tuning
This module discusses various performance tuning strategies on Windows for parallel apps.
Lessons
•Local SOA debugging with Visual Studio 2010
•Remote SOA debugging with Visual Studio 2010
•Low-level tracing with WCF
•Enabling WCF tracing via HPC Cluster Manager
•Common performance problems with SOA-based HPC apps
•Troubleshooting SOA services
Module 13: HPC Services for Excel 2010
This module presents techniques for bringing the potential of high-performance computing to the world of spreadsheets.
Lessons
•Excel as a computation engine
•Performing Excel computations on Windows HPC Server 2008 R2
•Using HPC Services for Excel 2010 to run workbooks on the cluster
•Using Excel UDFs to run parallel computations on the cluster
Module 14: Designing for Workstation and Azure nodes
This module discusses the design of applications that take advantage of on-premise Win7 workstations, as well as off-premise Azure nodes.
Lessons
•Additional compute resources: on-premise workstations and off-premise Azure nodes
•Taking advantage of Workstation nodes
•Taking advantage of Azure nodes
Module 15: Supporting and Emerging Technologies
This module presents brief overview of supporting and emerging technologies around HPC Server 2008 R2.
Lessons
•GPU computing with CUDA and HPC Server 2008 R2
•Virtual Shared Memory with AppFabric Caching
•Large data processing with Dryad and DryadLINQ
•Cluster data reporting via the HPC Server API