Loading…
This event has ended. Create your own event → Check it out
This event has ended. Create your own
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, March 2
 

8:00am

8:30am

Tutorial: 'Performance Anaylsis of MPI+OpenMP Programs with HPC Toolkit', John Mellor-Crummey, Rice University
The number of hardware threads per processor on multicore and manycore processors is growing rapidly. Fully exploiting emerging scalable parallel systems will require programs to use threaded programming models at the node level. OpenMP is the leading model for multithreaded programming. This tutorial will give a hands-on introduction of how to use Rice University's open-source HPCToolkit performance tools to analyze the performance of programs that employ MPI + OpenMP to harness the power of scalable parallel systems. See http://hpctoolkit.org for more information about HPCToolkit.


Wednesday March 2, 2016 8:30am - 10:00am
BioScience Research Collaborative Building (BRC), Room 280

8:30am

Tutorial: 'PETSc: The Portable, Extensible Toolkit for Scientific Computation', Matthew Knepley, Rice University
PETSc, is a suite of data structures and routines for the scalable parallel solution of nonlinear equations, often arising from partial differential equations or boundary integral equations. PETSc has been used for years in the oil and gas industry, including development contributed back from WesternGeco and Shell. It supports MPI,
shared memory pthreads, and GPUs, as well as hybrid MPI-shared memory pthreads or MPI-GPU parallelism. In this brief tutorial, we will highlight basic sparse parallel linear algebra, linear and nonlinear algebraic solvers, structured and unstructured meshes, and timestepping. We will show how optimal, hierarchical, multilevel solvers for complex, multiphysics problems can be dynamically assembled using the PETSc object system.

Speakers

Wednesday March 2, 2016 8:30am - 10:00am
BioScience Research Collaborative Building (BRC),Room 282

10:00am

10:30am

Tutorial: 'Introduction to OpenMP 4.0 and 4.5', Barbara Chapman, Stony Brook University
For over a decade, OpenMP has been the de-facto standard for parallel programming on shared memory systems. It has continued to evolve in order to meet the programming needs of a diversity of application developers, and to handle the requirements of new generations of computer architecture. In this tutorial we give a brief overview of the basics of OpenMP and then introduce the new features on OpenMP 4.0 and 4.5, with short examples to illustrate their usage.


Wednesday March 2, 2016 10:30am - 12:00pm
BioScience Research Collaborative Building (BRC), Room 280

10:30am

12:00pm

1:00pm

Opening Remarks

Speakers
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →



Wednesday March 2, 2016 1:00pm - 1:15pm
BioScience Research Collaborative Building (BRC), Room 103

1:15pm

Keynote: 'How is High Performance Computing reshaping O&G exploration and production', Francois Alabert, Total
PRESENTATION NOT AVAILABLE

For several decades, the Oil and Gas industry has been committed to produce more and more hydrocarbons in response to the growing world demand for energy. Always seeking deeper and farther, exploration and development has become economically challenging as a result of increased geological and above ground complexity, stronger environmental constraints and pressure on costs. In this presentation, we will review Total’s experience on High Performance Computing, how it has dramatically improved the efficiency of exploration and reservoir management, and how it might be reshaping our ways of working.

Significantly enhanced computational algorithms and more powerful computers have provided a much better understanding of the distribution and description of complex geological structures, opening new frontiers to unexplored geological areas as well as helping limit the risks and overall costs of deep and ultra-deep offshore drilling.

The progress of multi component seismic data acquisition and the fast evolution of new technologies in the rock physics labs provide the opportunity to develop new families of algorithms which include more complex physics. With an order of magnitude increase in computing capability, reaching an exascale in few years, the reduction of computing time combined with new generations of algorithms will offer new perspectives and require adapting science and engineering workflows.  While next generation codes are developed to give access to new information of unrivaled quality, these codes will also present new challenges in taking advantage of the increased complexity of the new supercomputers and will require integrated teams, which mix geo-scientist researchers with computational scientists and engineers.

Due to a new generation of sensors for field and reservoir monitoring, part of the emergence of connected systems recording continuous streams of information, the amount of data resulting from simulations, lab measurements, and field measurements will grow exponentially. Analyzing these data to extract pertinent information will be critical to the prediction, anticipation, optimization and reduction of the costs and risks associated with a variety of processes in O&G exploration and production.  This volume of data opens an era of data analytics and deep learning. By taking advantage of next generation high performance computing capabilities, data analytics and deep learning will become an important driver for the evolution of O&G technology.

Modeling with improved and new physics, multi-physics integration and data analytics, together with the possibility to improve uncertainty and risks assessment, will be reshaping competency requirements of specialists and the way they work together, for safer, cheaper, faster and better Oil and Gas exploration and production.

Speakers
FA

Francois Alabert

Vice-President of Geo-Technology Solutions, Total
François Alabert is currently leading the group of geoscience technology for exploration and reservoir within the Total Exploration & Production branch. The group of 450 engineers and technicians provides high technology solutions in the domains of geology, geophysics and geosciences data management and computing. With 30 years of experience in O&G, particularly in reservoir engineering and geo-modeling, reserves evaluations and... Read More →


Wednesday March 2, 2016 1:15pm - 2:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:00pm

Plenary: 'Computational Science and Engineering Applications at Exascale: Challenges and Opportunies', Doug Kothe, ORNL
WATCH THE PRESENTATION

The computer and computational science and engineering community in the public, private, and government sectors have been arguably thinking about exascale-class modeling and simulation technologies and capabilities for almost a decade. With exascale platforms becoming more certain and finally within sight, application developers and users must get “get real” now to adequately taken advantage of this opportunity. The hardware and software technologies currently envisioned in exascale platforms will present new challenges for application developers that could be disruptive relative to current approaches. New algorithms, for example, that communicate infrequently and store very little, may be critical for applications to move forward or even “hold pace”. Hybrid node architectures with hierarchical memory and compute technologies will likely be the norm, and applications may face comprehensive restructuring to exploit more appropriate task-based programming models and new data structures.

 

Given these challenges, tremendous opportunity nevertheless exists for science-based computational applications that can deliver, through effective exploitation of exascale HPC technology, breakthrough modeling and simulation solutions that yield high-confidence insights and answers to the nation’s most critical problems and challenges in scientific discovery, energy assurance, economic competitiveness, and national security. I will survey these application opportunities within the science/energy/national security mission space of the Department of Energy, where I will also touch upon challenges, decadal challenge problems, and prospective outcomes and impact.

Speakers
avatar for Doug Kothe

Doug Kothe

Oak Ridge National Laboratory
Douglas B. Kothe (Doug) has over three decades of experience in conducting and leading applied R&D in computational applications designed to simulate complex physical phenomena in the energy, defense, and manufacturing sectors. Doug is currently the Deputy Associate Laboratory Director of the Computing and Computational Sciences Directorate (CCSD) at Oak Ridge National Laboratory (ORNL). Prior positions for Doug at ORNL, where he has been... Read More →



Wednesday March 2, 2016 2:00pm - 2:30pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 1: Orchestrating Containers within Production Oil and Gas HPC Workloads and Workflows
WATCH THE PRESENTATION

Docker presents as an open platform that allows developers and sysadmins to build, ship and execute distributed applications. This is particularly appealing in cases where lightweight, easy-to-use, well-contained technologies, are well matched with rapidly evolving needs and faced-paced innovation. Not surprisingly then, numerous organizations are successfully evaluating Docker containers in proof-of-concept initiatives and/or pilot projects. The transition to production use, however, introduces additional requirements as Docker containers need to be incorporated into existing IT infrastructures and (ultimately) integrated into application workflows. Simply put, organizations need to be able to manage Docker containers in the same way they have become accustomed to managing other types of workloads and workflows. In other words, requirements to launch, execute, control (including limit) and account for Docker containers in production environments is well evident; complicating these requirements is the need to move data into and out from containers that may need to provide interactive-execution modalities. Although early adopters report “easier replication, faster deployment and lower configuration and operating costs” of workflows involving Docker containers, it is clear that more fulsome IT infrastructure integrations are called for. After reviewing selected use cases, attention shifts to ongoing and future efforts aimed at fully integrating Docker containers within on premise and/or cloud-based IT infrastructures from a workload orchestration and container optimization perspective for the oil and gas industry.

Speakers
avatar for Ian Lumb

Ian Lumb

Solution Architect, Navops by Univa
As an HPC specialist, Ian Lumb has spent about two decades at the global intersection of IT and science. Ian received his B.Sc. from Montreal's McGill University, and then an M.Sc. from York University in Toronto. Although his undergraduate and graduate studies emphasized geophysics, Ian’s current interests include workload orchestration and container optimization for HPC to Big Data Analytics in clusters and clouds. Ian enjoys discussing... Read More →



Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 2: Bringing Insight to Seismic Storage Repositories for faster Time-To-Oil
WATCH THE PRESENTATION

Finding a needle in an ocean of seismic data can be a costly process.  Over the last several years IOC’s and NOC’s have amassed petabytes of seismic data, and stored the data on bespoke storage systems that make processing and prioritizing that data challenging.  As Shared-Nothing Object-Oriented Storage architectures increase in use, opportunities will emerge to improve Oil and Gas workflows by leveraging the idle compute for analytics on the data in repository to gain early insights to data stored there.

This presentation will talk about how to leverage a scale-out x86 architectures (on-premise or in the Cloud) to deliver both cost-effective storage and provide compute for initial analytics for Seismic Repositories without moving the data to large HPCs, improving time to market and discovery of other insights.

Speakers


Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 3: BeeGFS - A Parallel File System to Solve I/O Problems
WATCH THE PRESENTATION

With the increasing size of parallel computers and the increasing speed of individual nodes ( CPU+GPU) the challenges for parallel file systems with respect to I/O pattern, bandwidth, latency, robustness and scalability are becoming more obvious. When the first dual core CPU´s hit the market we started to develop a parallel file system from scratch with full scalability for Data & Metadata, ease of use, robustness and high flexibility in mind. As the CPU roadmap was clearly pointing towards many core CPU´s one central development requirement was to follow a strict multithreaded approach to keep the software overhead low and allow the software to run on dedicated servers, on the compute nodes and adapt to new architectures on the rise like ARM and its variants.

Our own test cases for the development of BeeGFS ( former FhGFS) during the past 10 years have been a broad range of O&G codes mostly developed next door.
The paper presents an architectural overview over BeeGFS with special focus on scalability, metadata performance and reliability in large installations. As the BeeGFS server components are efficient multithreaded user space programs, which work on every underlying POSIX file system BeeGFS supports a variety of hardware and software solutions. As a special use case the paper will explain BeeOND: the BeeGFS on demand file system. SSD´s (NVRAM) in every compute node are delivering high speed, low latency I/O. With BeeOND we create a private parallel file system (../myscratch/ ) for very compute job on the corresponding nodes that fully utilizes the NVRAM capabilities and acts as a burst buffer for most of the temporal I/O behavior present in today´s applications. The paper will report about BeeOND O&G use cases and present benchmarks.

As the amount of storage grows data resilience and self-healing capabilities are essential requirements in a storage system. BeeGFS has ist own approach to this topic based on software robustness and its build-in data mirroring capabilities. . The paper shortly cover these HA aspects of BeeGFS and outlines the future BeeGFS roadmap which includes erasure coding as well as a non -POSIXI. The last section of the talk is related to the BeeGFS approach to Exascale.

Speakers
CM

Christian Mohrbacher

Fraunhofer ITWM
Christian Mohrbacher studied computer sciences and afterwards joined Fraunhofer's Competence Center for High Performance Computing in 2008. He is currently part of the parallel file system group, which drives the development of BeeGFS.



Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 4: PCIeArch for RTM
Seismic Imaging is a standard data processing technique used in creating an image of subsurface structures of the Earth from measurements recorded at the surface via seismic wave propagations captured from various sound energy sources.

Reverse Time Migration (RTM) is an advanced migration algorithm that solves wave equations both downward & upward through the earth model. The most popular ways to solve the resulting wave equations is using either stencil based method or FFT based method.

Dell has a unique architecture in its PowerEdge C4130 by employing flexible PCIe architecture which allows different configurations between CPU & GPU. The architecture allows you to connect 2 GPU per socket directly or 4 GPU per socket via PCIe Switch. It also allows you to have GPU direct using Infiniband adapters between different nodes.

Couple of highlights:
Flexible PCIe architecture to support various CPU: GPU configurations.
Allows PCIe Switches to be inserted in the architecture which enables low latency peer-peer traffic.
Allows GPU Direct using Infiniband adapters for multi-node scale out.

Here are some of the different configurations that can be exploited for RTM & our paper will try to go in-depth on which configuration might be better suited for RTM.

C4130 supports lot of different configurations but the goal of this paper would be to start with the topologies that we think will benefit RTM and build from there.

Speakers
avatar for Bhavesh A Patel

Bhavesh A Patel

Sr. Principal System Engineer, Dell
I am part of the Server Advanced Engineering group at Dell and work on server architectures for HPC workloads. I am primarily looking at different PCIe architectures for GPU applications.


Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 5: Why Wait? Unleash Your Compute Power with Intel® SSDs
Andrey Kudryavtsev, HPC Solution Architect for Intel® Non-Volatile Memory Solutions Group (NSG) will discuss advancements in Intel SSD technology that is unleashing the power of the CPU and Moore’s Law. He will dive into the benefits of Intel® NVMe SSDs, a standard specification interface for SSDs, show how NVMe SSDs can greatly benefit HPC specific performance and workloads in the oil and gas exploration. He will also share the HPC performance benefits that Intel has already seen with their customers today, and how adoption of the current NVMe SSD technology sets the foundation for Intel’s next generation of memory using Intel® 3D XPoint™ technology which will be incorporated into SSDs with Intel Optane™ Technology

Speakers
avatar for Andrey Kudryavtsev

Andrey Kudryavtsev

SSD Solution Architect, Intel Corporation
Let's talk about new non-volatile memory technologies by Intel how they can help improving your current application and solve performance problems.


Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 6: Solving the Oil & Gas Dilemma with Burst Buffer
WATCH THE PRESENTATION

The largest compute systems in Oil and Gas have become nearly 20 times faster over the past five years, which means there is a growing performance gap between compute and I/O that poses a threat to scaling performance to Exascale levels. Traditional architectures are further accelerating IO bottlenecks. Increasingly leading Oil and Gas companies are looking at Burst Buffer based architectures that eliminate the IO bottlenecks. By buffering and aligning I/O, a Burst Buffer can drive a parallel file system at close to max hardware speeds and sustain peak workload performance requirements, making it a logical fit for extreme scale Oil & Gas applications. Most importantly, Burst Buffers have demonstrated unprecedented acceleration for key oil and gas codes like Reverse Time Migration, by accelerating RTM by 300% without any code modifications.

In this presentation, we will discuss the motivations for using a Burst Buffer, provide an overview of recent results achieved on production Reverse Time Migration (RTM) , and discuss how this relates to important Oil and Gas applications. Furthermore, we will explain why Oil & Gas is likely to be the first commercial sector to achieve Exascale computing by leveraging a Burst Buffer approach.

Speakers
RM

Robert McMillen

McMillen is a senior system architect at DDN where he works out of the office of the CTO to analyze the data flow of key applications to identify opportunities to optimize the company's products. He has extensive experience in inventing (21 patents issued), developing, and managing the creation of innovative products based on computer technology. This has spanned applications from high performance parallel processing database computer hardware... Read More →



Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 7: IBM Accelerates Big Data Analytics
Current compute technologies and scientific methods for Big Data analysis today are demanding more compute cycles per processor than ever before, with extreme I/O performance also required. The capabilities of Intel Hyper-Threading that offers only 2 simultaneous threads per core limit the results of these recent advances in Big Data analysis techniques.

Despite multiple prior Hadoop Genome analysis attempts on an existing Intel based LSU HPC cluster, a large metagenome dataset could not be analyzed in a reasonable period of time on existing LSU resources. Knowing the extraordinary capabilities for big data analysis offered by IBM Power Systems, the LSU Center for Computational Technologies staff and researchers turned to IBM for help.

This presentation will talk about how IBM helped LSU render a 3.2TB Metagenome dataset in Hadoop in just 6.25 hours, using only 40 compute nodes , whereas the same analysis took 20+ hours on 120+ x86 nodes. This technology has direct parallel to Oil and Gas applications that can take advantage of simultaneous threading and a parallel file system.


Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 8: Machine Learning Support for Full Waveform Inversion
WATCH THE PRESENTATION

In theory, Full waveform inversion (FWI) is a non-linear and global optimization algorithm that seeks to find a high-fidelity, high-resolution quantitative model of the subsurface by using all information in the recorded seismic waveforms. In practice, FWI is implemented as part of a workflow that contains iterative modeling and migration steps together with constrained parameter management.

Productivity enhancements to FWI can be obtained via the implementation of deep neural nets to the initial and final stages of the workflow: dynamic recurrent neural nets for time series analysis and convolutional neural nets for the characterization of features in the highly dimensional pre/post stack image cubes. A few general-purpose Deep Learning frameworks exist and will be reviewed given the requirements for a industry-specific implementation.

Speakers
avatar for Geert Wenes

Geert Wenes

Sr. Practice Leader/Architect, Cray, Inc.



Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Disruptive Technology 9: Disruptive HPC with the elastic AWS Cloud
I

Speakers
avatar for Tim DiLauro

Tim DiLauro

Solutions Architect, Amazon Web Services
Timothy DiLauro is an AWS Solutions Architect responsible for providing architectural assistance and technical guidance for customers running enterprise solutions in the AWS cloud. Timothy supports AWS customers across Texas, including many in the Oil and Gas industry. Timothy has over fifteen years of experience in the technology industry, focused primarily on cloud and infrastructure architecture.


Wednesday March 2, 2016 2:30pm - 3:00pm
BioScience Research Collaborative Building (BRC), Exhibit Hall

3:00pm

3:30pm

Algorithms & Accelerators I: Performance of DGTD Finite Element Methods for the RTM Procedure on GPU Clusters
WATCH THE PRESENTATION

Nodal discontinuous Galerkin time-domain (DGTD) methods exhibit attractive features for the large scale simulation of seismic waves in complex media. First, such methods provide accurate wavefield solutions for complicated geological structures thanks to the use of unstructured meshes and high-degree discontinuous basis functions. Additionally, the dense algebraic operations required per element and the weak element-to-element coupling of DGTD methods make them suitable schemes for efficient computations on modern clusters with massively parallelized many-core devices, such as GPUs. Both these aspects, accuracy and computational performance, are very important for seismic imaging in the Oil and Gas industry.

In collaboration with Shell, we have conceived a high-performance tool for seismic migration that can be run on clusters of GPUs [*]. This tool, named RiDG, includes reverse time migration (RTM) capabilities and multiple wave models. The model solver is based on a high-order DGTD method for first-order systems which uses unstructured meshes and multi-rate local time-stepping to efficiently deal with multi-scale solutions. Imaging conditions based on vertical characteristics provide improved RTM images.

We adopted the MPI+X approach for distributed programming together with OCCA, a unified framework to make use of major multi-threading languages (e.g. OpenMP, OpenCL and CUDA), offering a flexible approach to handling the multi-threading X. While the RTM procedure generally has extensive data storage requirements with slow I/O, low storage requirements for DGTD boundary data allows halo trace data to be stored in memory rather than relying on disk based check-pointing. The load balancing of our implementation reduces both device--host data movement and MPI node-to-node communication.

In this talk, we present the main features of our RTM implementation and recent results for GPU computing. In particular, the computational performance of the DGTD solver is analysed using the roofline model and compared with alternative strategies. The strong scalability of the implementation is tested using a three-dimensional RTM synthetic case on a GPU cluster. These results confirm the quality of RiDG implementation and the relevance of programming strategies.

[*] A. Modave, A. St-Cyr, W.A. Mulder, and T. Warburton. A nodal discontinuous Galerkin method for reverse-time migration on GPU clusters. Geophysical Journal International, 203(2):1419– 1435, 2015.

Speakers
AM

Axel Modave

Postdoctoral Associate, VirginiaTech
avatar for Tim Warburton

Tim Warburton

John K. Costain Chair & Professor of Mathematics, Virginia Tech


3:30pm

Data Analytics Approaches & Tools: Handling Clusters with a Task-based Runtime System: Application to Geophysics
WATCH THE PRESENTATION

Many paradigms of parallelism have been derived from MPI to form the MPI+X combinations in order to improve for instance load-imbalance issues. Unfortunately, these solutions are difficult to develop, port and optimize since they involve different programming levels and because they generally use a static mapping in the MPI layer. We propose to use a single task-based paradigm which offer dynamism through work-stealing and which can tackle distributed heterogeneous machines using advanced runtime systems. The ease of portability comes from the powerful DAG description which hides the hardware and prevent the use of explicit communications. We compared MPI-based version and task-based version on Geophysics simulations, especially on the DIVA code of Total. Our previous studies demonstrated the task-based paradigm superiority on shared memory architectures (CPU or MIC), we are now working on distributed and heterogeneous architectures (CPUs+MICs) and, according to our preliminary results, the performance are still better than the MPI-version.

Speakers
HB

Helene Barucq

senior research scientist, Inria
avatar for Lionel Boillot

Lionel Boillot

Expert Engineer, Inria
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →



Wednesday March 2, 2016 3:30pm - 3:50pm
BioScience Research Collaborative Building (BRC), Room 103

3:50pm

Algorithms & Accelerators I: Efficient Reverse Time Migration on APU Clusters
WATCH THE PRESENTATION

Reverse Time Migration (RTM) is a numerical method for seismic imaging that is widely used in the Oil & Gas industry. RTM is using the two-way travel time of wave propagations to place (or migrate) dipping temporal events in their true subsurface spatial locations. Processing these reflections produces a synthetic image of the subsurface geologic structure. The RTM workflow is very time-consuming and resource-demanding in terms of compute power, memory bandwidth and storage capacities as prodigious amounts of data (terabytes of data) are generated during computations (namely wavefield snapshots). Thanks to the advent of high performance computing facilities, RTM is seeing significant acceleration using clusters of multi-core CPUs [1, 2, 3], and GPUs (Graphic Processing Units) [4, 5, 6, 7]. Although CPU clusters have shown performance gain by spreading large datasets over connected compute nodes, and even larger performance enhancements can be achieved by GPU clusters thanks to the massively parallel architecture of the GPUs, the GPU based solutions are suffering from some limitations, namely small GPU memory capacities, overheads incurred by the PCI interconnection between CPU and GPU that may bottleneck the seismic imaging applications and high power consumptions.

Recently, AMD has released a new architecture, the Accelerated Processing Unit (APU) that combines CPU cores and GPU cores in the same silicon die. This hardware design benefits from both CPU and GPU advantages and suppresses PCI Express interconnect between the CPU and the GPU. The APU hardware design can thus be an attractive solution for efficient depth imaging. Besides the APU can be considered as a low-power HPC chip (between 60 and 95 of watts of TDP). However, the integrated GPUs are about one order of magnitude less compute powerful and have less internal memory bandwidth than high-end discrete GPUs. Moreover, multiple compute nodes are required in order to process realistic RTM cases: the impact of the APU architecture on the communications has thus also to be considered.

We focus our interest on the implementation and deployment of the 3D acoustic RTM in isotropic media on an APU cluster using Fortran+OpenCL+MPI. We rely on a three-dimensional 8th order finite difference approximation of the acoustic wave equation to simulate the wave propagation during both the forward and backward sweeps of the RTM algorithm, and use the selective checkpointing method (with a data checkpointing frequency equal to 10) to reconstruct the source wavefield.

In this talk, we give an overview of the application implementation details with a particular emphasis on the impact of the APU new unified memory model on the RTM workflow. Then, we present the OpenCL single precision performance results and the power efficiency estimation of the RTM on a 16-node cluster, each node having an A10-7850 APU (code-named Kaveri). We show the relevance of APUs in a seismic imaging context by presenting the pros and cons of such HPC platforms for the RTM, and compare it against the traditional solutions by means of strong and weak scaling testings.

Speakers
HC

Henri Calandra

Total
Henri Calandra obtained his M.Sc. in mathematics in 1984 and a Ph.D. in mathematics in 1987 from the Universite des Pays de l’Adour in Pau, France. He joined Cray Research France in 1987 and worked on seismic applications. In 1989 he joined the applied mathematics department of the French Atomic Agency. In 1990 he started working for Total SA. After 12 years of work in high performance computing and as project leader for Pre-stack Depth... Read More →
avatar for Issam Said

Issam Said

Computational Scientist, TOTAL/LIP6
My current research interests include high performance computing (HPC), numerical methods and applied geophysics. I specialize in adapting scientific applications to hardware accelerators, mainly Graphics Processing Units (GPUs). It implies improving the performance of numerical solvers, surveying cutting edge hardware/architectures and studying the viability of scientific friendly programming models such as OpenACC. My current work involves... Read More →



3:50pm

Data Analytics Approaches & Tools: Reverse Time Migration via Resilient Distributed Datasets: Towards In-Memory Coherence of Seismic-Reflection Wavefields Using Thunder via Apache Spark
WATCH THE PRESENTATION

The need to cross-correlate two wavefields in the application of Reverse Time Migration’s imaging condition remains one of two fundamental challenges with use of the method in practice (e.g., Liu et al., Computers & Geosciences 59, 17–23, 2013). In a significant departure from previous approaches, this computational challenge is addressed here through the introduction of Resilient Distributed Datasets (RDDs) for RTM’s precomputed source wavefields. RDDs are a relatively recent abstraction for in-memory computing ideally suited to distributed computing environments like clusters (Zaharia et al., NSDI 2012, http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf). Originally introduced for Big Data Analytics and popularized (e.g., Lumb, “8 Reasons Apache Spark is So Hot”, insideBIGDATA, http://insidebigdata.com/2015/03/06/8-reasons-apache-spark-hot/, 2015) through the open-source implementation known as Apache Spark (https://spark.apache.org/), RDDs also appear promising in recontextualizing RTM’s imaging condition.

Recent work has already indicated that seismic reflection data in accepted industry formats can be distributed in memory across a cluster using Apache Spark (Yan et al., “A Spark-based Seismic Data Analytics Cloud”, 2015 Rice Oil & Gas Workshop, Houston, TX, http://rice2015.og-hpc.org/technical-program/). And although Lumb (“RTM Using Hadoop Is There a Case for Migration?”, 2015 Rice Oil & Gas Workshop, Houston, TX, http://rice2015.og-hpc.org/technical-program/) has indicated that RDDs and Spark appear promising for impacting RTM in a number of ways (e.g., in allowing for the implementation of imaging conditions using alternatives to cross-correlation), attention here focuses on use of RDDs for facilitating the assessment of coherence between seismic-reflection wavefields in memory. More specifically an algorithm that significantly reduces the impact of disk I/O, in the wavefield manipulations required by RTM, is proposed based on RDDs and subsequently implementation-prototyped using open-source Thunder (http://thunder-project.org/) via Apache Spark.

Speakers
avatar for Ian Lumb

Ian Lumb

Solution Architect, Navops by Univa
As an HPC specialist, Ian Lumb has spent about two decades at the global intersection of IT and science. Ian received his B.Sc. from Montreal's McGill University, and then an M.Sc. from York University in Toronto. Although his undergraduate and graduate studies emphasized geophysics, Ian’s current interests include workload orchestration and container optimization for HPC to Big Data Analytics in clusters and clouds. Ian enjoys discussing... Read More →



Wednesday March 2, 2016 3:50pm - 4:10pm
BioScience Research Collaborative Building (BRC), Room 103

4:10pm

Algorithms & Accelerators I: GPU-accelerated Discontinuous Galerkin Methods on Hybrid Meshes
WATCH THE PRESENTATION

Time-domain discontinuous Galerkin (DG) solvers combine high order accurate approximations with unstructured meshes, and are effective for the simulation of seismic wave propagation. DG solutions are allowed to be discontinuous across elements, and neighboring elements are coupled weakly through a numerical flux. The use of unstructured meshes in tandem with discontinuous approximations makes it possible to accurately capture sharp interfaces and geological features. Additionally, due to this weak coupling, DG solvers exhibit high parallel scalability, and benefit greatly from acceleration using Graphics Processing Units (GPUs). Finally, GPU-accelerated DG solvers have shown promise as efficient propagators for reverse time migration.

It is well known that high order DG on hexahedral meshes yields very efficient computational kernels due to the tensor product structure of hexahedra. However, producing high quality hexahedral meshes for complex domains is presently a difficult and non-robust procedure. Hybrid meshes, which consist of wedge and pyramidal elements in addition to hexahedra and tetrahedra, have been proposed to leverage the efficiency of hexahedral elements for more general geometries. We extend efficient DG solvers to hybrid meshes containing multiple types of elements, which have the potential to produce propagation models with improved accuracy at reduced computational cost. We propose efficient, low-storage implementations of DG on GPUs for each type of element and discuss the extension of multi-rate time stepping strategies for acoustic wave propagation to hybrid meshes.

Speakers
avatar for Tim Warburton

Tim Warburton

John K. Costain Chair & Professor of Mathematics, Virginia Tech



4:10pm

Data Analytics Approaches & Tools: Applying Big Data Analytics to Seismic Interpretation
WATCH THE PRESENTATION

Machine Learning including deep learning is the core technology in big data analytics. The petroleum industry is one of the big data domains that are facing the challenges of rapidly increasing volume and velocity of data. In this paper, we attempt to demonstrate the applicability of machine learning technology in identifying geological features from seismic data volumes. We compare the differences between traditional methods and machine learning methods in our test cases. We also present our seismic data analytics platform built on top of Hadoop and Spark to provide a productive and scalable platform to facilitate the work of tackling the big data challenges.

Speakers
avatar for Ted Clee

Ted Clee

President, TEC Applications Analysis
Ted is active in the development of seismic applications on parallel distributed systems.
LH

Lei Huang

Assistant Professor, Prairie View A&M University



Wednesday March 2, 2016 4:10pm - 4:30pm
BioScience Research Collaborative Building (BRC), Room 103

4:30pm

Algorithms & Accelerators I: Parallelizing Seismic One-Way Based Migration for GPUs Using OpenACC
WATCH THE PRESENTATION

One-Way Migration is a popular algorithm used in the industry for migrating seismic data. A parallel version of this algorithm using MPI is widely implemented. GPUs have become popular in the past few years in HPC due to their high computational throughput at reasonable power consumption, and hybrid CPU-GPU architectures are seen as a stepping stone towards next generation supercomputing. In this talk, we describe our experience in using OpenACC to parallelize One-Way Migration on NVIDIA GPUs.

In seismic applications, input data is typically made up of 'shots' which are processed independently using MPI tasks. One-Way Migration uses Fourier Finite Differencing. 'Phase-Shift' and 'Wide-Angle Correction' form the bulk of the computation for every task, taking as much as 80% of the computation time. These components form good candidates for computation on GPUs and thereby reduce application runtime by a significant amount.

Traditionally, low-level programming languages and extensions such as CUDA are used for programming on GPUs. However, this is non-trivial and the resulting code is not portable between different GPU architectures. As different accelerator technologies are being evaluated as the path forward towards exascale computing, code portability is highly desired. OpenACC is an emerging directive-based programming model for accelerators, similar to OpenMP. Application users can annotate their code using pragmas that instruct the compiler to generate appropriate device code.

We use OpenACC to parallelize One-Way Migration, which involves optimizing kernels that involve FFT operations and solving systems of tridiagonal sparse matrices. The process of optimizing applications for GPUs is discussed along with challenges and potential pitfalls for application users. We discuss our experience with different compilers and the evolution of OpenACC as a standard. Using acoustic isotropic data, we are able to improve the performance of the application by a factor of 3 using the NVIDIA K20X GPU on the Titan supercomputer at Oak Ridge National Lab, as compared with the CPU-only version of the application run on an 8-core SandyBridge CPU. However, the performance of an application that uses OpenACC does not match that written in CUDA, and it would be beneficial for future versions of compilers to focus on reducing this gap.

Speakers
avatar for Kshitij Mehta

Kshitij Mehta

HPC R&D Scientist, Total E&P



4:30pm

Data Analytics Approaches & Tools: SweetSpot Identification Using Machine Learning for Unconventionals
PRESENTATION NOT AVAILABLE

Reducing cost in well drilling and completion while improving the productivity of unconventional (UNC) reservoirs is vitally important. The physics model based simulation technologies that achieved great success in exploring conventional reservoirs have not been as effective when applied to UNC plays. The best methodology to determine where to drill and how to complete remains elusive. It is challenging to accurately and rapidly characterize the high EUR regions of a UNC play with early exploration data to provide guidance on where to drill new wells.

Machine learning (ML) techniques are data driven and can incorporate pertinent information from input data sources and learn the underlying complex and hidden inter-relationships and patterns. ML provides a promising means to tackle the complex exploration and production problems arising from UNC plays, where the underlying physics is not well known, or where the physical models are highly uncertain.

This contribution describes ML methodologies to tackle the particular challenge: Based on available exploration and production data (often scarce) from a play, can we accurately predict the emerging top productive areas, the so called sweetspots? The workflow has two stages (i) data integration and preprocessing, which generate a set of feature variables (or predictors) from original data; (ii) Predictive modeling, where a predictive model is built based on the predictors and production data using machine learning algorithms. The workflow has been applied to unconventional datasets for sweetspot identification. The results show that the methodology provides promising potentials.

Speakers
avatar for Mingqi Wu

Mingqi Wu

Statistical Consultant, Shell
Statistical Consultant / Data Scientist


Wednesday March 2, 2016 4:30pm - 4:50pm
BioScience Research Collaborative Building (BRC), Room 103

4:50pm

Algorithms & Accelerators I: GpuWrapper: A Portable API for Heterogeneous Programming at CGG
WATCH THE PRESENTATION

To increase the portability of our GPU-accelerated applications, we designed an API for heterogeneous programming abstracting CUDA and OpenCL. Applications targeting the GpuWrapper API can thus run on Nvidia GPU and on all devices supporting OpenCL. Moreover, this common API should future-proof our applications against uncertainty in hardware and programming model evolutions.

Speakers
avatar for Victor Arslan

Victor Arslan

High Performance Computing Research Engineer, CGG
High Performance Computing Research Engineer. Graduated a Master in Applied Mathematics in 2009 and specialist in software programming on massively parallel architectures. I work at CGG on technology forecasting for computational accelerators, including the Intel MIC architecture (Many integrated Core).
JB

Jean-Yves Blanc

Chief IT Architect, CGG
avatar for Marc Tchiboukdjian

Marc Tchiboukdjian

IT Architect, CGG



4:50pm

Data Analytics Approaches & Tools: Scalable data-driven predictive model application for real-time operations monitoring
PRESENTATION NOT AVAILABLE

Oil and Gas mission critical operations surveillance transitioned from merely real-time monitoring to embracing more proactive scheme. Increased leverage of data-driven models and machine learning techniques enabled early notification and lead time prediction of undesirable events.

Conventional model learning, verification, and testing process is inherently biased due to subjective sampling of population space or restrictive computation resources constraints. From a practical perspective, scaling models operability beyond sub-space of global operations footprint is a desired business objective that stretches the limits of training and maintaining validity of such models in a continuously evolving environment and growing big data feeds.

We present some of the lessons learned, challenges, and practical implications for designing scalable data-driven models and visualizations leveraging integrated big data streaming infrastructure in a distributed setting. The heterogeneous infrastructure support more active learning approach, adapting to the environment and process dynamics in a continuously evolving system.

Speakers
avatar for Mohamed Sidahmed

Mohamed Sidahmed

Data Analytics Scientist, BP


Wednesday March 2, 2016 4:50pm - 5:10pm
BioScience Research Collaborative Building (BRC), Room 103

5:10pm

 
Thursday, March 3
 

7:30am

8:30am

Message from Organizer

Speakers
avatar for Jan Odegard

Jan Odegard

Executive Director, Ken Kennedy Institute for Information Technology, Rice University
Jan E. Odegard joined the Ken Kennedy Institute for Information Technology (formerly Computer and Information Technology Institute) at Rice University as Executive Director in 2002. In this role he led the development and deployment of large scale competing resources in support of research. Today, the computational resources deployed at Rice supports the research of over 100 faculty members and close to 500 users. The majority of users are... Read More →



Thursday March 3, 2016 8:30am - 8:45am
BioScience Research Collaborative Building (BRC)

8:45am

Keynote: 'Exploration seismology and the return of the supercomputer', Sverre Brandsberg-Dahl, PGS

Speakers
SB

Sverre Brandsberg-Dahl

Sverre Brandsberg-Dahl is the Global Chief Geophysicist for the Imaging and Engineering Division at PGS. This division is responsible for delivering data processing and imaging services to customers around the world. It is also the home of PGS’ R&D organization where all aspects of the seismic value chain are addressed, helping put PGS in a leading position as a marine acquisition and seismic imaging company. Seismic processing and... Read More →


Thursday March 3, 2016 8:45am - 9:30am
BioScience Research Collaborative Building (BRC), Room 103

9:30am

Plenary: 'HPC I/O today and the Road Ahead', Brent Gorda, Intel
PRESENTATION NOT AVAILABLE

The world of HPC I/O & storage is active and changing for the better.  On both evolutionary and revolutionary paths, storage is evolving due to changing needs and the introduction of disruptive hardware.  Lustre, the popular open source scale-out parallel file system, is advancing in response to modern workloads and will continue to be the choice for network-based high performance/capacity storage. However, new solid state storage is approaching to disrupt the memory/storage hierarchy. Storage hardware and software have always been important to HPC but at this point in time, the community is making great advances that will benefit HPC for a long time to come.

Speakers
avatar for Brent Gorda

Brent Gorda

General Manager, HPC Storage, Intel
Brent Gorda is the General Manager of HPC Storage at Intel.  In 2010, Brent started Whamcloud to focus on the longevity of Lustre and sold the company to Intel in 2012.  An industry veteran, Brent has several decades of experience in HPC, leading projects such as the BlueGene architecture work at the Lawrence Livermore National Laboratory.  Brent serves on the SC Conference Series Steering Committee and initiated the Student... Read More →


Thursday March 3, 2016 9:30am - 10:00am
BioScience Research Collaborative Building (BRC), Room 103

10:00am

10:30am

Algorithms & Accelerators II: A High Performance Reservoir Simulator on GPU
PRESENTATION NOT AVAILABLE

The resolution and complexity of reservoir simulation models are increasing continuously in order to capture the geologic heterogeneity and multiphase physics in a reservoir with more fidelity. In addition, scenarios designed to investigate uncertainty quantification, history matching, and production optimization can take considerable time and computational power on these large scale models. The excessive run times not only limits the ability to simulate multiple realizations, but also lead to longer project timelines, and can cause costly delays. Thus, an ultra fast reservoir simulator offers significant value for efficient workflows that enable rapid high precision decision making.

Stone Ridge Technology and Marathon Oil Company have developed ECHELON, a state-of-the-art GPU based reservoir simulator. The GPU implementation provides an extremely dense and efficient computational platform that can reduce the required hardware footprint and power usage. ECHELON executes all major computational tasks on GPU including property evaluation, construction and assembly of Jacobian, and the CPR-AMG linear solver. GPUs allow faster numerical simulation for multi-million cell models that can run at least an order of magnitude faster than current parallel CPU codes.

In this talk, we give field example models for assessing both performance and accuracy of ECHELON. We address several potential bottlenecks on performance and our strategies to resolve them. We will also discuss our attempt to efficiently scale reservoir simulation to the GPU cluster using a combination of CUDA and MPI. Furthermore, we discuss the workflow challenges and requirements which need to be addressed to make the best use of this high performance simulator along with other existing tools.

Speakers


Thursday March 3, 2016 10:30am - 10:50am
BioScience Research Collaborative Building (BRC), Room 280 & 282

10:30am

Facilities, Infrastructure & Visualization: Experiences with Oil Immersion Cooling in a Seismic Processing Datacenter
WATCH THE PRESENTATION

CGG installed its first rack of oil immersion cooled compute systems in June 2011. A variety of lessons have been learned in that time. This presentation covers some of those lessons including: actual cost savings (CapEx, OpEx), equipment failure rates, thermal performance, and operational issues. The presentation begins with an outline of the specific business scenario that led to considering oil immersion cooling. It closes with an outline of possible next steps and remaining hurdles. Before going into those details, our current thinking on the feasibility of oil immersion cooling can be summarized as follows

• Oil immersion cooling provides significant ROI on a case by case basis depending on the specific business scenario.
• CapEx savings are realized as ‘deferring’ expenditures for a number of years.
• OpEx power savings are approximately 30% for standard high density air cooled servers.
• Our current oil immersion datacenter has an ‘Equivalent PUE’ of 1.05.
• There are specific equipment failure modes and drawbacks to oil immersion but these can be dealt with successfully.
• Significant additional savings remain to be exploited.

Business Scenario
The business scenario involved upgrading a PUE ~ 2 legacy datacenter. An upgrade to high efficiency air cooling and conversion to oil cooling were evaluated. Oil was chosen because of lower OpEx and the ability to defer CapEx.

Cost Savings
The OpEx part of cost savings comes to approximately 30% of the power consumed by air cooled equipment in high efficiency (PUE = 1.35) datacenter. The CapEx savings is significant, but because it involves NPV calculations and other business considerations, is not evaluated explicitly here.

Oil Immersion Challenges
There are unique challenges raised by oil immersion cooling. These can be dealt with successfully. They include material degradation (plastics & silicones degrade in one way or another in oil), lower equipment density, and an ‘oily’ work environment. Component failure rates are similar in oil and air.

Oil Immersion Bonuses
There are several new benefits provided by oil. Decreased sound levels and greater thermal inertia provide operational benefits. An increase in thermal headroom of 20C is observed for the hottest components.

Next Steps
There are significant opportunities arising from increased thermal headroom. A simple thermal model shows how server density can be increased and the feasibility of warm water cooling.

Speakers
CO

Cemil Ozyalcin

Data Center Infrastructure Engineer, CGG



Thursday March 3, 2016 10:30am - 10:50am
BioScience Research Collaborative Building (BRC), Room 103

10:50am

Algorithms & Accelerators II: Experience with Two-stage Constraint Pressure Residual Preconditioning in Production Reservoir Simulation
PRESENTATION NOT AVAILABLE

Numerical reservoir simulation is an important tool for the oil and gas industry. It is used to aid the development and forecasting of hydrocarbon reservoirs. With recent advances in parallel reservoir simulation, we have the capability to simulate larger and more realistic problems in much short time frame than ever before. In a fully-implicit simulator, most of the computational time is spent on solving a sequence of large-scale and ill-conditioned Jacobian resulting from the discretization of the material balance equations. Consequently, an efficient preconditioning strategy is required.

In this talk, we share our experience on using CPR (Constrained Pressure Residual) and BILU (Block Incomplete LU) preconditioners for large scale linear systems resulting from black-oil and compositional reservoir simulations on massively parallel computing architectures. Our numerical findings demonstrate that the CPR preconditioner is more efficient than BILU when running on a moderate core count, while the superior scalability of BILU makes it the preferred choice on larger core counts. This illustrates the importance of having multiple preconditioning options.


Thursday March 3, 2016 10:50am - 11:10am
BioScience Research Collaborative Building (BRC), Room 280 & 282

10:50am

Facilities, Infrastructure & Visualization: Visualization of Massive Seismic Data in HPC
PRESENTATION NOT AVAILABLE

Seismic data processing is a very important tool to revealing geological structures, lithology properties, and fluid contents from seismic field data. It is also a computationally intensive operation when applied to the multiple terabytes of data acquired in a modern 3D survey. In many cases, the processing steps consist of applying relatively simple algorithms to these massive datasets. To obtain results in a timely fashion, the data is commonly processed in High Performance Computing (HPC) center. The processing results need to be visualized and verified. In this abstract, we described the approaches that the graphics server concept has been adapted to provide interactive 2D and 3D visualization for these massive datasets in HPC. Researchers transparently use the same SSH security of HPC for visualization. A web portal lets users easily obtain a remote desktop that runs in HPC graphics server. Multiple users can share a graphics server. As a result, researchers can interactively visualize hundreds of gigabytes of 2 dimensional datasets that was not able to be displayed previously. With graphics hardware in graphics server, the performance of 3D visualization is much better than the local workstation.


Thursday March 3, 2016 10:50am - 11:10am
BioScience Research Collaborative Building (BRC), Room 103

11:10am

Algorithms & Accelerators II: An Efficient High Accuracy Discretization and Direct Solution Technique for Variable Coefficient Partial Differential Equations
WATCH THE PRESENTATION

The ability to efficiently and accurately solve variable coefficient partial differential equations (PDEs) is a critical for numerical simulations in seismic imaging. In this talk, we present a high-order accurate discretization technique for these challenging problems that comes complete with with an efficient and robust direct solver. The method utilizes local high order discretization gluing neighboring regions with continuum operators. The resulting sparse linear system is inherently amenable to a direct solver similar to nested dissection whose asymptotic scaling is no worse than O(N^{3/2}) precomputation where N is the number of discretization points. The cost of the applying the solver scales (at worst O(N log(N)) with a tiny constant). The result is a method that ideally suited for the ill-conditioned problems with many right hand-sides that consistently arise in the seismic imaging community. For applications where the coefficients of the PDE change locally in the geometry, such as in many inversion algorithms, the proposed method is naturally able to re-use information from the static regions making local updates extremely inexpensive. Numerical results will illustrate the performance of the proposed method.

Speakers

Thursday March 3, 2016 11:10am - 11:30am
BioScience Research Collaborative Building (BRC), Room 280 & 282

11:10am

Facilities, Infrastructure & Visualization: Big Seismic Data: Increase Performance for HPC and Interpretation, and Reduce Infrastructure Cost
WATCH THE PRESENTATION

Oil companies and service companies amass seismic data at the rate of hundreds of terabytes or petabytes per year. Driven by an increase in resolution and new acquisition methods, seismic data sets are larger than ever before. As a consequence, data storage system sales is the fastest growing segment according to official numbers.
Another consequence is that the internal networks quickly become a bottleneck. As users require access to increasingly larger datasets, whether pre-stack or post-stack, networks are constantly overloaded. We will present a new software-based approach to significantly improve storage capacity and simultaneously increase effective network bandwidth between central storage and consumers of seismic data. The described approach does not require upgrades or modifications to hardware, and thus enables the oil company or service company to leverage previous investments. Compared to commonly available commercial software, Hue’s implementation is approximately 25 times faster on compression and more than 200 times faster on decompression. This speed ensures that the overhead of compression and decompression is minimized. In fact it becomes a lot faster to access compressed data than the original data, providing a significant I/O boost to any application using the compressed data. As will be described, the proposed approach is generally transparent to end users.

Speakers
avatar for Michele Isernia

Michele Isernia

VP Strategy & Alliances, Hue Technology N.A
Ideas and Innovation grounded to global business development, mostly in "enterprise" type businesses.



Thursday March 3, 2016 11:10am - 11:30am
BioScience Research Collaborative Building (BRC), Room 103

11:30am

Algorithms & Accelerators II: Hybrid Parallel Implementation of the DG Method
WATCH THE PRESENTATION

The majority of supercomputers today are built using the so-called hybrid architecture where the central processing units are complemented by one or more accelerated units. In order to achieve optimal performance on these hybrid architectures many parallel applications must be modified to support more than one level of parallelism. This requirement can be realized by combining multiple parallel programming models and their implementations which are designed to perform best on their targeted architectures. In this work, we evaluate the parallel scalability of the Discontinuous Galerkin (DG) method using a hybrid parallel programming model. We combine three levels of parallelism to achieve the best performance on a GPU-enabled supercomputer. We also compare the performances of a purely distributed implementation with our hybrid implementation.

Speakers
NC

Nabil Chaabane

Postdoc fellow, Rice University
Numerical methods for PDEs, High performance computing, MPI, openMP, GPU
avatar for Beatrice Riviere

Beatrice Riviere

Noah Harding Chair and Professor, Rice University



Thursday March 3, 2016 11:30am - 11:50am
BioScience Research Collaborative Building (BRC), Room 280 & 282

11:30am

Facilities, Infrastructure & Visualization: Data Centric Optimizations of Seismic Natural Migration Algorithm at Scale on Parallel File Systems and Burst Buffer
WATCH THE PRESENTATION

Parallel I/O is an integral component of modern high performance computing, especially in storing and processing very large datasets, such as the case of seismic imaging applications. The storage hierarchy includes nowadays additional layers, the latest being the usage of SSD based storage as a burst buffer for I/O acceleration.
We analyze here the performance of an I/O intensive seismic application, natural migration algorithm at scale, on a large installation of Lustre parallel file system and SSD-based burst buffer. Our results show a significant performance improvement by tuning the Lustre stripe count and its counterpart in burst buffer technology for various node counts. The advantage of burst buffer is demonstrated with up to 34% performance improvement when compared to Lustre.

Speakers
AA

Abdullah Altheyab

King Abdullah University of Science and Technology
avatar for Saber Feki

Saber Feki

Computational Scientist, KAUST Supercomputing Laboratory
Saber Feki received his PhD and M.S in computer science at the University of Houston in 2008 and 2010 respectively. In 2011, he joined the oil and gas industry with TOTAL as an HPC Research Scientist working on seismic imaging applications using different programming models including CAF, OpenACC and HMPP. Saber currently holds the position of a computational scientist at the KAUST Supercomputing Laboratory where he was part of the technical... Read More →



Thursday March 3, 2016 11:30am - 11:50am
BioScience Research Collaborative Building (BRC), Room 103

11:50am

Algorithms & Accelerators II: A Survey of Sparse Matrix-Vector Multiple Performance on Large Matrices
PRESENTATION NOT AVAILABLE

Iterative linear solvers are popular in large-scale
computing as they consume less memory than direct solvers.
Contrary to direct linear solvers, iterative solvers approach the
solution gradually requiring the computation of sparse matrix-
vector (SpMV) products. The evaluation of SpMV products
can emerge as a bottleneck for computational performance
within the context of the simulation of large problems. In this
work, we focus on a linear system arising from the discretiza-
tion of the Cahn{Hilliard equation, which is a fourth order non-
linear parabolic partial differential equation that governs the
separation of a two-component mixture into phases [3]. The
underlying spatial discretization is performed using the dis-
continuous Galerkin method and Newton's method.
A number of parallel algorithms and strategies have been eval-
uated in this work to accelerate the evaluation of SpMV prod-
ucts. “

Speakers
MA

Mauricio Araya

Senior Researcher Computer Science, Shell Intl. E&P Inc.
FF

Florian Frank

Rice University


Thursday March 3, 2016 11:50am - 12:10pm
BioScience Research Collaborative Building (BRC), Room 280 & 282

11:50am

Facilities, Infrastructure & Visualization: Advanced Parallel IO Libraries Study for Seismic Depth Imaging Applications
WATCH THE PRESENTATION

Seismic applications such as the Reverse Time Migration (RTM) are very demanding on HPC resources, being compute, memory or storage. Using those resources efficiently is critical on an industrial production system where a full processing campaign can take up to several months of intensive computations. Henceforth, extracting a maximum of performances from every part of a seismic processing application is a necessity.

IO operations are a critical part of a HPC seismic application and the nature of IO in this domain is of a great diversity in terms of access pattern: serial, parallel on a shared file or independently. Optimizing IO can become complex when considering the multiple level of storage in parallel at the local or system level.

While parallel IO in HPC environments is typically achieved through a mix of MPI-IO \cite{thakur1997users} for shared file IO and POSIX-IO for single file per process data accesses. Extracting good performances from the underlying file system at scale is difficult and requires a lot of optimization, tuning and boiler plate code. For this reason, advanced parallel IO libraries have been subject to an increasing interest from the Oil and Gas industry due to the advanced data management semantics they propose, the implementation simplicity and high performances.

We propose here a study on the performances of two of those libraries, namely parallel HDF5 and ADIOS for checkpointing and shared file access in parallel. This study has been done on several HPC systems in an industrial environment. We show that using advanced parallel IO libraries provides a good trade-off in terms performances for seismic software.


Thursday March 3, 2016 11:50am - 12:10pm
BioScience Research Collaborative Building (BRC), Room 103

12:10pm

1:30pm

Plenary: 'The Evolution of a Comprehensive Computation and Data Infrastructure at the Texas Advanced Computing Center', Dan Stanzione, TACC

Speakers
avatar for Dan Stanzione

Dan Stanzione

Executive Director, Texas Advanced Computing Center, The University of Texas at Austin
Dr. Stanzione is the Executive Director of the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. A nationally recognized leader in high-performance computing, Stanzione has served as Deputy Director since June 2009 and assumed the Executive Director post on July 1, 2014. |   | He is the principal investigator (PI) for several leading projects including a multimillion-dollar National Science Foundation (NSF) grant... Read More →



Thursday March 3, 2016 1:30pm - 2:00pm
BioScience Research Collaborative Building (BRC), Room 103

2:00pm

Plenary: 'HPC Workforce Challenges', Barbara Chapman, Stony Brook University & University of Houston
WATCH THE PRESENTATION

Simulation and computing are essential to a significant fraction of today’s research in academia, government laboratories and in industry. They are also the basis for the design and development of many products. With the increasing reliance of US industry on computing for its business, demand for HPC-related skills in a range of disciplines including Computer Science, Applied Mathematics, Statistics and domain sciences, is expected to grow.  In this presentation, we discuss the findings of a report from the US Department of Energy on HPC workforce challenges.

Speakers


Thursday March 3, 2016 2:00pm - 2:30pm
BioScience Research Collaborative Building (BRC), Room 103

2:30pm

Keynote: 'The Path to Capable Exascale Computing', Paul Messina, ANL
WATCH THE PRESENTATION

Exascale computing has been the subject of study and analysis for almost ten years. Dozens of voluminous reports have been published. R&D related to the many issues and challenges involved in achieving usable and affordable exascale computers has resulted in thousands of papers and presentations. The time has come to mount a focused effort that applies the insights learned by those studies to build exascale systems. President Obama’s Executive order of July 2015 established the National Strategic Computing Initiative, a key objective of which is to accelerate delivery of a capable exascale computing system.  The Executive Order assigns the lead role for pursuing that objective to the US Department of Energy Office of Science and the DOE Nuclear Security Administration.

This talk will present an overview of the efforts that are underway to put in place a joint DOE-SC and NNSA project that will result in a capable exascale ecosystem and prepare mission critical scientific and engineering applications to take advantage of that ecosystem.

Speakers
avatar for Paul Messina

Paul Messina

Argonne National Laboratory (ANL)
Dr. Paul Messina is a senior strategic advisor and Argonne Distinguished Fellow at Argonne National Laboratory. During 2008-2015 he served as Director of Science for the Argonne Leadership Computing Facility and in 2002-2004 as Distinguished Senior Computer Scientist at Argonne and as Advisor to the Director General at CERN (European Organization for Nuclear Research). | | From 1987-2002, Dr. Messina served as founding Director of California... Read More →


Thursday March 3, 2016 2:30pm - 3:15pm
BioScience Research Collaborative Building (BRC), Room 103

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

Poster: GPU Accelerated Hermite Methods for the Simulation of Waves
Speakers
AV

Arturo Vargas

Graduate Student, Rice University
Graduate student working in numerical methods for hyperbolic equations.


3:15pm

3:15pm

3:15pm

3:15pm

Poster: Optimizations of Explicitly Parallel Programs Using Polyhedral Techniques
Speakers
avatar for Prasanth Chatarasi

Prasanth Chatarasi

Graduate Student, Rice University
Parallel computing ; | Optimizing compilers ; | Perforamance ; | OpenMP;


3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

3:15pm

Poster: Towards PETSc-based OPM Upscaling of Relative Permeability as a Cloud Service
Speakers
avatar for Anne Elster

Anne Elster

Professor/ Visiting Scholar, NTNU Trondheim, Norway and UT Austin
Founder and Director – HPC-Lab, Dept. of Computer & Info. Science NTNU, which started as my research group when I joined NTNU in 2001, and was founded as a lab in 2008. We are now an established research lab in Heterogeneous and Parallel Computing, consisting of myself, several Post Docs, PhD students and Master students as well as both national and international visiting colleagues and scientists who want to learn about and experiment with... Read More →