Education
Publications and Reports
| Adaptive
Inhomogeneous PDE Solvers for Complex Geometries
H. Langston, L. Greengard, and D. Zorin
In Preparation, 2012
 |
A Free-Space Adaptive FMM-Based PDE Solver in
Three Dimensions
H. Langston, L. Greengard, and D. Zorin
Communications in Applied Mathematics and
Computational Science, vol. 6, no. 1, 2011, pp. 79–122
We present a kernel-independent, adaptive fast
multipole method (FMM) of arbi- trary order
accuracy for solving elliptic PDEs in three
dimensions with radiation and periodic boundary
conditions. The algorithm requires only the
ability to evaluate the Green's function for the
governing equation and a representation of the
source distribution (the right-hand side) that
can be evaluated at arbitrary points. The
performance is accelerated in three ways. First,
we construct a piecewise polynomial
approximation of the right-hand side and compute
far-field expansions in the FMM from the
coefficients of this approximation. Second, we
precompute tables of quadratures to handle the
near-field interactions on adaptive octree data
structures, keeping the total storage
requirements in check through the exploitation
of symmetries. Third, we employ shared-memory
parallelization methods and load-balancing
techniques to accelerate the major algorithmic
loops of the FMM. We present numerical examples
for the Laplace, modified Helmholtz and Stokes equations.
 | A
massively parallel adaptive fast-multipole method on
heterogeneous architectures
I. Lashuk, A. Chandramowlishwaran, H. Langston,
T.-A. Nguyen, R. S. Sampath, A. Shringarpure,
R. W. Vuduc, L. Ying, D. Zorin, and G. Biros
Proceedings of SC09 ACM/ICEE SCXY Conference Series,
pp. 1-11, 2009, Best Paper Finalist
We present new scalable algorithms and a new
implementation of our kernel-independent fast
multipole method (Ying et al. ACM/IEEE SC '03),
in which we employ both distributed memory
parallelism (via MPI) and shared
memory/streaming parallelism (via GPU
acceleration) to rapidly evaluate two-body
non-oscillatory potentials. On traditional
CPU-only systems, our implementation scales well
up to 30 billion unknowns on 65K cores
(AMD/CRAYbased Kraken system at NSF/NICS) for
highly non-uniform point distributions. We
achieve scalability to such extreme core counts
by adopting a new approach to scalable MPI-based
tree construction and partitioning, and a new
reduction algorithm for the evaluation
phase. Taken together, these components show
promise for ultrascalable FMM in the petascale
era and beyond.
|  | Cash:
Distributed Cooperative Buffer Caching
C. Decoro, H. Langston, J. Weinberger
Technical Report, 2004
Modern servers pay a heavy price in block access
time on diskbound workloads when the working set
is greater than the size of the local buffer
cache. We provide a mechanism for cooperating
servers to coordinate and share their local
buffer caches. The coordinated buffer cache can
handle working sets on the order of the
aggregate cache memory, greatly imptoving
performance on diskbound workloads. This
facility is provided with minimal communication
overhead, no penalty for local cache hits, and
without any explicit kernel support.
|  | A
New Parallel Kernel-Independent Fast Multipole
Method
L. Ying, G. Biros, H. Langston, and
D. Zorin
Proceedings of SC03 ACM/ICEE SCXY Conference Series,
pp. 1-11, 2003, Best Student Paper, Gordon Bell
prize finalist
We present a new adaptive
fast multipole algorithm and its parallel
implementation. The algorithm is
kernel-independent in the sense that the
evaluation of pairwise interactions does not rely
on any analytic expansions, but only utilizes
kernel evaluations. The new method provides the
enabling technology for many important problems in
computational science and engineering. Examples
include viscous flows, fracture mechanics and
screened Coulombic interactions. Our MPI-based
parallel implementation logically separates the
computation and communication phases to avoid
synchronization in the upward and downward
computation passes, and thus allows us to fully
exploit computation and communication
overlapping. We measure isogranular and fixed-size
scalability for a variety of kernels on the
Pittsburgh Supercomputing Center's TCS-1
Alphaserver on up to 3000 processors. We have
solved viscous flow problems with up to 2.1
billion unknowns and we have achieved 1.6 Tflops/s
peak performance and 1.13 Tflops/s sustained performance.
| | |
Research Projects and Codes
3D Kernel-Independent FMM-Based Free-Space and Periodic Volume Solver
This volume code will be posted soon!
|
Brain Tumor and Heart Image Registration Experiments
Images, movies and results of work with Rahul
Sampath and George Biros. Being processed and will
be posted soon.
|
Kernel Independent 3D Fast Multipole Method
This is the original KIFMM3d code
of Lexing
Ying, for
which documentation is provided here.
|
Fluid
Dynamics Visualization
Scientific visualization performed with George
Biros and Lexing Ying,
using their embedded integral solver for the Stokes
equations. Full videos and pictures for some
results as appeared at SC03.
|
Line Integral
Convolution (LIC) code
Working with Denis Zorin, George Biros and Lexing Ying to
visualize 2d fluid dynamics simulations, specifically Navier Stokes
problems in both steady and unsteady-states.
Resulting images include 2d unsteady cavity problems
for different Reynold's numbers. Code is attached
as well as some movies formed from static images.
|
Vortex Methods
Code and images/movies for simulating fast-moving
fluids areound abjects in two and three
dimensions. Code currently being processed for
distribution and will be posted soon.
|
Fast Poisson Solver in 2D
Fast Fourier Transform based Poisson equation
solver in a regular two dimensional domain with
inhomogeneous Dirichlet boundary conditions. All
code in Matlab.
|
HotShell
A customizable Unix shell, designed to sit on top
of an existing shell with augmented commands in
Perl and Korn shell scripting languages.
|
File-Derived Motion Machine
A motion machine for animating an articulated
figure, in this case a human figure, derived with
simple cubes. Examples for specific text-file
inputs and JAVA code attached.
|
Motion-Capture Experiments
Simple motion capture and image processing
experiments and projects.
|
|