![]() Phaseless pointer an important optimization.Performance of Shared Pointer Arithmetic 1 cycle = 1.5ns Scale: test message pipelining and software pipelining.Measure the effectiveness of various communication optimizations.Performance-tuning benchmarks (Costin).Vector addition: no remote communication.Shared pointer arithmetic, forall, allocation, etc.Measures the cost of UPC language features and constructs.Compaq C compiler for the translated C code.Compaq AlphaServer in ORNL, with Quadrics conduit.May have a different representation for phaseless pointers (skipping the phase field).Modular design means easy to add new representations (packed format done in one day).Can easily switch at compiler installation time.Representation is hidden in the runtime layer.Portability and performance balance in UPC compiler.Struct format for large-scale programs.Packed 8-byte format that gives better performance.Compiler offers two pointer-to-shared configurations.But very large machines may require a longer representation.Use of packed 8-byte format may allow pointers to reside in a single register.These are important in C, because array reference are based on pointers - Smaller pointer size may help performance.Faster pointer manipulation, e.g., ptr+int as well as dereferencing.Use of scalar types rather than a struct may improve backend code quality.Shared pointer representation trade-offs.Important to performance, since it affects all shared operations.Don’t need to update thread id for indefinite phaselessĪddress Thread Phase Accessing Shared Memory in UPC start of array object Shared Memory … block size Phase … Thread 0 Thread 1 Thread N -1 0 addr 2.Don’t need to keep phase in pointer operations for cyclic and indefinite.Indefinite blocked pointers only have one block. ![]() Source of overhead for updating and dereferencing.A pointer needs a “phase” to keep track of where it is in a block.Indefinite (local to allocating thread): shared double *a = (shared double *) upc_alloc(n).UPC has three different kinds of distributed arrays:.Convert Whirl back to C, with shared variables declared as opaque pointer-to-shared types.Apply standard optimizations and analyses.Calls can be blocking/non-blocking/bulk/register-based Transform shared read and writes into calls into runtime library.Parses and type-checks UPC code, generates Whirl, with UPC-specific information available in symbol table.UPC extensions to C: shared qualifier, block size, forall loops, builtin functions and values (THREADS, memget, etc), strict/relaxed.Will present our implementation in Open64 workshop in March UPC front end Whirl w/ shared types Backend lowering Whirl w/ runtime calls Whirl2c ANSI-compliant C Code.Communicate with runtime via a standard API and configuration files.Implementing the UPC to C Translator Source File Translator can perform communication optimizations.Native C compiler optimizes serial code.GASNet itself has a layered design with a small core.GASNet: An uniform interface for low-level communication primitives.UPC runtime: Allocate/initialize shared data, perform operations on pointer-to-shared.UPC to C translator: Translates UPC code into C, inserting calls to the runtime library for parallel features.Overview of Berkeley UPC Compiler Compilation all encapsulated in one “upcc” command UPC Code Translator Platform- independent Translator Generated C Code Network- independent Berkeley UPC Runtime System Compiler- independent GASNet Communication System Language- independent Network Hardware Two Goals: Portability and High-Performance The Berkeley UPC Compiler: Implementation and Performance Wei Chen the LBNL/Berkeley UPC Group
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |