3. Samples List¶
3.1. Introduction¶
Basic TopsCC samples for beginners that illustrate key concepts with using TopsCC and TopsCC runtime APIs.
3.1.1. simpleStreams¶
This sample illustrates the usage of TOPS streams for overlapping kernel execution with device/host memcopies. The kernel is used to initialize an array to a specific value, after which the array is copied to the host (CPU) memory. To increase performance, multiple kernel/memcopy pairs are launched asynchronously, each pair in its own stream. Kernels are serialized. Thus, if n pairs are launched, streamed approach can reduce the memcopy cost to the (1/n)th of a single copy of the entire data set. Additionally, this sample uses TOPS events to measure elapsed time. Elapsed times are averaged over nreps repetitions (10 by default).
3.1.3. simplePrintf¶
This is a simple example of using printf() inside a kernel.
3.1.4. simpleAssert¶
This is a simple example of using assert() inside a kernel.
3.1.5. simpleTemplates¶
This sample is a templatized version of the template project.
3.1.6. simpleMultiThread¶
This sample demonstrates how to launch the kernel of topscc in multithreading
3.1.7. simpleZeroCopy¶
This sample illustrates how to use Zero MemCopy, kernels can read and write directly to pinned system memory.
3.1.8. simpleMultiGCU¶
This sample illustrates how to use TOPS API to use multiple GCUs.
3.1.9. simpleMultiCopy¶
This sample illustrates how to use Tops streams to achieve overlapping of kernel execution with data copies to and from the device.
3.1.10. simpleVectorAdd¶
This sample implements element by element vector addition.
3.1.11. asyncAPI¶
This sample illustrates the usage of TOPS events for both GCU timing and overlapping CPU and GPU execution. Events are inserted into a stream of calls. Since stream calls are asynchronous, the CPU can perform computations while GCU is executing. CPU can query events to determine whether GPU has completed tasks.
3.1.12. simpleP2P¶
This sample demonstrates how to copy device memory from Peer to Peer (P2P) directly, or how to use remote device memory in a kernel, and measures the bandwidth of a P2P memory copy.
3.1.13. simpleIPC¶
This sample is a very basic sample that demonstrates Inter Process Communication with one process per GCU for computation.
3.1.14. simpleRTC¶
This sample demonstrated how to use the topsrtc mode.
To run RTC cases, please set environment of TopsCC installing location:
export CAPS_HOME=/topscc/location
By default it will be /opt/tops
.
3.2. Utilities¶
Utility samples that demonstrate how to query device capabilities and measure GCU/CPU bandwidth.
3.2.1. deviceQuery¶
This sample queries the properties of the GCU devices present in the system via Host Runtime API.
3.2.2. kernelEfficiency¶
This sample meatures elapsed time to launch an empty kernel.
3.3. Concepts and Techniques¶
Samples that demonstrate TopsCC related concepts and common problem solving techniques.
3.3.1. simpleElemwiseAdd¶
This sample illustrates multiple implementations of element-by-element addition, including algorithms of scalar addition and vector addition, and also measures the performance of each implementation.
3.3.2. simpleReductionAdd¶
This sample illustrates multiple implementations of the reduction addition of scalar, vector, and vector_async, and also measures the performance of each implementation.
3.4. TOPS Features¶
Samples that demonstrate TopsCC & Runtime Features (Scatter memory, Executables, etc.).
3.4.1. memoryUsage¶
This sample demonstrats how to get memory usage of a specific device.
3.4.3. scatterMemory¶
This sample demonstrats how to use scatter memory to slice/deslice memories and better utilize device memory on different MCs.
3.4.4. resourceBundle¶
This sample demonstrats how to create 1pg/3pg/6pg resource bundles, and use the resource bundles in multiple threads.
3.4.5. executableDump¶
This sample demonstrats how to create an executable from a prebuild binary. It can be created directly from a file, or from a preloaded binary buffer.
4. References¶
TopsCC Programming Guide