Nvidia Cuda Toolkit

CUDA® is a parallel computing platform and programming model developed by NVIDIA. It allows you to significantly improve computing performance by harnessing the power of the graphics processing unit (GPU).

CUDA was developed with several design goals in mind:

  • Provide a small set of extensions to standard programming languages ​​such as C that provide simple implementations of parallel algorithms. With CUDA C/C++programmers can focus on the task of parallelizing algorithms rather than wasting time on their implementation.
  • Support for heterogeneous computing, where applications use both the CPU and GPU. The sequential portions of the applications are executed on the CPU, while the parallel portions are offloaded to the GPU. In this way, CUDA can be gradually applied to existing applications. The CPU and GPU are treated as separate devices that have their own memory areas. This configuration also allows simultaneous computation on the CPU and GPU without competing for memory resources.

GPUs that support CUDA have hundreds of cores that can collectively run thousands of compute threads. These cores share resources, including a register file and shared memory. On-chip shared memory allows parallel tasks running on these cores to communicate without sending them across the system memory bus.

This guide shows you how to install and verify that the CUDA development tools are working properly.

Features and highlights

    • GPU timestamp: Launch timestamp
    • Method: The name of the GPU method. This is either “memcpy*” for memory copies or the name of the GPU core. Memory copies have a suffix that describes the type of memory transfer, for example, “memcpyDToHasync” means asynchronous transfer from device memory to host memory
    • GPU time: This is the time it takes to execute the method on the GPU
    • CPU Time: This is the sum of the GPU time and the CPU load to run this method. At the driver-generated data level, CPU time is only the CPU load of running a method for non-blocking methods; for blocking methods, this is the sum of GPU time and CPU load. All kernel runs are non-blocking by default. But if any profiler counters are enabled, the kernel is blocked from starting. Requests for asynchronous memory copying in different threads are not blocked
    • Thread ID: identification number for the thread
    • Columns for kernel methods only
    • Occupancy: Occupancy is the ratio of the number of active warps per multiprocessor to the maximum number of active warps
    • Profiler Counters: For a list of supported counters, see Profiler Counters
    • grid size: The number of blocks in the grid along dimensions X,Y and Z is displayed as [num_blocks_X num_blocks_Y num_blocks_Z] in one column
    • block size: The number of threads in a block along the dimensions X, Y and Z is displayed as [num_threads_X num_threads_Y num_threads_Z] in one column
    • dyn smem per block: dynamic size of shared memory per block in bytes
    • sta smem per block: static size of shared memory per block in bytes
    • registration in stream: number of registers in stream
    • Columns for memcopy methods only
    • memory transfer size: size of memory transferred in bytes
    • Host memory transfer type: Indicates whether the memory transfer uses page-scanned or page-locked memory

Verifying the installation

Follow these steps to verify the installation −

Step 1 − Check the version of CUDA toolkit by entering nvcc -V at the command line.

Step 2 − Run deviceQuery.cu located at: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\bin\ win64\Releaseto view information about your video card. The result will look like this −

Step 3 − Run the throughput test located at C:\ProgramData\NVIDIA Corporation\CUDA Samples\v9.1\bin\win64\Release. This ensures that the host and device can communicate with each other correctly. The output will look like this −

If any of the above tests fail, it means that the toolkit was not installed properly. Repeat the installation following the instructions above.

Download Cuda Toolkit

Прокрутить вверх