CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK |
The maximum number of threads per block, beyond which a launch of the function would fail. This number depends on both the function and the device on which the function is currently loaded. |
CU_FUNC_ATTRIBUTE_SHARED_SIZE_BYTES |
The size in bytes of statically-allocated shared memory required by this function. This does not include dynamically-allocated shared memory requested by the user at runtime. |
CU_FUNC_ATTRIBUTE_CONST_SIZE_BYTES |
The size in bytes of user-allocated constant memory required by this function. |
CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES |
The size in bytes of local memory used by each thread of this function. |
CU_FUNC_ATTRIBUTE_NUM_REGS |
The number of registers used by each thread of this function. |
CU_FUNC_ATTRIBUTE_PTX_VERSION |
The PTX virtual architecture version for which the function was compiled. This value is the major PTX version * 10 + the minor PTX version, so a PTX version 1.3 function would return the value 13. Note that this may return the undefined value of 0 for cubins compiled prior to CUDA 3.0. |
CU_FUNC_ATTRIBUTE_BINARY_VERSION |
The binary architecture version for which the function was compiled. This value is the major binary version * 10 + the minor binary version, so a binary version 1.3 function would return the value 13. Note that this will return a value of 10 for legacy cubins that do not have a properly-encoded binary architecture version. |