
skcuda.cublas.cublasSgeam(handle, transa, transb, m, n, alpha, A, lda, beta, B, ldb, C, ldc)[source]

Matrix-matrix addition/transposition (single precision real).

Computes the sum of two single precision real scaled and possibly (conjugate) transposed matrices.

  • handle (int) – CUBLAS context
  • transb (transa,) – ‘t’ if they are transposed, ‘c’ if they are conjugate transposed, ‘n’ if otherwise.
  • m (int) – Number of rows in A and C.
  • n (int) – Number of columns in B and C.
  • alpha (numpy.float32) – Constant by which to scale A.
  • A (ctypes.c_void_p) – Pointer to first matrix operand (A).
  • lda (int) – Leading dimension of A.
  • beta (numpy.float32) – Constant by which to scale B.
  • B (ctypes.c_void_p) – Pointer to second matrix operand (B).
  • ldb (int) – Leading dimension of A.
  • C (ctypes.c_void_p) – Pointer to result matrix (C).
  • ldc (int) – Leading dimension of C.


>>> import pycuda.autoinit
>>> import pycuda.gpuarray as gpuarray
>>> import numpy as np
>>> alpha = np.float32(np.random.rand())
>>> beta = np.float32(np.random.rand())
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(2, 3).astype(np.float32)
>>> c = alpha*a+beta*b
>>> a_gpu = gpuarray.to_gpu(a)
>>> b_gpu = gpuarray.to_gpu(b)
>>> c_gpu = gpuarray.empty(c.shape, c.dtype)
>>> h = cublasCreate()
>>> cublasSgeam(h, 'n', 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get(), c)
>>> a = np.random.rand(2, 3).astype(np.float32)
>>> b = np.random.rand(3, 2).astype(np.float32)
>>> c = alpha*a.T+beta*b
>>> a_gpu = gpuarray.to_gpu(a.T.copy())
>>> b_gpu = gpuarray.to_gpu(b.T.copy())
>>> c_gpu = gpuarray.empty(c.T.shape, c.dtype)
>>> transa = 'c' if np.iscomplexobj(a) else 't'
>>> cublasSgeam(h, transa, 'n', c.shape[0], c.shape[1], alpha, a_gpu.gpudata, a.shape[0], beta, b_gpu.gpudata, b.shape[0], c_gpu.gpudata, c.shape[0])
>>> np.allclose(c_gpu.get().T, c)
>>> cublasDestroy(h)

