Sha256: 67f54e6424722662da6b270bbd70374048d01395545ff5f4e21af8b84f50745c
Contents?: true
Size: 984 Bytes
Versions: 2
Compression:
Stored size: 984 Bytes
Contents
// First naive implementation % c_dtype = dtype_to_c_type(dtype) __kernel void gemm_<%= dtype %>(const int M, const int N, const int K, const __global <%= c_dtype %>* A, const __global <%= c_dtype %>* B, __global <%= c_dtype %>* C) { // Get the index of the current element to be processed const int index = get_global_id(0); const int globalRow = get_global_id(1); // Row ID of C (0..M) const int globalCol = get_global_id(2); // Col ID of C (0..N) // Compute a single element (loop over K) <%= c_dtype %> acc = 0.0f; for (int k=0; k<K; k++) { int a_index = globalRow*K + k; int b_index = k*N + globalCol; <% if ta %>a_index = M*k + globalRow;<% end %> <% if tb %>b_index = globalCol*K + k;<% end %> acc += A[a_index + index * <%= n_a %>] * B[b_index + index * <%= n_b %>]; } // Store the result C[index * <%= n %> + globalRow*N + globalCol] = acc; }
Version data entries
2 entries across 2 versions & 1 rubygems
Version | Path |
---|---|
tensor_stream-opencl-0.3.2 | lib/tensor_stream/opencl/kernels/gemm.cl |
tensor_stream-opencl-0.3.1 | lib/tensor_stream/opencl/kernels/gemm.cl |