Two-dimensional (2-D) convolution is a common operation in a wide range of signal and image processing applications such as edge detection, sharpening, and blurring. In the hardware implementation of these applications, 2d convolution is one of the most challenging parts because it is a compute-intensive and memory-intensive operation. To address these challenges, several design techniques such as pipelining, constant multiplication, and time-sharing have been applied in the literature which leads to convolvers with different implementation features. In this paper, based on design techniques, we classify these convolvers into four classes named Non-Pipelined Convolver, Reduced-Bandwidth Pipelined Convolver, MultiplierLess Pipelined Convolver, and Time-Shared Convolver. Then, implementation features of these classes, such as critical path delay, memory bandwidth, and resource utilization, are analytically discussed for different convolution kernel sizes. Finally, an instance of each class is captured in Verilog and their features are evaluated by implementing them on a Virtex-7 FPGA and reported confirming the analytical discussions.
Keywords: Convolution, Compute-Intensive, Critical Path, MemoryIntensive, Resource Utilization