Softcore stream processor for FPGA-based DSP / Peng Wang

  • Peng Wang

Student thesis: Doctoral ThesisDoctor of Philosophy


Modern DSP applications present increasingly high computational requirements and keep evolving in nature. In these applications (e.g. high definition video, wireless communication), there are fundamental DSP kernels playing central roles, including Fast Fourier Transform (FFT), Matrix Multiplication (MM), Motion Estimation (ME), and Multiple-Input Multiple-Output (MIMO) decoders. The implementations of these kernels present substantial challenges for the underlying DSP devices to be able to deliver not only high enough throughput but also sufficient flexibility to adapt as standards evolve. Current implementations of DSP applications are predominantly on Application Specific Integrated Circuits (ASICs) for the sake of performance and energy efficiency. However, ASICs have long design cycle and the fabrication cost associated with ASICs can be millions of dollars, further, ASICs are designed for a specific application making them difficult to modify or reuse after fabrication. Due to these issues, current DSP implementations are shifting to more flexible solutions. Modern Field Programmable Gate Arrays (FPGAs) host a vast array of logic, hardwired DSP slices and memory resources combined with reconfigurability, emerging as a promising platform for DSP implementations. However, to provide the demanding level of compu- tational capacity required by modern DSP applications, the current manner of programming FPGA still relies on design of dedicated circuits whilst current FPGAs have grown in complexity to the extent that such designs expressed at the Register Transfer Level (RTF) have become very large. This has prompted the emergence of ’soft' processor architectures, hosted on the FPGAs reconfigurable fabric. However, existing softcore processors are still constrained in terms of performance, resource efficiency and applicability. In this thesis, these issues are addressed by a proposed Softcore Stream Processor (SSP). The SSP features a streaming processing architecture optimised for the FPGA-based DSP and a variety of configurable aspects for application-specific optimisation. It is able to deliver combined high performance and low resource cost. The SSP is used to create the first software defined high-performance FFT processor on FPGA with comparable resource efficiency to dedicated circuitry. In addition, an SSP-based software defined FFT processor is demonstrated for IEEE 802.1 lac FFT, enabling the first recorded software defined 802.1 lac FFT architecture with real-time processing ability for 8 channels and all required bandwidths. More importantly, it demonstrates not only it can offer a flexible, real-time processing, it also achieves reductions in resource cost of, on average, 65%, compared to dedicated Xilinx FFT designs. These implementations strongly indicate SSP's suitability and potential for existing and emerging DSP. Sliding window applications as an important subdomain of DSP applications are also targeted in this thesis. This kind of application commonly involves complex data addressing and tightly nested loops. Through introducing dedicated addressing and hardware loop execution to SSP, the highest performance softcore-based large matrix multiplication and motion estimation are achieved. The implementations achieve over an order of magnitude higher resource efficiency when compared to current best metrics achieved by soft vector processors. This clearly shows the advantage of proposed softcore approach in terms of performance and resource efficiency. In addition to the novel softcore architecture, a model-level SSP platform synthesis flow is presented to allow generation of high-quality real-time DSP on SSP in a systematic and automated way. By improving the design level to arithmetic level, the effort related to utilising the highly configurable and multiprocessing processor architecture is greatly reduced. By using the platform synthesis flow, automatic generation of a real-time MIMO decoder is recorded for the first time and real-time H.264 ME is also demonstrated.
Date of AwardJul 2014
Original languageEnglish
Awarding Institution
  • Queen's University Belfast

Cite this