Home

Montium® Tile Processor

The Montium® Tile Processor (TP) is a programmable architecture that obtains significantly lower energy consumption than DSPs for fixed-point digital signal processing algorithms. It targets computational intensive algorithm kernels that are dominant in both power consumption and execution time.

montium_tile02b.gifThe Tile Processor is typically used as an accelerator core in combination with a lightweight general purpose processor. In contrast to a conventional DSP, the Montium TP does not have a fixed instruction set, but is configured with the functionality required by the algorithm at hand. In particular, the tile processor does not have to fetch instructions and, hence, does not suffer from the Von Neumann bottleneck. Once configured, the Montium TP resembles more an ASIC than a DSP.

Reconfiguration of the tile processor is almost instant, as the size of the configuration binaries is very small. The size of a typical Montium configuration binary is less than 1 KB and reconfiguration typically takes less than 5 µs using a 100 MHz clock for configuration.

By virtue of its small core, the tile processor has a low silicon cost. For instance, the silicon area of a single Montium TP with 20 KB of embedded SRAM is 2 mm2 in 0.13 µm CMOS technology. The power consumption depends on the applications and ranges from 0.1-0.5 mW/MHz in 0.13 µm CMOS technology (including memory access).

The Montium TP is programmed using Recore's MontiumC programming language. Recore provides complete solutions without the need for a system engineer to program the Montium itself.

Architecture

The hardware organization is very regular. Five identical processing parts in a tile exploit spatial concurrency to enhance performance. This parallelism demands a very high memory bandwidth, which is obtained by having 10 local memories in parallel. The local memories are also motivated by the locality of reference principle, which is a guiding principle to obtain energy-efficiency.

The datapath width of a tile processor core and the memory capacity are customizable at design time and depend on the computational requirements. The arithmetic and logic units (ALUs) support both signed integer and signed fixed-point arithmetic. Input registers provide the most local level of storage.

The five processing parts together are called the Processing Part Array (PPA). A relatively simple sequencer controls the entire PPA by selecting configurable PPA instructions.

The Communication and Control Units (CCU) are network interface controllers and provide communication and configuration services to the reconfigurable tile. CCU interfaces for on-chip bus and network-on-chip are available.

Performance Benchmarks

  • 1024-point FFT/IFFT 
5140 cycles 
  • 25-tap FIR filter (512 samples)        
2562 cycles
  • 32x32 matrix product
8192 cycles
  • 32x32 matrix-vector product
256 cycles

 

Find out more

Download the Montium Tile Processordata sheet .

Applications

  • Wireless communication systems for cellphones and base station applications
  • Digital radio for terrestrial and satellite applications
  • Digital TV receivers for fixed and mobile applications
  • Image processing applications
  • Speech and audio signal processing applications
  • Radar signal processing applications such as tracking algorithms and adaptive beamforming applications
  • Biomedical signal processing applications