The CDSP-k2 is the second member in the CDSP family of high
performance customizable fixed-point DSP cores, featuring high
execution speeds for
both signal-processing algorithms and standard microprocessor
applications. It is meant to be used
as an embedded cell in ASICs developed on most of the 0.6u and below
technologies.
It is highly customizable and can be targeted at a large number of
technologies
thanks to its parameterized, HDL-only based design.
The CDSP-k2 includes an integrated customizable hardware
acceleration
unit that has been optimized for MAC/FIR/Correlation,
FFT/iFFT, and Matrix/Vector operations, thus allowig the processor to
be optimized for many common DSP algorithms. The
modular
design of the core allows stripped-down versions to be easily obtained
and enables an easy
tuning of the design to match the user's specifications. The
user-guided
customization process can thus achieve a highly efficient, low power
and
small area implementation, making the CDSP-k2 well suited for
high-volume,
low-cost applications, while also delivering world-class performance.
Architectural features:
- Single-cycle execution for most instructions.
- Two-operand instruction set with one operand residing in memory
and the other in a register
- A dual-operation instruction-word option enabling sustained rates
of two operations per cycle in memory-access intensive algorithms such
as buffered image processing and adaptive filtering
- Four internal data busses enabling up to four internal data
transfers per cycle
- Zero-cycle Block-repeat capability plus a standard looping
instruction
- Special bank-based memory architecture enabling efficient usage
of data types that are smaller than a processor word
- Very compact code and large addressing space
- Eight logical shifts and four arithmetic shifts
- Configurable hardware multiplier
- Provision for full accommodation of MAC(s) by mapping one MAC
input and the output on the register file and providing the full range
of indexed addressing modes for the second input
- Configurable butterfly unit enabling execution speeds comparable
with the cutting edge parallel DSP processors on the market.
- Up to three index registers fully featured with modulo and
bit-reversed post-increment addressing capability
- A constant-memory table-lookup pointer featured with
post-increment/post-decrement options
- Synchronous program memory implementable as a RAM/ROM
combination, enabling the DSP with run-time programmability feature via
the comm ports
- Less than one cycle response when in wait mode allowing fast
synchronization with predictable asynchronous events
- Option for shadow registers allowing zero-cycle context saving
for one or more levels of interrupts
Customizable features include:
- The size of the processor word (up to 64 bits) and the
point-position within the fixed-point registers
- The RAM and ROM sizes
- The number of integer and fixed-point registers
- The number and choice of shadow registers
- The number of index registers and the features of the address
generators, including modulo and bit-reversed addressing modes
- The amount of shifting for the shift instructions
- The addressing space (up to 2 GW)
- The number, size and operation mode of the communication ports
- The performance of the hardware multiplier, ranging from one
result-bit per cycle up to pipelined single-cycle
- Two instruction set implementation options, targeting lower power
consumption and higher maximum execution speed respectively
- And more...
Performance for a typical 0.6u/5V technology implementation:
- 75MHz or 133MHz operation depending on instruction-set
implementation option
- Sustained 100 MIPS performance, leading to various execution
speeds depending on the architecture variant and the specific algorithm
being implemented. Peak performance for
typical DSP algorithms is higher than:
- 100 MOPS for the basic architecture
- 200 MOPS by using the dual-operation instruction-set option
- 300 MOPS by using the dual-operation instruction-set option and
a register-mapped MAC
- As an example, an 8-channel ITU-G726 ADPCM algorithm can be
implemented on a 60 MHz basic architecture
- A 10,000 256-point FFTs/second rate can be obtained at 75 MHz
operation, by using the dedicated butterfly unit.
- The basic architecture, without the MAC(s) and the butterfly
unit, has a 130 MIPS/Watt ratio, leading to less than 300mW power
dissipation for a 40-bit processor running at 50MHz